AWS Database Blog
Volatility classification in PostgreSQL
PostgreSQL database systems often handle complex queries involving user-defined functions that process data and return results. These functions can range from simple calculations to complex data manipulations, each with different behavioral patterns in terms of their output consistency and side effects.
The unpredictability of these functions poses challenges for query optimization and caching. To tackle this, PostgreSQL employs volatility classification, categorizing functions based on their behavior and return value consistency across executions. By labeling functions as IMMUTABLE
, STABLE
, or VOLATILE
, you help the query planner optimize performance by determining when results can be cached and reused or recalculated.
This classification system serves as a vital tool for database optimization, allowing PostgreSQL to balance between performance optimization and result accuracy. Understanding and implementing volatility classification correctly is essential for PostgreSQL developers, because improper use can lead to significant side effects impacting overall database performance.
By properly classifying function volatility, you can achieve several benefits, including:
- Informed decisions about result caching by the query planner
- Improved query performance through optimized execution plans
- Reduced unnecessary function recalculations
- Consistent and predictable query results
- Better resource utilization
To achieve optimal performance, label your functions with the strictest applicable volatility category (IMMUTABLE
, STABLE
, or VOLATILE
). In this post, we discuss different ways you can use volatility classification with functions in PostgreSQL and provide best practices to help you keep your database optimized and develop efficient and reliable database applications.
Understanding volatility classification
PostgreSQL implements a volatility classification system that categorizes functions into three distinct levels, as detailed in the following table.
Volatility Classification | Use Case | Examples |
IMMUTABLE |
Functions that always return the same output for the same input arguments (such as mathematical calculations) | |
STABLE |
Functions that return consistent results within a single table scan but might vary across transactions | |
VOLATILE |
Functions that can return different results even with the same inputs |
Let’s understand each classification in detail through some practical examples.
Immutable
An IMMUTABLE
function must not contain any statements that modify the database state and always returns the same result when given the same arguments. The following code snippet is an example of an IMMUTABLE
function because its output depends solely on input parameters, with no reliance on database state or external factors, providing consistent results across executions:
IMMUTABLE
functions must not contain any DML, DDL, or TCL commands. If you declare a function or procedure as IMMUTABLE
, which modifies the database state, then the function or procedure will be created, but the function reports an error when invoked. For example, see the following code:
We get the following output:
If you run the function, you get the following error:
Stable
Similar to IMMUTABLE
functions, a STABLE
function must also not contain any statements that modify the database state. The difference is that STABLE
functions return consistent results within a single table scan but might vary across transactions. STABLE
functions must not contain any DML, DDL, or TCL, commands, but they can contain SELECT
statements that query database tables. Because the database table state can change between transactions, STABLE
function results might vary for different transactions.
Let’s explore an example to understand STABLE
function behavior:
Let’s test the code and check the function output:
Now let’s understand why the function is defined as STABLE
and not IMMUTABLE
. IMMUTABLE
would be incorrect for the following reasons:
- Data dependency – The function depends on table data that can change
SELECT role FROM users WHERE id = user_id
. Because of this, the same input doesn’t guarantee you will get the same output across transactions. Table data can be modified by other transactions. Even with the same input user_id, the function could return different results if the underlying role data changes. This violates the fundamental principle ofIMMUTABLE
functions, which must return the same output for the same input every time. - Transaction context – Within the same transaction, results are reused as long as there is no database state change. This is appropriate because we want to see consistent data within our transaction. However, using
IMMUTABLE
would be incorrect because it would cache the result permanently.IMMUTABLE
caching could lead to stale data being returned even after the role has been updated in the table.
See the following code:
Volatile
A VOLATILE
function can do anything, including modifying the database. It can return different results on successive calls with the same input arguments. The optimizer makes no assumptions about the behavior of such functions. When a function is labeled as VOLATILE
, the optimizer treats its results as unpredictable and potentially changing with each call. To provide accuracy, the optimizer will rerun the function every time it’s encountered in a query, rather than relying on cached or precomputed results. See the following code:
In this example, fnc_insert_emp_data
is VOLATILE
because the function alters the database state if the employee doesn’t exist and the optimizer will run this function every time it’s called.
Function volatility in transactions
In PostgreSQL, you can start a transaction using BEGIN
. Anything within the transaction follows the ACID property (Atomicity, Consistency, Isolation, Durability). Although functions and procedures have different volatility categories, their behavior might change with respect to the transaction. Let’s dive deeper into each function volatility type with examples.
IMMUTABLE
functions always return the same result for the same input, regardless of whether they’re in a transaction or not:
STABLE
functions, like the function now()
, maintain consistency within a transaction but might change between transactions. Let’s explore the behavior of STABLE
functions within transactions in PostgreSQL:
By defining now()
as STABLE
, PostgreSQL acknowledges that its behavior can change depending on the context, even if the transaction or input remains the same. This distinction is important for optimization and caching purposes, because IMMUTABLE
functions can be safely cached and reused, whereas STABLE
and VOLATILE
functions require more careful consideration.
Unlike IMMUTABLE
and STABLE
functions, VOLATILE
functions are run every time, because they can return different results on each call. Understanding these distinctions is important for writing efficient and predictable database operations. For example, the built-in functions random()
and clock_timestamp()
are inherently non-deterministic, meaning their output will always vary, even when called within a transaction:
Sometimes, you might declare functions as IMMUTABLE
or STABLE
without realizing that some expressions or called functions are neither IMMUTABLE
nor STABLE
. For example, in the following code block, the get_timestamp()
function is labeled as IMMUTABLE
, indicating that its result depends only on the input parameters (none in this case):
On first look, the function code looks fine, but it actually returns the current timestamp, which depends on the TimeZone
setting, a configuration parameter. The problem with this example is that it calls the built-in now()
function, which is defined as STABLE
in the PostgreSQL catalog. As a general rule, IMMUTABLE
PostgreSQL functions must not invoke non-IMMUTABLE
functions, because non-IMMUTABLE
functions like STABLE
or VOLATILE
can change their values within transactions, which contradicts the IMMUTABLE
definition.
In this example, if the TimeZone
setting changes, the result of the get_timestamp function will also change. This violates the contract of an IMMUTABLE
function, which promises consistent results for identical input parameters. See the following code:
By labeling the function as STABLE
instead of IMMUTABLE
, we acknowledge that its result depends on TimeZone
settings. This way, PostgreSQL will recalculate the function result if the TimeZone
setting changes, providing correctness and avoiding unexpected behavior.
Remember, when in doubt about whether a function should be IMMUTABLE
, it’s safer to use STABLE
or VOLATILE
instead. Incorrectly marking a function as IMMUTABLE
can lead to incorrect query results that are hard to debug.
Key considerations for determining function volatility
This section provides a comprehensive explanation of when to use IMMUTABLE
, STABLE
, and VOLATILE
functions in PostgreSQL.
Define a function as IMMUTABLE
for the following cases:
- String manipulations (like
upper()
,lower()
, orsubstr()
) - Mathematical calculations
- Pure computational operations
- Functions that only work with their input parameters
- Date/time format conversions with constant timestamp values (not current time)
- The function only invokes other IMMUTABLE functions
Define STABLE
for the following cases:
- When the function results depend on database lookups or information that remains constant throughout a transaction
- Functions that read from the database but don’t modify data, typically
SELECT
statements - Results should be consistent within the same transaction when called with the same arguments
- Invoked functions are either
IMMUATBLE
orSTABLE
functions inside the function body
Define VOLATILE
for the following cases:
- When results can change within the same transaction
- Use for functions that modify database state or depend on external factors
- Results might vary even with the same inputs and within the same transaction, for example
random()
- This is the default volatility category if none is specified
IMMUTABLE
functions can be optimized most aggressively by PostgreSQL, whereas VOLATILE
functions need to be reevaluated every time. STABLE
functions provide a middle ground, allowing for some optimization while still depending on database state. The following section provides more clarity on function volatility optimization.
Use the strictest volatility category to optimize functions
This section highlights the importance of understanding the performance differences between IMMUTABLE
, STABLE
, and VOLATILE
functions in PostgreSQL, and how you can use this knowledge to optimize database performance and improve query efficiency. For best optimization results, you should label your functions with the strictest volatility category that is valid for them. This means you should use IMMUTABLE
or STABLE
wherever appropriate, rather than defaulting to VOLATILE
. This is one of the optimization techniques that can boost functions performance in PostgreSQL.
Let’s examine this through a practical example:
Because this use case doesn’t depend on the database state or database configuration parameters, you could create the function as IMMUTABLE
:
You can verify the performance difference by running both functions many times in a row and comparing their runtimes:
The output shows that the IMMUTABLE
function took 3 milliseconds, whereas the VOLATILE
function took 22 milliseconds—an 86% improvement just by correctly classifying the volatility category.
This example demonstrates how IMMUTABLE
functions can significantly improve performance.
During the planning phase, PostgreSQL recognizes the function is IMMUTABLE
. It evaluates calculation_immutable(100)
one time and caches the result value. During execution, PostgreSQL uses the cached value for all 1,000 rows. There are no repeated calculations; it returns the same pre-calculated value 1,000 times.
This distinction is relevant for large-scale applications and complex queries, where even slight improvements in runtime can have a significant impact on overall performance. Always analyze before you decide the volatility category of a function or procedure. Using the incorrect volatility category of a function in PostgreSQL can lead to stale data, meaning the system might cache and reuse outdated results instead of recalculating them when needed. This can cause inconsistencies, data accuracy issues, and potential business decisions based on outdated information. Let’s dive deeper into this in the next section.
Incorrect function volatility category side effects
In this section, we discuss how volatility classification might have unintended consequences or side effects when incorrectly used, and why it’s important to be aware of these potential risks.
PostgreSQL doesn’t issue warnings when an IMMUTABLE
function executes a SELECT
statement on a table. However, because IMMUTABLE
functions assume unchanged data, modifying the table can lead to inconsistent results due to cached query plans. If a function is mistakenly marked as IMMUTABLE
instead of STABLE
, it might not reflect real-time data changes, leading to unreliable outputs in queries and reports. See the following code:
Let’s create prepared statements and see how incorrect function labeling can produce unreliable outputs:
Because employee_id = 200
doesn’t exist in Employee
, the result for both preceding statements must be False. Let’s test and verify the output:
Now let’s change the database table state by inserting one row for employee_id=200
and see how the IMMUATBLE
and STABLE
functions behave:
This example shows why functions that depend on table data should be marked as STABLE
rather than IMMUTABLE
, because it affects both data accuracy and query performance. Therefore, using the correct volatility category for functions is critical for maintaining data accuracy and integrity, especially when using prepared statements in PostgreSQL.
PostgreSQL stores function metadata, including volatility categories, in system catalog tables, making it possible to query and verify a function’s volatility. You can check if a function is IMMUTABLE
, STABLE
, or VOLATILE
. Let’s dive deeper into querying this information in next section.
Check the volatility classification of a function
You can use the system catalog pg_proc
to check the volatility classification of a function in the column provolatile. The values are as follows:
- i indicates IMMUTABLE functions
- s indicates STABLE functions
- v indicates VOLATILE functions
See the following code:
Alternatively, you can use the function pg_get_functiondef()
to check the volatility of an individual function:
You can also use the psql terminal to check function volatility classifications and other details:
Best practices
The following are some best practices for using VOLATILE
, IMMUTABLE
, and STABLE
functions in PostgreSQL:
- Choose the correct volatility category based on the function’s behavior and dependencies.
- A function must be labeled
VOLATILE
if its value can change within a single query or has side effects. Some built-in functions examples arerandom()
,currval()
, andtimeofday()
. - A function must be labeled
IMMUTABLE
if it depends only on its input parameters and doesn’t modify the external state or rely on volatile data. This provides predictability and consistency, making the function suitable for optimization and caching. - A function must be labeled
STABLE
when it depends on configuration parameters or the query contains onlySELECT
statements (no DDL, DML, or TCL commands). - If the function is declared
IMMUTABLE
, then the expression must not invoke anynon-IMMUTABLE
functions or operators. - If the function is declared
STABLE
, then the expression must not invoke anyVOLATILE
functions or operators. - When in doubt about whether a function should be
IMMUTABLE
, it’s safer to useSTABLE
orVOLATILE
instead.
Conclusion
In this post, we discussed how properly categorizing functions as VOLATILE
, IMMUTABLE
, or STABLE
allows PostgreSQL’s optimizer to make informed decisions, leading to more efficient query plans. VOLATILE
functions offer freshness, STABLE
functions balance predictability with flexibility, and IMMUTABLE
functions enable the highest degree of optimization through result caching and precomputation. Understanding and using these categories effectively can greatly enhance the performance and scalability of PostgreSQL-based applications.
If you have any questions or feedback about these functions and how to implement them, leave a comment.