Understanding Database isolation level via examples — MySQL and Postgres

Amir So

--

If you are familiar with Transactional Databases, you probably worked with the ACID properties. In the context of transaction processing, the acronym ACID refers to the four key properties of a transaction: atomicity, consistency, isolation, and durability. Here we want to talk about Isolation Levels.

What is Isolation Level?

Transactions specify an isolation level that defines the degree to which one transaction must be isolated from resource or data modifications made by other transactions. Isolation levels are described in terms of which concurrency side-effects, such as dirty reads or phantom reads, are allowed[1].

When working with database transactions, we must ask ourselves; which isolation level is more relevant for our use case. If we don’t choose the right one for our queries, it might cause a problem in the future, which even can have a huge effect on business. So today, we will explore deeper and more in detail to know how each level of isolation work in PostgreSQL and MySQL.

Keep in mind; Although isolation levels are well-defined standards, each database engine might implement them differently, which we will explain in this article.

Requirements

To replicate race conditions, we need to have MySQL and Postgres. I suggest using Docker to create these instances:

docker run -d --name mysql -p 3306:3306 -e ALLOW_EMPTY_PASSWORD=yes bitnami/mysql:latest && \
docker run -d --name postgresql -p 5432:5432 -e ALLOW_EMPTY_PASSWORD=yes bitnami/postgresql:latest

Now, We need to prepare our requirements (Tables, Data). To show the phenomena — We will use the weakest isolation level for this purpose — (queries below— View Sources):

MySQL — (PostgreSQL is available https://bit.ly/3mehCtC)

Phenomena

Issues that can occur when many people read and write to the same rows. These are:

Dirty Read: One transaction reads another transaction’s uncommitted writes.

Dirty Read
Figure 1 — Dirty Read (Bold line means transaction flow)

In Figure 1, user 2 experiences an anomaly. The messages returns an unopened item, but the notifications shows zero unopened because the counter increment has not yet happened by user 1. Isolation would have prevented this issue by ensuring that user 2 sees either the inserted item and the updated unopened counter or neither. (E.g MySQL / Postgres)

User1/Trx1 (Left) Not committed, but User2/Trx2 (Right) can see the new message.

Non-Repeatable Read: A transaction reads the same row twice and sees a different value. This happens when someone else UPDATEs the row between queries.

Figure 2 — Non-Repeatable Read (Bold line means transaction flow)

In Figure 2, after the first query by user 2, user 1 changes the value of the row. So the second query sees the new value and/or result. (E.g MySQL / Postgres)

User2/Trx1 (Left) — User1/Trx2 (Right)

Phantom Read: A transaction reads objects that match some search condition. Another user makes committed INSERTS and/or DELETES that affect the results of that search. This happens when another session inserts or deletes rows that match the WHERE clause of your query. So repeated queries can return different rows. (E.g MySQL / Postgres)

Figure 3 — Phantom Read (Bold line means transaction flow)
User2/Trx1 (Left) — User1/Trx2 (Right)

The SQL-92 standard introduced these above phenomena, But in reality, there are more…

Lost Updates: If two transactions want to change the same columns, the second transaction will overwrite the first one, therefore losing the first transaction update — read-modify-write cycle. (E.g MySQL / Postgres)

Lost Updates
Figure 4 — Lost Updates (Bold line means transaction flow)
User1/Trx1 (Left) — User2/Trx2 (Right)

Read Skew: A user sees different parts of the database at different points in time. Some cases of read skew are also known asNon-Repeatable Read”.

Figure 5 — Read Skew (Bold line means transaction flow)

Alice has $1000 of savings in her wallet, split across two accounts with $500 each. Now a transaction transfers $100 from Account 226 to Account 200. If she is unlucky enough to look at her list of account balances at the same moment as that transaction is being processed, she may see one account balance at a time before the incoming payment has arrived (balance $500), and the other account after the outgoing transfer has been made (the new balance being $400). (E.g MySQL / Postgres)

Transfer/Trx1 (Left) — Alice/Trx2 (Right)

Write Skew: A transaction reads something, makes a decision based on the value it saw, and writes the decision to the database. However, by the time the write is made, the premise of the decision is no longer true[3]. (E.g MySQL / Postgres)

Write Skew
Figure 6 — Write Skew (Bold line means transaction flow)
User2/Trx1 (Left) — User1/Trx2 (Right)

So now the question is, how can we avoid or detect these phenomena? Thanks to isolation implementation, we do not need to manage all these conditions at the application level manually. Keep reading; We are going to introduce the solutions.

Isolation Levels

Several clients can access a database at the same time. It won’t be problematic if they read and write different parts of the database, but you can run into race conditions if they work on the same records.

Read Uncommitted: The lowest level of isolation and actually does almost nothing. In this level, one transaction may read not yet committed changes made by other transactions, thereby allowing dirty reads.

Read Committed: Specifies that statements cannot read data that has been modified but not committed by other transactions. This prevents dirty reads. Data can be changed by other transactions between individual statements within the current transaction, resulting in nonrepeatable reads or phantom data [4] — Postgres uses this isolation level by default.

Repeatable Read: Specifies that statements cannot read data that has been modified but not yet committed by other transactions and that no other transactions can modify data that has been read by the current transaction until the current transaction completes [5] — MySQL uses this level of isolation by default.

Serializable: The strongest level of isolation. Specifies that statements cannot read data that has been modified but not yet committed by other transactions, No other transactions can modify data that has been read by the current transaction until the current transaction completes, and other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the current transaction until the current transaction completes [6].

MySQL Isolation Levels
Figure 7 — MySQL Isolation Levels and Phenomena
Postgres isolation levels
Figure 8— Postgres Isolation Levels and Phenomena

In PostgreSQL, you can request any of the four standard transaction isolation levels. Still, internally only three distinct isolation levels are implemented, i.e., PostgreSQL’s “READ UNCOMMITTED” mode behaves like “READ COMMITTED.” This is because it is the only sensible way to map the standard isolation levels to PostgreSQL’s multiversion concurrency control architecture [6]. So basically, Postgres only has 3 isolation levels instead of 4, as in MySQL.

Serializable isolation level can be implemented in many ways like Serial Execution, Two-phase locking (2PL), Serializable Snapshot Isolation (SSI) — Serializable transactions in PostgreSQL are implemented using SSI— however, each of them has pros and cons, and it’s better to do some research about them. Most databases that provide serializability today use one of these techniques.

How to change the isolation level?

Changing is easy; choosing is challenging because you must consider resources, use cases, etc.

MySQL:

MySQL SET TRANSACTION
SET TRANSACTION Syntax for Transaction Characteristics

In MySQL, You can set transaction characteristics globally, for the current session or the next transaction only [2].

Postgres:

The SET TRANSACTION command sets the characteristics of the current transaction. It does not affect any subsequent transactions. If SET TRANSACTION is executed without a prior START TRANSACTION or BEGIN, it will appear to have no effect since the transaction will immediately end.

To begin a new transaction with the same snapshot as an already existing transaction, first export the snapshot from the existing transaction. That will return the snapshot identifier, for example:

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT pg_export_snapshot();
pg_export_snapshot
--------------------
000003A1-1
(1 row)

Then give the snapshot identifier in a SET TRANSACTION SNAPSHOT command at the beginning of the newly opened transaction:

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SET TRANSACTION SNAPSHOT '000003A1-1';

Keep in mind; Don’t forget to implement a retry mechanism at a higher isolation level (higher chance of timeout, errors, and even deadlock) and also don’t forget to read the database documentation ;)

//Thanks a lot for reading!

Good to read!

References

[1] Isolation Levels in the Database Engine https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008-r2/ms189122(v=sql.105)?redirectedfrom=MSDN

[2] https://dev.mysql.com/doc/refman/8.0/en/set-transaction.html

[3] https://www.amazon.nl/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321 — Page 267

[4] https://docs.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql?view=sql-server-ver15&r

[5] https://docs.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql?view=sql-server-ver15&b

[6] https://docs.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql?view=sql-server-ver15&c

[7] https://www.postgresql.org/docs/current/transaction-iso.html

--

--