Four Guarantees, One Label

A transaction takes several reads and writes and makes them one unit: all of it happens, or none of it does. The application gets to think in operations like “transfer 100” while the database worries about crashes, restarts, and the other 50 clients doing the same thing at the same time.

The guarantees behind that unit have a name. Härder and Reuter coined ACID in a 1983 paper on database recovery - atomicity, consistency, isolation, durability. Forty years later the acronym is on every database’s feature page, and a chapter of Martin Kleppmann’s Designing Data-Intensive Applications (the book this post leans on) argues it has drifted into marketing: implementations differ so much that “ACID compliant” does not tell you what you get. Two databases can both claim the label and behave differently under the same workload.

So this post takes the letters one at a time. A and D get short sections. C mostly turns out to be your job, not the database’s. I gets most of the post, because isolation is where databases differ, where the level names mislead, and where the incidents come from.

Atomicity

Atomic here means indivisible, and nothing more. The word carries a concurrency meaning elsewhere (atomic instructions, atomic counters), but the A in ACID says nothing about concurrent access. That is isolation’s job. Kleppmann suggests abortability would have been a better word, and it is the right way to think about the guarantee: a transaction can always be cleanly abandoned, no matter where it stops.

The case that matters is a fault midway. A transfer is 3 writes: debit one account, credit another, record the transfer. Suppose the process dies after the first write.

Without atomicity, a restart finds the in-between state: money gone from one account, never arrived at the other. Worse, you cannot tell how far the transaction got, so you cannot safely retry it - a blind retry might apply the debit twice. With atomicity, recovery rolls the partial writes back on restart and the database behaves as if the transaction never began. That is the real payoff: an aborted transaction leaves nothing behind, so retrying is safe by construction.

The machinery is a log. Before changing a page, the database records enough information to undo the change. Commit itself is one tiny act - a commit record reaching the log - and anything short of that point gets undone during recovery. Everything before commit is provisional, and the database is always prepared to take it all back.

Consistency

Consistency here means your invariants hold: credits and debits balance, every order points at a customer that exists, a seat is never sold twice.

But look at where those rules live. The database can enforce a few of them - uniqueness, foreign keys, CHECK constraints - and has no idea about the rest. Whether “transfer 100” conserves money depends on the statements your application puts inside the transaction. Write one that debits without crediting, and the database will commit it without complaint. It cannot know the rule exists.

So the C is different in kind from the other three letters. Atomicity, isolation and durability are properties the database provides. Consistency is a property of your application, built on top of them: if each of your transactions is individually correct, A, I and D guarantee that correctness survives crashes and concurrency. Joe Hellerstein has remarked (Kleppmann quotes this) that the C was “tossed in to make the acronym work”.

One disambiguation, because the word is badly overloaded: this consistency is unrelated to the C in the CAP theorem and unrelated to replica convergence in eventually consistent systems. When someone says “consistency”, ask which one they mean.

Isolation

In March 2014 the bitcoin exchange Flexcoin shut down. Attackers had fired thousands of concurrent transfers of the same balance: each request read the balance, saw enough funds, and proceeded. The reads interleaved, every check passed against the same stale number, and 896 BTC were gone. The database executed every transaction faithfully. Each one was individually correct. They were allowed to interleave in a way the developers never considered.

Isolation is the letter that governs this: what concurrent transactions may see of each other.

The textbook ideal is serializability: the database may interleave work however it likes, as long as the outcome equals some schedule in which the transactions ran one at a time. Under that guarantee, concurrency cannot corrupt anything that sequential logic would not also corrupt. You reason about one transaction alone, and you are done.

But serializability has a long reputation for being slow, so SQL databases shipped weaker levels and made one of them the default. Postgres, Oracle and SQL Server default to read committed. MySQL’s InnoDB defaults to repeatable read. Few applications change the default, which means most transactions in the world run at a level that permits specific, well-catalogued race conditions. The practical skill is to know those races by name, know which level admits which, and decide on purpose which ones your code can survive.

The rest of this section is that ladder, one anomaly at a time.

Read committed

Read committed makes 2 promises: you only read data that has been committed (no dirty reads), and you only overwrite data that has been committed (no dirty writes).

A dirty read is acting on data from a transaction that has not committed - and might never. If you read a balance mid-transfer, while the debit is applied but the credit is not, you see money that does not exist. If that transfer then aborts, you acted on a value that was never true at all.

A dirty write is two in-flight transactions overwriting each other’s uncommitted writes. The classic shape is one logical update spread over 2 tables. A car sale writes the listing’s buyer and the invoice’s recipient. Two buyers race, the interleaving lands badly, and:

        Alice                                Bob
  t1    UPDATE listings SET buyer = 'alice'
  t2                                         UPDATE listings SET buyer = 'bob'
  t3                                         UPDATE invoices  SET to    = 'bob'
  t4    UPDATE invoices  SET to    = 'alice'

        -- car sold to bob, invoice sent to alice

Read committed prevents both, and the implementation is cheap, which is why it is the default almost everywhere. A writer takes a row-level lock and holds it until commit, so a second writer waits rather than clobbering uncommitted data. The write lock is not special to this level - writers lock at every level, even read uncommitted, because rollback depends on it. What read committed adds is the read side: readers take no locks, and while a row has an uncommitted change pending, the database hands them the old committed value.

What read committed does not promise: that 2 of your own reads agree with each other.

Read skew, and snapshots

You hold 1000 across two accounts, 500 each. A report starts reading. It reads account x: 500. While it runs, a transfer moves 100 from y to x and commits. The report reads account y: 400. It totals 900.

No rule was broken. Both reads returned committed data. They returned data committed at different moments, and the combined picture - 900 - was never true at any moment. This is read skew, and read committed permits it.

Rerun the report and the number comes back right, which is why this anomaly gets dismissed. The damage is to work that cannot rerun. A backup taken from such a view restores a database where the 100 is missing for good. A long analytics scan quietly aggregates numbers from different moments. An integrity checker reports corruption that was never there.

The fix is snapshot isolation: every transaction reads from a consistent snapshot of the database as of the moment it began. A 5-minute report sees the database as of its first millisecond, no matter who commits meanwhile.

The implementation is multi-version concurrency control, MVCC. A write does not overwrite a row in place. It appends a new version of the row, tagged with the writer’s transaction id, and the old version stays. Each transaction is told, at start, which transaction ids were committed at that instant, and its reads return the newest version from that set, ignoring everything later. Old versions cost storage and need garbage collection (Postgres calls its collector vacuum). In exchange, readers never block writers and writers never block readers, which is the property that makes snapshot isolation cheap enough to leave on.

Now the naming problem. Postgres and MySQL expose snapshot isolation under the name “repeatable read”. Oracle exposes it under the name “serializable”, which it is not - the next 2 anomalies survive it. The label on the level does not tell you the guarantee. Kleppmann maintains Hermitage, a suite of runnable tests that documents what each database’s levels do, and it is the reference I trust over any manual.

Lost update

Two transactions read-modify-write the same value concurrently. A counter holds 42. Both read 42, both compute 43, both write 43. Two increments happened, the counter moved by one. The second write was based on a value that was already stale, and it silently cancelled the first.

Nothing here violates read committed - each write replaced a committed value. The Flexcoin race was this cycle with money in it: read balance, compute new balance, write, concurrently.

The defenses, in the order I would reach for them:

Atomic writes. UPDATE likes SET n = n + 1 moves the whole cycle into one statement. The database takes the row lock and the race disappears. The best fix whenever the modification is expressible in SQL.
Explicit locking. SELECT ... FOR UPDATE takes the row lock at the read, so the cycle from read to write runs alone.
Automatic detection. Snapshot isolation has enough information to notice a lost update at commit. Postgres’s repeatable read aborts the second writer with a serialization error, and your code retries. MySQL InnoDB’s repeatable read does not detect it - same level name, weaker promise (Hermitage has the test results).
Compare-and-set. UPDATE ... SET n = 44 WHERE n = 43, then check the affected-row count in the application. The lock-free variant, and easy to get wrong if you forget the check.

A note on that automatic detection point, because it is easy to picture it as lock-free. Both databases (Postgres and MySQL) lock the write either way: a row lock is taken on every UPDATE at every level, repeatable read included, so a second writer to the same row blocks in both. The difference is what happens once the block clears. Postgres aborts the second writer if the row changed since its snapshot began, while MySQL lets it proceed against the current row without aborting, which is how the lost update slips through. Snapshot isolation decides which version your reads see, and in Postgres whether to abort on a write conflict. It does not make the write itself lock-free.

Write skew

Now the anomaly snapshot isolation cannot stop. A hospital requires at least 1 doctor on call. Tonight there are 2, Alice and Bob, and both feel ill at the same moment. Each transaction counts who is on call, sees 2, passes the check, and takes its own doctor off. Both commit. On call: 0.

This is write skew. It generalizes the lost update: the 2 transactions read an overlapping set of rows but write to different rows, so no write-write machinery notices anything. Each one’s write invalidates the premise the other read. Snapshot isolation allows it - each transaction read a consistent snapshot, then committed a decision that was already stale.

The shape is everywhere once you can name it. Check a username is free, then claim it. Check a spending limit, then spend. Check a meeting room is free for a slot, then book it. Read a premise, write a conclusion, and the premise moves under you.

For the doctors there is a manual fix: lock the rows the premise depends on, SELECT ... FOR UPDATE on the on-call rows, and the 2 transactions serialize. But the meeting room exposes the harder version. The premise there is “no booking exists for this slot”, and you cannot lock rows that do not exist. The conflicting write is an insert that creates the row your query would have matched - this is called a phantom. The workaround of materializing conflicts (pre-creating a row per time slot, just to have something to lock) works and is ugly enough that DDIA recommends it last. The honest fix is to run serializable.

Serializable, three ways

Every production implementation of serializability is one of three ideas.

Run them one at a time. Drop concurrency instead of taming it: one thread, one transaction at a time, on data held in RAM. This stopped being absurd around 2007, when memory got big enough to hold OLTP working sets and people noticed that short transactions do not need overlap to keep a core busy. VoltDB and its ancestor H-Store run transactions as pre-registered stored procedures this way, and Redis executes each command, and each MULTI/EXEC block, on its single command thread. The costs: throughput is capped at one core per partition, and an interactive transaction that pauses mid-flight for a network round trip stalls everyone behind it, so the conversational multi-statement style of transaction is ruled out.

Two-phase locking. The classic, and for about 30 years the only choice. Readers take shared locks, writers take exclusive locks, everyone holds locks until commit, so readers block writers and writers block readers. Take the doctors. Each transaction’s count takes a shared lock on both on-call rows, then tries to set its own doctor off, which needs an exclusive lock on that row. The exclusive lock collides with the shared lock the other transaction still holds there, so each blocks on the other - a deadlock, broken by aborting one. The survivor commits, the loser retries, reads a count of 1, and its check fails. Read locks are what snapshot isolation was missing. MySQL and SQL Server serializable mode lock plain reads to get there. Postgres tracks reads instead, the next approach. Phantoms force an extra trick: you cannot lock a row that does not exist, so the database locks the gap instead - predicate locks in theory, index-range locks in practice, guarding the conditions queries searched for rather than just the rows they found. (Two-phase locking is unrelated to two-phase commit, which is about distributed transactions.) It works, and it is the reason “serializable” earned its slow reputation: lock queues pile up behind any slow transaction, deadlocks need detecting and breaking, and tail latencies become unpredictable under contention.

Serializable snapshot isolation. The optimistic one, and the youngest - Cahill, Röhm and Fekete, 2008. Everyone runs on plain snapshot isolation, nobody blocks, and the database additionally tracks which transactions read what. At commit, a transaction whose reads have been invalidated by a concurrent committed write - the doctors pattern - is aborted. The cost is the abort and the retry, and it is paid only when transactions interfere. Postgres’s SERIALIZABLE has been SSI since 9.1, and CockroachDB runs serializable by default, which it can afford for the same reason: under low contention, SSI costs little more than the snapshot isolation you were already running. The contract is that your application treats serialization failures as normal and retries.

Where that leaves the databases you run:

database	default level	what `SERIALIZABLE` gets you
PostgreSQL	read committed	SSI, true serializability (9.1+)
MySQL InnoDB	repeatable read	two-phase locking
Oracle	read committed	snapshot isolation, by another name
SQL Server	read committed	two-phase locking (snapshot opt-in)
CockroachDB	serializable	the default already
SQLite	serializable	one writer at a time, by construction

The posture I would adopt: find out what level you run (SHOW default_transaction_isolation in Postgres, SELECT @@transaction_isolation in MySQL). For each code path where a race costs money or data, name the anomaly you are exposed to at that level. Then either accept it deliberately, restructure into atomic statements and explicit locks, or move that path to serializable and write the retry loop.

Durability

Durability is the promise that once the database says committed, the data survives a crash. The mechanism is the same log atomicity uses: the commit record is written to the write-ahead log and flushed - fsync - to disk before the client hears yes. Crash any time after, and recovery replays the log.

The promise is only as honest as the layers under it, and there are more layers than the word “disk” suggests:

The write path from process buffers through OS page cache and disk cache to persistent media, with brackets showing what a process crash, an OS crash, a power loss, and a drive failure each destroy — What each failure takes with it. fsync is the line most of durability leans on.

Each layer has its own fine print. The fsync line: in 2018 the Postgres developers discovered that after a failed fsync, Linux marks the dirty pages clean anyway, so retrying the fsync would report success without the data ever reaching disk - LWN’s writeup is the canonical account. Postgres now deliberately crashes on fsync failure and replays the log, rather than trust a retry that lies. The drive line: disks have volatile write caches, and a drive without power-loss protection can acknowledge a flush it has not performed, which moves the durability line without telling you. And the last bracket has no local fix at all: when the drive dies, only another machine helps. Replication covers hardware death. Backups cover the failure modes replication faithfully copies to every node, like an errant DELETE.

So durability in practice is layered: fsync for crashes, replication for hardware, backups for yourself, and a tested restore. Kleppmann’s framing is the right one - perfect durability does not exist, only risk reduction, layered.

In one paragraph

A transaction is the database’s unit of all-or-nothing. Atomicity is abortability: a failed transaction leaves nothing behind, which is what makes retries safe. Consistency is your invariants, and it is the application’s letter - the database supplies the other three so your correct transactions stay correct under crashes and concurrency. Isolation comes in levels: read committed stops dirty reads and dirty writes, snapshot isolation gives every transaction one consistent instant to read from, and only serializability - serial execution, 2PL, or SSI - defends the premises your transactions act on, against lost updates, write skew and phantoms. The level names disagree across databases, so check the guarantee, not the label. Durability is the fsync line plus honesty about the layers beneath it, then replication and backups for the failures fsync cannot see. Each letter is a separate guarantee, and each is worth checking against the database you run.