Mini-LSM Course Overview

Course Structure

Course Overview

This course has three parts, or weeks. In the first week, you will focus on the structure and storage format of an LSM storage engine. In the second week, you will explore compaction in depth and add persistence to the storage engine. In the third week, you will implement multiversion concurrency control (MVCC).

Follow Environment Setup to prepare your development environment.

Overview of LSM

An LSM storage engine generally has three components:

A write-ahead log that persists recent data for recovery.
SSTs on disk that form the LSM-tree structure.
Memtables in memory that batch small writes.

The storage engine generally provides the following interfaces:

Put(key, value): Stores a key-value pair in the LSM tree.
Delete(key): Removes a key and its corresponding value.
Get(key): Retrieves the value associated with a key.
Scan(range): Retrieves a range of key-value pairs.

It may also provide an operation that establishes a persistence boundary:

Sync(): Ensures that all preceding operations have been persisted to disk.

Some engines combine Put and Delete into a single operation called WriteBatch, which accepts a batch of updates.

The overview diagrams assume a leveled compaction layout, which is common in production systems. In Week 2, you will implement and compare several compaction strategies.

Write Path

The LSM write path has four steps:

Write the key-value pair to the write-ahead log so that it can be recovered after a crash.
Write the key-value pair to the mutable memtable. After steps 1 and 2 are complete, the engine can report that the write has completed.
In the background, freeze a full mutable memtable, making it immutable, and flush it to disk as an SST file.
Also in the background, compact files from one or more levels into lower levels. This maintains the shape of the LSM tree and limits read amplification.

Read Path

To read a key, the engine:

Probes the memtables from newest to oldest.
If the memtables do not determine the result, searches the SSTs in the LSM tree.

There are two types of reads: lookups and scans. A lookup finds one key in the LSM tree, whereas a scan iterates over all keys within a range. The course covers both.

Your feedback is greatly appreciated. Welcome to join our Discord Community.
Found an issue? Create an issue / pull request on github.com/skyzh/mini-lsm.
mini-lsm-book © 2022-2026 by Alex Chi Z is licensed under CC BY-NC-SA 4.0.

Keyboard shortcuts

LSM in a Week

Mini-LSM Course Overview

Course Structure

Overview of LSM

Write Path

Read Path