Mem Table and Merge Iterators

This is a legacy version of the Mini-LSM tutorial and we will not maintain it anymore. We now have a better version of this tutorial and this chapter is now part of Mini-LSM Week 1 Day 1: Memtable and Mini-LSM Week 1 Day 2: Merge Iterator

In this part, you will need to modify:

  • src/iterators/merge_iterator.rs
  • src/iterators/two_merge_iterator.rs
  • src/mem_table.rs

You can use cargo x copy-test day3 to copy our provided test cases to the starter code directory. After you have finished this part, use cargo x scheck to check the style and run all test cases. If you want to write your own test cases, write a new module #[cfg(test)] mod user_tests { /* your test cases */ } in table.rs. Remember to remove #![allow(...)] at the top of the modules you modified so that cargo clippy can actually check the styles.

This is the last part for the basic building blocks of an LSM tree. After implementing the merge iterators, we can easily merge data from different part of the data structure (mem table + SST) and get an iterator over all data. And in part 4, we will compose all these things together to make a real storage engine.

Task 1 - Mem Table

In this tutorial, we use crossbeam-skiplist as the implementation of memtable. Skiplist is like linked-list, where data is stored in a list node and will not be moved in memory. Instead of using a single pointer for the next element, the nodes in skiplists contain multiple pointers and allow user to "skip some elements", so that we can achieve O(log n) search, insertion, and deletion.

In storage engine, users will create iterators over the data structure. Generally, once user modifies the data structure, the iterator will become invalid (which is the case for C++ STL and Rust containers). However, skiplists allow us to access and modify the data structure at the same time, therefore potentially improving the performance when there is concurrent access. There are some papers argue that skiplists are bad, but the good property that data stays in its place in memory can make the implementation easier for us.

In mem_table.rs, you will need to implement a mem-table based on crossbeam-skiplist. Note that the memtable only supports get, scan, and put without delete. The deletion is represented as a tombstone key -> empty value, and the actual data will be deleted during the compaction process (day 5). Note that all get, scan, put functions only need &self, which means that we can concurrently call these operations.

Task 2 - Mem Table Iterator

You can now implement an iterator MemTableIterator for MemTable. memtable.iter(start, end) will create an iterator that returns all elements within the range start, end. Here, start is std::ops::Bound, which contains 3 variants: Unbounded, Included(key), Excluded(key). The expresiveness of std::ops::Bound eliminates the need to memorizing whether an API has a closed range or open range.

Note that crossbeam-skiplist's iterator has the same lifetime as the skiplist itself, which means that we will always need to provide a lifetime when using the iterator. This is very hard to use. You can use the ouroboros crate to create a self-referential struct that erases the lifetime. You will find the ouroboros examples helpful.

#![allow(unused)]
fn main() {
pub struct MemTableIterator {
    /// hold the reference to the skiplist so that the iterator will be valid.
    map: Arc<SkipList>
    /// then the lifetime of the iterator should be the same as the `MemTableIterator` struct itself
    iter: SkipList::Iter<'this>
}
}

You will also need to convert the Rust-style iterator API to our storage iterator. In Rust, we use next() -> Data. But in this tutorial, next doesn't have a return value, and the data should be fetched by key() and value(). You will need to think a way to implement this.

Spoiler: the MemTableIterator struct
#![allow(unused)]
fn main() {
#[self_referencing]
pub struct MemTableIterator {
    map: Arc<SkipMap<Bytes, Bytes>>,
    #[borrows(map)]
    #[not_covariant]
    iter: SkipMapRangeIter<'this>,
    item: (Bytes, Bytes),
}
}

We have map serving as a reference to the skipmap, iter as a self-referential item of the struct, and item as the last item from the iterator. You might have thought of using something like iter::Peekable, but it requires &mut self when retrieving the key and value. Therefore, one approach is to (1) get the element from the iterator on initializing the MemTableIterator, store it in item (2) when calling next, we get the element from inner iter's next and move the inner iter to the next position.

In this design, you might have noticed that as long as we have the iterator object, the mem-table cannot be freed from the memory. In this tutorial, we assume user operations are short, so that this will not cause big problems. See extra task for possible improvements.

You can also consider using AgateDB's skiplist implementation, which avoids the problem of creating a self-referential struct.

Task 3 - Merge Iterator

Now that you have a lot of mem-tables and SSTs, you might want to merge them to get the latest occurrence of a key. In merge_iterator.rs, we have MergeIterator, which is an iterator that merges all iterators of the same type. The iterator at the lower index position of the new function has higher priority, that is to say, if we have:

iter1: 1->a, 2->b, 3->c
iter2: 1->d
iter: MergeIterator::create(vec![iter1, iter2])

The final iterator will produce 1->a, 2->b, 3->c. The data in iter1 will overwrite the data in other iterators.

You can use a BinaryHeap to implement this merge iterator. Note that you should never put any invalid iterator inside the binary heap. One common pitfall is on error handling. For example,

#![allow(unused)]
fn main() {
let Some(mut inner_iter) = self.iters.peek_mut() {
    inner_iter.next()?; // <- will cause problem
}
}

If next returns an error (i.e., due to disk failure, network failure, checksum error, etc.), it is no longer valid. However, when we go out of the if condition and return the error to the caller, PeekMut's drop will try move the element within the heap, which causes an access to an invalid iterator. Therefore, you will need to do all error handling by yourself instead of using ? within the scope of PeekMut.

You will also need to define a wrapper for the storage iterator so that BinaryHeap can compare across all iterators.

Task 4 - Two Merge Iterator

The LSM has two structures for storing data: the mem-tables in memory, and the SSTs on disk. After we constructed the iterator for all SSTs and all mem-tables respectively, we will need a new iterator to merge iterators of two different types. That is TwoMergeIterator.

You can implement TwoMergeIterator in two_merge_iter.rs. Similar to MergeIterator, if the same key is found in both of the iterator, the first iterator takes precedence.

In this tutorial, we explicitly did not use something like Box<dyn StorageIter> to avoid dynamic dispatch. This is a common optimization in LSM storage engines.

Extra Tasks

  • Implement different mem-table and see how it differs from skiplist. i.e., BTree mem-table. You will notice that it is hard to get an iterator over the B+ tree without holding a lock of the same timespan as the iterator. You might need to think of smart ways of solving this.
  • Async iterator. One interesting thing to explore is to see if it is possible to asynchronize everything in the storage engine. You might find some lifetime related problems and need to workaround them.
  • Foreground iterator. In this tutorial we assumed that all operations are short, so that we can hold reference to mem-table in the iterator. If an iterator is held by users for a long time, the whole mem-table (which might be 256MB) will stay in the memory even if it has been flushed to disk. To solve this, we can provide a ForegroundIterator / LongIterator to our user. The iterator will periodically create new underlying storage iterator so as to allow garbage collection of the resources.

Your feedback is greatly appreciated. Welcome to join our Discord Community.
Found an issue? Create an issue / pull request on github.com/skyzh/mini-lsm.
Copyright © 2022 - 2024 Alex Chi Z. All Rights Reserved.