The idea is: Use a hashmap to maintain the whole database in memory. The hashmap is a map between the data keys and the offset of the values in the data file. The data file is an append only log, every operation add a new record to the file.
Many databases use this approach, for example, RiakDB.
If the value of a record is larger than the available memory, it can be read from the disk instead, this only takes 1 disk seek so it's cheap.
This approach is well suited for situation where the value of each key is updated frequently.
We have to keep the data file from growing bigger and bigger, the common approach is: Break the log into segments of a certain size. Create a new segment when the max size is reached. And perform compact regularly to reduce duplications. Maintain a cache of regularly used segments to reduce disk read.
About Concurrency: While doing compact, there might be some other read operations going on, so compact had to be happen in the background, and the merged segment is the new segment, by doing this, we can serve the data from old segments until the merged segment created.
About Crash Recovery: Because data is maintained in the memory, it will be lost if database restarted, RiakDB stored a snapshot of each segment's hashmap on disk to load them back more quickly.
Crash could happen during appending as well, RiakDB include a checksum to detect and ignore the corrupted data.
Why use append only log?
- Because appending and segment merging are sequential write operations, it's much faster than random writes.
- Concurrency and crash recovery is much simpler.
Hashmap does not support sequential reads, so if we have a range queries, the have to lookup each key individually.
Source: Designing Data-Intensive Appliclations, Chapter 3, Pg. 73-75