Log-Structured Merge Tree
Jump to navigation
Jump to search
A Log-Structured Merge Tree is a tree indexing structure that
- Context:
- It is a Disk-based Data Structure.
- It can be designed to provide Low-Cost Indexing for a file experiencing a high rate of Record Inserts and Record Deletion Operations.
- …
- Counter-Example(s):
- a B-Tree.
- See: Associative Array Data Structure.
References
2006
- (Chang et al., 2006) ⇒ Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. (2006). “Bigtable: A Distributed Storage System for Structured Data.” In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation.
- QUOTE: The manner in which Bigtable uses memtables and SSTables to store updates to tablets is analogous to the way that the Log-Structured Merge Tree (26 ] stores updates to index data. In both systems, sorted data is buffered in memory before being written to disk, and reads must merge data from memory and disk.
1996
- (O'Neil et al., 1996) ⇒ Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. (1996). “The Log-structured Merge-tree (LSM-tree).” In: Acta Informatica, 33(4). doi:10.1007/s002360050048
- QUOTE: Unfortunately, standard disk-based index structures such as the B-tree will effectively double the I/O cost of the transaction to maintain an index such as this in real time, increasing the total system cost up to fifty percent. Clearly a method for maintaining a real-time index at low cost is desirable. The Log-Structured Merge-tree (LSM-tree) is a disk-based data structure designed to provide low-cost indexing for a file experiencing a high rate of record inserts (and deletes) over an extended period. The LSM-tree uses an algorithm that defers and batches index changes, cascading the changes from a memory-based component through one or more disk components in an efficient manner reminiscent of merge sort.