Scale and Concurrency of GIGA+: File System Directories with Millions of Files
The authors examine the problem of scalable file system directories, motivated by data-intensive applications requiring millions to billions of small files to be ingested in a single directory at rates of hundreds of thousands of file creates every second. They introduce a POSIX-compliant scalable directory design, GIGA+, which distributes directory entries over a cluster of server nodes. For scalability, each server makes only local, independent decisions about migration for load balancing. GIGA+ uses two internal implementation tenets, asynchrony and eventual consistency, to: Partition an index among all servers without synchronization or serialization, and gracefully tolerate stale index state at the clients. Applications, however, are provided traditional strong synchronous consistency semantics.