QFS improves performance of Hadoop file system

Open source file system by Quantcast

A new open source file system that takes up half the space and runs significantly faster than HDFS is now available for Hadoop thanks to a firm named Quantcast. Their Quantcast File System (QFS) is being released today under an Apache 2 license and is immediately available for free download on GitHub.

If you’re one of those grumblers (I admit to it) who complains about the widespread tracking of web users for marketing purposes, you can pause to thank Quantcast for funding this significant advance out of their own pockets as a big data company in the advertising space. They started using Hadoop when they launched in 2006, storing a terabyte of data on web audiences each day. Now, using QFS as their primary data store, they add 40 terabytes of new data and their daily Hadoop processing can exceed 20 petabytes.

As they grew, Quantcast tweaked and enhanced the various tools in the Hadoop chain. In 2008, they adopted the Kosmos File System (KFS) and hired its lead developer, Sriram Rao. After much upgrading for reliability, scalability, manageability, they are now releasing the file system to the public as QFS. They hope to see other large-scale Hadoop users evaluate and adopt it for their own big data processing needs and collaborate on its ongoing development. The source code is available on GitHub, as well as prebuilt binaries for several popular versions of Linux.

The key enhancement to QFS seemed simple in retrospect, but tricky to implement. Standard HDFS achieves fault-tolerance by storing three copies of each file; in contrast, QFS uses a technique called Reed-Solomon encoding, which has been in wide use since the 1980s in products such as CDs and DVDs.

According to Jim Kelly, vice president of R&D at Quantcast, HDFS’s optimization approach was well chosen when it was invented. Networks were relatively slow, so data locality was important, and HDFS tried to store a complete copy of each file on the node most likely to access it. But in intervening years, networks have grown tenfold in speed, leaving disks as the major performance bottleneck, so it’s now possible to achieve better performance, fault tolerance, and disk space efficiency by distributing data more widely.

The form of Reed-Solomon encoding used in QFS stores redundant data in 9 places and is able to reconstruct the file from any 6 of these stripes. Whereas HDFS could lose a file if the 3 disks hosting it happen to fail, QFS is more robust.

More importantly, Reed-Solomon adds only 50% to the size of the data stored, making it twice as efficient as HDFS in terms of storage space, which also has ripple effects in savings on servers, power, cooling, and more.

Furthermore, the technique increases performance: writes are faster, because only half as much data needs to be written, and reads are faster, because every read is done by six drives working in parallel. Quantcast’s benchmarks of Hadoop jobs using HDFS and QFS show a 47% performance improvement in reads over HDFS, and a 75% improvement in writes.

QFS is also a bit more efficient because it is written in C++ instead of Java. Hadoop uses existing JNI binds to communicate with it.

Quantcast expects QFS to be of most interest to established Hadoop shops processing enough data that cost-efficient use of hardware is a significant concern. Smaller environments, those new to Hadoop, or those needing specific HDFS features will probably find HDFS a better fit. They have done intensive testing internally, running QFS in production for over a year, so now it’s time to see how the code holds up in a wider public test.

tags: , , , , , , ,
  • BDD

    This is not entirely accurate depiction. It is true that Reed-Solomon will reduce the storage requirement but how will it improve the speed of data access ? Anyone who knows storage knows that the major delay associated with disks is the latency. When data is written or read from disks, this latency is the main drawback. By increasing the writes/reads to multiple disks, this problem doesn’t get solved. In fact when you apply Redd-Solomon encoder and save data to 9 different machines/disks, you are going to get disk latency/delay in accessing the disk block at all those 9 places. In case of standrad HDFS with replication factor of three, the write has to wait for disk latency delay at three places. If you do this write/read in parallel, the best you can hope for is the latency of disk read/write and so it is going to be same for any system with or without Reed-Solomon.

    Another disadvantage of the reed-soloman is that it is computationaly heavy and it would take extra CPU cycles to do the encoding on client side.
    So overall Reed-Solomon reduces data replication but doesn’t make the data access faster.

    • Charles

      That’s not what their performance tests are showing I’m afraid

      • BDD

        There has to be logical explanation for any performance gain. Otherwise the results are suspicious and testing method needs to be examined.

        • BJ

          That would be true if the data were stored on a single disk. However, since the 9 seeks occur on 9 different disks, the overall seek latency is the same. Each disk also needs to read less data, so the read goes faster as well. When working with data at this scale as well, there is a lot more sequential reading than seeking, so the increased speed of sequential reads is in fact quite noticeable.