In 2010, then-Google CEO Eric Schmidt claimed that five exabytes of information was created between the dawn of civilization through 2003 — and that now, five exabytes of data is generated "every two days," though that statistic probably falls short of reality. As a corollary, in November 2014, Mark Zuckerberg said that "If you fast-forward five years, probably most of [Facebook] is going to be video." However, the data being generated today is not equal to the first five exabytes of recorded history.
The key difference in this data is value. As it is, a substantive amount of video is created from people using smartphones to record video of fireworks, music concerts, or other public events, with the resulting video being of so low quality as to be nearly unwatchable. Combining that with the access restrictions that come with privacy protections of social media — friends-only viewing privileges — that the result is data that needs to be stored, but accessed only rarely, if ever.
For data center engineers, this prompts a challenging question: In what way should infrequently accessed data such as social media videos, system backups, or old system logs be stored? How can you store this data in a cost-efficient and physical space-saving way? This month, two potential solutions to that question have been introduced.
World's first 10 TB drive coming later this year
HGST's enterprise drives are quite a bit different from drives available from other vendors, as its drives are hermetically sealed, allowing for more platters to be included in a standard 3.5" form factor than air-filled drives. HGST's confidence in this technology is quite high, as it announced in September 2014 that all future enterprise drives will use the HelioSeal technology. The next generation of HelioSeal drives combines this design with a new recording method: Shingled Magnetic Recording, or SMR.
SMR allows data to be written on partially-overlapping tracks. This overcomes a previous limitation of hard drives: platter densities have become so high that the magnetic write head cannot be shrunk further without losing the ability to flip the magnetization of a bit. SMR overcomes this by compacting and layering the tracks, which can be read by normal means. However, writing to tracks takes a performance hit — multiple tracks have to be written at once to prevent data loss. As such, SMR technology is much better suited to continuous writing or erasing than random writes or updates — perfect for write-once binary blobs like video, but not optimal for high-rewrite tasks, like SQL databases.
HGST's 10 TB drive that is planned for release in the second half of 2015 is not the first SMR drive on the market, but it has substantive benefits over the largest comparable drive from Seagate. Aside from the HGST drive being 2 TB larger than the Seagate alternative, it spins at 7200 RPM, rather than the 5900 of the Seagate model. HGST also indicates that HelioSeal drives run 4-5°C cooler than other drives, and claim a 2.5 million hour MTBF, as opposed to the 800,000 hour rating by Seagate for its SMR drives (PDF). In addition, Seagate's warranty is three years, while HGST provides a five-year warranty for HelioSeal drives.
Google Cloud Nearline beta bests Amazon Glacier
This month, Google has opened the beta of Nearline Storage in the Google Cloud Platform. Nearline is a new storage class that has slightly lower availability and slightly higher latency than standard storage, but at a much cheaper rate — Nearline Storage is $0.01 GB/month, whereas standard storage is $0.026 GB/month. Nearline operates using the same APIs and models as the rest of Cloud Platform storage, and converting existing buckets to Nearline is a trivial operation.
Nearline is positioned as a pseudo-competitor to Amazon's Glacier service, but with crucial differences. Glacier data storage has a retrieval time measured in hours, as opposed to about three seconds with Nearline. Glacier also has steep penalties — $0.03 per GB — for data deleted before 90 days has passed, whereas Nearline bills usage for a full 30 days, even if data is deleted before that time.
Amazon allows Glacier storage users to retrieve up to "5% of your average monthly storage, pro-rated daily, for free each month," but the rates for users who want their data quickly, and above the free tier, should prepare to pay, according to this labyrinthine and confusing formula. Google's billing is much more streamlined, with egress rates worldwide (excluding China and Australia) at $0.12 per GB up to the first TB, with lower rates after.
How can this be leveraged?
For the example scenario of storing and retrieving video, some creative programming is needed to use Nearline without any degradation in performance for the end user. For this use case, keeping the first few seconds of the video in standard, high availability cloud storage and switching the video source to Nearline copy would be possible, though in practice this would generate a second GET request, which —in aggregate — would increase the bills. Though, at the reduced storage rates, the difference should be minimal enough to create a savings for your cloud deployment.
What's your view?
What applications can you think of for these new storage and cloud solutions? Are the reduced costs attractive enough to migrate some of your data to these solutions? Tell us your thoughts in the comments.
- 10-terabyte hard drive coming soon to a server near you (ZDNet)
- HGST goes all in on helium drives (ZDNet)
- Google Cloud Platform adds new on-demand data storage option (ZDNet)
- Amazon launches Glacier cloud storage, hopes enterprise will go cold on tape use (ZDNet)
Note: TechRepublic and ZDNet are CBS Interactive properties.
James Sanders is a Java programmer specializing in software as a service and thin client design, and virtualizing legacy programs for modern hardware.