It takes a technical expert to keep the time on thousands of servers closely aligned with each other and with authoritative time sources. FSMLabs automates this process with TimeKeeper.
Timekeeping is an essential ingredient in cloud. A time distribution network is as important to cloud computing as computing, storage, and network. Why? The timestamp.
What has a timestamp got to do with data integrity? Yodaiken explained how workloads no longer run on a single computer. "You can't just use one computer -- you have to use multiple." This means a single transaction has to be synchronized across many computers. "If their times drift off from each other, trying to relate what they've done to the same record is going to be very difficult."
Making sense of many transactions across many computers requires accurate timekeeping. "One of the big uses of our technology in financial firms is distributed trading platforms. They have multiple gateways to multiple exchanges. They want to be able to put together a sequential order book -- they want to know what they did during the day. If you don't have accurate time, everything is going to look very weird."
Yodaiken gave an example of "two computers together in a rack, six inches away from each other. One sends out an order to an exchange, and a confirmation comes back, which is picked up by the second one. If their times are microseconds out, you can easily conclude you did the trade backwards. You got the confirmation before you did the order, which is really going to mix up your trading algorithms."
What kind of technology provides the split-second time necessary to avoid these problems? What helps software synchronize distributed workloads?
Time hardware, software, and protocols
It takes a technical expert to keep the time on thousands of servers closely aligned with each other and with authoritative time sources. The time expert builds a distribution network from many hardware and software building blocks.
The authoritative sources of time are a collection of atomic clocks run by government agencies around the world, such as the NIST-F1 in Boulder, Colorado. A collection of hardware, software, and protocols distribute this time to the world's computers.
Enterprise computers run time clients, which talk to time servers using a protocol like Network Time Protocol (NTP) or Precision Time Protocol (PTP). Time clients regularly check with the time servers and update their own clocks. Time servers often get their time from horribly complicated radio technologies like Global Positioning System (GPS) and Code Division Multiple Access (CDMA).
The NTP has been synchronizing time across computers for decades. Most virtual machines (VMs) run an NTP client that regularly checks with a higher authority and resets the time to a sub-second accuracy. This is handy because a VM has a pretty shaky grip on reality -- if its host freezes the VM, it is unaware of time passing; when it springs back to life, its clock has fallen behind.
PTP is a new alternative to NTP. Its selling point is, while NTP can be accurate to milliseconds, PTP can be accurate to microseconds.
Yodaiken described how even sophisticated users can find their time distribution networks hard to manage. "One of our customers in New York is a large bank. They had -- supposedly -- a tier-1 service that they could never get to settle properly. They ran a time map [a timekeeper component] and got a picture. What the picture showed was that the server was pulling time from GPS. The GPS was failing occasionally and had a backup that was going across the Atlantic on a phone line."
FSMLabs has also seen problems with the NTP and PTP protocols. "There is a standard error that we see with both of these things, where the time comes down from the GPS satellite. It comes down in epoch time -- the time since some date -- the number of nanoseconds. To get it to real time you have to make adjustments for the leap seconds that have added over the years. There are currently 35 seconds since the epoch. What will happen is every now and then the devices that bring GPS time down and broadcast to the network will lose a little bit of memory, forget to compensate for the leap seconds and are 34 or 35 seconds off."
FSMLabs wants to automate the time expert
FSMLabs provides hardware and software for time distribution. Yodaiken described the three types of product in its system. "Client software goes into the application server and brings in the time over the network. Server software essentially bridges networks or brings in time from a GPS device or provides fault tolerance, all the rest of it. We also sell appliances which bring in GPS time. "
Yodaiken said their goal with TimeKeeper is to improve the quality of time distribution. In the traditional enterprise network, Yodaiken said, "You'd have some sort of cobbled-together improvised network that's not reliable and is really hard to maintain. We're trying to replace all that with these snap-together parts that you don't have to be an expert to use -- because all the expertise is in the part. It does the job, and it lets people work on something else."