If your company is looking to make a bet on big data in the cloud, follow these best practices to find out what technologies will be best for your AWS deployment.
On Tuesday, at the 2016 AWS re:Invent conference, Siva Raghupathy, senior manager for solutions architecture at Amazon Web Services, shared five architectural principles that could make it easier for your business to build out a big data solution in the cloud.
Big data is all about the three Vs—volume, velocity, and variety of data. However, those Vs are increasing dramatically, Raghupathy said. And, as big data evolved from batch processing to stream processing to machine learning, the considerations for building a system become much more complex.
SEE: Big data policy template (Tech Pro Research)
When speaking with customers, Raghupathy said that the core challenges usually center around the questions of whether or not there is a reference architecture, what tool should be used, how it should be used, and why it should be used. To better address these questions, Raghupathy outlined the following five architectural principles for building big data solutions in the cloud. AWS customers can then apply these principles as they choose products to fit their use case.
1. Building decoupled systems is key
When Raghupathy talked about building decoupled systems, he gave the example of an automobile. The engine and wheels in a car are separate, but they are connected by the gearbox, which helps them work together.
In big data, though, that decoupling mechanism is the storage subsystem. Decoupled systems in big data are important because they allow you to alter one aspect of the system without affecting the other.
2. Use the right tool for the job
Amazon has many different products for big data systems, so it can be very difficult to choose the right one. However, Raghupathy said that data structure, latency, throughput, and access patterns are all considerations that should be taken into account when you choose a tool.
The data itself can be transactional (in-memory data structures, database records), files (log files, search documents, messages), or events (data streams and messages). Also, consider the volume of your data and how much it will be requested as well.
3. Leverage AWS managed services
Businesses should use AWS managed services, Raghupathy said, because they are scalable/elastic, available, reliable, secure, and low admin. They do the management tasks so the customer can focus on building their own products or tools to remain competitive and go to market faster. Ultimately, it adds business agility, Raghupathy said.
4. Use log-centric design patterns
With the low cost of storage, Raghupathy said, most organizations have no need to delete any of their data. This goes along with the big data adage: "Data is gold," Raghupathy said.
Because of this, organizations should build their big data system in a log-centric fashion. Immutable log files, which are protected from tampering, can help IT keep a record of the original data in case anything happens to their system.
5. Be cost conscious
Big data shouldn't mean big cost, Raghupathy said. When he is in a consultation, he usually doesn't let it go past 20 minutes before he begins calculating the cost of a solution that has been built.
Either the business will realize that they are right on track, or the bill will be too high. In the latter case, you'll need to consider some different products. However, Raghupathy said, the lower cost tools are often the most used.
- The hottest new big data analytics jobs you need to know (TechRepublic)
- To the cloud, big data sisters and brothers, to the cloud (ZDNet)
- HPE delivers big data analytics for cloud-native apps with new Operations Bridge suite (TechRepublic)
- Getting results from big data analytics, without big upfront costs (ZDNet)
- Can cloud, big data, and AI save a sinking IBM? (TechRepublic)