Big Data

5 architectural principles for building big data systems on AWS

If your company is looking to make a bet on big data in the cloud, follow these best practices to find out what technologies will be best for your AWS deployment.

blueprints.jpg
Image: iStockphoto/jm1366

On Tuesday, at the 2016 AWS re:Invent conference, Siva Raghupathy, senior manager for solutions architecture at Amazon Web Services, shared five architectural principles that could make it easier for your business to build out a big data solution in the cloud.

Big data is all about the three Vs—volume, velocity, and variety of data. However, those Vs are increasing dramatically, Raghupathy said. And, as big data evolved from batch processing to stream processing to machine learning, the considerations for building a system become much more complex.

SEE: Big data policy template (Tech Pro Research)

When speaking with customers, Raghupathy said that the core challenges usually center around the questions of whether or not there is a reference architecture, what tool should be used, how it should be used, and why it should be used. To better address these questions, Raghupathy outlined the following five architectural principles for building big data solutions in the cloud. AWS customers can then apply these principles as they choose products to fit their use case.

1. Building decoupled systems is key

When Raghupathy talked about building decoupled systems, he gave the example of an automobile. The engine and wheels in a car are separate, but they are connected by the gearbox, which helps them work together.

In big data, though, that decoupling mechanism is the storage subsystem. Decoupled systems in big data are important because they allow you to alter one aspect of the system without affecting the other.

2. Use the right tool for the job

Amazon has many different products for big data systems, so it can be very difficult to choose the right one. However, Raghupathy said that data structure, latency, throughput, and access patterns are all considerations that should be taken into account when you choose a tool.

The data itself can be transactional (in-memory data structures, database records), files (log files, search documents, messages), or events (data streams and messages). Also, consider the volume of your data and how much it will be requested as well.

3. Leverage AWS managed services

Businesses should use AWS managed services, Raghupathy said, because they are scalable/elastic, available, reliable, secure, and low admin. They do the management tasks so the customer can focus on building their own products or tools to remain competitive and go to market faster. Ultimately, it adds business agility, Raghupathy said.

4. Use log-centric design patterns

With the low cost of storage, Raghupathy said, most organizations have no need to delete any of their data. This goes along with the big data adage: "Data is gold," Raghupathy said.

Because of this, organizations should build their big data system in a log-centric fashion. Immutable log files, which are protected from tampering, can help IT keep a record of the original data in case anything happens to their system.

5. Be cost conscious

Big data shouldn't mean big cost, Raghupathy said. When he is in a consultation, he usually doesn't let it go past 20 minutes before he begins calculating the cost of a solution that has been built.

Either the business will realize that they are right on track, or the bill will be too high. In the latter case, you'll need to consider some different products. However, Raghupathy said, the lower cost tools are often the most used.

Also see

About Conner Forrest

Conner Forrest is a Senior Editor for TechRepublic. He covers enterprise technology and is interested in the convergence of tech and culture.

Editor's Picks

Free Newsletters, In your Inbox