To remain relevant, companies are going to need to master the intricacies of 'fast data.' However, there are some challenges to its adoption.
App servers are dead. Long live containers and microservices!
This may seem obvious to those clued in to the container revolution, but wanting to embrace microservices and actually doing it are two very different things. That's just one message that resounds from a new Lightbend developer survey that shows JVM developers are increasingly dropping monolithic app server architectures and embracing container technologies and microservices for distributed applications—with a growing emphasis on real-time data streams.
This convergence of such "fast data" and microservices means that applications are no longer merely calling data stores, and instead those data streams are being built into the applications.
It sounds exciting. It also sounds hard, and is a major opportunity and challenge for developers.
Too fast, too furious
The most exciting use cases that enterprises are chasing today—real-time personalization, IoT, machine learning, AI, you name it—have a real-time data component. It's really a different way of looking at data compared to the old world where developers collect data then analyze later. In these new use cases, enterprises face never-ending data streams, so they need always-on data processing, and that has profound implications for the application stack.
SEE: IT leader's guide to big data security (Tech Pro Research)
While new to most enterprises, dealing with fast data is deep in the Lightbend DNA. Not only does the company have years of experience working with real-time stream processing engines like Spark, Kafka, and Flink natively but Martin Odersky, one of its co-founders, saw this trend more than a decade ago when he created Scala. Odersky saw the need to do real-time natively with multicores and highly parallelized applications, which are always on and properly reactive (resilient, responsive, and elastic) when traffic spikes unpredictably or other problems occur.
One of the things they saw early on, Lightbend CEO Mark Brewer told me, is that "fast data systems need to behave like Reactive microservice-based systems." This is different from the core characteristics of big data systems, so data engineers need to learn the skills of the microservices developers, especially when they need to put their systems into production and run them at scale.
Conversely, traditional applications need to successfully process large volumes of streaming data, so application developers need to get better at handling this data and accounting for various volume, latency, transformation, and integration requirements.
Indeed, according to the Lightbend survey, 60% of senior management can effectively link strategic value to projects when data is in motion. But getting to the point where they're capturing that value can be tricky.
Learning to fly
The good and bad news is that most companies are nowhere near figuring this out. Lightbend's latest survey shows the fast data phenomenon is still fairly nascent, with enterprises at various points on the adoption curve. Even so, developers surveyed said 90% of their data processing workloads include a real-time component. Importantly, this need for speed increases as use cases climb the maturity curve. Rather than batch versus streaming, in other words, enterprises will need batch and streaming to succeed with fast data.
To figure out how to know when an organization is ready to embrace real-time data, I asked Brewer which qualities characterize an organization that could benefit from such fast data adoption.
There are two categories of companies ready for fast data, he said, with a third category emerging.
The first is the ones used to batch processing and using it for use cases like personalization, anomaly detection, better shopping experiences, targeted ads, and so on. "Now," Brewer said, "based on ready access to cheap, powerful computing, they realize they should be extracting this knowledge in real-time, not later, after its value has started to decay." Such enterprises are ready for real-time data processing or, to put it simply, faster ETL.
SEE: How to build a successful data scientist career (free PDF)
The other category is enterprises coming from conventional microservices environments, which find that their businesses are facing ever-greater volumes of data that's being generated and collected in real-time. In this case, they simply don't know how to process the data as quickly as possible before it's sent downstream to data warehouses or consumed by users in real-time on a mobile app.
Lastly, he said, there is also an emerging sub-category of machine learning in a streaming context. These are use cases where a company wants to train models in real time, so that responses to customers' immediate needs are as smart as possible. There are new operational challenges training models and using them in a streaming context.
All of which makes sense in light of the survey data showing that 55% of developers say they are choosing new frameworks and languages based on fast data requirements...yet need help figuring out the right tooling to assist them. Just because a company needs to embrace real-time data doesn't mean they're set up to do so.
Fortunately, just as Cloudera, Hortonworks, and MapR sprouted up to make Hadoop easier to use with improved tooling so, too, are a host of fast data-focused companies like Lightbend emerging to help companies take advantage of their real-time data needs. The market is still new, but shows impressive promise.
- How one e-commerce giant uses microservices and open source to scale like crazy (TechRepublic)
- Why your company's big data is not nearly fast enough (TechRepublic)
- How The New York Times uses reactive programming tools like Scala to scale (TechRepublic)
- How Credit Karma uses Akka to manage big data at scale (TechRepublic)
- Data science demands elastic infrastructure (TechRepublic)