Combining event streaming with serverless computing often produces a cost-effective solution for handling streaming data that significantly reduces the complexity of managing and maintaining infrastructure. This synergy allows developers to focus more on application logic and less on underlying operational concerns, leading to faster development.
Not so long ago, serverless event streaming meant using an event streaming platform and a stream processing engine (managed by a vendor or in house), complemented with Function as a Service (FaaS) technology where appropriate (such as with short-lived, stateless workloads). Perhaps it’s generous to call such a setup “serverless” considering that FaaS is the only serverless component.
However, due to advances in serverless technologies, we no longer rely exclusively on FaaS. Other alternatives, such as serverless Container as a Service (CaaS) tools, are increasingly used as a foundation for event streaming use cases.
Current State of Serverless Computing
Serverless computing is on an upward trajectory. According to Datadog’s 2023 “The State of Serverless” report, all major cloud providers are seeing significant serverless adoption:
“Over the past year, serverless adoption for organizations running in Azure and Google Cloud grew by 6% and 7%, respectively, with AWS seeing a 3% growth rate. Over 70% of our AWS customers and 60% of Google Cloud customers currently use one or more serverless solutions, with Azure following closely at 49%.”
— The State of Serverless, Datadog, 2023
The growing popularity of serverless is understandable. Numerous organizations spanning every industry are embracing serverless computing, seduced by the promises of cost efficiency, scalability on demand, reduced operational overhead and faster time to market.
The adoption of serverless is also fueled by the emergence of a diverse tooling ecosystem. In addition to FaaS (such as AWS Lambda, Microsoft Azure Functions and Google Cloud Functions), the serverless landscape has expanded to a much broader range of services and capabilities, including:
- Serverless application platforms, e.g., Netlify and Vercel.
- Serverless databases such as MongoDB Atlas, FaunaDB and InfluxDB Cloud.
- Serverless API management platforms, including AWS API Gateway and Azure API Management.
- Serverless frameworks like Zappa, Serverless Framework, Claudia.js and Ruby on Jets.
- Serverless CaaS solutions, e.g., AWS Fargate and Knative.
Serverless Approaches: FaaS vs. CaaS
Among the alternatives to FaaS, serverless CaaS is quickly growing in prominence. Datadog’s 2022 “The State of Serverless” report shows that in 2022, Google Cloud Run was the fastest-growing method for deploying serverless applications in Google Cloud. The 2023 report indicates that serverless CaaS adoption has continued to intensify across all major cloud providers.
The rise of serverless CaaS is not a surprise, since it brings more flexibility and removes some of the shortcomings of FaaS:
Criteria | FaaS | CaaS |
Latency | Can be unpredictable and slow due to frequent cold starts. This can be problematic for applications requiring real-time responsiveness. | Starting a container can take more time than initiating a lightweight function. However, containers are long-lived. So, once a container is started, you won’t experience cold-start issues. This leads to lower and more predictable latencies overall. |
State | Typically stateless, which is a limitation for applications requiring stateful behavior (e.g., windowing, aggregation). | CaaS generally supports both stateful and stateless applications. |
Runtime and portability | FaaS offerings are often tied to specific cloud providers. Each provider has its own runtime environment and restrictions, which can affect FaaS portability. | Superior portability due to the use of containers. These containers can be moved across environments (e.g., different cloud providers, on premises) with minimal changes, if any. |
Types of workloads | Short-lived, infrequent event-driven workloads. Medium throughput. | Long-running, continuous workloads. High throughput. |
Execution | With most FaaS solutions, each instance handles just one request at a time. This can be a significant limitation for high-traffic applications. | Capable of handling multiple requests concurrently, offering more efficient resource usage and better performance for high-traffic applications. |
These differences between FaaS and CaaS are especially relevant in the context of event streaming applications. Overall, the CaaS model is a more reliable, versatile and suitable approach for handling high-frequency data streams.
Current State of Event Streaming
Event streaming (or data streaming) has become an integral part of modern architectures, enabling organizations to collect, process, store and analyze data in real time. Per Confluent’s “2023 Data Streaming Report,” data streaming is high on the IT investment agenda:
“89% of respondents say investments in data streaming are important, with 44% citing it as a top strategic priority.”
— Data Streaming Report, Confluent, 2023
Confluent’s report indicates that adopting data streaming technologies leads to positive business outcomes, such as increased efficiency and profitability, improved responsiveness, enhanced customer experience and faster operational decision making.
Organizations looking to embrace data streaming have plenty of solutions to choose from. Due to its proven reliability, scalability, high performance and rich ecosystem, Apache Kafka is usually the first name that comes to mind. But it’s not the only option. Other notable event streaming platforms include Amazon Kinesis, Google Cloud Pub/Sub, Apache Pulsar and Azure Event Hubs. If you’re curious about how Kafka compares to some of these alternatives, check out our comparisons of Kafka versus Pulsar, Kafka versus Redpanda and Kafka versus Kinesis.
Complementing event streaming platforms are a variety of stream processing technologies, such as Apache Flink, Apache Storm, Apache Samza, Apache Beam, Kafka Streams, ksqlDB and Faust, each with its own strengths. For example, Beam provides a single, unified API for handling both batch and streaming data, while ksqlDB streamlines the development of streaming applications that rely only on SQL queries.
Event streaming is, without a doubt, here to stay and continues to grow in importance. That being said, streaming data can be hard to handle. Most streaming technologies available today are difficult to use, and managing a streaming architecture in house is not for the faint of heart or thin of wallet. For instance, I touched on the many challenges of hosting and managing Kafka in a previous article; give it a read to get a feel of what’s involved.
The Intersection of Serverless and Event Streaming
In 2019, Neil Avery (former technologist in the office of the CTO at Confluent) published a blog post analyzing the relationship between event streaming and serverless computing. Neil’s post discusses how FaaS fits in with event streaming. The focus on FaaS makes sense, considering that FaaS was then the predominant form of serverless computing. Neil’s article is a good read, showing how FaaS can be used to complement event streaming, as well as its limitations, such as cold starts and unsuitability for stateful stream processing.
Fast forward to 2023. Due to recent technical advancements, there is better and tighter synergy between serverless and event streaming, which goes well beyond FaaS. Here are some of the emerging tools and trends that combine serverless computing (other than FaaS) with event streaming.
Serverless Stream Processing
Traditional stream processing usually involves an architecture with many moving parts managing distributed infrastructure and using a complex stream processing engine. For instance, Apache Spark, one of the most popular processing engines, is notoriously difficult to deploy, manage, tune and debug (read more about the good, bad and ugly of using Spark). Implementing a reliable, scalable stream processing capability can take anywhere between a few days and a few weeks, depending on the use case. On top of that, you also need to deal with continuous monitoring, maintenance and optimization. You may even need a dedicated team to handle this overhead. All in all, traditional stream processing is challenging, expensive and time consuming.
In contrast, serverless stream processing eliminates the headache of managing a complex architecture and the underlying infrastructure. It’s also more cost effective, since you pay only for the resources you use. It’s natural that serverless stream processing solutions have started to appear. One example is Spark on Google Cloud. Google claims this is the industry’s first autoscaling serverless Spark, which completely removes manual infrastructure provisioning and tuning.
I mentioned that CaaS is on the rise as a serverless approach. Generally speaking, serverless CaaS stream processing solutions have the following characteristics:
- Predictable low latency, with minimal processing delays.
- High throughput (up to thousands or millions of events per second).
- Suitable for both stateless and stateful processing workloads.
- Adequate for real-time data processing, as well as batch processing.
- Best suited for long-running, computationally intensive processes or processes with variable or unpredictable workloads.
- Capable of handling multiple data processing tasks simultaneously (concurrency).
- Have no server infrastructure to provision, maintain or scale.
Bytewax is one example of a stream processing technology that can be leveraged with the serverless CaaS model. Bytewax is an open source Python library and distributed stream processing engine for building streaming data pipelines. Among other options, you can run Bytewax dataflows using containers. This means that you could, for example, run Bytewax data flows on Amazon Elastic Kubernetes Service (EKS) or Amazon Elastic Container Service (ECS). And then you could deploy those containers to AWS Fargate, Amazon’s serverless compute engine. This way, you’d benefit from a serverless stream processing capability without needing to provision, configure or scale clusters of servers for containers.
Quix Streams is another open source Python stream-processing library that abstracts away the complexities of developing streaming applications and processing data in real time. Being cloud native, it can be deployed to any Kubernetes cluster. It can also be paired with Quix Cloud, which falls in the serverless CaaS category. Under the hood, Quix Cloud is a fully managed platform that uses Kafka, Docker, Git, containerized microservices and a serverless compute environment for hosting streaming applications. The intent is to enable developers to build, deploy and monitor applications while eliminating the operational overhead of configuring, managing and scaling containers and infrastructure.
For example, CKDelta, an AI software business, uses Quix’s serverless stream processing capabilities. CKDelta uses Quix to deliver an event streaming app that uses machine learning to process 40GB of Wi-Fi data per day from 180 underground train stations in Singapore. Specifically, the app continuously collects high-throughput data and performs predictive analytics to forecast crowd density at train stations.
If you’re curious about other types of serverless event streaming apps you can build with Quix, have a look at these interactive templates.
Serverless Message Brokers
Going beyond serverless stream processing, serverless message brokers are emerging. One example is Amazon MSK Serverless, a new cluster type for Amazon MSK. While regular MSK requires manual setup and management of Kafka clusters and charges for provisioned capacity (regardless of usage), MSK Serverless automatically manages and scales Kafka infrastructure based on demand, charging for actual usage.
Apache EventMesh is another example of a serverless event-based middleware. Born at WeBank, EventMesh is now a Top-Level Project at the Apache Software Foundation. While still in its infancy, EventMesh already has nearly 1,500 stars and almost 600 forks on GitHub, which is an encouraging sign. It will be interesting to see how EventMesh evolves and if similar projects appear.
Conclusion
Event streaming has become a mainstay of modern software architectures. Meanwhile, serverless computing has made impressive progress over the past few years; gone are the days when FaaS was pretty much the only expression of serverless.
Considering how hard it is to deal with event streams and that serverless computing massively simplifies the process of extracting value from streaming data, it’s no surprise to see serverless event streaming solutions emerging (or organizations adopting them). Such tools generally come with a friendly pricing model (you pay only for what you use), and enable businesses to collect and process data streams in real time without having to think about the underlying infrastructure and capacity planning.
A rising trend today is combining serverless CaaS and stream processing. Serverless CaaS brings together the scalability and flexibility of containerization with the simplicity and cost-efficiency of serverless architectures. It’s a strong foundation for handling dynamic, high-volume, high-frequency data streams, so I look forward to seeing more contenders in this space.
The post State of Serverless Computing and Event Streaming in 2024 appeared first on The New Stack.
Leave a Reply