Cloud Streaming Analytic Platforms Comparison

Cloud Streaming Analytics Platforms Comparision across AWS, Azure, GCP

In this blog post, we will compare the Cloud Streaming Analytics Platforms across AWS, Azure, GCP

What is Streaming data and analytics ?

Streaming data is data that is generated continuously by thousands of data sources, which typically send in the data records simultaneously, and in small sizes (order of Kilobytes). Streaming analytics is the process to ingest, analyze and act on Streaming data from streaming data sources in real time to quickly identify patterns and automate actions. 

Streaming Analytics Workflow

Following is high level Streaming Analytics Workflow

Data Streaming Workflow
  1. Data sources can be Mobile apps, application logs, click stream data, IOT sensors, Smart Devices
  2. Streaming Ingestion can be from multiple cloud vendor product clients, third party tools
  3. Stream Storage and Processing can be from multiple cloud vendor products
  4. Destination can be Data lakes, Data ware house , Databases

Cloud Streaming Analytic Platforms Comparison

Following chart shows various Cloud streaming analytic platforms available

VendorStream Ingestion & ProcessingStream Processing & AnalyticsStream Destination
AWSAmazon Kinesis Data Streams
Amazon Kinesis Data Firehose
Amazon Managed Streaming for Apache Kafka (Amazon MSK)
EMR Spark Streaming
AWS Lambda
AWS Glue streaming
Amazon Kinesis Data Analytics for SQL
Amazon Kinesis Data Analytics for Apache Flink

Amazon S3
Amazon Redshift
Amazon Elastic Search
AzureAzure Event Hubs
Azure HDInsight (Apache Kafka)
Azure IoT Hub
Azure Stream Analytics
Azure HDInsight (Apache Kafka, Apache Spark Streaming)
Azure Databricks Spark Streaming
Azure Functions
Azure Synapse
Azure Blobs
Azure SQL
Google Cloud Pub/Sub
Cloud Functions
Cloud Streaming Analytic Platforms Comparison

Following is high level overview of each Platform Service

  1. Amazon Kinesis Data Streams is a fully managed, serverless data streaming service that stores and ingests various streaming data in real time at any scale.
  2. Amazon Kinesis Data Firehose is an extract, transform, and load (ETL) service that reliably captures, transforms, and delivers streaming data to data lakes, data stores, and analytics services.
  3. Amazon MSK is a fully managed, secure, and highly available Apache Kafka service that makes it easy to ingest and process streaming data in real time
  4. Azure Event Hubs is a fully managed, real-time data ingestion service that’s simple, trusted, and scalable. 
  5. Azure HDInsight is a cloud distribution of Hadoop components.
  6. Azure Stream Analytics is a fully managed, real-time analytics service designed to help you analyze and process fast moving streams of data that can be used to get insights, build reports or trigger alerts and actions. 
  7. GCP Dataflow is Unified stream and batch data processing that’s serverless, fast, and cost-effective.
  8. GCP Dataproc is a fully managed and highly scalable service for running Apache Spark, Apache Flink, Presto, and 30+ open source tools and frameworks.