Overview of Amazon Kinesis Services – Data Streams, Firehose, Analytics, Video Streams
What is AWS Kinesis?
Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information.
Amazon Kinesis is portfolio of AWS services which will be utilized for Data Streaming and Real-Time analytics use cases. This will help to Ingest, process, and analyze high volumes of high-velocity data from a variety of sources in real time
Following is high level overview of a Streaming analytics workflow
Following are AWS Kinesis Service offerings.
Amazon Kinesis Data Streams
Amazon Kinesis Data Streams is a serverless streaming data service that makes it easy to capture, process, and store data streams at any scale. Following is high level workflow
- Kinesis Data Streams have two capacity modes
- On-Demand which is suited for unpredictable workloads. This will have maximum Write capacity of 200 MB/s and 200,000 records/second, Read Capacity of 400 MB/s
- Provisioned which is suited for reliable estimation workloads. This will have Write capacity of 1 MB/s and 1000 records/second, Read Capacity of 2 MB/s
- Data retention period in stream is 24 hrs. and can be increased up to 7 days
- Consumers of the streams will be Kinesis data analytics, fire hose, EMR Spark, EC2 instances , AWS Lambda.
- Shared fan-out consumers all share a shard’s 2 MB/second of read throughput and five transactions per second limits. Enhanced fan-out consumer gets its own 2 MB/second allotment of read throughput, allowing multiple consumers to read data in parallel
- Data in Kinesis Data streams is encrypted both in transit and at rest.
- Key Features of Kinesis Data Streams – preserve client ordering FIFO, consume in parallel, collection and processing is decoupled
Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose is an extract, transform, and load (ETL) service that reliably captures, transforms, and delivers streaming data to data lakes, data stores, and analytics services. Amazon Kinesis Firehose is an Streaming ETL service. Following is high level workflow
- Capture, transform, and load streaming data into Amazon S3, Amazon Redshift, Amazon OpenSearch Service, and Splunk
- Kinesis Data Firehose supports built-in data format conversion from data raw or Json into formats like Apache Parquet and Apache ORC
- Each delivery stream can intake up to 2,000 transactions/second, 5,000 records/second, and 5 MB/second
- Kinesis Firehose Limits
- Maximum Record size is 1 MB
- Buffer interval hints range from 60 seconds to 900 seconds
- Buffer sizes hints range from 1 MB to 128 MB for Amazon S3,1 MB to 100 MB for OpenSearch,1 MB and 3 MB for Lambda
- During Data Delivery Failures
- For S3, Amazon Kinesis Data Firehose will retry to deliver data every 5 seconds for up to a maximum period of 24 hours.
- For Redshift, retries data delivery every 5 minutes for up to a maximum period of 120 minutes.
- For OpenSearch, you can specify a retry duration between 0 and 7200 seconds
- Kinesis Firehose transformation Comparision
Amazon Kinesis Data Analytics
Amazon Kinesis Data Analytics is the easiest way to transform and analyze streaming data in real time using Apache Flink. Apache Flink is an open source framework and engine for processing data streams. Following is high level workflow
- Kinesis Analytics is recommended in Streaming ETL, Time-series analytics, Interactive analysis of data streams and Continuous metric generation applications.
- Amazon Kinesis Data Analytics Studio provide a single-interface development experience for developing, debugging code, and running stream processing applications.
- Kinesis Data Analytics supports standard ANSI SQL. Integrates with AWS Glue Data Catalog
- Kinesis Data Analytics provisions capacity in the form of Amazon Kinesis Processing Units (KPU). One KPU provides you with 1 vCPU and 4GB memory.
- Kinesis Data Analytics for SQL supports up to three destinations per application. You can persist SQL results to Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service (through Amazon Kinesis Data Firehose), and Amazon Kinesis Data Streams.
- Stagger Windows – A query that aggregates data using keyed time-based windows that open as data arrives. The keys allow for multiple overlapping windows. It is well suited for any time-series analytics use case
- Tumbling Windows (Aggregations Using GROUP BY) – A query that aggregates data using distinct time-based windows that open and close at regular intervals.
- Sliding Windows – A query that aggregates data continuously, using a fixed time or rowcount interval.
Amazon Kinesis Video Streams
Amazon Kinesis Video Streams makes it easy to securely stream video from connected devices to AWS for analytics, machine learning (ML), playback, and other processing. Following is high level workflow
- Amazon Kinesis Video Streams is fully managed, so there is no infrastructure to manage.
- Capture, process, and store media streams for playback, analytics, and machine learning.
- Build applications with ultra-low latency live streaming and two-way real-time communication.
- Amazon Kinesis Video Streams supports HTTP Live Streaming (HLS) and Dynamic Adaptive Streaming over HTTP (DASH) to enable live and on-demand playback of video ingested from devices on any browser or mobile app.
- Automatic data encryption in transit and at rest
- Supports Time-encoded data. Time-encoded data is any data in which the records are in a time series, and each record is related to its previous and next records.
- Common use cases for Kinesis Video Streams – Smart Home, Smart City, Industrial Automation
- You can publish media data to a Kinesis video stream via the PutMedia operation. Kinesis Video Streams provides a PutMedia API to write media data to a Kinesis video stream.
- You can use the GetMedia API to retrieve media content from a Kinesis video stream.
Amazon Kinesis Pricing
Amazon Kinesis FAQs
- What is a shard in Amazon Kinesis?
- A shard has a sequence of data records in a stream. A shard supports 1 MB/second and 1,000 records per second for writes and 2 MB/second for reads.
- What is a record in Amazon Kinesis?
- A record is the unit of data stored in an Amazon Kinesis data stream. A record is composed of a sequence number, partition key, and data blob. Maximum size of a data blob is 1 MB
- What is Enhanced fan-out?
- Enhanced fan-out is an optional feature for Kinesis Data Streams consumers that provides logical 2 MB/second throughput pipes between consumers and shards.
- When to chose Kinesis Data streams Provisioned capacity mode vs on-demand capacity mode?
|New streams with unknown traffic
|Traffic is consistent
|Prefer the ease of hands-free management
|When you want to have tight control of shards
|Lower cost of ownership
|Limited to 200 MB/s of write and 400 MB/s of
|No capacity limits for a provisioned stream
Amazon Kinesis Pricing
Amazon Kinesis Pricing Varies depending on type of Service. To learn more information about Amazon Kinesis Pricing, refer to our post – http://www.cloudinfonow.com/amazon-kinesis-pricing/