8.6 Kinesis 101
What is streaming data
Streaming data is data that is generated continuously by thousands of data sources, which typically send in the data records simultaneously, and in small sizes (order of Kilobytes).
Purchases from online stores (amazon.com)
Stock prices.
Game data (as the gamer plays)
Social network data
Geospatial data (uber.com)
IOT sensor data
What is Kinesis
Amazon Kinesis is a platform on AWS to send your streaming data too. Kinesis makes it easy to load and analyze streaming data, and also providing the ability for you to build your own custom application for you business needs.
Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. You can configure hundreds of thousands of data producers to continuously put data into an Amazon Kinesis stream.
For example, data from website click streams, application logs, and social media feeds. Within less than a second, the data will be available for your Amazon Kinesis Applications to read and process from the stream.
Kinesis actually can be used to (when you should use it):
Consume big data
Stream large amounts of social media, news feeds logs, etc, into the cloud
Process large amounts of data:
Redshift for business intelligence
Elastic MapReduce for big data processing
Core Kinesis Services
Kinesis Streams
Kinesis Firehose
Kinesis Analytics
What is Kinesis Streams
All of the EC2 instances, mobile phones, laptops or IOT are streaming data sources, they send data to Kinesis Steams. The data will be stored for 24 hours by default, but you can increase this for 7 days. These data is actually stored in Shards. Once the data is stored in the shards, you could have a fleet of EC2 instances which are called Consumers and they take the data from the shards and do some calculation or processing on it and turn it into something useful (e.g. doing data aggregating, sentiment analysis on social media feeds, stock prediction algorithms, etc). Once the consumers done their calculations, they can then send the processed data to be stored (e.g. S3, DynamoDB, EMR, Redshift, RDS, etc).
Key parts of Kinesis:
Kinesis Streams consist of shards
Each shard can support up to 5 transactions per second for reads, up to a maximum total data read rate of 2 MB per second and up to 1,000 records per second for writes, up to a maximum total data write rate of 1 MB per second (including partition keys).
The data capacity of your stream is a function of the number of shards that you specify for the stream. The total capacity of the stream is the sum of the capacities of its shards.
What is Kinesis Firehose
All of the EC2 instances, mobile phones, laptops or IOT are streaming data sources, they send data to Kinesis Firehose. For Kinesis Firehose, you don't need to worry about shards, streams or manually adding in shards to keep up with your data, it is automated. You even don't need to worry about consumers go in and mining the data. You can query/analyze that data using Lambda in real time. Once that data has been analyzed, you can send it over directly to S3. In Kinesis Firehose, there is a automatic data retention window as well. In Kinesis Streams, you can retain data by default for 24 hours, but you can extend it up to 7 days, Firehose doesn't work like that. As soon as the data comes into Firehose, it is either analyzed using Lambda or it is sent directly to S3 or other locations. For example, you can send the data to S3 firstly and then copy the data into Redshift. You can also send the data to ElasticSearch Cluster and write the data into it.
Kinesis Analytics
When we have Kinesis Firehose and Kinesis Streams, the Kinesis Analytics allows you to run the SQL queries of that data as it exist within Firehose or Streams and then you can use that SQL queries to store that data in S3 or Redshift or ElasticSearch Cluster. It is a way of analyzing the data that is inside Kinesis using SQL languages, and you can use it both on Kinesis Streams and Kinesis Firehose.
Exam Tips
Know the difference between Kinesis Streams and Firehose. You will be given scenario questions and you must choose the most relevant service.
Understand what Kinesis Analytics is.
Last updated
Was this helpful?