Tel: +44 (0)1786 430076 email: info@objectiveassociates.co.uk
With the volume of events that need monitored increasing daily there has to be a better way than batch processing.
AWS Kinesis provides realtime capability to monitor data streams continuosly, giving you the capability to instantly detect and analyse important events over a moving 24 hour data window.
+44 (0) 1786 430076 info@objectiveassociates.co.uk
Talk to us today to discuss how you can be reacting faster and reacting better to your live data sources.
Amazon are careful not to use the phrase Event Stream Processing in their material, but in simple terms that is what it is.
Each data element that passes through your data stream can be queried and analysed in combination with up to the previous 24 hours worth of data (longer if really required; up to 7 days).
It is a managed service, as AWS Lambda is, allowing you to use it as required and to scale up and scale down as loads demand.
AWS Kinesis processes data streams for further processing and storage downstream.
Data Producers feed Kinesis with the data to be dealt with, it could be from the clicks on a website or from log and transaction files, basically anywhere.
You can have any number of Kinesis applications running concurrently on one or more streams. Perhaps one is dealing with dashboard visualisation, while another is pre-processing data for subsequent storage.
There are a few basic terms we can quickly cover here to get you started.
A Shard: This is a uniquely identified group of data records in your stream, and is basically how you buy compute power. The shard unit supports 1MB/sec of data input and 2MB/sec of data output. You add Shards to increase compute power and to Scale Up.
A Data record: This simply consists of a sequence number (allocated by Kinesis), a partition key (a routing code created by the Data Producer), and the actual data (called a data blob and up to 1MB in size).
AWS Kinesis is fully managed in the same way that AWS Lambda is. It provides an elastic, scalable infrastructure that has been designed specifically to manage huge volumes of data.
You can turn it on and off as you require and scale up and down by simply altering the number of Shards that you need.
It is secured in the same ways that AWS has at it's core, for instance using VPCs. And as you would expect you can monitor and control access.
The key to understanding Kinesis limitations is the Shard. Your stream is made up of a number of shards. Where each shard is a group of records.
Each shard can support up to 5 transactions per second for reads, a total data read rate of 2MB per second. And it supports up to 1,000 records per second for writes, up to a total data write rate of 1MB per second.
The more shards you have the bigger the data capacity of your stream can be. You can increase or decrease the number of shards allocated to your stream.
AWS Kinesis can provide solutions where you are required to process large amounts of incoming data in real (or near) realtime.
Typical use cases involve handling large amounts of clicks from a website; this could be to analyse behaviour or it could be more pressing activities such as managing the purchase of popular concert tickets.
Other use cases involve detecting unusual events or anomalies in a stream where immediate action is required. This could be to address a fraud risk for instance.
There are no end to the possible uses, even video analysis is possible with AWS Kinesis.
If you are familiar with standard SQL then you are already able to program solutions using AWS Kinesis.
Admittedly you will find some differences; you are after all analysing a moving target. Because the data stream is dynamic you'll find that there are specific ways to access data within a timeframe or a fixed set of records. But importantly, you are not learning a new programming language.
You build your queries within the Amazon Kinesis environment. Allow you to build and test as you go.
Yes, you can have multiple Kinesis applications running against the one data stream, accessing the same data.
This gives you a powerful mechanism to address all of your business needs. You could be filtering data to detect unusual detects; updating a live dashboard for visualisation purposes and directing data to specific S3 locations. All at the same time.
All you need to consider is the number of shards that you will require.
As a core part of AWS, Kinesis is integrated with CloudWatch. This provides you with access to all the metrics you need to check on the behaviour of your data streams and the shards operating within them. While with AWS Cloudtrail you can monitor API calls and view log files.
And as you would expect you can control access to Kinesis through AWS IAM services, to ensure only those with the correct permissions can add data to your streams.
AWS Kinesis is priced based on compute power (Shard Hour), usage (PUT Payload Unit) and data retention (extendable, but 24 hours by default). Put simply the more Shard Hours you apply to your stream, and the more data you PUT into it then the more it will cost you.
To help reduce anxiety there is a price estimator available on the AWS site to give you a decent indication of the costs.
Amazon Kinesis is a fully managed service giving you the capability to run streaming applications with no admin overhead.
It gives you the power to process streaming data in realtime. Providing you with a valuable realtime view of your systems and your business world.
When you realise that you are consuming more data than you can analyse in a sensible timeframe then it's time to look at data stream processing with AWS Kinesis. But can you construct the data stream you need, can you build the real time queries to analyse the data, can you off load the query results to the correct downstream services, can you be sure that you can scale when required?
Call us to discuss how you can benefit from realtime data stream processing, we can define and implement the correct realtime data processing strategies for you.
Call us to start thinking in realtime.