Friday, May 5, 2017

ELK/EKK - AWS Implementation

This article is about ELK (buzz word now) implementation.

The ELK stack consists of Elasticsearch, Logstash, and Kibana.





Logstash is a tool for log data intake, processing, and output. This includes virtually any type of log that you manage: system logs, webserver logs, error logs, and app logs.
Here in this post, 'Logstash' will be replaced by 'AWS CloudWatch' and 'AWS Kinesis Firehose'.

Elasticsearch - Is a NoSQL database that is based on the Lucene search engine.  Is a popular open-source search and analytics engine. It is designed to be distributed across multiple nodes enabling work with large datasets. Handle use cases as : Log Analytics, Real-time application monitoring, Click Stream Analytics and Text Search
Here in this post, 'AWS Elasticsearch' Service  will be used for 'Elasticsearch' component.

Kibana is your log-data dashboard. It’s a stylish interface for visualizing logs and other time-stamped data.
Enable better grip on your large data stores with point-and-click pie charts, bar graphs, trendlines, maps and scatter plots.

First Implementation – ELK With CloudTrail/CloudWatch (as LogStash)

We’ll try to list few easy steps to do so:

-          Go to AWS Elastic Search
-          Create ES Domain – amelasticsearchdomain
o   Set Access Policy to Allow All/Your Id

-          Go to AWS CloudTrail Service
-          Create Cloud Trail - amElasticSearchCloudTrail
o   Create S3 Bucket – amelasticsearchbucket (Used to hold cloudtrail data)
o   Create CloudWatch Group - amElasticSearchCloudWatchGroup
o   In order to deliver CloudTrail events to CloudWatch Logs log group, CloudTrail will assume role with below two permissions
§  CreateLogStream: Create a CloudWatch Logs log stream in the CloudWatch Logs log group you specify
§  PutLogEvents: Deliver CloudTrail events to the CloudWatch Logs log stream

-          Go & Setup Cloud Watch,
-          Select Group and Then Action to Stream data to Elastic Search Domain
o   Create New Role - AM_lambda_elasticsearch_execution
o   Create Lambda (Automatically) LogsToElasticsearch_amelasticsearchdomain - CloudWatch Logs uses Lambda to deliver log data to Amazon Elasticsearch Service / Amazon Elasticsearch Service Cluster.

-          Go to Elastic Search
o   Hit Kibana link
o   On Kibana - Configure an index pattern


Second Implementation – ELK With AWS KinesisFirehose/CloudWatch (as LogStash)

We’ll try to list few easy steps to do so:

-          Go to AWS Elastic Search
-          Create ES Domain - amelasticsearchdomain
o   Set Access Policy to Allow All/Your Id
      
-          Create Kinesis Firehose Delivery Stream - amelasticsearchkinesisfirehosestream
o   Attach it to above ES Domain
o   Create Lambda (Optional)  - amelasticsearchkinesisfirehoselambda
o   Create S3 Bucket for Backup - amelasticsearchkinesisfirehosebucket
o   Create Role - am_kinesisfirehose_delivery_role

-          Create EC2 System - (To send log data to above configured Kinesis Firehose)
o   This will be using 1995 NASA Apache Log (http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html) to feed into Kinesis Firehose.
o   EC2 used the Amazon Kinesis Agent to flow data from my file system into my Firehose stream.
o   Amazon Kinesis Agent is a standalone Java software application that offers an easy way to collect and send data to Amazon Kinesis and to Firehose
               
- Steps:
       - Launch an EC2 Instance (t2.micro) running the Amazon Linux Amazon Machine Image (AMI)
       - Putty into instance/etc/aws-kinesis/agent
       - Install Kinesis Agent - sudo yum install –y aws-kinesis-agent
       - Go to directory - /etc/aws-kinesis/
       - Open file - nano agent.json
       - Make sure it has this data:
                       {
                         "cloudwatch.emitMetrics": true,
                         "firehose.endpoint": "https://firehose.us-east-1.amazonaws.com",

                         "flows": [
                                       {
                                         "filePattern": "/tmp/mylog.txt",
                                         "deliveryStream": "amelasticsearchkinesisfirehosestream",
                                         "initialPosition": "START_OF_FILE"
                                       }
                         ]
                       }
       - Now Download NASA access log file in your local desktop and Upload to S3
                       - URL - http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html
                       - File download - Jul 01 to Jul 31, ASCII format, 20.7 MB gzip compressed,
                       - Unzip and uplaod this file to any S3 bucket (other than any used above)
                       - Make sure file is Public
                      
       - Again go to EC2 Putty 
                       - Go to directory - /etc/aws-kinesis/
                       - Downlaod file from S3 - wget https://s3-us-west-1.amazonaws.com/arunm/access_log_Jul95
                       - Concatenate this file to mylog.txt - cat access_log_Jul95 >> /tmp/mylog.txt
                      
       -  Again go to EC2 Putty
                       - Come to root - cd ~
                       - Go to directory -  /var/log/aws-kinesis-agent/
                       - Monitor the agent’s log at /var/logs/aws-kinesis-agent/aws-kinesis-agent.log.
                       - Open file - nano aws-kinesis-agent.log
                       - You’ll find log lines like : 2017-03-01 21:46:38.476+0000 ip-10-0-0-55 (Agent.MetricsEmitter RUNNING) com.amazon.kinesis.streaming.agent.Agent [INFO] Agent: Progress: 1891715 records parsed (205242369 bytes), and 1891715 records sent successfully to destinations. Uptime: 630024ms

-          Create Kibana ( To Visualize data)                    
o   Go to AWS Elasticsearch
o   Click on link to Kibana
o   The first thing you need to do is configure an index pattern. Use the index root you set when you created the Firehose stream (in our case, logs*).
o   Kibana should recognize the logs indexes and let you set the Time-field name value. Firehose provides two possibilities:
§  @timestamp – the time as recorded in the file
§  @timestamp_utc – available when time zone information is present in the log data
o   Choose either one, and you should see a summary of the fields detected.
o   Select the discover tab, and you see a graph of events by time along with some expandable details for each event.
o   As we are using the NASA dataset, we get a message that there are no results. That’s because the data is way back in 1995.
o   Expand the time selector in the top right of the Kibana dashboard and choose an absolute time. Pick a start of June 30, 1995, and an end of August 1, 1995. You’ll see something like this.



Hope this helps.

Regards,
Arun Manglick