Wednesday, June 21, 2017

10 New AWS Cloud Services - Till Jun 2017

Glue: Data collection is the most challenging part of data science; when it comes to analysis. Glue, an Amazon initiative, runs Python based scripts which collect data and transform it into a usable format, before transporting it into the Cloud for further use. The Python layer can be used, even when you don’t know how to write the Python script. Glue is well equipped to run jobs without interruption. You can bid adieu to all manual efforts, once you have Glue doing all the work for you.

FPGA: FPGA or Field Programmable Gate Arrays have been used extensively by hardware designers to make their software ends meet. With Field Programmable Gate Arrays, you don’t need to worry about building custom masks or try and keep fitting multiple transistors into one piece of silicon. An FPGA is well designed to rewire itself to act like a real chip, while keeping in mind the software configurations.

Who needs such kind of technology? Bitcoin miners, who use cryptographically secure hash functions a gazillion times to speed up their searches can benefit extensively from the use of this technology. One can rent machines to run repetitive algorithms, which are further embedded into silicon, allowing programmers and software developers to ease out their work drastically.

Blox: As Docker gains a lot of publicity, Amazon comes to the rescue again, as it allows people to run Docker anywhere. Blox is designed in such a way so as to run the optimum number of instances; since it is event driven, one can write the logic in a simplistic manner. The best part is that one does not need to keep tracking the machines, to understand which applications are being run.

X-Ray: Loading job instances and managing efficiency has always been a tedious task for a lot of people. In order to ensure the smooth functioning of your cluster, the developer had to write elaborate code to track the progress at every step. This promoted a lot of people to purchase third party tools to do their bidding. However, Amazon’s X-Ray is a step ahead, since it offers to take care of all your work for you. In other words, it’s beating the competition and emerging as a clear winner, when it comes to watching your stack for you.

In simple words, as soon as your website gets the request for data, X-Ray has been enabled to trace the flow from your network of machines and services. Once the information is traced, the X-Ray suite aggregates the data from different zones and regions to create a one stop shop on a server. This way, the authorized personnel can access the data as and when required.

Rekognition: Rekognition is pretty much aimed at image work, since it allows your apps to do a lot more than just store images. It chews through images, while searching for objects. By using some of the best known and well tested algorithms, Rekognition has been able to perfect the art of data science. All you need to do is point the algorithm at the image (the image needs to be stored in the Amazon Cloud) and you will immediately get a list of objects you would like to choose. The payment is on a per image basis, which makes it even trendier.

Athena: Amazon’s Athena lets you run queries on their S3 model. Forget about writing extensive, complicated looping coding. With Athena, everything is done and ready for you. By using SQL based queries, Athena has been tuned to charge you per byte of information searched. Thankfully, the price is around $5 per TB, so it turns out to be rather cost effective for developers and companies alike.

Lambda@Edge: Amazon loves to invent and make the lives of developers easier. Lambda@Edge is exactly another app in Amazon’s fully functional kitty; it lets you push Node.js code out to the edges, where it runs and performs the desired tasks. Unlike other data, which is usually stored at a centralized data store on the server, waiting for the requests to pour in from different sources, Lambda@Edge believes in cloning data, to enhance efficiency. Once the data is enhanced, the results are delivered in microseconds, making it available to different users in a more time effective manner. You get billed only when the code is running, so you save on the cost of renting different machines to pull data.

Snowball Edge: Some people are extremely finicky, when it comes to getting their data in one place. Snowball Edge is the answer to the problems of all such people, who want everything in one place only. It enables data to be shipped anywhere one wants, with a shipping label. If you have data stored in the Amazon Cloud, and want it to be physically delivered somewhere, Amazon will copy it into a box and ship it to the desired location. Such is the convenience you get with Snowball Edge. This way, one prime purpose is served. Since downloading large blocks of data can often be slow and very time consuming, it becomes easier to have Amazon do the work for you.

Pinpoint: As your company grows, so does your network of customers’, subscribers, employees etc. There will be times when you would want to push some updates through messages. However, most of the times, such mails usually end up in the junk folder, which makes it highly difficult to track. With Pinpoint, targeted mailing is a beneficial feature for companies and businesses alike. As soon as your users are ready to receive the mails, Pinpoint will shoot out the messages, allowing everyone to receive the mails in their inboxes. Further on, once the messages are delivered, Pinpoint will help you collect and report data efficiently, so that your campaigns can be fine tuned as and when required.

Polly: Audio interfaces always have a better conversion rate than normal text based messages. For this very reason, it becomes imperative to use Polly, software which helps convert text messages into sound waves, which are easily understood by the human ears. By using the concept of speech synthesis, it converts text to sound, to make conversions easier and stronger.

Regards,
Arun

Friday, May 5, 2017

ELK/EKK - AWS Implementation

This article is about ELK (buzz word now) implementation.

The ELK stack consists of Elasticsearch, Logstash, and Kibana.

Logstash is a tool for log data intake, processing, and output. This includes virtually any type of log that you manage: system logs, webserver logs, error logs, and app logs.

Here in this post, 'Logstash' will be replaced by 'AWS CloudWatch' and 'AWS Kinesis Firehose'.

Elasticsearch - Is a NoSQL database that is based on the Lucene search engine. Is a popular open-source search and analytics engine. It is designed to be distributed across multiple nodes enabling work with large datasets. Handle use cases as : Log Analytics, Real-time application monitoring, Click Stream Analytics and Text Search

Here in this post, 'AWS Elasticsearch' Service will be used for 'Elasticsearch' component.

Kibana is your log-data dashboard. It’s a stylish interface for visualizing logs and other time-stamped data.

Enable better grip on your large data stores with point-and-click pie charts, bar graphs, trendlines, maps and scatter plots.

First Implementation – ELK With CloudTrail/CloudWatch (as LogStash)

We’ll try to list few easy steps to do so:

- Go to AWS Elastic Search

- Create ES Domain – amelasticsearchdomain

o Set Access Policy to Allow All/Your Id

- Go to AWS CloudTrail Service

- Create Cloud Trail - amElasticSearchCloudTrail

o Create S3 Bucket – amelasticsearchbucket (Used to hold cloudtrail data)

o Create CloudWatch Group - amElasticSearchCloudWatchGroup

o In order to deliver CloudTrail events to CloudWatch Logs log group, CloudTrail will assume role with below two permissions

§ CreateLogStream: Create a CloudWatch Logs log stream in the CloudWatch Logs log group you specify

§ PutLogEvents: Deliver CloudTrail events to the CloudWatch Logs log stream

- Go & Setup Cloud Watch,

- Select Group and Then Action to Stream data to Elastic Search Domain

o Create New Role - AM_lambda_elasticsearch_execution

o Create Lambda (Automatically) LogsToElasticsearch_amelasticsearchdomain - CloudWatch Logs uses Lambda to deliver log data to Amazon Elasticsearch Service / Amazon Elasticsearch Service Cluster.

- Go to Elastic Search

o Hit Kibana link

o On Kibana - Configure an index pattern

Second Implementation – ELK With AWS KinesisFirehose/CloudWatch (as LogStash)

We’ll try to list few easy steps to do so:

- Go to AWS Elastic Search

- Create ES Domain - amelasticsearchdomain

o Set Access Policy to Allow All/Your Id

- Create Kinesis Firehose Delivery Stream - amelasticsearchkinesisfirehosestream

o Attach it to above ES Domain

o Create Lambda (Optional) - amelasticsearchkinesisfirehoselambda

o Create S3 Bucket for Backup - amelasticsearchkinesisfirehosebucket

o Create Role - am_kinesisfirehose_delivery_role

- Create EC2 System - (To send log data to above configured Kinesis Firehose)

o This will be using 1995 NASA Apache Log (http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html) to feed into Kinesis Firehose.

o EC2 used the Amazon Kinesis Agent to flow data from my file system into my Firehose stream.

o Amazon Kinesis Agent is a standalone Java software application that offers an easy way to collect and send data to Amazon Kinesis and to Firehose

- Steps:

- Launch an EC2 Instance (t2.micro) running the Amazon Linux Amazon Machine Image (AMI)

- Putty into instance/etc/aws-kinesis/agent

- Install Kinesis Agent - sudo yum install –y aws-kinesis-agent

- Go to directory - /etc/aws-kinesis/

- Open file - nano agent.json

- Make sure it has this data:

{

"cloudwatch.emitMetrics": true,

"firehose.endpoint": "https://firehose.us-east-1.amazonaws.com",

"flows": [

{

"filePattern": "/tmp/mylog.txt",

"deliveryStream": "amelasticsearchkinesisfirehosestream",

"initialPosition": "START_OF_FILE"

}

]

}

- Now Download NASA access log file in your local desktop and Upload to S3

- URL - http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html

- File download - Jul 01 to Jul 31, ASCII format, 20.7 MB gzip compressed,

- Unzip and uplaod this file to any S3 bucket (other than any used above)

- Make sure file is Public

- Again go to EC2 Putty

- Go to directory - /etc/aws-kinesis/

- Downlaod file from S3 - wget https://s3-us-west-1.amazonaws.com/arunm/access_log_Jul95

- Concatenate this file to mylog.txt - cat access_log_Jul95 >> /tmp/mylog.txt

- Again go to EC2 Putty

- Come to root - cd ~

- Go to directory - /var/log/aws-kinesis-agent/

- Monitor the agent’s log at /var/logs/aws-kinesis-agent/aws-kinesis-agent.log.

- Open file - nano aws-kinesis-agent.log

- You’ll find log lines like : 2017-03-01 21:46:38.476+0000 ip-10-0-0-55 (Agent.MetricsEmitter RUNNING) com.amazon.kinesis.streaming.agent.Agent [INFO] Agent: Progress: 1891715 records parsed (205242369 bytes), and 1891715 records sent successfully to destinations. Uptime: 630024ms

- Create Kibana ( To Visualize data)

o Go to AWS Elasticsearch

o Click on link to Kibana

o The first thing you need to do is configure an index pattern. Use the index root you set when you created the Firehose stream (in our case, logs*).

o Kibana should recognize the logs indexes and let you set the Time-field name value. Firehose provides two possibilities:

§ @timestamp – the time as recorded in the file

§ @timestamp_utc – available when time zone information is present in the log data

o Choose either one, and you should see a summary of the fields detected.

o Select the discover tab, and you see a graph of events by time along with some expandable details for each event.

o As we are using the NASA dataset, we get a message that there are no results. That’s because the data is way back in 1995.

o Expand the time selector in the top right of the Kibana dashboard and choose an absolute time. Pick a start of June 30, 1995, and an end of August 1, 1995. You’ll see something like this.

Hope this helps.

Regards,

Arun Manglick

Wednesday, April 5, 2017

AWS Data Pipeline

Regards,
Arun Manglick

Saturday, March 4, 2017

Amazon Echo || Echo Dot || Alexa || TAP

Amazon Echo - 10-inch-tall cylindrical device
- Is a smart home appliance that can also act as a portable speaker.
- Plays all your music from Amazon Music, Spotify, Pandora, iHeartRadio, TuneIn, and more
- Hears you from across the room with far-field voice recognition, even while music is playing
- Answers questions, reads audiobooks and the news, reports traffic and weather, gives info on local businesses,
- Provides sports scores and schedules, and more using the Alexa Voice Service
- Controls lights, switches, and thermostats with compatible WeMo, Philips Hue, Samsung SmartThings, Wink, Insteon, Nest, and ecobee smart home devices

The Echo's beam-forming and noise cancellation technology enables it to hear your commands from the other side of the room, and this is where Alexa comes in.
-------------------------------------------------------------------
ALEXA:

Alexa is the Amazon Echo's pre-installed personal virtual assistant, which is capable of listening and responding to commands. Think of it as Amazon's answer to Siri.
Using Alexa on the Amazon Echo (just say "Alexa"), you can ask for a news update, create to-do lists, ask for facts and measurements, set timers and alarms, play music, and more.
--------------------------------------------------------------------------------------------------
Amazon TAP:

Tap is a slightly smaller (6.5 inches tall and 2.6 inches wide) and more portable version of the Echo, which doesn't require plugging in to the mains. However, in order to conserve power, the Tap isn't always on like the Echo, meaning you can't say "Alexa" to activate it. Instead, you need to physically press the microphone button on the device.

-------------------------------------------------------------------
Amaxon Echo Dot:

Echo Dot is even smaller yet, but it's actually another plugged in device that does exactly the same as the full-sized Echo. The only difference is that it lacks the meaty speaker, so you'll need to hook it into your home stereo set-up if you want decent music playback.

Keep Blogging

Arun Manglick

Wednesday, February 22, 2017

AWS - Key Components Limits

Hi,

Sharing crunch of AWS various courses which will work as a key for exam success.

This has been accumulated over a period of time and will keep updating this post.

Port:

· SSH: 22

· RDP: 3389

· HTTP: 80

· HTTPS: 443

· MySQL: 3306

· Redshift: 5439

· MS SQL: 1433

ELB

· Supported Protocols: HTTP/HTTPS/TCP/SSL

· Supported Ports: 25, 80, 443, 1024 - 36635

Compute

EC2

· Max Number of Tags – 10 Per EC2 Instance

· How many regions are there on the AWS platform currently - 11

· Total Regions Supported for EC2 - 9

· EC2 Instance Limits (Per Region)

· On-Demand : 20

· Reserved: 20

· SPOT – No Limit

Auto Scaling

· Default Cooling Period – 5 mins

Elastic Bean Stalk

· Default Limit

· Applications: 75

· Application Version: 1000

· Environments: 200

Storage

S3

· Per AWS A/c S3 Buckets – 100 (Call AWS to increase limit)

· File Size: 1 Byte – 5 TB

· Object/File Size in Single PUT – 5 GB

· Multipart Upload – Greater than 100 MB

Glacier

· 1000 Vaults – Per A/c Per Region

· No Max Limit to the total amount of data.

· Individual Archives Limit: 1 Byte – 40 TB

· Object/File Size in Single PUT – 4 GB

Storage Gateway

· Gateway Stored Volume – 16 TB, 32 Volumes: 512 TB

· Gateway Cached Volume – 32 TB, 32 Volumes: 1 PB

· Virtual Tape – 1 PB (1500 Virtual Tapes) (Takes 24 Hours for retrieval)

Import/Export

· Max Device Capacity – 16 TB

· Snowball - Max Device Capacity – 50 TB

CloudFront: (Can be Writable)

· 1000 – Request Per Second

· Max File Size that can be delivered thru CloudFront – 20 GB

· TTL – 24 Hrs (86400 secs)

· Cannot be - RDS, Glacier

· Can be – S3, EC2, ELB, Route53

Database

RDS:

· Limit – 40 RDS Instances

· Max DB on Single SQL-Server Instance – 30

· Max DB on Single Oracle Instance – 01

· RDS Backup Retention Period – 1 - 35 Days

· Read-Replicas – 05

· MySQL DB Size – 6 TB

· Maximum RDS Volume size using RDS PIOPS storage with MySQL & Oracle DB Engine - 6 TB

· Maximum PIOPS capacity on an MySQL and Oracle RDS instance is 30,000 IOPS (Default)

· Maximum size for a Microsoft SQL Server DB Instance with SQL Server Express edition – 10 GB (SA Mega Quiz #20)

Dynamo DB

· Storage – No Limit

· Single Item Size (Row Size): 1 – 400 KB

· Local/Global Secondary Index – 05 per Table

· Streams – Stored for 24 Hours only.

· Maximum Write Thruput – Can go beyond 10,000 capacity units, but contact AWS first.

· Projected Non-Key Attributes – 20 Per Table

· LSIs - Limit the total size of all elements (tables and indexes) to 10 GB per partition key value. (GSI does not have any such limitations)

· Tags: 50 Tags Per DynamoDB Table

· Triggers for a Table - Unlimited

RedShift

· Block Size – 1024 KB

· Maintain 3 copies

· Compute Node: 1 – 128

Aurora

· Maintain 6 copies in 3 AZs

ElastiCache

· Reserved Cache Nodes – 20

Networking

ELB:

· Allowed Load balancer : 20

· Port Supported: HTTP, HTTPS, SSL, TCP

· Acceptable ports for both the HTTPS/SSL and HTTP/TCP connections are 25, 80, 443, and 1024-65535

Per Region

· 05 - VPCs

· 05 - EIP

· 05 – Virtual Private Gateway

· 50 - VPN Connections

· 50 – Customer Gateway

Per VPC

· 01 - Internet Gateway

· 01 – IP Address Range

· 200 – Subnets

· 20 - EC2 Instances (Default)

Per Subnet

· 01 – AZ

· 01 - ACL

Notes:

· An instance retains its Private IP and persist across starts and stops

· Assign multiple IP addresses to your instances

· EIP is associated with your AWS account and not a particular instance. It remains associated with your account until you explicitly release it.

Route53

· Number of domains you can manage using Route 53 is 50 (however it is a soft limit and can be raised by contacting AWS support)

Management

Cloud Watch

· Logs – Unlimited/ Indefinitely

· Alarms – 2 Weeks (14 Days)

· Metrics – 2 Weeks (14 Days)

· For more, use API – GetMetricStatistics or some third party tools

· EC2 Metrics Monitoring

· Standard – 5 Mins

· Detailed – 1 Min

· Custom Metrics Monitoring – Minimum 1 Min

Cloud Formation

· Templates – No Limits

· Stack – 200 Per A/c

· Parameters – 60 per Template

· Output – 60 per Template

· Description Field Size – 4096 Characters

Cloud Trail

· 5 Trails – Per Region

· Deliver Log Files – Every 5 mins

· Capture API Activity – Last 7 Days

OpsWorks

· 40 Stacks

· 40 Layers per stack

· 40 Instances per stack

· 40 - Apps per stack

Security

· Roles: 250 Per AWS Account

· KMS

· Master Keys – 1000 Per AWS A/c

· Data Key – No Limit

· Resource Base Permission:

· S3, Glacier, EBS

· SNS, SQS

· Elastic Bean Stalk, Cloud Trail

Analytics

EMR:

· EC2 Instances Across All clusters - 20

Application

SQS:

· Visibility Timeout – 30 Secs (Default) (Otherwise 12 hours) Value must be between 0 seconds and 12 hours.

· Retention Period – 4 days (Default). Can be set form 1 min to 2 Weeks (14 Days) Value must be between 1 minute and 14 days.

· Max Long Polling Timeout – 20 Secs Value must be between 0 and 20 seconds.

· Message Size – 256 KB Value must be between 1 and 256 KB.

· Number of Queues – Unlimited

· Number of messages per queue – Unlimited

· Queue name: 80 characters

SES:

· SES Email Size – 10 MB (including Attachments)

· SES Recipients – 50 for every message

· Sending Limits:

SNS:

· Topic – 1,00,000 Lakh (Per A/c)

· Subscription – 10 Million Per Topic (Per A/c)

· TTL – 04 Weeks

SWF:

· Retention Period – 1 Year

· Max workflow execution – 1 Year

· History of Execution – 90 Days Max

· Max Workflow and Activity Types – 10,000

· Max Amazon SWF domains – 100

· Max open executions in a domain – 1,00,000

Elastic Encoder

· Jobs – 10k Per Pipeline

Hope this helps.

Keep Blogging!!!

Regards,

Arun Manglick