AWS Elastic Map Reduce (EMR) Service

An AWS Elastic Map Reduce (EMR) Service is a distributed data analytics platform service that is an AWS service.

Context:
- It can be used to manage an AWS EMR Cluster (with Hadoop, Spark, Hive, ...)
- It can be accessed by an EMR Web Console or an EMR CLI Command (such as create-cluster or describe-cluster).
- It can provide Instance Fleets to have back up instance available during peak hours
- It can provide APIs for tracking/recording cluster setup.
- …
Example(s):
- EMR v6.x [1]
  - EMR v6.3 (2021-05):
    - Apache Spark, v3.1.1-amzn-0
  - …
- EMR v5.x [2]
  - EMR v5.28 (2019-07):
    - Apache Spark v2.4.4 ...
    - Apache Zeppelin v0.8.2 ...
    - JupyterHub v1.0.0 ...
    - Apache MXNet v1.5.1 ...
    - Apache Hadoop v2.8.5 ...
    - Apache Flink version 1.9.0
    - Apache Phoenix version 4.14.3
    - Apache Presto version 0.227
    - …
  - EMR v5.12 (2018-07):
    - Apache Spark v2.3.1 ...
    - Apache Zeppelin v0.7.3 ...
    - JupyterHub v0.8.1 ...
    - Apache MXNet v1.2 ...
    - Apache Hadoop v2.8.4 ...
    - …
- EMR v4.x [3]
  - EMR v4.7.0 (2016-05):
    - Core Hadoop: Hadoop 2.7.2 with Ganglia 3.7.2, Apache Hive 1.0.0, Apache Hue 3.7.1, Apache Mahout 0.12.0, and Apache Pig 0.14.0
    - HBase: HBase 1.2.1 with Ganglia 3.7.2, Hadoop 2.7.2, Apache Hive 1.0.0, Apache Hue 3.7.1, Phoenix 4.7.0, and ZooKeeper 3.4.8
    - Presto-Sandbox: PrestoDN 0.147 with Hadoop 2.7.2 HDFS and Apache Hive 1.0.0 Hive Metastore.
    - Spark: Apache Spark 1.6.1 on Hadoop 2.7.2 YARN with Ganglia 3.7.2
- …
Counter-Example(s):
- GCP Dataproc Service within Google's GCP Cloud Platform.
- Databricks Platform.
- AWS Data Pipeline - a lightweight orchestration service for periodic, data-driven workflows.
- AWS Elasticsearch Service - a managed service that makes it easy to deploy, operate, and scale Elasticsearch, a popular open-source search and analytics engine.
- AWS Kinesis - a service make it easy to work with real-time streaming data in the AWS cloud.
- AWS Machine Learning - a service that enables you to easily build smart applications.
- AWS DynamoDB, AWS RDS, AWS Redshift.
See: Apache Spark, Map Reduce Framework, Hadoop.

References

http://aws.amazon.com/elasticmapreduce/

2018

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-5x.html
- QUOTE:

2016

https://en.wikipedia.org/wiki/Apache_Hadoop#Amazon_Elastic_MapReduce
- Elastic MapReduce (EMR)^[1] was introduced by Amazon.com in April 2009. Provisioning of the Hadoop cluster, running and terminating jobs, and handling data transfer between EC2(VM) and S3(Object Storage) are automated by Elastic MapReduce. Apache Hive, which is built on top of Hadoop for providing data warehouse services, is also offered in Elastic MapReduce.^[2]
  Support for using Spot Instances^[3] was later added in August 2011.^[4] Elastic MapReduce is fault-tolerant for slave failures,^[5] and it is recommended to only run the Task Instance Group on spot instances to take advantage of the lower cost while maintaining availability.

↑ "AWS | Amazon Elastic MapReduce (EMR) | Hadoop MapReduce in the Cloud". Aws.amazon.com. http://aws.amazon.com/elasticmapreduce/. Retrieved 2014-07-22.
↑ "Amazon Elastic MapReduce Developer Guide" (PDF). http://s3.amazonaws.com/awsdocs/ElasticMapReduce/latest/emr-dg.pdf. Retrieved 2013-10-17.
↑ "Amazon EC2 Spot Instances". Aws.amazon.com. http://aws.amazon.com/ec2/spot-instances/. Retrieved 2014-07-22.
↑ "Amazon Elastic MapReduce Now Supports Spot Instances". Amazon.com. 2011-08-18. http://aws.amazon.com/about-aws/whats-new/2011/08/18/amazon-elastic-mapreduce-now-supports-spot-instances/. Retrieved 2013-10-17.
↑ "Amazon Elastic MapReduce FAQs". Amazon.com. http://aws.amazon.com/elasticmapreduce/faqs/#cluster-10. Retrieved 2013-10-17.

2016b

https://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-release-components.html
- QUOTE:

2014

http://aws.amazon.com/elasticmapreduce/
- QUOTE: Amazon Elastic MapReduce (Amazon EMR) is a web service that makes it easy to quickly and cost-effectively process vast amounts of data.
  Amazon EMR simplifies big data processing, providing a managed Hadoop framework that makes it easy, fast, and cost-effective for you to distribute and process vast amounts of your data across dynamically scalable Amazon EC2 instances. You can also run other popular distributed frameworks such as Spark and Presto in Amazon EMR, and interact with data in other AWS data stores such as Amazon S3 and Amazon DynamoDB.
  Amazon EMR securely and reliably handles your big data use cases, including log analysis, web indexing, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics.

[1] "AWS | Amazon Elastic MapReduce (EMR) | Hadoop MapReduce in the Cloud". Aws.amazon.com. http://aws.amazon.com/elasticmapreduce/. Retrieved 2014-07-22.

[2] "Amazon Elastic MapReduce Developer Guide" (PDF). http://s3.amazonaws.com/awsdocs/ElasticMapReduce/latest/emr-dg.pdf. Retrieved 2013-10-17.

[3] "Amazon EC2 Spot Instances". Aws.amazon.com. http://aws.amazon.com/ec2/spot-instances/. Retrieved 2014-07-22.

[4] "Amazon Elastic MapReduce Now Supports Spot Instances". Amazon.com. 2011-08-18. http://aws.amazon.com/about-aws/whats-new/2011/08/18/amazon-elastic-mapreduce-now-supports-spot-instances/. Retrieved 2013-10-17.

[5] "Amazon Elastic MapReduce FAQs". Amazon.com. http://aws.amazon.com/elasticmapreduce/faqs/#cluster-10. Retrieved 2013-10-17.

[1]

[2]

[3]

[4]

[5]

AWS Elastic Map Reduce (EMR) Service

References

2018

2016

2016b

2014

Navigation menu

Search