AWS EMR-based Cluster
Jump to navigation
Jump to search
An AWS EMR-based Cluster is a computing cluster that is based on AWS' EMR service.
- Context:
- It can (typically) be composed of EMR Master Nodes and EMR Core Nodes (and EMR Task Nodes).
- It can (typically) be composed of Apache Hadoop, and possibly Apache Spark, Apache Hive, Apache Hue, Apache Pig, Apache Ganglia, ...
- It can be managed by an AWS EMR Management Task, such as create EMR cluster.
- …
- Example(s):
- an AWS EMR-based Spark Cluster: with Apache Spark 1.6.1 on Hadoop 2.7.2 YARN with Ganglia 3.7.2
- Core Hadoop: Hadoop 2.7.2 with Ganglia 3.7.2, Apache Hive 1.0.0, Apache Hue 3.7.1, Apache Mahout 0.12.0, and Apache Pig 0.14.0
- HBase: HBase 1.2.1 with Ganglia 3.7.2, Hadoop 2.7.2, Apache Hive 1.0.0, Apache Hue 3.7.1, Phoenix 4.7.0, and ZooKeeper 3.4.8
- Presto-Sandbox: PrestoDN 0.147 with Hadoop 2.7.2 HDFS and Apache Hive 1.0.0 Hive Metastore.
- …
- Counter-Example(s):
- See: Elastic Map Reduce Cluster.
References
2018
- https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-web-interfaces.html
- QUOTE: Hadoop and other applications you install on your Amazon EMR cluster, publish user interfaces as web sites hosted on the master node. For security reasons, when using EMR-Managed Security Groups, these web sites are only available on the master node's local web server, so you need to connect to the master node to view them. For more information, see Connect to the Master Node Using SSH. Hadoop also publishes user interfaces as web sites hosted on the core and task (slave) nodes. These web sites are also only available on local web servers on the nodes.
- YARN ResourceManager http://master-public-dns-name:8088/
- YARN NodeManager http://slave-public-dns-name:8042/
- Hadoop HDFS NameNode http://master-public-dns-name:50070/
- Hadoop HDFS DataNode http://slave-public-dns-name:50075/
- Spark HistoryServer http://master-public-dns-name:18080/
- Zeppelin http://master-public-dns-name:8890/
- Hue http://master-public-dns-name:8888/
- Ganglia http://master-public-dns-name/ganglia/
- HBase UI http://master-public-dns-name:16010/
- QUOTE: Hadoop and other applications you install on your Amazon EMR cluster, publish user interfaces as web sites hosted on the master node. For security reasons, when using EMR-Managed Security Groups, these web sites are only available on the master node's local web server, so you need to connect to the master node to view them. For more information, see Connect to the Master Node Using SSH. Hadoop also publishes user interfaces as web sites hosted on the core and task (slave) nodes. These web sites are also only available on local web servers on the nodes.
2015
- http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-nodes.html
- QUOTE: Amazon EMR defines three roles for the servers in a cluster. These different roles are referred to as node types. The Amazon EMR node types map to the master and slave roles defined in Hadoop.
- Master node — Manages the cluster: coordinating the distribution of the MapReduce executable and subsets of the raw data, to the core and task instance groups. It also tracks the status of each task performed, and monitors the health of the instance groups. There is only one master node in a cluster. This maps to the Hadoop master node.
- Core nodes — Runs tasks and stores data using the Hadoop Distributed File System (HDFS). This maps to a Hadoop slave node.
- Task nodes (optional) — Run tasks. This maps to a Hadoop slave node.
- QUOTE: Amazon EMR defines three roles for the servers in a cluster. These different roles are referred to as node types. The Amazon EMR node types map to the master and slave roles defined in Hadoop.