Dell PowerEdge C8000, R720xd, R720, Force10 S4810, Z9000 Apache Hadoop Solution Reference Architecture Guide

Dell PowerEdge C8000, R720xd, R720, Force10 S4810, Z9000 Apache Hadoop Solution Reference Architecture Guide
Add to My manuals

Below you will find brief information for Apache Hadoop Solution PowerEdge C8000, Apache Hadoop Solution PowerEdge R720xd, Apache Hadoop Solution PowerEdge R720.

advertisement

Assistant Bot

Need help? Our chatbot has already read the manual and is ready to assist you. Feel free to ask any questions about the device, but providing details will make the conversation more productive.

Dell Cloudera Apache Hadoop Solution PowerEdge C8000, R720xd, R720 Reference Architecture Guide | Manualzz

Dell | Cloudera Solution Reference Architecture Guide v5.1

Dell | Cloudera Solution

Reference Architecture v5.1

A Dell Reference Architecture Guide

July 14, 2014

Summary

This document presents the reference architecture of the Dell™ | Cloudera™ Solution for Apache Hadoop, which Dell designed jointly with Cloudera.

The reference architecture introduces all the high-level components, hardware, and software that are included in the stack. Each high-level component is then described individually.

5.1 1 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Table of Contents

Table of Contents

Tables

Figures

Dell | Cloudera Apache Hadoop Solution Overview

Solution Use Case Summary

Dell | Cloudera Hadoop Solution Components

Cloudera Enterprise Software Overview

Cluster Architecture

High-level Node Architecture

Network Fabric Architecture

Cluster Sizing

High Availability

Hardware Architecture

Server Infrastructure Options

Network Architecture

Physical Network Components

Network Connectivity Summary

IPv6 Capabilities

Cloudera Enterprise Software

Cloudera Manager

Cloudera RTQ (Impala)

Cloudera Search

Cloudera BDR

Cloudera Navigator

Cloudera Support

Dell | Cloudera Solution Deployment Methodology

Appendix A : Physical Configuration — PowerEdge C8000 Series

Appendix B : Bill of Materials – PowerEdge C8000 Series

Appendix C : Physical Configuration — PowerEdge R720xd

Appendix D : Bill of Materials – PowerEdge R720 Nodes

Appendix E : Bill of Materials – PowerEdge R720xd 3.5” Data Node

Appendix F : Bill of Materials – PowerEdge R720xd 2.5” Data Node

Appendix G : Part Numbers – Force10 Network Equipment

Networking Equipment notes

Server Racks and Power

Appendix H : Bill of Materials – Software and Support

Appendix I : JBOD versus Single Disk RAID 0 Configuration

Appendix J : Abbreviations

Update History

Changes in Version 5.1

To Learn More

5.1 2 Dell Confidential

48

49

51

53

53

32

33

37

46

47

29

29

29

29

29

30

30

54

55

56

57

57

57

9

9

11

12

13

15

15

21

22

27

28

4

5

5

6

8

2

3

Dell | Cloudera Solution Reference Architecture Guide v5.1

Tables

Table 1: Solution Use Cases

Table 2: Solution Support Matrix

Table 3 Service Locations

Table 4: Cluster Sizes by Server Model

Table 5: Server Platform Attributes

Table 6: Hardware Configurations – PowerEdge C8000 Compute Sleds

Table 7: Hardware Configurations – PowerEdge C8000 Storage Sleds

Table 8: Hardware Configurations – PowerEdge R720 Infrastructure Nodes

Table 9: Hardware Configurations –PowerEdge R720xd Data Nodes

Table 10: Per Rack Network Equipment

Table 11: Aggregation Network Switches for 3 or more racks

Table 12: Network Cables Required – 10GbE Configurations

Table 13: Chassis Configuration – PowerEdge C8000 Master Chassis

Table 14: Chassis Configuration – PowerEdge C8000 High Availability Chassis

Table 15: Chassis Configuration – PowerEdge C8000 Data Nodes

Table 16: Chassis Configuration – PowerEdge C8000 ‘Heavy’ Data Nodes

Table 17: Rack Configuration – PowerEdge C8000

Table 18: Rack Configuration – PowerEdge C8000 ‘Heavy’ Nodes

Table 19: Master Chassis – PowerEdge C8000

Table 20: HA Chassis – PowerEdge C8000

Table 21: Data Node Chassis – PowerEdge C8000

Table 22: Heavy Data Node Chassis – PowerEdge C8000

Table 23: Rack Configuration – PowerEdge R720xd (or R720/R720xd)

Table 24: Active and Standby Name, Admin, Edge and HA Nodes – PowerEdge R720

Table 25: Data node – PowerEdge R720xd

Table 26: Data node – PowerEdge R720xd

Table 27: Network Equipment – 1GbE – Dell Force10

Table 28: Network Equipment – 10GbE – Dell Force10

47

48

49

51

42

44

46

52

36

37

39

33

34

35

27

27

33

33

17

20

20

26

13

15

17

5

7

10

5.1 3 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Figures

Figure 1: Dell | Cloudera Solution Components

Figure 2: Cluster Architecture

Figure 3: Cluster Network Fabric Architecture

Figure 4: PowerEdge C8000 Chassis

Figure 5: PowerEdge 720xd Servers – 2.5” and 3.5” Chassis Options

Figure 6: Hadoop Logical Network Diagram

Figure 7: PowerEdge R720xd Node 1GbE Network Interconnects

Figure 8: Single Rack Networking Equipment

Figure 9: S4810 Multi-rack Networking Equipment

Figure 10: Multi-Rack View Using Force10 Z9000 Switches (Based on Layer-2)

Figure 11: Network Connections for 10GbE

THIS PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS

PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND.

© 2011 – 2014 Dell Inc. All rights reserved. Dell, the DELL logo, the DELL badge and PowerEdge are trademarks of Dell Inc. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell disclaims proprietary interest in the marks and names of others. This document is for informational purposes only. Dell reserves the right to make changes without further notice to the products herein. The content provided is as-is and without expressed or implied warranties of any kind.

22

23

23

25

26

28

6

9

11

16

19

5.1 4 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Dell | Cloudera Apache Hadoop Solution Overview

The Dell™ | Cloudera™ Apache Hadoop Solution lowers the barrier to adoption for organizations intending to use Apache™ Hadoop® in production.

Hadoop is an Apache project being built and used by a global community of contributors, using the Java programming language. Yahoo!, has been the largest contributor to this project, and uses Apache Hadoop extensively across its businesses. Core committers on the Hadoop project include employees from Cloudera, eBay, Facebook, Getopt, Hortonworks, Huawei, IBM, InMobi, INRIA, LinkedIn, MapR, Microsoft, Pivotal, Twitter,

UC Berkeley, VMware, WANdisco, and Yahoo!, with contributions from many more individuals and organizations.

Although Hadoop is popular and widely used, installing, configuring, and running a production Hadoop cluster involves multiple considerations, including:

The appropriate Hadoop software distribution and extensions

Monitoring and management software

Allocation of Hadoop services to physical nodes

Selection of appropriate server hardware

Design of the network fabric

Sizing and Scalability

Performance

These considerations are complicated by the need to understand the type of workloads that will be running on the cluster, the fast-moving pace of the core Hadoop project and the challenges of managing a system designed to scale to thousands of nodes in a single instance.

Dell’s customer-centered approach is to create rapidly deployable and highly optimized end-to-end Hadoop solutions running on hyperscale hardware. Dell listened to its customers and designed a Hadoop solution that is unique in the marketplace, combining optimized hardware, software and services to streamline deployment and improve the customer experience.

The Dell | Cloudera Apache Hadoop Solution was jointly designed by Dell and Cloudera, and embodies all the hardware, software, resources and services needed to run Hadoop in a production environment. This end-toend solution approach means that you can be in production with Hadoop in a shorter time than is typically possible with homegrown solutions.

The solution is based on the Cloudera Distribution for Apache Hadoop, and Dell PowerEdge and Force 10 hardware. This solution includes components that span the entire solution stack:

Reference architecture and best practices

Optimized server configurations

Optimized network infrastructure

Cloudera Distribution for Apache Hadoop

Solution Use Case Summary

The Dell | Cloudera Apache Hadoop Solution is designed to address the following use cases:

Use case

Big data analytics

ETL Offload

Data Warehouse Optimization

5.1

Table 1: Solution Use Cases

Description

Ability to query in real time at the speed of thought on petabyte scale unstructured and semi structured data using HBase and Hive.

Offload the Extract, Transform, Load (ETL) process from a relational management database or enterprise data warehouse into a Hadoop cluster

Augment the traditional relational management

5 Dell Confidential

Data storage

Batch processing of unstructured data

Data archive

Integration with data warehouse

Big data visualization

Search and predictive analytics

Dell | Cloudera Solution Reference Architecture Guide v5.1

database or enterprise data warehouse with

Hadoop. Hadoop acts as single data hub for all data types.

Collect and store unstructured and semi-structured data in a secure, fault-resilient scalable data store that can be organized and sorted for indexing and analysis.

Ability to batch-process (index, analyze, etc.) tens to hundreds of petabytes of unstructured and semi- structured data.

Active archival of medium-term (12–36 months) data from EDW/DBMS to expedite access, increase data retention time, or meet data retention policies or compliance requirements.

Extract, transfer and load data in and out of

Hadoop into separate DBMS for advanced analytics.

Capture, index and visualize unstructured and semi structured big data in real time

Crawl, extract, index and transform semi structured and unstructured data for search and predictive analytics

Dell | Cloudera Hadoop Solution Components

Figure 1: Dell | Cloudera Solution Components

Figure 1 illustrates the primary components in the Dell | Cloudera Solution.

5.1 6 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

The PowerEdge servers, Force10 networking, the operating system and the Java Virtual Machine make up the foundation on which the Hadoop software stack runs.

The Hadoop components provide multiple layers of functionality on top of this foundation. Apache Zookeeper provides a coordination layer for the distributed processing in the Hadoop system. The Hadoop Distributed File

System (HDFS) provides the core storage for data files in the system. HDFS is a distributed, scalable, reliable and portable file system. Apache HBase is a layer that provides record-oriented storage on top of HDFS. HBase can be used as an alternative to direct data file access, optimized for real time data serving environments, and co-exists with direct data file access.

YARN provides a resource management framework for running distributed applications under Hadoop, without being tied to MapReduce. The most popular distributed application is Hadoop’s MapReduce, but other applications also run under YARN, such as Apache Spark, Apache Hive, Apache Pig, etc.

Sitting on top of these storage layers are four complementary access layers providing data processing, inmemory processing, data query and data search. MapReduce is the core processing framework in the Hadoop system, and provides a massively parallel data processing framework inspired by Google’s MapReduce papers.

Another processing framework is the real-time, in-memory processing framework called Spark. The Data

Query layer provides real-time query access to data using Cloudera Impala. The Data Search layer provides real-time search of indexed data using Apache SOLR Cloud technology. All four of these layers can be used simultaneously or independently, depending on the workload and problems being solved.

Above these layers are a number of Hadoop end-user tools, providing a higher level of abstraction for data access and processing. Apache Pig and Apache Hive are data access and processing languages, while Apache

Mahout provides machine learning capabilities. Apache Oozie provides a general workflow capability for coordinating complex sequences of production jobs, and Apache HUE provides a web interface for analyzing data.

The left side of the diagram shows the integration components that can be used to move data in and out of the Hadoop system. Apache Sqoop provides data transfer to and from relational databases while Apache

Flume is optimized for processing event and log data. The HDFS API and tools can be used to move data files to and from the Hadoop system.

The right side of the diagram shows the capabilities that are integrated across the entire system. Hadoop administration and management is provided by Cloudera Manager while enterprise grade security (via Apache

Sentry) is integrated through the entire stack.

Support Matrix

The supported components and operating environments for the Dell | Cloudera® Apache Hadoop Solution are shown in

Table 2.

Category

Operating System

Operating System

Java Virtual Machine

Hadoop

Hadoop

Hadoop

Component

Table 2: Solution Support Matrix

Red Hat Enterprise Linux

CentOS

Version

6.5

6.5

Available Support

Red Hat Linux support

Dell Hardware support

Java 7 (1.7.0_25 minimum) N/A Sun Oracle JVM

Cloudera Distribution for

Apache Hadoop (CDH)

5.1 Cloudera support

Cloudera Manager

Cloudera Navigator

5.1

1.2

Cloudera support

Cloudera support

5.1 7 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Cloudera Enterprise Software Overview

Hadoop for the Enterprise

Cloudera Enterprise helps you become information-driven by leveraging the best of the open source community with the enterprise capabilities you need to succeed with Apache Hadoop in your organization.

Designed specifically for mission-critical environments, Cloudera Enterprise includes CDH, the world’s most popular open source Hadoop-based platform, as well as advanced system management and data management tools plus dedicated support and community advocacy from our world-class team of Hadoop developers and experts. Cloudera is your partner on the path to big data.

Cloudera Enterprise, with Apache Hadoop at the core, is:

Unified – one integrated system, bringing diverse users and application workloads to one pool of data on common infrastructure; no data movement required

Secure – perimeter security, authentication, granular authorization, and data protection

Governed – enterprise-grade data auditing, data lineage, and data discovery

Managed – native high-availability, fault-tolerance and self-healing storage, automated backup and disaster recovery, and advanced system and data management

Open – Apache-licensed open source to ensure your data and applications remain yours, and an open platform to connect with all of your existing investments in technology and skills

Rethink Data Management

 One massively scalable platform to store any amount or type of data, in its original form, for as long as desired or required

 Integrated with your existing infrastructure and tools

 Flexible to run a variety of enterprise workloads -- including batch processing, interactive SQL, enterprise search and advanced analytics

 Robust security, governance, data protection, and management that enterprises require

With Cloudera Enterprise, today’s leading organizations put their data at the center of their operations, to increase business visibility and reduce costs, while successfully managing risk and compliance requirements.

What's Inside?

CDH - At the core of Cloudera Enterprise is CDH, which combines Apache Hadoop with a number of other open source projects to create a single, massively scalable system where you can unite storage with an array of powerful processing and analytic frameworks.

Automated Cluster Management – Cloudera Manager - Cloudera Enterprise includes Cloudera Manager to help you easily deploy, manage, monitor, and diagnose issues with your cluster. Cloudera is critical for operating clusters at scale.

Cloudera Support - Get the industry’s best technical support for Hadoop. With Cloudera Support, you’ll experience more uptime, faster issue resolution, better performance to support your mission critical applications, and faster delivery of the platform features you care about.

Cloudera Enterprise Data Hub

Cloudera Enterprise also offers support for several advanced components that extend and complement the value of Apache Hadoop:

Online NoSQL – HBase

HBase is a distributed key-value store that helps you build real-time applications on massive tables (billions of rows, millions of columns) with fast, random access.

Analytic SQL – Impala

Impala is the industry’s leading massively-parallel (MPP) SQL engine built for Hadoop.

Search – Cloudera Search

5.1 8 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Cloudera Search, based on SOLR, lets your users query and browse data in Hadoop just they would search

Google or your favorite e-commerce site.

In-Memory Machine Learning and Stream Processing – Apache Spark

Spark delivers fast, in-memory analytics and real-time stream processing for Hadoop.

Data Management – Cloudera Navigator

Cloudera Navigator provides critical enterprise data audit, lineage, and data discovery capabilities that enterprises require.

Cluster Architecture

The overall architecture of the solution addresses all aspects of a production Hadoop cluster, including the software layers, the physical server hardware, the network fabric, as well as scalability, performance, and ongoing management.

This Cluster Architecture section summarizes the main aspects of the solution architecture. The remaining

sections of the document cover the details in depth.

High-level Node Architecture

Figure 2: Cluster Architecture

The cluster environment consists of multiple software services running on multiple physical server nodes. The implementation divides the server nodes into several roles, and each node has a configuration optimized for its role in the cluster. The physical server configurations are divided into two broad classes—data nodes, which handle the bulk of the

Hadoop processing, and infrastructure nodes, which support services needed for the cluster operation. A high performance

network fabric connects the cluster nodes together, and separates the core data network from management functions.

Figure 2 shows the roles for the nodes in a basic cluster.

The minimum configuration supported is six nodes, although at least seven are recommended. The nodes have the following roles:

5.1 9 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Node Role

Administration Node Optional

Hardware Configuration

Infrastructure

Active Name Node

Standby Name Node

Required

Required

Infrastructure

Infrastructure

High Availability (HA) Node Required

Edge (or Gateway) Node

Infrastructure

Recommended Infrastructure

Data Node 1

Data Node 2

Required

Required

Data

Data

Data Node 3 Required Data

Administration Node—provides cluster deployment and management capabilities. The administration node is optional in cluster deployments, depending on whether existing provisioning, monitoring, and management infrastructure will be used.

Active Name Node—runs all the services needed to manage the HDFS data storage and YARN resource management. This is sometimes called the “master name node.” There are four primary services running on the active name node:

Resource Manager (to support cluster resource management, including MapReduce jobs)

NameNode (to support HDFS data storage)

Journal Manager (to support high availability)

Zookeeper (to support coordination)

Standby Name Node—when quorum-based HA mode is used, this node runs the standby

namenode

process, a second journal manager, and an optional standby

resource manager

. This node also runs a second

Zookeeper service.

High Availability (HA) Node—this node provides the third journal node for HA—the master and secondary name nodes provide the first and second journal nodes. It also runs a third Zookeeper service.

Edge Node—provides an interface between the data and processing capacity available in the Hadoop cluster and a user of that capacity. The edge node is connected to the main access LAN, and is sometimes called a

“gateway node.” Edge nodes are optional, but highly recommended.

Data Node—runs all the services required to store blocks of data on the local hard drives and execute processing tasks against that data. A minimum of three data nodes are required, and larger clusters are scaled primarily by adding additional data nodes There are two types of services running on the data nodes:

NodeManager Daemon (to support YARN job execution)

DataNode Daemon (to support HDFS data storage)

Physical Node

Administration Node

Edge Node

Active Name Node

Table 3 Service Locations

Software Function

Operating System Provisioning

Yum Repositories

Monitoring Functions

Cloudera Manager

NameNode

Resource Manager

Zookeeper

Quorum Journal Node

HMaster

5.1 10 Dell Confidential

Standby Name Node

HA Node

Data Node(x)

Network Fabric Architecture

Dell | Cloudera Solution Reference Architecture Guide v5.1

Standby Namenode

Standby Resource Manager

Zookeeper

Quorum Journal Node

Zookeeper

Quorum Journal Node

DataNode

NodeManager

RegionServer

Figure 3: Cluster Network Fabric Architecture

The cluster network is architected to meet the needs of a high performance and scalable cluster, while providing redundancy and access to management capabilities.

Four distinct networks are used in the cluster:

5.1 11 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Logical Network

Cluster Data Network

Connection

Bonded 10GbE

Switch

Dual top of rack switches

Management Network

BMC/IPMI Network

1GbE,

1GbE

Switch per rack, dedicated or shared with BMC network

Switch per rack, dedicated or shared with Management network

Edge Network Bonded Top of rack or aggregation switch

Cluster Data Network—the data network carries the bulk of the traffic within the cluster. This network is aggregated within each rack, and racks are aggregated into the cluster switch. Dual connections with active load balancing are used from each node. This provides increased bandwidth and redundancy when a cable or switch fails.

Management Network—the management network is used to provide cluster management and provisioning capabilities.

BMC / IPMI Network—the BMC network connects the BMC or iDRAC ports and the out-of-band management ports of the switches. It is aggregated into a dedicated switch in each rack, and optionally connected to the top of rack or cluster switches with dedicated VLAN.

Edge Network—the Edge network provides connectivity from edge nodes to the existing core network via the top of rack or cluster switch.

Connectivity between the cluster and existing network infrastructure can be adapted to specific installations.

Normally, the cluster data nodes are isolated from any existing network but they can be exposed, and optionally routed through an application gateway or firewall.

Cluster Sizing

The architecture is organized into three units for sizing as the Hadoop environment grows. From smallest to largest, they are rack, pod and cluster. Each has specific characteristics and sizing considerations documented in this reference architecture. The design goal for the Hadoop environment is to enable you to scale the environment by adding the additional capacity as needed, without the need to replace any existing components.

Rack

A rack is the smallest size designation for a Hadoop environment. A rack consists of all the necessary power, the network cabling and the two Ethernet switches necessary to support up to 20 data nodes. A rack should use its own power and space within the data center, separate from other racks, and should be treated as a fault zone.

Pod

A pod is an installation composed of three racks, based on server and network sizing. A pod is capable of supporting enough Hadoop server nodes and network switches for a minimum commercial scale installation.

Cluster

A cluster is a single Hadoop environment attached to a pair of distribution switches providing an aggregation layer for the entire cluster. A cluster can range in size from a single rack to a set of pods. A cluster shares the infrastructure nodes and management tools for operating the Hadoop environment. The size of the cluster can vary depending on the capacity of the aggregation network. For example, a Dell™ Force10™ Z9000 aggregation switch can run a larger cluster than the Dell™ Force10™ S4810 switches.

Sizing Constraints

The minimum configuration supported is six nodes:

Master name node

5.1 12 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Secondary name node

High availability (HA) node

Three data nodes

The hardware configurations for the infrastructure nodes support clusters in the petabyte storage range.

Beyond the infrastructure nodes, cluster size is primarily a function of the server platform and disk drives

chosen, and the number of data nodes. Table 4 shows the approximate number of data nodes per rack, pod

and cluster for the various server models. In practice the actual density per rack will be influenced by physical constraints like power and cooling as well as available network ports.

A minimum of one edge node is recommended per cluster. Larger clusters and clusters with high ingest volumes or rates may benefit from additional edge nodes.

Table 4: Cluster Sizes by Server Model

Server Model Max Per Rack Max Per Pod Max Per Cluster

R720 Data Node 20 60

To be determined based on sizing criteria

High Availability

The architecture implements high availability at multiple levels through a combination of hardware redundancy and software support.

Hadoop Redundancy

The Hadoop distributed filesystem implements redundant storage for data resiliency. Data is replicated across multiple nodes, and across racks. This provides multiple copies of data for reliability in the case of disk failure or node failure, and can also increase performance. The number of replicas defaults to three, and can easily be changed. Hadoop will automatically maintain replicas when a node fails – the bonded network provides enough bandwidth to handle replication traffic as well as production traffic.

The Hadoop job parallelism model can scale to larger and smaller numbers of nodes, allowing jobs to run when parts of the cluster are off line.

Network Redundancy

The production network uses bonded connections to multiple switches in each rack. This allows operation at reduced capacity in the event of a network port, network cable, or switch failure.

HDFS Highly Available Name Nodes

The architecture implements high availability for the HDFS directory through a quorum mechanism that replicates critical name node data across multiple physical nodes. Production clusters normally implement name node HA.

In quorum-based HA, there are typically two name node processes running on two physical servers. At any point in time, one of the NameNodes is in an Active state, and the other is in a Standby state. The Active

NameNode is responsible for all client operations in the cluster, while the Standby is simply acting as a slave, maintaining enough state to provide a fast failover if necessary.

In order for the Standby node to keep its state synchronized with the Active node in this implementation, both nodes communicate with a group of separate daemons called JournalNodes. When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these

JournalNodes. The Standby node is capable of reading the edits from the JournalNodes, and is constantly watching them for changes to the edit log. As the Standby Node sees the edits, it applies them to its own namespace. In the event of a failover, the Standby will ensure that it has read all of the edits from the

5.1 13 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

JournalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.

In order to provide a fast failover, it is also necessary that the Standby node has up-to-date information regarding the location of blocks in the cluster. In order to achieve this, the DataNodes are configured with the location of both NameNodes, and they send block location information and heartbeats to both.

There should be an odd number (and at least three) JournalNode daemons, since edit log modifications must be written to a majority of JournalNodes. The JournalNode daemons run on the master, secondary, and HA nodes in this reference architecture.

Resource Manager High Availability

The architecture supports high availability for the Hadoop YARN resource manager. Without resource manager

HA, a Hadoop resource manager failure causes currently executing jobs to fail. When resource manager HA is enabled, jobs can continue running in the event of a resource manager failure. Furthermore, upon failover the applications can resume from their last check-pointed state; for example, completed map tasks in a

MapReduce job are not rerun on a subsequent attempt. This allows events such as machine crashes or planned maintenance to be handled without any significant performance effect on running applications.

Resource manager HA is implemented by means of an Active/Standby pair of resource managers. On start-up, each resource manager is in the standby state: the process is started, but the state is not loaded. When transitioning to active, the resource manager loads the internal state from the designated state store and starts all the internal services. The stimulus to transition-to-active comes from either the administrator or through the integrated failover controller when automatic failover is enabled.

This feature is not always implemented in production clusters.

5.1 14 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Hardware Architecture

Server Infrastructure Options

The Dell | Cloudera Solution includes two choices for server infrastructure:

Dell™ PowerEdge™ C8000 series

Dell™ PowerEdge™ R720(xd) series

These alternatives provide density and capacity choices to match customer requirements. The appropriate choice depends on the intended cluster usage and workload, cluster size, and the planned customer

environment. Table 5 summaries the high-level attributes involved in a server platform choice.

Table 5: Server Platform Attributes

Customer Environment Attributes Shared Infrastructure

Platform

Workload Attributes

R720XD

Choose R720 if:

Standardized on monolithic

Rack density 10 – 20 servers per rack

Power per rack < 10Kw

Standard rack/rear cabling

Choose R720 if:

Higher-frequency CPU

Require high memory density (768GB, 24

DIMMs)

Require high spindle >12 x 2.5-inch drives

Ideal for small – medium Hadoop cluster

C8000 Series

Choose C8000 if:

Open to shared infrastructure

Rack density 20+ servers per rack

Power per rack > 10Kw

Wide-deep rack/front cabling

Choose C8000 if:

Need high spindle >12 x 3.5-inch drives

Intend to run multiple server types per chassis

Need future flexibility/configuration

Ideal for medium – large Hadoop cluster

The following sections describe the supported server models and configurations required. Detailed part lists and rack layouts are included in the appendices. The PowerEdge C8000 series and PowerEdge R720 series are recommended for new installations.

PowerEdge C8000 Series

The PowerEdge C8000 series is Dell’s hyperscale-inspired 4U shared infrastructure server that allows the mixing and matching of compute, storage and GPU sleds in one chassis. The PowerEdge C8000 chassis holds up to eight single-wide compute PowerEdge C8220 server sleds, up to four double-wide PowerEdge C8220X compute/GPU sleds, or PowerEdge C8000XD storage sleds, or a combination of these, and two power sleds.

This design allows the right balance of CPU-to-memory-to-disk ratio and large-scale storage nodes requiring

24 or more hard drives to run big data applications faster. The flexible PowerEdge C8000 can run Hadoop name nodes, data nodes, edge nodes and multiple workloads from the same chassis or across racks, allowing for better use of IT resources, lower total cost of ownership over the lifecycle of the server, and more efficient use of space while increasing Hadoop POD compute/storage density and performance.

5.1 15 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Figure 4: PowerEdge C8000 Chassis

PowerEdge C8000 feature summary:

 Up to eight independently serviceable PowerEdge C8220 compute sleds, four PowerEdge C8220x compute sleds or four PowerEdge C8000XD storage sleds in a 4U rack chassis

 Cold aisle service

 Intel® E5-2600v2 series processors with up to ten cores and support for up to 130W TDP

 Up to 256GB of memory with 16 DDR3 slots at 1600MHz per node (512GB RTS+)

PowerEdge C8220 Single Width Compute (SWC)

 Up to two 2.5-inch non-hot-plug hard drives per PowerEdge C8220 compute sled

PowerEdge C8220X Double Width Compute (DWC)

 Up to 12 x 2.5-inch or four 3.5-inch hot-plug hard drives per PowerEdge C8220X compute

 Up to two 2.5-inch non-hot-plug hard drives per PowerEdge C8220X compute

 Up to two 2.5-inch hot-plug hard drives per PowerEdge C8220X compute

PowerEdge C8000XD Double Width Storage (DWS)

 Up to 12 x 3.5-inch or 12 x 2.5-inch hot-plug hard drives or 24 x 2.5-inch SSDs per PowerEdge C8000XD storage sled

5.1 16 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

PowerEdge C8000 Hardware Configurations

Machine Function

Table 6: Hardware Configurations – PowerEdge C8000 Compute Sleds

Infrastructure Nodes Data Node Heavy Data Node

Sled 1

Processor

RAM (minimum)

PowerEdge C8220X

2 x E5-2670v2 (10-core)

128GB

PowerEdge C8220X

2 x E5-2670v2 (10-core)

64GB

PowerEdge C8220X

2 x E5-2670v2 (10-core)

64GB

LOM

Network Controller

DISK (onboard)

2 x 1GbE

2 x Intel X520 10GbE NIC, Dual

Port, SFP+, Low Profile

None

2 x 1GbE

Intel X520 10GbE NIC, Dual

Port, SFP+, Low Profile

None

2 x 1GbE

Intel X520 10GbE NIC, Dual

Port, SFP+, Low Profile

None

DISK (hot-swap)

N/A 2 x 2.5-in. 1TB 2 x 2.5-inch 1TB

DISK (side)

6 x 1 TB 2.5-in. SATA 4 x 4 TB 3.5-in. NL SAS 4 x 4 TB 3.5-in. NL SAS

DISK (expansion)

Storage Controller

Storage Controller 2

RAID

None

LSI 2008 (Mezzanine)

None

RAID 10

1 x C8000XD

48TB

LSI 2008 (Mezzanine)

LSI 9202 (PCI)

JBOD

2 x C8000XD

96TB

LSI 2008 (Mezzanine)

LSI 9202 (PCI)

JBOD

Table 7: Hardware Configurations – PowerEdge C8000 Storage Sleds

Infrastructure Nodes Data Node Machine Function

Sled 2

DISK

Sled 3

DISK

N/A

N/A

N/A

N/A

PowerEdge C8000XD

12 x 4TB 3.5-in. Nearline SAS (NL-SAS)

PowerEdge C8000XD

12 x 4TB 3.5-in. Nearline SAS (NL-SAS)

C8000 Configuration Notes

Appendix A :Illustrates the recommended chassis and rack layout for C8000 clusters.

5.1 17 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Appendix B : contains complete bill of materials (BOM) listing for the C8000 server configurations.

Data nodes are configured with the onboard chipset controller connected to the front hot-swap drives in the

PowerEdge C8220X compute sled.

Either 3TB or 4TB drives can be used, and are fully supported. The reference BOMs include 4TB drives.

The two “rear” motherboard drives in the PowerEdge C8220x compute sled are not supported for Hadoop configurations.

Data nodes require one PowerEdge C8220XD sled. Data nodes can alternatively be configured with two

PowerEdge C8220XD sleds, referred to as ‘heavy’ data nodes.

When building a cluster using ‘heavy’ data nodes, the single data node in the HA chassis ( Appendix B :, Table

14) should be removed.

Data nodes use an LSI 9202 PCI HBA to connect to one or two PowerEdge C8220XD storage sleds. The connection requires one SAS extender cable per external sled.

The reference BOMs in the appendices have been organized by chassis to simplify ordering.

Some configurations may require sled blanks for empty slots; the reference BOMs in the appendices account for this.

A SAS extension cable is required for data nodes, and connects from the compute sled to the storage sled. For

“heavy” data nodes, two cables are used, one per storage sled. Do not connect a single storage sled using multiple SAS extension cables. All required cables are included in the BOM listings.

The PowerEdge C8000 series is designed for cold-aisle service, with cabling in front of the chassis. Verify that rack configurations are compatible with this configuration.

Be sure to consult your Dell account representative before changing the recommended disk sizes.

A minimum configuration can be implemented in three PowerEdge C8000 chassis, if one of the data nodes is installed in the HA chassis.

PowerEdge R720 / R720xd Server

The PowerEdge R720 and R720xd servers are Dell’s 12G PowerEdge mainstream dual socket 2U rack servers.

They are designed to deliver the most competitive feature set, best performance and best value. In this generation, Dell offers a large storage footprint, best-in-class I/O capabilities and more advanced management features. The PowerEdge R720 and R720xd are technically similar except the R720xd has a backplane that can accommodate more drives (up to 24).

5.1 18 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Figure 5: PowerEdge 720xd Servers – 2.5” and 3.5” Chassis Options

PowerEdge R720xd feature summary:

Intel® Romley platform and Intel

®

Xeon

®

E5-2600v2 processors

1600MHz DDR3

Network daughter cards for customer choice of LOM speed, fabric and brand at point of sale

PCIe SSD in a front-accessible, hot-plug format

Internal GPGPU support

Intel® Node Manager power management technology

Software RAID

Platinum efficiency power supplies, common across 600 and 700 series platforms

5.1 19 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

PowerEdge R720 / R720xd Hardware Configurations

Table 8: Hardware Configurations – PowerEdge R720 Infrastructure Nodes

Machine Function

Infrastructure Nodes

Platform

CPU

RAM (minimum)

LOM

PowerEdge R720

2 x E5-2670v2 (10-core)

128GB

4 x 1GbE

Add In Network

DISK

2 x Intel X520 DP 10Gb DA/SFP+ (for 10GbE networking)

8 x 1TB 7.2K SATA 3.5-in.

Storage Controller

PERC H710

RAID

RAID 10

Notes:

Be sure to consult your Dell account representative before changing the recommended disk sizes.

Table 9: Hardware Configurations –PowerEdge R720xd Data Nodes

Machine Function Data Nodes Data Nodes

Platform

CPU

PowerEdge R720xd

2 x E5-2670v2 (10-core)

PowerEdge R720xd

2 x E5-2670v2 (10-core)

RAM (minimum)

64GB 64GB

LOM

4 x 1GbE 4 x 1GbE

DISK

Add In Network

Storage Controller

12 x 4TB 7.2K RPM SATA 3Gbps 3.5in

1 x Intel X520 DP 10Gb DA/SFP+

(for 10GbE networking)

LSI 9207

24 x 1TB SATA 7.2K 2.5-in.

1 x Intel X520 DP 10Gb DA/SFP+

(for 10GbE networking)

LSI 9207

RAID

JBOD JBOD

Notes:

Be sure to consult your Dell account representative before changing the recommended disk sizes.

5.1 20 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

PowerEdge R720xd Configuration Notes

Appendix C : Illustrates the recommended rack layout for R720 clusters.

Appendix D :, Appendix D :, and Appendix E : contain the full bill of materials (BOM) listing for the PowerEdge

R720 and R720Xd server configurations.

The R720 and R720xd configurations can be used with 10GbE networking. To use 10GbE networking support, additional network cards are required. Infrastructure nodes require two dual-port cards, while data nodes require one dual-port card. The BOM listings include the required cards for 10GbE networking.

Data nodes can be configured with either the LSI 9207 or the PERC H710 disk controller. The LSI 9207 is recommended for new deployments. The PERC H170 is supported as an alternative, primarily for compatibility

with existing clusters. Refer to the “JBOD versus Single Disk RAID 0 Configuration” section for more

information.

Storage Sizing Notes

For drive capacities greater than 3TB or node storage density over 36TB, special consideration is required for

HDFS setup. Configurations of this size are close to the limit of Hadoop per-node storage capacity. At a minimum, the HDFS block size should be increased to 128MB or more. Since number of files, blocks per file, compression, and reserved space all factor into the calculations, the configuration will require an analysis of the intended cluster usage and data.

Large per-node density also has an impact on cluster performance in the event of node failure. The bonded

10GbE configuration is recommended for large node densities to minimize performance impacts in this case.

You Dell representative can assist with these estimates and calculations.

Network Architecture

The cluster network is architected to meet the needs of a high performance and scalable cluster, while providing redundancy and access to management capabilities.

The architecture supports two options for networking: 1GbE and 10GbE. The 1GbE option uses Dell™

Force10™ S60 switches as the top-of-rack connectivity to all Hadoop-related nodes, while the 10GbE option uses Dell™ Force10™ S4810 switches. Hadoop applications are increasingly being deployed on 10GbE servers for the scale and price advantages they bring, and this is the recommended configuration for new clusters.

Four distinct networks are used in the cluster:

Logical Network Connection Switch

Cluster Data Network Bonded 10GbE Dual top of rack switches

Management Network 1GbE, Dedicated switch per rack

BMC Network 1GbE Dedicated switch per rack

Edge Network Bonded Top of rack or cluster switch

Each network uses a separate VLAN, and dedicated components when possible. Figure 6 shows the logical

organization of the network.

5.1 21 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Figure 6: Hadoop Logical Network Diagram

Physical Network Components

Server Node Connections

Server connections to the network switches for the data network are bonded, and use an Active-Active LAN aggregation group (LAG) in a load-balance configuration. (Under Linux, this is balanced-alb or mode 6 bonding) The connections are made to a pair of ToR switches, to provide redundancy in the case of port, cable, or switch failure. The switch ports are configured as a LAG. Each server has an additional 1GbE connection to the management network to facilitate server management and provisioning.

Connections to the BMC network use a single connection from the BMC port to a dedicated switch in each rack.

Edge nodes have an additional pair of 10GbE connections to the ToR switch. This connection facilitates high performance ingest and cluster access between applications running on those nodes, and the core datacenter network.

5.1 22 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Figure 7: PowerEdge R720xd Node 1GbE Network Interconnects

Top of Rack (ToR) Switches

Each rack uses a pair of Force10 S4810’s as top of rack switches. These switches are configured for high availability using the Virtual Link Trunking (VLT) feature. VLT allows the servers to terminate their LAG interfaces into two different switches instead of one. This provides redundancy within the rack if a switch fails or needs maintenance, while providing active-active bandwidth utilization.

Figure 8: Single Rack Networking Equipment

Figure 8 shows the single rack network configuration, with a pair of Force10 S4810 switches aggregating the

rack traffic.

5.1 23 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

For a single rack, the top of rack switches can act as the cluster aggregation layer. For larger clusters, a cluster aggregation layer is required.

In this architecture, each rack is managed as a separate entity from a switching perspective, and ToR switches connect only to the aggregation switches.

Cluster Aggregation Switches

For clusters consisting of one more pods, the architecture uses either the Force10 S4810, or the Force10

Z9000 for aggregation switches. The choice depends on the initial size and planned scaling. The Force10

S4810-based aggregation design is preferred for lower cost and medium scalability. This design can handle up to six racks or two pods. The Z9000 is recommended for larger deployments.

Like the ToR switches, the aggregation switches are also connected in pairs using VLT. The uplink from each

S4810 ToR switch to the aggregation pair is 80Gb, using a pair of 40G interfaces Since both S4810’s connect to the aggregation pair, there is a collective bandwidth of 160G available from each rack.

S4810 Cluster Aggregation

Figure 9 illustrates the configuration for a multiple rack cluster using the S4810 as a cluster aggregation switch.

5.1 24 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

.

Figure 9: S4810 Multi-rack Networking Equipment

Force10 Z9000 Cluster Aggregation

For larger initial deployments, deployments where scale up is planned, or instances where the cluster needs to be co-located with other applications in different racks, the recommended option is the Force10 Z9000 core switch. The Force10 Z9000 is a 32-port, 40G high-capacity switch. It can aggregate up to 15 racks of highdensity PowerEdge C8000 servers. The rack-to-rack bandwidth needed in Hadoop is best addressed by a

40G-capable, non-blocking switch and the Force10 Z9000 can provide a cumulative bandwidth of 1.5TB of

5.1 25 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

throughput at line-rate traffic from every port. In many cases, The Force10 Z9000 does not need to connect into any other higher-tier core switches because the capacity is enough for a data center with hundreds of servers.

Figure 10 illustrates the configuration for a multiple rack cluster using the Z9000 as a cluster aggregation

switch. This is an example of a Clos fabric that grows horizontally. This technique of network fabric deployment has been used in the data centers of some of the largest web companies, whose businesses range from social media to public cloud. Some of the largest recent Hadoop deployments also use this new approach to networking.

Each switch in Figure 10 forms a layer-2 LAG, This assumes that the Force10 Z9000 pair in the aggregation

forms a VLT pair for HA. Now we have two tiers of VLT, one forming at the ToR for servers and another at the aggregation for the ToR switches.

Figure 10: Multi-Rack View Using Force10 Z9000 Switches (Based on Layer-2)

Core Network

The aggregation layer functions as the network core for the cluster. In most instances, the cluster will connect

to a larger core within the enterprise, represented by the cloud in Figure 9. Details of the connection are site

specific, and need to be determined as part of the deployment planning.

Layer-2 and Layer-3

The layer-2 and layer-3 boundaries are separated at either the ToR or the aggregation layer. Either of the

options is equally viable. The colors blue and red in Figure 10 represent the layer-2 and layer-3 boundaries.

This document uses layer-2 as the reference up to the aggregation layer.

Management Network

The management network of all the servers and switches is aggregated into a Dell™ Force10™ S55 switch, which is located in each rack of the POD. It uplinks on a 10G link to the aggregation switches or the core directly, wherever the split for out-of-band is required.

Network Equipment Summary

Table 10 and Table 11 summarize the required cluster networking equipment. Table 12 summarizes the number

of cables needed for a cluster.

Table 10: Per Rack Network Equipment

Total Racks

1 (6-20 Nodes)

Top-of-rack switches

1 x Force10 S55

5.1 26 Dell Confidential

Aggregation switch

Switch Interconnect cables

Modules in each ToR

Dell | Cloudera Solution Reference Architecture Guide v5.1

2 x Force10 S4810

Not needed for a single rack

2 x 40Gb QSFP+ Cables

1x 12-2port Stacking, 1x 10G -2 port uplink

Table 11: Aggregation Network Switches for 3 or more racks

Total racks

3 to 15 Racks (1 – 5 pods)

Aggregation Layer Switches

Pod-interconnect cabling

Switch Interconnect Cables

2 x Force10 Z9000

4 x 40Gb QSFP+ Cables per Rack

4 x 40GB QSFP+ cables 1 M

Table 12: Network Cables Required – 10GbE Configurations

Description 1GbE Cables Required

10GbE Cables with SFP+

Required

Name and HA nodes

Edge nodes

Data nodes

2 x number of nodes

2 x number of nodes

2 x number of nodes

2 x number of nodes

4 x number of nodes

2 x number of nodes

Network Connectivity Summary

The network interconnects between various hardware components of the solution are depicted in Figure 11.

For more information, please see the

Dell | Cloudera Apache Hadoop Solution Deployment Guide

.

5.1 27 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Figure 11: Network Connections for 10GbE

IPv6 Capabilities

At this time, the architecture does not support or allow for the use of IPv6 for network connectivity.

5.1 28 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Cloudera Enterprise Software

The Dell | Cloudera Solution is based on Cloudera Enterprise, which includes Cloudera’s distribution for

Hadoop (CDH) 5.0 and Cloudera Manager.

Cloudera Manager

Cloudera Manager is designed to make administration of CDH simple and straightforward, at any scale. With

Cloudera Manager, you can easily deploy and centrally operate the complete Hadoop stack. The application automates the installation process, reducing deployment time from weeks to minutes; gives you a clusterwide, real-time view of nodes and services running; provides a single, central console to enact configuration changes across your cluster; and incorporates a full range of reporting and diagnostic tools to help you optimize performance and utilization.

Cloudera Manager is available as part of both the Cloudera Standard and Cloudera Enterprise product offerings. With

Cloudera Standard, you get a full set of functionality to deploy, configure, manage, monitor, diagnose and scale your cluster—the most comprehensive and advanced set of management capabilities available from any vendor. When you upgrade to Cloudera Enterprise, you get additional capabilities for integration, process automation and disaster recovery that are focused on helping you operate your cluster successfully in enterprise environments.

Cloudera RTQ (Impala)

Cloudera Impala is an open source Massively Parallel Processing (MPP) query engine that runs natively in

Apache™ Hadoop®. The Apache-licensed Impala project brings scalable parallel database technology to

Hadoop, enabling users to issue low-latency SQL queries to data stored in HDFS and Apache HBase™ without requiring data movement or transformation. Impala is integrated from the ground up as part of the Hadoop ecosystem and leverages the same flexible file and data formats, metadata, security and resource management frameworks used by MapReduce, Apache Hive™, Apache Pig™ and other components of the Hadoop stack.

Designed to complement MapReduce, which specializes in large-scale batch processing, Impala is an independent processing framework optimized for interactive queries. With Impala, analysts and data scientists now have the ability to perform real-time, “speed of thought” analytics on data stored in Hadoop via SQL or through business intelligence (BI) tools. The result is that large-scale data processing and interactive queries can be done on the same system using the same data and metadata—removing the need to migrate data sets into specialized systems and/or proprietary formats simply to perform analysis.

Cloudera Search

Cloudera Search delivers full-text, interactive search to CDH, Cloudera’s 100% open source distribution including Apache Hadoop™. Powered by Apache Solr, Cloudera Search enriches the Hadoop platform and enables a new generation of search – Big Data search – through scalable indexing of data within HDFS and

Apache HBase™. Cloudera Search gains the same fault tolerance, scale, visibility, and flexibility provided to other Hadoop workloads, due to its integration with CDH.

Apache Solr has been the enterprise standard for open source search since its release in 2006. Its active and mature community drives wide adoption across verticals and industries, and its APIs are feature-rich and extensible. Cloudera Search extends the value of Apache Solr by tightly integrating and optimizing it to run on

CDH and Cloudera Manager

Cloudera BDR

BDR is an add-on subscription to Cloudera Enterprise that provides end-to-end business continuity. When you add BDR to your Cloudera Enterprise subscription, you’ll get the management capabilities and support you need to get maximum value from the powerful disaster recovery features available in CDH.

Cloudera BDR makes it easy to configure and manage disaster recovery policies for data stored in CDH. With

BDR you can:

Centrally configure and manage disaster recovery workflows for files (HDFS) and metadata (Hive) through an easy-to-use graphical interface

5.1 29 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Consistently meet or exceed service level agreements (SLAs) and recovery time objectives (RTOs) through simplified management and process automation

BDR includes:

 Centralized management for HDFS replication through Cloudera Manager

 Centralized management for Hive replication through Cloudera Manager

 8x5 or 24x7 Cloudera Support

Key features of BDR:

Define file and directory-level replication policies

Schedule replication jobs

Monitor progress through a centralized console

Identify discrepancies between primary and secondary system(s)

Cloudera Navigator

Navigator is an add-on subscription to Cloudera Enterprise that provides the first fully integrated data management tool for Cloudera Enterprise. It's designed to provide all of the capabilities required for administrators, data managers and analysts to secure, govern, and explore the large amounts of diverse data that land in CDH. The first release of Cloudera Navigator (v1.0) was developed specifically to address data security concerns most typically associated with highly regulated industries, such as financial services, healthcare and government. It includes a full suite of auditing capabilities across all CDH components that store data.

The Navigator subscription gives you access to all of the capabilities of the Cloudera Navigator application.

With Navigator, you can:

Store sensitive data in CDH while maintaining compliance with regulations and internal audit policies

Verify access permissions to files and directories

Maintain a full audit history of HDFS, Hive and HBase data access

Report on data access by user and type

Integrate with third-party SIEM tools

Navigator includes:

Centralized audit management and reporting for HDFS, Hive and HBase

8x5 or 24x7 Cloudera Support

Key features of Cloudera Navigator:

Configuration of audit information for HDFS, HBase and Hive

Centralized view of data access and permissions

Simple, queryable interface with filters for type of data or access patterns

Export of full or filtered audit history for integration with third-party SIEM tools

Cloudera Support

As the use of Hadoop grows and an increasing number of groups and applications move into production, your

Hadoop users will expect greater levels of performance and consistency. Cloudera’s proactive productionlevel support gives your administrators the expertise and responsiveness they need.

Cloudera Support includes:

Flexible Support Windows

Choose 8×5 or 24×7 to meet SLA requirements.

Configuration Checks

Verify that your Hadoop cluster is fine-tuned for your environment.

Escalation and Issue Resolution

Resolve support cases with maximum efficiency.

5.1 30 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Comprehensive Knowledge Base

Expand your Hadoop knowledge with hundreds of articles and tech notes.

Support for Certified Integration

 Connect your Hadoop cluster to your existing data analysis tools.

Proactive Notification

Stay up-to-speed on new developments and events.

With Cloudera Enterprise, you can leverage your existing team’s experience and Cloudera’s expertise to put your Hadoop system into effective operation. Built-in predictive capabilities anticipate shifts in the Hadoop infrastructure to support reliable function.

Cloudera Enterprise makes it easy to run open source Hadoop in production, by:

 Simplifying and accelerating Hadoop deployment

 Reducing the costs and risks of adopting Hadoop in production

 Reliably operating Hadoop in production with repeatable success

 Applying SLAs to Hadoop

 Increasing control over Hadoop cluster provisioning and management

5.1 31 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Dell | Cloudera Solution Deployment Methodology

A suggested deployment workflow is documented in the

Dell | Cloudera Solution Deployment Guide

, which is a complement to this reference architecture.

5.1 32 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Appendix A : Physical Configuration — PowerEdge C8000 Series

C8000 Chassis Configuration

Table 13: Chassis Configuration – PowerEdge C8000 Master Chassis

C8220X

DWC

(Master)

C8220X

DWC

(Admin)

Power Power

C8220X

DWC

(Edge)

Refer to Table 19 in Appendix B : for the bill of materials for this chassis.

Table 14: Chassis Configuration – PowerEdge C8000 High Availability Chassis

Empty Empty

C8220X

DWC

(Secondary)

C8220X

DWC

(HA)

Power Power

C8220X

DWC

Refer to Table 20 in Appendix B : for the bill of materials for this chassis

Table 15: Chassis Configuration – PowerEdge C8000 Data Nodes

C8220XD

DWS

C8220X

DWC

C8220XD

DWS

Power Power

Refer to Table 21 in Appendix B : for the bill of materials for this chassis

C8220X

DWC

C8220XD

DWS

5.1 33 Dell Confidential

C8220XD

DWS

Dell | Cloudera Solution Reference Architecture Guide v5.1

Table 16: Chassis Configuration – PowerEdge C8000 ‘Heavy’ Data Nodes

C8220X

DWC

Power Power

C8220XD

DWS

C8220XD

DWS

C8220XD

DWS

C8220X

DWC

Power Power

C8220XD

DWS

C8220X

DWC

C8220XD

DWS

C8220X

DWC

Power Power

C8220XD

DWS

C8220XD

DWS

Refer to Table 21 and Table 22 in Appendix B : for the bill of materials for these chassis. The “heavy” data node

configuration is ordered in groups of three chassis—two “heavy” data node chassis and one data node chassis.

5.1 34 Dell Confidential

RU

27-21

5.1

12

11

10

9

8

7

6

16

15

14

13

20

19

18

17

5

4

3

2

1

30

29

28

34

33

32

31

38

37

36

35

42

41

40

39

Master Chassis

Empty

Dell | Cloudera Solution Reference Architecture Guide v5.1

Table 17: Rack Configuration – PowerEdge C8000

RACK1

R1- Switch 2: 10Gb S4810

R1- Switch 1: 10Gb S4810

Cable Management

Cable Management

Cable Management

Cable Management

R1- Chassis06: Data node x 2

R1 - S55 iDRAC Mgmt switch

R1- Chassis05: Data node x 2

R1- Chassis04: Data node x 2

R1- Chassis03: Data node x 2

R1- Chassis02: Data node x 2

R1- Chassis01: Data node x 2

RACK2

R2- Switch2: 10Gb S4810

R2- Switch1: 10Gb S4810

Cable Management

Cable Management

HA Chassis

Cable Management

Cable Management

R2- Chassis06: Data node x 2

R2 - S55 iDRAC Mgmt switch

Empty

R2- Chassis05: Data node x 2

R2- Chassis04: Data node x 2

R2- Chassis03: Data node x 2

R2- Chassis02: Data node x 2

R2- Chassis01: Data node x 2

35

RACK3

R3- Switch2: 10Gb S4810

R3- Switch1: 10Gb S4810

Cable Management

Cable Management

R3 - Switch 1: Force10 S4810 (1 RU)

OR Force10 Z9000 (2 RU)

R3 - Switch 1: Force10 S4810 (1 RU)

OR Force10 Z9000 (2 RU)

Cable Management

Cable Management

R3- Chassis06: Data node x 2

R3 - S55 iDRAC Mgmt switch

Empty

R3- Chassis05: Data node x 2

R3- Chassis04: Data node x 2

R3- Chassis03: Data node x 2

R3- Chassis02: Data node x 2

R3- Chassis01: Data node x 2

Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

RU

24

23

6

5

8

7

4

3

14

13

12

11

10

9

22

17

16

15

21

20

19

18

2

1

42

41

40

39

38

37

36

35

34

33

32

25-31

Table 18: Rack Configuration – PowerEdge C8000 ‘Heavy’ Nodes

RACK1

R1- Switch 2: 10Gb S4810

R1- Switch 1: 10Gb S4810

Cable Management

Cable Management

Master Chassis

Cable Management

Cable Management

R1 - S55 iDRAC Mgmt switch

Empty

R1- Chassis06:

Data node x 4

(chassis 1 of 3)

R1- Chassis05:

Data node x 4

(chassis 2 of 3)

R1- Chassis04

Data node x 4

(chassis 3 of 3)

R1- Chassis03:

Data node x 4

(chassis 1 of 3)

R1- Chassis02:

Data node x 4

(chassis 2 of 3)

R1- Chassis01:

Data node x 4

(chassis 3 of 3)

RACK2

R2- Switch2: 10Gb S4810

R2- Switch1: 10Gb S4810

Cable Management

Cable Management

HA Chassis

Cable Management

Cable Management

R2 - S55 iDRAC Mgmt switch

Empty

R2- Chassis06:

Data node x 4

(chassis 1 of 3)

R2- Chassis05:

Data node x 4

(chassis 2 of 3)

R2- Chassis04:

Data node x 4

(chassis 3 of 3)

R2- Chassis03:

Data node x 4

(chassis 1 of 3)

R2- Chassis02:

Data node x 4

(chassis 2 of 3)

R2- Chassis01:

Data node x 4

(chassis 3 of 3)

NOTE: Four “heavy” data nodes require 12U of rack space

RACK3

R3- Switch2: 10Gb S4810

R3- Switch1: 10Gb S4810

Cable Management

Cable Management

R3 - Switch 1: Force10 S4810 (1 RU)

OR Force10 Z9000 (2 RU)

R3 - Switch 2: Force10 S4810 (1 RU)

OR Force10 Z9000 (2 RU)

Cable Management

Cable Management

R3 - S55 iDRAC Mgmt switch

Empty

R3- Chassis06:

Data node x 4

(chassis 1 of 3)

R3- Chassis05:

Data node x 4

(chassis 2 of 3)

R3- Chassis04:

Data node x 4

(chassis 3 of 3)

R3- Chassis03:

Data node x 4

(chassis 1 of 3)

R3- Chassis02:

Data node x 4

(chassis 2 of 3)

R3- Chassis01:

Data node x 4

(chassis 3 of 3)

5.1 36 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Appendix B : Bill of Materials – PowerEdge C8000 Series

For the PowerEdge C8000 series, the bill of materials is organized by chassis rather than node, to simplify ordering.

Table 19: Master Chassis – PowerEdge C8000

The master chassis includes the administration node, a master name node, and an edge node

SKU

Group: 1

225-3550

331-9573

331-8341

420-3323

331-8218

330-7353

318-2363

936-6035

936-4705

936-6145

936-4695

989-3439

936-3965

900-9997

973-2426

331-3282

Group: 2

210-ABBZ

318-2308

338-BDBG

5.1

338-BDBV

317-8810

317-4928

317-9095

319-1811

331-4424

331-4428

780-BBDB

342-5079

331-8996

342-4983

Component

Quantity: 1

PE C8000 Enclosure, Two Sleds with Dual PSU

SHIP,C8000,DAO

PowerEdge C8000 Shipping

No Factory Installed Operating System

PowerEdge C8000 Static Rails, Toolless

Power Cord, C13 to C14, PDU Style, 12 Amps, 2 meter, Qty 4

PowerEdge C8000 Sled Blank, Single Width Quantity 2

Dell Hardware Limited Warranty Plus On Site Service Initial Year

ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year

Dell Hardware Limited Warranty Plus On Site Service Extended Year

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, 2 Year Extended

Dell ProSupport. For tech support, visit http://support.dell.com/ProSupport or call 1-800-945-3355

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, Initial Year

On-Site Installation Declined

Declined Remote Consulting Service

CLOUD COMPUTE NODE,DCS, INFO MOD,HADOOP

Quantity: 3

PowerEdge C8220X Double Width Compute Sled, X6

Thermal Heatsink Quantity 2

Intel Xeon E5-2670v2 2.5GHz,25M Cache, 8.0GT/s QPI, Turbo, HT, 10C, 115W, Max Mem 1866MHz

Intel Xeon E5-2670v2 2.5GHz,25M Cache, 8.0GT/s QPI, Turbo, HT, 10C, 115W, Max Mem

1866MHz,2nd Proc

Memory Filler Blank Dimm Quantity 8

Dual Processor Option

Memory Filler Blank DIMM Quantity 6

8GB RDIMM, 1600MT/s, Low Volt, Dual Rank, x4 Data Width Quantity 16

1600 MHz RDIMMS

Performance Optimized

C10A,LSI 2008 Controller

LSI 2008 SAS Controller Card, 6G, PE C8XXX

Cable for 2.5in Rear Hard Drives, PE-C8220X

Hot Plug Hard Drive Carrier,PE-C8220X

37 Dell Confidential

342-5057

342-4871

342-4821

342-4986

342-0088

430-3643

421-8663

330-4118

934-9845

996-9927

935-0585

935-0575

989-3439

934-0626

900-9997

973-2426

331-3282

Dell | Cloudera Solution Reference Architecture Guide v5.1

2.5in HDD Blank, PE-C8220X

1TB 7.2K RPM SATA 3Gbps 2.5in Hard Drive Quantity 6

Hard Drive Carrier 2.5 C8000 Quantity 6

2.5in HDD Enclosure, PE-C8220X

No Hard Drive

Intel DA 10GbE NIC, Dual Port, SFP+,Low Profile Quantity 2

No Factory Installed Operating System, v.2

System ordered as part of Multipack order

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, Initial Year

Dell Hardware Limited Warranty Plus On Site Service Initial Year

ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, 2 Year Extended

Dell ProSupport. For tech support, visit http://support.dell.com/ProSupport or call 1-800-945-3355

Dell Hardware Limited Warranty Plus On Site Service Extended Year

On-Site Installation Declined

Declined Remote Consulting Service

CLOUD COMPUTE NODE,DCS, INFO MOD,HADOOP

5.1 38 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Table 20: HA Chassis – PowerEdge C8000

The HA Chassis includes a secondary name node, the HA node, and one data node

SKU

Group: 1

225-3550

331-8341

331-9573

420-3323

331-8218

330-7353

318-2363

989-3439

936-3965

936-4695

936-4705

936-6035

936-6145

900-9997

973-2426

331-9532

331-3282

Group: 2

210-ABBZ

338-BDBG

317-8810

317-9095

338-BDBV

317-4928

318-2308

319-1811

331-4424

331-4428

780-BBDB

331-8996

342-5079

342-4983

Component

Quantity: 1

PE C8000 Enclosure, Two Sleds with Dual PSU

PowerEdge C8000 Shipping

SHIP,C8000,DAO

No Factory Installed Operating System

PowerEdge C8000 Static Rails, Toolless

Power Cord, C13 to C14, PDU Style, 12 Amps, 2 meter, Qty 4

PowerEdge C8000 Sled Blank, Single Width Quantity 2

Dell ProSupport. For tech support, visit http://support.dell.com/ProSupport or call 1-

800-945-3355

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, Initial Year

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, 2 Year Extended

ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year

Dell Hardware Limited Warranty Plus On Site Service Initial Year

Dell Hardware Limited Warranty Plus On Site Service Extended Year

On-Site Installation Declined

Declined Remote Consulting Service

LSI 9202 SAS Controller Cable Quantity 2

CLOUD COMPUTE NODE,DCS, INFO MOD,HADOOP

Quantity: 2

PowerEdge C8220X Double Width Compute Sled, X6

Intel Xeon E5-2670v2 2.5GHz,25M Cache, 8.0GT/s QPI, Turbo, HT, 10C, 115W, Max Mem

1866MHz

Memory Filler Blank Dimm Quantity 8

Memory Filler Blank DIMM Quantity 6

Intel Xeon E5-2670v2 2.5GHz,25M Cache, 8.0GT/s QPI, Turbo, HT, 10C, 115W, Max Mem

1866MHz,2nd Proc

Dual Processor Option

Thermal Heatsink

8GB RDIMM, 1600MT/s, Low Volt, Dual Rank, x4 Data Width Quantity 16

1600 MHz RDIMMS

Performance Optimized

C10A,LSI 2008 Controller

Cable for 2.5in Rear Hard Drives, PE-C8220X

LSI 2008 SAS Controller Card, 6G, PE C8XXX

Hot Plug Hard Drive Carrier,PE-C8220X

5.1 39 Dell Confidential

989-3439

900-9997

973-2426

331-3282

Group: 3

210-ABBZ

317-4928

318-2308

SKU

342-5057

342-4871

342-4986

342-4821

342-0088

430-3643

421-8663

330-4118

934-0626

935-0585

996-9927

934-9845

935-0575

338-BDBG

317-9095

317-8810

338-BDBV

319-1811

331-4424

331-4428

342-5079

780-BBDB

331-8999

342-4983

342-4820

342-5855

342-4987

5.1

Dell | Cloudera Solution Reference Architecture Guide v5.1

Component

2.5in HDD Blank, PE-C8220X

1TB 7.2K RPM SATA 3Gbps 2.5in Hard Drive Quantity 6

2.5in HDD Enclosure, PE-C8220X

Hard Drive Carrier 2.5 C8000 Quantity 6

No Hard Drive

Intel DA 10GbE NIC, Dual Port, SFP+,Low Profile Quantity 2

No Factory Installed Operating System, v.2

System ordered as part of Multipack order

Dell Hardware Limited Warranty Plus On Site Service Extended Year

ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year

Dell Hardware Limited Warranty Plus On Site Service Initial Year

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, Initial Year

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, 2 Year Extended

Dell ProSupport. For tech support, visit http://support.dell.com/ProSupport or call 1-

800-945-3355

On-Site Installation Declined

Declined Remote Consulting Service

CLOUD COMPUTE NODE,DCS, INFO MOD,HADOOP

Quantity:1

PowerEdge C8220X Double Width Compute Sled, X6

Dual Processor Option

Thermal Heatsink Quantity 2

Intel Xeon E5-2670v2 2.5GHz,25M Cache, 8.0GT/s QPI, Turbo, HT, 10C, 115W, Max Mem

1866MHz

Memory Filler Blank DIMM Quantity 6

Memory Filler Blank Dimm Quantity 8

Intel Xeon E5-2670v2 2.5GHz,25M Cache, 8.0GT/s QPI, Turbo, HT, 10C, 115W, Max Mem

1866MHz,2nd Proc

8GB RDIMM, 1600MT/s, Low Volt, Dual Rank, x4 Data Width Quantity 8

1600 MHz RDIMMS

Performance Optimized

LSI 2008 SAS Controller Card, 6G, PE C8XXX

C10A,LSI 2008 Controller

SAS Controller Cable, PE-C8220X

Hot Plug Hard Drive Carrier,PE-C8220X

Hard Drive Carrier 3.5 C8000 Quantity 4

4TB,Near Line SAS 6Gps,7.2K RPM, 3.5in Hard Drive Quantity 4

3.5in HDD Enclosure, PE-C8220X

40 Dell Confidential

SKU

342-0088

342-4851

430-3643

421-8663

330-4118

934-0626

935-0585

934-9845

989-3439

996-9927

935-0575

900-9997

Group: 4

973-2426

331-3282

225-3558

420-3323

342-4824

342-5855

989-3439

934-4706

934-4716

934-6156

934-6046

934-3976

900-9997

973-2426

330-4118

331-3282

Software and

Accessories

332-0727

Dell | Cloudera Solution Reference Architecture Guide v5.1

Component

No Hard Drive

LSI 9202-16E, LP, Controller, CE

Intel DA 10GbE NIC, Dual Port, SFP+,Low Profile

No Factory Installed Operating System, v.2

System ordered as part of Multipack order

Dell Hardware Limited Warranty Plus On Site Service Extended Year

ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, Initial Year

Dell ProSupport. For tech support, visit http://support.dell.com/ProSupport or call 1-

800-945-3355

Dell Hardware Limited Warranty Plus On Site Service Initial Year

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, 2 Year Extended

On-Site Installation Declined

Declined Remote Consulting Service

CLOUD COMPUTE NODE,DCS, INFO MOD,HADOOP

Quantity: 1

PowerEdge C8000XD Storage Sled, Single, 12 Hard Drives

No Factory Installed Operating System

Hard Drive Carrier,3.5,Expanded,Double Wide Storage,C8000 Quantity 12

4TB,Near Line SAS 6Gps,7.2K RPM, 3.5in Hard Drive Quantity 12

Dell ProSupport. For tech support, visit http://support.dell.com/ProSupport or call 1-

800-945-3355

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, 2 Year Extended

ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year

Dell Hardware Limited Warranty Plus On Site Service Extended Year

Dell Hardware Limited Warranty Plus On Site Service Initial Year

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, Initial Year

On-Site Installation Declined

Declined Remote Consulting Service

System ordered as part of Multipack order

CLOUD COMPUTE NODE,DCS, INFO MOD,HADOOP

External Cable for LSI9202, Customer Install C8xxx – Quantity: 1

5.1 41 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Table 21: Data Node Chassis – PowerEdge C8000

The Data node chassis includes two data nodes.

SKU

Group: 1

225-3550

331-8341

331-9573

420-3323

331-8218

330-7353

936-4695

936-3965

989-3439

936-6145

936-4705

936-6035

900-9997

331-3282

Group: 2

331-9532

210-ABBZ

338-BDBV

338-BDBG

317-9095

317-8810

317-4928

318-2308

319-1811

331-4424

331-4428

331-8999

342-5079

780-BBCT

342-4983

342-4987

342-4820

Component

Quantity: 1

PE C8000 Enclosure, Two Sleds with Dual PSU

PowerEdge C8000 Shipping

SHIP,C8000,DAO

No Factory Installed Operating System

PowerEdge C8000 Static Rails, Toolless

Power Cord, C13 to C14, PDU Style, 12 Amps, 2 meter, Qty1

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, 2 Year Extended

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, Initial Year

Dell ProSupport. For tech support, visit http://support.dell.com/ProSupport or call 1-800-

945-3355

Dell Hardware Limited Warranty Plus On Site Service Extended Year

ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year

Dell Hardware Limited Warranty Plus On Site Service Initial Year

On-Site Installation Declined

CLOUD COMPUTE NODE,DCS, INFO MOD,HADOOP

Quantity: 2

LSI 9202 SAS Controller Cable

PowerEdge C8220X Double Width Compute Sled, X6

Intel Xeon E5-2670v2 2.5GHz,25M Cache, 8.0GT/s QPI, Turbo, HT, 10C, 115W, Max Mem

1866MHz,2nd Proc

Intel Xeon E5-2670v2 2.5GHz,25M Cache, 8.0GT/s QPI, Turbo, HT, 10C, 115W, Max Mem

1866MHz

Memory Filler Blank DIMM Quantity 6

Memory Filler Blank Dimm Quantity 8

Dual Processor Option

Thermal Heatsink Quantity 2

8GB RDIMM, 1600MT/s, Low Volt, Dual Rank, x4 Data Width Quantity 8

1600 MHz RDIMMS

Performance Optimized

SAS Controller Cable, PE-C8220X

LSI 2008 SAS Controller Card, 6G, PE C8XXX

C10B,LSI 2008 and Onboard Controller

Hot Plug Hard Drive Carrier,PE-C8220X

3.5in HDD Enclosure, PE-C8220X

Hard Drive Carrier 3.5 C8000 Quantity 4

5.1 42 Dell Confidential

989-3439

900-9997

973-2426

331-3282

Group: 3

225-3558

420-3323

342-4824

342-5855

989-3439

934-4706

934-4716

934-6156

934-6046

934-3976

900-9997

973-2426

330-4118

331-3282

342-5855

342-4861

342-4841

342-4851

430-3643

421-8663

330-4118

934-0626

996-9927

934-9845

935-0585

935-0575

Dell | Cloudera Solution Reference Architecture Guide v5.1

4TB,Near Line SAS 6Gps,7.2K RPM, 3.5in Hard Drive Quantity 4

1TB,7.2K RPM,SATA,3Gbps,2.5in, Hard Drive Quantity 2

Hard Drive,2.5 Rear Carrier,C8220 Quantity 2

LSI 9202-16E, LP, Controller, CE

Intel DA 10GbE NIC, Dual Port, SFP+,Low Profile

No Factory Installed Operating System, v.2

System ordered as part of Multipack order

Dell Hardware Limited Warranty Plus On Site Service Extended Year

Dell Hardware Limited Warranty Plus On Site Service Initial Year

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, Initial Year

ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, 2 Year Extended

Dell ProSupport. For tech support, visit http://support.dell.com/ProSupport or call 1-800-

945-3355

On-Site Installation Declined

Declined Remote Consulting Service

CLOUD COMPUTE NODE,DCS, INFO MOD,HADOOP

Quantity: 2

PowerEdge C8000XD Storage Sled, Single, 12 Hard Drives

No Factory Installed Operating System

Hard Drive Carrier,3.5,Expanded,Double Wide Storage,C8000 Quantity 12

4TB,Near Line SAS 6Gps,7.2K RPM, 3.5in Hard Drive Quantity 12

Dell ProSupport. For tech support, visit http://support.dell.com/ProSupport or call 1-800-

945-3355

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, 2 Year Extended

ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year

Dell Hardware Limited Warranty Plus On Site Service Extended Year

Dell Hardware Limited Warranty Plus On Site Service Initial Year

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, Initial Year

On-Site Installation Declined

Declined Remote Consulting Service

System ordered as part of Multipack order

CLOUD COMPUTE NODE,DCS, INFO MOD,HADOOP

5.1 43 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Table 22: Heavy Data Node Chassis – PowerEdge C8000

The Heavy Data node chassis is used to configure four heavy data nodes in three chassis. Order two heavy data node chassis and one data node chassis for this configuration.

SKU

Group: 1

225-3550

331-8341

331-9573

420-3323

331-8218

330-7353

936-4695

936-3965

989-3439

936-6145

936-4705

936-6035

900-9997

973-2426

331-9532

331-3282

Component

Quantity: 1

PE C8000 Enclosure, Two Sleds with Dual PSU

PowerEdge C8000 Shipping

SHIP,C8000,DAO

No Factory Installed Operating System

PowerEdge C8000 Static Rails, Toolless

Power Cord, C13 to C14, PDU Style, 12 Amps, 2 meter, Qty1

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, 2 Year Extended

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, Initial Year

Dell ProSupport. For tech support, visit http://support.dell.com/ProSupport or call 1-800-

945-3355

Dell Hardware Limited Warranty Plus On Site Service Extended Year

ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year

Dell Hardware Limited Warranty Plus On Site Service Initial Year

On-Site Installation Declined

Declined Remote Consulting Service

LSI 9202 SAS Controller Cable Quantity 3

CLOUD COMPUTE NODE,DCS, INFO MOD,HADOOP

Group: 2

210-ABBZ

338-BDBV

338-BDBG

317-9095

317-8810

317-4928

318-2308

319-1811

331-4424

331-4428

331-8999

342-5079

780-BBCT

342-4983

Quantity: 1

PowerEdge C8220X Double Width Compute Sled, X6

Intel Xeon E5-2670v2 2.5GHz,25M Cache, 8.0GT/s QPI, Turbo, HT, 10C, 115W, Max Mem

1866MHz,2nd Proc

Intel Xeon E5-2670v2 2.5GHz,25M Cache, 8.0GT/s QPI, Turbo, HT, 10C, 115W, Max Mem

1866MHz

Memory Filler Blank DIMM Quantity 6

Memory Filler Blank Dimm Quantity 8

Dual Processor Option

Thermal Heatsink

8GB RDIMM, 1600MT/s, Low Volt, Dual Rank, x4 Data Width Quantity 8

1600 MHz RDIMMS

Performance Optimized

SAS Controller Cable, PE-C8220X

LSI 2008 SAS Controller Card, 6G, PE C8XXX

C10B,LSI 2008 and Onboard Controller

Hot Plug Hard Drive Carrier,PE-C8220X

5.1 44 Dell Confidential

Group: 3

225-3558

420-3323

342-4824

342-5855

989-3439

934-4706

934-4716

934-6156

934-6046

934-3976

900-9997

973-2426

330-4118

331-3282

342-4987

342-4820

342-5855

342-4861

342-4841

342-4851

430-3643

421-8663

330-4118

934-0626

996-9927

934-9845

935-0585

935-0575

989-3439

900-9997

973-2426

331-3282

Dell | Cloudera Solution Reference Architecture Guide v5.1

3.5in HDD Enclosure, PE-C8220X

Hard Drive Carrier 3.5 C8000 Quantity 4

4TB,Near Line SAS 6Gps,7.2K RPM, 3.5in Hard Drive Quantity 4

1TB,7.2K RPM,SATA,3Gbps,2.5in, Hard Drive Quantity 2

Hard Drive,2.5 Rear Carrier,C8220 Quantity 2

LSI 9202-16E, LP, Controller, CE

Intel DA 10GbE NIC, Dual Port, SFP+,Low Profile

No Factory Installed Operating System, v.2

System ordered as part of Multipack order

Dell Hardware Limited Warranty Plus On Site Service Extended Year

Dell Hardware Limited Warranty Plus On Site Service Initial Year

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, Initial Year

ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, 2 Year Extended

Dell ProSupport. For tech support, visit http://support.dell.com/ProSupport or call 1-800-

945-3355

On-Site Installation Declined

Declined Remote Consulting Service

CLOUD COMPUTE NODE,DCS, INFO MOD,HADOOP

Quantity: 3

PowerEdge C8000XD Storage Sled, Single, 12 Hard Drives

No Factory Installed Operating System

Hard Drive Carrier,3.5,Expanded,Double Wide Storage,C8000 Quantity 12

4TB,Near Line SAS 6Gps,7.2K RPM, 3.5in Hard Drive Quantity 12

Dell ProSupport. For tech support, visit http://support.dell.com/ProSupport or call 1-800-

945-3355

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, 2 Year Extended

ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year

Dell Hardware Limited Warranty Plus On Site Service Extended Year

Dell Hardware Limited Warranty Plus On Site Service Initial Year

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, Initial Year

On-Site Installation Declined

Declined Remote Consulting Service

System ordered as part of Multipack order

CLOUD COMPUTE NODE,DCS, INFO MOD,HADOOP

5.1 45 Dell Confidential

21-

31

2

5.1

1

4

3

6

5

10

9

8

7

16

15

14

13

12

11

20

19

18

17

RU

35

34

33

32

39

38

37

36

42

41

40

Dell | Cloudera Solution Reference Architecture Guide v5.1

Appendix C : Physical Configuration — PowerEdge R720xd

Table 23: Rack Configuration – PowerEdge R720xd (or R720/R720xd)

RACK1

R1- Switch 2: Force10 S4810

R1- Switch 1: Force10 S4810

Cable Management

Cable Management

Master Name Node:R720xd or R720

Cable Management

Cable Management

Admin Node R720xd or R720

R1 - S55 iDRAC Mgmt switch

RACK2

R2- Switch2: Force10 S4810

R2- Switch1: Force10 S4810

Cable Management

Cable Management

Edge01: R720xd or R720

Cable Management

Cable Management

Secondary Name Node R720xd or

R720

R2 - S55 iDRAC Mgmt switch

RACK3

R3- Switch2: Force10 S4810

R3- Switch1: Force10 S4810

Cable Management

Cable Management

R3 - Switch 1: Force10 S4810

R3 - Switch 2: Force10 S4810

Cable Management

Cable Management

HA Node: R720xd or R720

R3 - S55 iDRAC Mgmt switch

Empty

R1- Chassis10: R720xd

R1- Chassis09: R720xd

R1- Chassis08: R720xd

R1- Chassis07: R720xd

R1- Chassis06: R720xd

R1- Chassis05: R720xd

R1- Chassis04: R720xd

R1- Chassis03: R720xd

R1- Chassis02: R720xd

R1- Chassis01: R720xd

Empty

R2- Chassis10: R720xd

R2- Chassis09: R720xd

R2- Chassis08: R720xd

R2- Chassis07: R720xd

R2- Chassis06: R720xd

R2- Chassis05: R720xd

R2- Chassis04: R720xd

R2- Chassis03: R720xd

R2- Chassis02: R720xd

R2- Chassis01: R720xd

46

Empty

R3- Chassis10: R720xd

R3- Chassis09: R720xd

R3- Chassis08: R720xd

R3- Chassis07: R720xd

R3- Chassis06: R720xd

R3- Chassis05: R720xd

R3- Chassis04: R720xd

R3- Chassis03: R720xd

R3- Chassis02: R720xd

R3- Chassis01: R720xd

Dell Confidential

338-BDBV

5.1

317-8688

331-4424

331-4428

319-1812

331-4403

342-3529

341-8730

421-5339

430-4447

331-4440

430-4445

331-4605

330-3151

330-5116

331-4433

318-1375

313-9092

310-5171

420-6320

421-5736

939-2768

936-4603

939-2678

936-4593

988-9281

900-9997

926-2979

331-3282

Dell | Cloudera Solution Reference Architecture Guide v5.1

Appendix D : Bill of Materials – PowerEdge R720 Nodes

SKU

331-3765

591-BBBP

210-ABVP

342-3587

331-4437

338-BDBG

331-4508

Table 24: Active and Standby Name, Admin, Edge and HA Nodes – PowerEdge R720

Component

UEFI BIOS Setting

PowerEdge R720 Motherboard, TPM

PowerEdge R720, Intel Xeon E-26XX v2 Processors

3.5" Chassis with up to 8 Hard Drives

PowerEdge R720 Shipping

Intel Xeon E5-2670v2 2.5GHz,25M Cache, 8.0GT/s QPI, Turbo, HT, 10C, 115W, Max Mem 1866MHz

Heat Sink for PowerEdge R720and R720xd Quantity 2

Intel Xeon E5-2670v2 2.5GHz,25M Cache, 8.0GT/s QPI, Turbo, HT, 10C, 115W, Max Mem 1866MHz,2nd

Proc

DIMM Blanks for Systems with2 Processors

1600 MHz RDIMMS

Performance Optimized

16GB RDIMM, 1600MT/s, Low Volt, Dual Rank, x4 Data Width Quantity 8

Unconfigured RAID for H710P/H710/H310 (1-16 HDDs)

PERC H710 Integrated RAID Controller, 512MB NV Cache

1TB 7.2K RPM SATA 3Gbps 3.5in Hot-plug Hard Drive Quantity 8 iDRAC7 Enterprise

Intel Ethernet I350 QP 1Gb Network Daughter Card

Risers with up to 6, x8 PCIeSlots + 1, x16 PCIe Slot

Intel X520 DP 10Gb DA/SFP+ Server Adapter Quantity 2

Dual, Hot-plug, Redundant Power Supply (1+1), 750W

Power Cord, C13 to C14, PDU Style, 12 Amps, 2 meter, Qty 2

Power Saving Dell Active Power Controller

ReadyRails Sliding Rails With Cable Management Arm

Bezel

DVD ROM, SATA, INTERNAL

No System Documentation, No OpenManage DVD Kit

No Operating System

No Media Required

Dell Hardware Limited Warranty Plus On Site Service Initial Year

ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year

Dell Hardware Limited Warranty Plus On Site Service Extended Year

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, 2 Year Extended

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, Initial Year

On-Site Installation Declined

Proactive Maintenance Service Declined

CLOUD COMPUTE NODE,DCS, INFO MOD,HADOOP

47 Dell Confidential

338-BDBV

331-4424

331-4428

319-1811

331-4533

428-BBBX

342-5272

421-5339

430-4447

430-4445

331-4605

330-3151

330-5116

331-4433

318-1375

331-5914

420-6320

421-5736

939-3398

989-3439

936-7243

936-7263

936-0967

989-2701

900-9997

926-2979

331-3282

Dell | Cloudera Solution Reference Architecture Guide v5.1

Appendix E : Bill of Materials – PowerEdge R720xd 3.5” Data Node

SKU

331-3765

210-ABMY

591-BBBP

342-3567

331-4437

338-BDBG

317-8688

331-4508

Table 25: Data node – PowerEdge R720xd

Component

UEFI BIOS Setting

PowerEdge R720xd, Intel XeonE-26XX v2 Processors

PowerEdge R720 Motherboard, TPM

Chassis with up to 12, 3.5" Hard Drives

PowerEdge R720 Shipping

Intel Xeon E5-2670v2 2.5GHz,25M Cache, 8.0GT/s QPI, Turbo, HT, 10C, 115W, Max Mem 1866MHz

DIMM Blanks for Systems with2 Processors

Heat Sink for PowerEdge R720and R720xd Quantity 2

Intel Xeon E5-2670v2 2.5GHz,25M Cache, 8.0GT/s QPI, Turbo, HT, 10C, 115W, Max Mem

1866MHz,2nd Proc

1600 MHz RDIMMS

Performance Optimized

8GB RDIMM, 1600MT/s, Low Volt, Dual Rank, x4 Data Width Quantity 8

No RAID for H310 (1-16 HDDs)

LSI 9207, Internal Passthrough Host Bus Adapter Card for R720 and R720 XD with 3.5in HDDs

4TB 7.2K RPM SATA 3Gbps 3.5in Hot-plug Hard Drive Quantity 12 iDRAC7 Enterprise

Intel Ethernet I350 QP 1Gb Network Daughter Card

Intel X520 DP 10Gb DA/SFP+ Server Adapter

Dual, Hot-plug, Redundant Power Supply (1+1), 750W

Power Cord, C13 to C14, PDU Style, 12 Amps, 2 meter, Qty2

Power Saving Dell Active Power Controller

ReadyRails Sliding Rails With Cable Management Arm

Bezel

Electronic System Documentation and OpenManage DVD Kit forR720 and R720xd

No Operating System

No Media Required

Dell Hardware Limited Warranty Plus On Site Service Extended Year

Dell ProSupport. For tech support, visit http://support.dell.com/ProSupport

or call 1-800-945-3355

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, 2 Year Extended

ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year

Dell Hardware Limited Warranty Plus On Site Service Initial Year

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, Initial Year

On-Site Installation Declined

Proactive Maintenance Service Declined

CLOUD COMPUTE NODE,DCS, INFO MOD,HADOOP

5.1 48 Dell Confidential

317-8688

338-BDBV

331-4424

331-4428

319-1811

331-4533

342-5964

342-1998

421-5339

430-4447

430-4445

331-4605

330-3151

330-5116

331-4433

318-1375

331-5914

420-6320

421-5736

989-3439

939-3398

936-0967

Dell | Cloudera Solution Reference Architecture Guide v5.1

Appendix F : Bill of Materials – PowerEdge R720xd 2.5” Data Node

SKU

331-3765

210-ABMY

591-BBBP

342-3566

331-4437

331-4508

338-BDBG

936-7243

936-7263

989-2701

900-9997

Table 26: Data node – PowerEdge R720xd

Component

UEFI BIOS Setting

PowerEdge R720xd, Intel XeonE-26XX v2 Processors

PowerEdge R720 Motherboard, TPM

Chassis with up to 24, 2.5" Hard Drives

PowerEdge R720 Shipping

Heat Sink for PowerEdge R720and R720xd Quantity 2

Intel Xeon E5-2670v2 2.5GHz,25M Cache, 8.0GT/s QPI, Turbo, HT, 10C, 115W, Max

Mem 1866MHz

DIMM Blanks for Systems with2 Processors

Intel Xeon E5-2670v2 2.5GHz,25M Cache, 8.0GT/s QPI, Turbo, HT, 10C, 115W, Max

Mem 1866MHz,2nd Proc

1600 MHz RDIMMS

Performance Optimized

8GB RDIMM, 1600MT/s, Low Volt, Dual Rank, x4 Data Width Quantity 8

No RAID for H310 (1-16 HDDs)

LSI 9207, Internal Passthrough Host Bus Adapter Card for R720 and R720 XD with

2.5in HDDs

1TB 7.2K RPM SATA 3Gbps 2.5in Hot-plug Hard Drive Quantity 24 iDRAC7 Enterprise

Intel Ethernet I350 QP 1Gb Network Daughter Card

Intel X520 DP 10Gb DA/SFP+ Server Adapter

Dual, Hot-plug, Redundant Power Supply (1+1), 750W

Power Cord, C13 to C14, PDU Style, 12 Amps, 2 meter, Qty2

Power Saving Dell Active Power Controller

ReadyRails Sliding Rails With Cable Management Arm

Bezel

Electronic System Documentation and OpenManage DVD Kit forR720 and R720xd

No Operating System

No Media Required

Dell ProSupport. For tech support, visit http://support.dell.com/ProSupport or call

1-800-945-3355

Dell Hardware Limited Warranty Plus On Site Service Extended Year

Dell Hardware Limited Warranty Plus On Site Service Initial Year

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, 2 Year

Extended

ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Year

ProSupport: Next Business Day Onsite Service After ProblemDiagnosis, Initial Year

On-Site Installation Declined

5.1 49 Dell Confidential

926-2979

331-3282

Dell | Cloudera Solution Reference Architecture Guide v5.1

Proactive Maintenance Service Declined

CLOUD COMPUTE NODE,DCS, INFO MOD,HADOOP

5.1 50 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Appendix G : Part Numbers – Force10 Network Equipment

SKU

Table 27: Network Equipment – 1GbE – Dell Force10

Description

225-2446

331-5996

331-5343

430-4543

331-7279

225-2477

225-2479

331-5103

331-5105

331-5258

331-5996

421-6981

430-4543

331-5274

430-4543

331-5393

225-2450

331-5233

331-5996

331-5226

331-5398

Force10, Z9000, 2U, 32 x 40Gbe QSFP+ Ports, 1 AC Pwr Supply, Fan w/IO Panel to PSU

(Normal) Airflow (Non-Redundant Pwr)

Force10, Power Cord, 125V, 15A, 10 Feet, NEMA 5-15/C13, S-Series

Force10, Z9000, AC Power Supply for Chassis with IO Panel to PSU (Normal) Airflow

Force10, Transceiver, 40GE QSFP+ Short Reach Optics, 850nmWavelength, 100-150m

Reach onOM3/OM4

Force10, Z9000 Cable Management Kit

Force10, S4810P, 48 x 10GbE SFP+, 4 x QSFP 40GbE, 1 x AC PSU, 2 x Fans, IO Panel to PSU

Airflow

Force10, S4810P, 48 x 10GbE SF P+, 4 x QSFP 40GbE, 1 x AC PSU , 2 x Fans, PSU to IO Panel

Airflow

Force10, S4810, AC Power Supply, IO Panel to PSU Airflow

Force10, S4810, AC Power Supply, PSU to IO Panel Airflow

Force10, Cable, SFP+ to SFP+, 10GbE, Copper Twinax Direct Attach Cable, 2 Meters

Force10, Power Cord, 125V, 15A, 10 Feet, NEMA 5-15/C13, S-Series

Force10, Software, L3 Latest Version, S4810

Force10, Transceiver, 40GE QSFP+ Short Reach Optics, 850nmWavelength, 100-150m

Reach onOM3/OM4

Force10, Transceiver, SFP+, 10GbE, SR, 850nm Wavelength, 300m Reach

Force10, Transceiver, 40GE QSFP+ Short Reach Optics, 850nmWavelength, 100-150m

Reach onOM3/OM4

Force10, Rear Rack Mounting Bracket, 4 Post, S4810

Force10, S60, 44 x 10/100/1000 BASE-T, 4 x SFP, 2 Expansion Slots, 1 x AC PSU, 2 x fans, P

SU to IO Panel Airflow

Force10, SFP+ Expansion Module , 2 x 10 GbE Ports, S60 Series (SFP+ optics required)

Force10, Power Cord, 125V, 15A , 10 Feet, NEMA 5-15/C13, S-Series

Force10, S60, AC Power Supply, PSU to IO Panel Airflow

Force10, Rear Rack Mounting Bracket, Metal, 4 Post, S60

Force10 S60 2 port, 12G, Stacking module

Force10 S60 12 Gig 60cms stacking cable

5.1 51 Dell Confidential

5.1

SKU

331-5274

330-8723

225-2477

331-5996

331-5272

331-5393

331-6279

935-0103

935-0143

931-3856

989-3439

996-2760

935-0123

996-2670

900-9997

996-3080

331-9460

331-5217

331-3282

225-2503

331-5233

331-5243

331-5996

331-5252

331-9233

331-6271

935-1367

938-7578

Dell | Cloudera Solution Reference Architecture Guide v5.1

Table 28: Network Equipment – 10GbE – Dell Force10

Description

Cluster Network

Dell Networking, Transceiver, SFP+, 10GbE, SR, 850nm Wavelength, 300m Reach

SFP+, Short Range, Optical Transceiver, LC Connector, 10Gb and 1Gb compatible(Intel

10G SFP+)

Force10, S4810P, 48 x 10GbE SFP+, 4 x QSFP 40GbE, 1 x AC PSU, 2 x Fans, IO Panel to

PSU Airflow

Force10, Power Cord, 125V, 15A, 10 Feet, NEMA 5-15/C13, S-Series

Dell Networking, Transceiver, SFP, 1000BASE-LX, 1310nm Wavelength, 10km Reach

Force10, Rear Rack Mounting Bracket, 4 Post, S4810

Force10, User Documentation for S4810, DAO/BCC

SW Support,Force10 Software ,3 Years

ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Years

ProSupport: 4-Hour 7x24 Parts Only After Problem Diagnosis, Initial Year

Dell ProSupport. For tech support, visit http://support.dell.com/ProSupport or call 1-

800-945-3355

Dell Hardware Limited Warranty Extended Year(s)

ProSupport: 4-Hour 7x24 Parts Only After Problem Diagnosis, 2 Year Extended

Dell Hardware Limited Warranty Initial Year

On-Site Installation Declined

ProSupport for, Force10,Layer 3 Enablement, 1 Year

Force10, Software, iSCSI-Optimized Configuration, S4810

Customer Kit, Dell Networking, Cable, QSFP+, 40GbE SFP+ Passive Copper Direct

Attach Cable, 1 Meter

CLOUD COMPUTE NODE,DCS, INFO MOD,HADOOP

Administration Network

Force10, S55, 44 x 10/100/1000 BASE-T, 4 x SFP, 2 Expansion Slots, 1 x AC PSU, 2 x

Fans, IO Panel to PSU Airfl (225-2503)

Forcd10 SFP+ Expansion Module 2x10 Gbe Ports

Force10, S55, AC Power Supply, IO Panel to PSU Airflow (331-5243)

Force10, Power Cord, 125V, 15A, 10 Feet, NEMA 5-15/C13, S-Series (331-5996)

Force10, Rear Rack Mounting Bracket, 4 Post, S55 (331-5252)

No Returns Allowed on Dell Force10 Switches (331-9233)

Force10, User Documentation for S55/S60, DAO/BCC (331-6271)

Dell Hardware Limited Warranty Initial Year (935-1367)

Dell Hardware Limited Warranty Extended Year(s) (938-7578)

52 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

989-3439

995-0592

995-0622

995-9649

996-0530

996-0540

990-9997

331-3282

Dell ProSupport. For tech support, visit http://support.dell.com/ProSupport or call 1-

800-945-3355 (989-3439)

ProSupport: Next Business Day Parts Delivery, 2 Year Extended (995-0592)

ProSupport: 7x24 HW / SW Tech Support and Assistance, 3 Years (995-0622)

SW Support,Force10 Software ,5 Years (995-9649)

ProSupport: Next Business Day Parts Delivery, Initial Year (996-0530)

Force10, 5 Year Return To Depot Service, Base Warranty (996-0540)

On-Site Installation Declined (900-9997)

CLOUD COMPUTE NODE,DCS, INFO MOD,HADOOP

Networking Equipment notes

Theses SKU’s are provided for reference. The actual quantities of switches and connections required will depend on the cluster size, and the final rack layout.

The above list of SKUs includes switches that have specific air flow options. There are both I/O to PSU SKU numbers and PSU to I/O side options available for reverse air flow. Redundant FANs (other than the minimum supplied with chassis) should also be same direction as the base switch. The airflow cannot be reversed in the field at this time.

The above list shows the AC power supplies only. All switch models are available in DC as well.

The above list includes the necessary cables for the connections between the switches for uplinks and interconnects.

The BOMs do not include the cables required for connecting the individual servers into the cluster, since the exact cables required depend on the final chosen rack layout, and choice of cable is often based on customer

preference. Refer to Table 12 for the required cable quantities.

Server Racks and Power

The above list of SKUs for the servers includes many items. However, they do not include racks or power distribution units, as they are generally site specific. The PowerEdge C8000 server line requires 240V power and other servers are dual voltage (110/240). The physical dimensions and power requirements need to be reviewed, as the PowerEdge C8000 requires extra space for front-side cable management and rear power distribution, in addition to extra depth. The PowerEdge R720 requires rear cable management and power distribution.

5.1 53 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Appendix H : Bill of Materials – Software and Support

Software, training and support SKUs change regularly, and are related to specific global regions. Please refer to the “ Hadoop Solution SKUs ” document on Dell SalesEdge (Dell internal link) or contact your Dell account representative for the latest information.

The Sample Bill of Materials appendices include service and support SKUs for the United States. These SKUs need to be changed for other regions.

5.1 54 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Appendix I : JBOD versus Single Disk RAID 0 Configuration

The Hadoop community’s strong advocacy for the “non-RAIDed” drives configuration known as “Just a Bunch of Disks,” or JBOD, has caused some confusion for readers of our reference architecture. We fully endorse this approach but feel a need for clarification because there are multiple valid ways to achieve this configuration.

Normally, the optimum disk configuration for Hadoop data nodes is considered to be JBOD mode rather than

RAID. This is because HDFS provides its own data replication, eliminating the need for the redundancy provided by RAID levels 1-6. HDFS also implements efficient round robin parallel I/O across multiple drives, eliminating the need for the parallelism provided by the striping capabilities of RAID 0.

The LSI 9207 controller is a SAS + SATA controller, and provides JBOD capabilities as a standard hard disk bus adapter (HBA.)

Some drive controllers, such as the PERC H710, support only RAID mode, and so can't be used in a plain host bus adapter (HBA) mode for JBOD. For these situations, configuring the controllers to use the disks as multiple

RAID 0 “arrays” allows HDFS to own them as a single drive. In this configuration, the controller is effectively operating just like a standard HBA in JBOD mode, and the RAID 0 and JBOD performance characteristics are comparable. While having a RAID controller adds a minor latency, the latency is offset by adaptive read-ahead caching on the controller.

5.1 55 Dell Confidential

NIC

LOM

OS

ToR

Appendix J : Abbreviations

Abbreviation Definition

Dell | Cloudera Solution Reference Architecture Guide v5.1

BMC

CDH

DBMS

EDW

EoR

HDFS

Baseboard management controller

Cloudera Distribution for Hadoop

Database management system

Enterprise data warehouse

End-of-row switch/router

Hadoop File System

IPMI Intelligent Platform Management Interface

Network interface card

Local area network on motherboard

Operating system

Top-of-rack switch/router

5.1 56 Dell Confidential

Dell | Cloudera Solution Reference Architecture Guide v5.1

Update History

Changes in Version 5.1

The following changes have been made to this guide since the 5.0 release:

Updated to CDH 5.1 and Cloudera Manager 5.1

Updated to Red Had Enterprise 6.5

Changed network bonding to mode-6 (Active-Active load balancing)

To Learn More

For more information on the Dell | Cloudera Solution, visit:

www.Dell.com/Hadoop

©2011 – 2014 Dell Inc. All rights reserved. Trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products.

Specifications are correct at date of publication but are subject to availability or change without notice at any time. Dell and its affiliates cannot be responsible for errors or omissions in typography or photography. Dell’s Terms and Conditions of Sales and Service apply and are available on request. Dell service offerings do not affect consumer’s statutory rights.

Dell, the DELL logo, and the DELL badge, PowerConnect, and PowerVault are trademarks of Dell Inc.

5.1 57 Dell Confidential

advertisement

Key Features

  • Hyperscale-inspired 4U shared infrastructure server
  • High performance and scalable cluster architecture
  • High availability at multiple levels
  • Optimized server configurations
  • Optimized network infrastructure
  • Cloudera Distribution for Apache Hadoop

Frequently Answers and Questions

What are the supported server models for the Dell | Cloudera Solution?
The supported server models are Dell™ PowerEdge™ C8000 series and Dell™ PowerEdge™ R720(xd) series.
What are the different logical networks used in the cluster architecture?
The different logical networks used in the cluster architecture are: Cluster Data Network, Management Network, BMC/IPMI Network, and Edge Network.
What are the minimum and recommended configurations for a Hadoop cluster?
The minimum configuration supported is six nodes, although at least seven are recommended.
What are the different roles of the nodes in a Hadoop cluster?
The roles for the nodes in a basic cluster are: Administration Node, Active Name Node, Standby Name Node, High Availability (HA) Node, Edge Node, Data Node.
What are the different physical server configurations?
The physical server configurations are divided into two broad classes—data nodes, which handle the bulk of the Hadoop processing, and infrastructure nodes, which support services needed for the cluster operation.

Related manuals