Configuring Hadoop Security with Cloudera Manager

Configuring Hadoop Security with Cloudera Manager
Configuring Hadoop Security
with Cloudera Manager
Important Notice
(c) 2010-2015 Cloudera, Inc. All rights reserved.
Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service
names or slogans contained in this document are trademarks of Cloudera and its
suppliers or licensors, and may not be copied, imitated or used, in whole or in part,
without the prior written permission of Cloudera or the applicable trademark holder.
Hadoop and the Hadoop elephant logo are trademarks of the Apache Software
Foundation. All other trademarks, registered trademarks, product names and
company names or logos mentioned in this document are the property of their
respective owners. Reference to any products, services, processes or other
information, by trade name, trademark, manufacturer, supplier or otherwise does
not constitute or imply endorsement, sponsorship or recommendation thereof by
us.
Complying with all applicable copyright laws is the responsibility of the user. Without
limiting the rights under copyright, no part of this document may be reproduced,
stored in or introduced into a retrieval system, or transmitted in any form or by any
means (electronic, mechanical, photocopying, recording, or otherwise), or for any
purpose, without the express written permission of Cloudera.
Cloudera may have patents, patent applications, trademarks, copyrights, or other
intellectual property rights covering subject matter in this document. Except as
expressly provided in any written license agreement from Cloudera, the furnishing
of this document does not give you any license to these patents, trademarks
copyrights, or other intellectual property. For information about patents covering
Cloudera products, see http://tiny.cloudera.com/patents.
The information in this document is subject to change without notice. Cloudera
shall not be liable for any damages resulting from technical errors or omissions
which may be present in this document, or from use of this document.
Cloudera, Inc.
1001 Page Mill Road Bldg 2
Palo Alto, CA 94304
[email protected]
US: 1-888-789-1488
Intl: 1-650-362-0488
www.cloudera.com
Release Information
Version: 5.0.x
Date: April 14, 2015
Table of Contents
About Configuring Hadoop Security with Cloudera Manager...............................5
Kerberos Principals and Keytabs.............................................................................7
Why Use Cloudera Manager to Implement Hadoop Security?..............................9
Using Cloudera Manager to Configure Hadoop Security.....................................11
Step 1: Install Cloudera Manager and CDH.............................................................................................11
Overview of the User Accounts and Groups in CDH and Cloudera Manager to Support Security.................11
Step 2: Set up a Cluster-dedicated KDC and Default Domain for the Hadoop Cluster......................12
When to use kadmin.local and kadmin................................................................................................................12
Setting up a Cluster-Dedicated KDC and Default Realm for the Hadoop Cluster...........................................13
Step 3: If You are Using AES-256 Encryption, Install the JCE Policy File..............................................17
Step 4: Get or Create a Kerberos Principal and Keytab File for the Cloudera Manager Server.........18
Step 5: Deploying the Cloudera Manager Server Keytab.......................................................................19
Step 6: Configure the Kerberos Default Realm in the Cloudera Manager Admin Console................19
Step 7: Stop All Services............................................................................................................................19
Step 8: Enable Hadoop Security...............................................................................................................20
Step 9: Wait for the Generate Credentials Command to Finish............................................................21
Step 10: Enable Hue to Work with Hadoop Security using Cloudera Manager...................................22
Step 11: (Flume Only) Use Substitution Variables for the Kerberos Principal and Keytab................22
Step 12: (CDH 4.0 and 4.1 only) Configure Hue to Use a Local Hive Metastore..................................23
Step 13: Start All Services.........................................................................................................................23
Step 14: Deploy Client Configurations.....................................................................................................23
Step 15: Create the HDFS Superuser Principal.......................................................................................23
Step 16: Get or Create a Kerberos Principal or Keytab for Each User Account...................................24
Step 17: Prepare the Cluster for Each User.............................................................................................24
Step 18: Verify that Kerberos Security is Working.................................................................................25
Step 19: (Optional) Enable Authentication for HTTP Web Consoles for Hadoop Roles......................26
Hadoop Users in Cloudera Manager .....................................................................27
Viewing and Regenerating Kerberos Principals...................................................31
Configuring LDAP Group Mappings ......................................................................33
Security-Related Issues in Cloudera Manager.....................................................35
Troubleshooting Security Issues............................................................................37
About Configuring Hadoop Security with Cloudera Manager
About Configuring Hadoop Security with Cloudera Manager
This guide describes how to use Cloudera Manager to automate many of the manual tasks of implementing
Kerberos security on your CDH cluster.
• Background - These instructions assume you know how to install and configure Kerberos, you already have
a working Kerberos key distribution center (KDC) and realm setup, and that you've installed the Kerberos
user packages on all cluster hosts and hosts that will be used to access the cluster. Furthermore, Oozie and
Hue require that the realm support renewable tickets. For more information about installing and configuring
Kerberos, see:
– MIT Kerberos Home
– MIT Kerberos Documentation
• Support
– Kerberos security in Cloudera Manager has been tested on the following version of MIT Kerberos 5:
– krb5-1.6.1 on Red Hat Enterprise Linux 5 and CentOS 5
– Kerberos security in Cloudera Manager is supported on the following versions of MIT Kerberos 5:
–
–
–
–
krb5-1.6.3 on SUSE Linux Enterprise Server 11 Service Pack 1
krb5-1.8.1 on Ubuntu
krb5-1.8.2 on Red Hat Enterprise Linux 6 and CentOS 6
krb5-1.9 on Red Hat Enterprise Linux 6.1
Configuring Hadoop Security with Cloudera Manager | 5
Kerberos Principals and Keytabs
Kerberos Principals and Keytabs
Hadoop security uses Kerberos principals and keytabs to perform user authentication on all remote procedure
calls.
A user in Kerberos is called a principal, which is made up of three distinct components: the primary, instance,
and realm. A Kerberos principal is used in a Kerberos-secured system to represent a unique identity. The first
component of the principal is called the primary, or sometimes the user component. The primary component is
an arbitrary string and may be the operating system username of the user or the name of a service. The primary
component is followed by an optional section called the instance, which is used to create principals that are
used by users in special roles or to define the host on which a service runs, for example. An instance, if it exists,
is separated from the primary by a slash and then the content is used to disambiguate multiple principals for
a single user or service. The final component of the principal is the realm. The realm is similar to a domain in
DNS in that it logically defines a related group of objects, although rather than hostnames as in DNS, the Kerberos
realm defines a group of principals . Each realm can have its own settings including the location of the KDC on
the network and supported encryption algorithms. Large organizations commonly create distinct realms to
delegate administration of a realm to a group within the enterprise. Realms, by convention, are written in
uppercase characters.
Kerberos assigns tickets to Kerberos principals to enable them to access Kerberos-secured Hadoop services.
For the Hadoop daemon principals, the principal names should be of the format
username/[email protected] In this guide, username in the
username/[email protected] principal refers to the username of an existing
Unix account that is used by Hadoop daemons, such as hdfs or mapred. Human users who want to access the
Hadoop cluster also need to have Kerberos principals; in this case, username refers to the username of the user's
Unix account, such as joe or jane. Single-component principal names (such as [email protected]) are
acceptable for client user accounts. Hadoop does not support more than two-component principal names.
A keytab is a file containing pairs of Kerberos principals and an encrypted copy of that principal's key. A keytab
file for a Hadoop daemon is unique to each host since the principal names include the hostname. This file is
used to authenticate a principal on a host to Kerberos without human interaction or storing a password in a
plain text file. Because having access to the keytab file for a principal allows one to act as that principal, access
to the keytab files should be tightly secured. They should be readable by a minimal set of users, should be stored
on local disk, and should not be included in host backups, unless access to those backups is as secure as access
to the local host.
For more details about the security features in CDH 4 and CDH 5, see the "Introduction to Hadoop Security"
sections of the CDH 4 Security Guide and CDH 5 Security Guide.
Configuring Hadoop Security with Cloudera Manager | 7
Why Use Cloudera Manager to Implement Hadoop Security?
Why Use Cloudera Manager to Implement Hadoop Security?
If you don't use Cloudera Manager to implement Hadoop security, you must manually create and deploy the
Kerberos principals and keytabs on every host in your cluster. If you have a large number of hosts, this can be
a time-consuming and error-prone process. After creating and deploying the keytabs, you must also manually
configure properties in the core-site.xml, hdfs-site.xml, mapred-site.xml, and taskcontroller.cfg
files on every host in the cluster to enable and configure Hadoop security in HDFS and MapReduce. You must
also manually configure properties in the oozie-site.xml and hue.ini files on certain cluster hosts in order
to enable and configure Hadoop security in Oozie and Hue.
Cloudera Manager enables you to automate all of those manual tasks. Cloudera Manager can automatically
create and deploy a keytab file for the hdfs user and a keytab file for the mapred user on every host in your
cluster, as well as keytab files for the oozie and hue users on select hosts. The hdfs keytab file contains entries
for the hdfs principal and a host principal, and the mapred keytab file contains entries for the mapred principal
and a host principal. The host principal will be the same in both keytab files. The oozie keytab file contains
entries for the oozie principal and a HTTP principal. The hue keytab file contains an entry for the hue principal.
Cloudera Manager can also automatically configure the appropriate properties in the core-site.xml,
hdfs-site.xml, mapred-site.xml, and taskcontroller.cfg files on every host in the cluster, and the
appropriate properties in oozie-site.xml and hue.ini for select hosts. Lastly, Cloudera Manager can
automatically start up the NameNode, DataNode, Secondary NameNode, JobTracker, TaskTracker, Oozie Server,
and Hue roles once all the appropriate configuration changes have been made.
Configuring Hadoop Security with Cloudera Manager | 9
Using Cloudera Manager to Configure Hadoop Security
Using Cloudera Manager to Configure Hadoop Security
Important: Ensure you have secured communication between the Cloudera Manager Server and
Agents before you enable Kerberos on your cluster. Kerberos keytabs are sent from the Cloudera
Manager Server to the Agents, and must be encrypted to prevent potential misuse of leaked keytabs.
For secure communication, you should have at least Level 1 TLS enabled as described in Configuring
TLS Security for Cloudera Manager (Level 1).
Here are the general steps to using Cloudera Manager to configure Hadoop security on your cluster, each of
which is described in more detail in the following sections:
Step 1: Install Cloudera Manager and CDH
If you have not already done so, Cloudera strongly recommends that you install and configure the Cloudera
Manager Server and Cloudera Manager Agents and CDH to set up a fully-functional CDH cluster before you begin
doing the following steps to implement Hadoop security features.
Overview of the User Accounts and Groups in CDH and Cloudera Manager to Support Security
When you install the CDH packages and the Cloudera Manager Agents on your cluster hosts, Cloudera Manager
takes some steps to provide system security such as creating the following Unix accounts and setting directory
permissions as shown in the following table. These Unix accounts and directory permissions work with the
Hadoop Kerberos security requirements.
This User
Runs These Roles
hdfs
NameNode, DataNodes, and Secondary Node
mapred
JobTracker and TaskTrackers (MR1) and Job History Server (YARN)
yarn
ResourceManager and NodeManagers (YARN)
oozie
Oozie Server
hue
Hue Server, Beeswax Server, Authorization Manager, and Job Designer
The hdfs user also acts as the HDFS superuser.
When you install the Cloudera Manager Server on the server host, a new Unix user account called cloudera-scm
is created automatically to support security. The Cloudera Manager Server uses this account to create and deploy
the host principals and keytabs on your cluster.
If you installed CDH and Cloudera Manager at the Same Time
If you have a new installation and you installed CDH and Cloudera Manager at the same time, when you started
the Cloudera Manager Agents on your cluster hosts, the Cloudera Manager Agent on each host automatically
configured the directory owners shown in the following table to support security. Assuming the owners are
configured as shown, the Hadoop daemons can then automatically set the permissions for each of the directories
specified by the properties shown below to make sure they are properly restricted. It's critical that the owners
are configured exactly as shown below, so don't change them:
Directory Specified in this Property
Owner
dfs.name.dir
hdfs:hadoop
dfs.data.dir
hdfs:hadoop
Configuring Hadoop Security with Cloudera Manager | 11
Using Cloudera Manager to Configure Hadoop Security
Directory Specified in this Property
Owner
mapred.local.dir
mapred:hadoop
mapred.system.dir in HDFS
mapred:hadoop
yarn.nodemanager.local-dirs
yarn:yarn
yarn.nodemanager.log-dirs
yarn:yarn
oozie.service.StoreService.jdbc.url (if using
oozie:oozie
Derby)
[[database]] name
hue:hue
javax.jdo.option.ConnectionURL
hue:hue
If you Installed and Used CDH Before Installing Cloudera Manager
If you have been using HDFS and running MapReduce jobs in an existing installation of CDH before you installed
Cloudera Manager, you must manually configure the owners of the directories shown in the table above. Doing
so enables the Hadoop daemons to automatically set the permissions for each of the directories. It's critical that
you manually configure the owners exactly as shown above.
Step 2: Set up a Cluster-dedicated KDC and Default Domain for the Hadoop
Cluster
Important: If you have existing Kerberos host keytabs at your site, it's important that you read this
section to prevent your existing host keytabs from becoming invalid.
If you use the following instructions to use Cloudera Manager to enable Hadoop security on your cluster, the
Cloudera Manager Server will create the hdfs, mapred, oozie, HTTP, hue, and host principals and then generate
keytabs for those principals. Cloudera Manager will then deploy the keytab files on every host in the cluster.
Note: The following instructions illustrate an example of creating and deploying the principals and
keytab files for MIT Kerberos. (If you are using another version of Kerberos, refer the Kerberos
documentation for the version of the operating system you are using, for instructions.)
When to use kadmin.local and kadmin
When performing the Kerberos commands in this document, you can use kadmin.local or kadmin depending
on your access and account:
• If you can log on to the KDC host directly, and have root access or a Kerberos admin account, use the
kadmin.local command.
• When accessing the KDC from a remote host, use the kadmin command.
To start kadmin.local on the KDC host or kadmin from any host, run one of the following:
12 | Configuring Hadoop Security with Cloudera Manager
Using Cloudera Manager to Configure Hadoop Security
•
$ sudo kadmin.local
•
$ kadmin
Note:
• In this guide, kadmin is shown as the prompt for commands in the kadmin shell, but you can type
the same commands at the kadmin.local prompt in the kadmin.local shell.
• Running kadmin.local may prompt you for a password because it is being run via sudo. You
should provide your Unix password. Running kadmin may prompt you for a password because
you need Kerberos admin privileges. You should provide your Kerberos admin password.
Setting up a Cluster-Dedicated KDC and Default Realm for the Hadoop Cluster
Cloudera has tested the following configuration approaches to Kerberos security for clusters managed by
Cloudera Manager. For administration teams that are just getting started with Kerberos security, we recommend
starting with these approaches to the configuration of KDC services for a number of reasons.
The number of service principal names that are created and managed by the Cloudera Manager server for a CDH
cluster can be significant, and it is important to realize the impact on cluster uptime and overall operations when
keytabs must be managed manually by hand. The Cloudera Manager server will manage the creation of service
keytabs on the proper hosts based on the current configuration of the database. Manual keytab management
can be error prone and face delays when deploying new or moving services within the cluster, especially under
time-sensitive conditions.
Cloudera Manager will create service principal names (SPNs) within a KDC that it can access with the kadmin
command and reach based on configuration of the /etc/krb5.conf on all systems participating in the cluster.
SPNs must be created in the form of service-name/[email protected] where
service name is the relevant CDH service name such as hue or hbase or hdfs.
If your site already has a working KDC and keytabs for any of the principals that Cloudera Manager creates, as
described in the following sections, the Cloudera Manager Server will randomize the key stored in the keytab
file and consequently cause your existing host keytabs to become invalid.
This is why Cloudera recommends you prevent your existing host keytabs from becoming invalid is by using a
dedicated local MIT Kerberos KDC and default realm for the Hadoop cluster and create all Hadoop hdfs, mapred,
oozie, HTTP, hue, and host service principals in that realm. You can also set up one-way cross-realm trust from
the cluster-dedicated KDC and realm to your existing central MIT Kerberos KDC, or to an existing Active Directory
realm. Using this method, there is no need to create service principals in the central MIT Kerberos KDC or in
Active Directory, but principals (users) in the central MIT KDC or in Active Directory can be authenticated to
Hadoop. The steps to implement this approach are as follows:
1. Install and configure a cluster-dedicated MIT Kerberos KDC that will be managed by Cloudera Manager for
creating and storing the service principals for your Hadoop cluster.
2. See the example kdc.conf and krb5.conf files here for configuration considerations for the KDC and
Kerberos clients.
3. Configure a default Kerberos realm for the cluster you want Cloudera Manager to manage and set up one-way
cross-realm trust between the cluster-dedicated KDC and either your central KDC or Active Directory. Follow
the appropriate instructions below for your deployment: Using a Cluster-Dedicated KDC with a Central MIT
KDC on page 14 or Using a Cluster-Dedicated MIT KDC with Active Directory on page 15.
Configuring Hadoop Security with Cloudera Manager | 13
Using Cloudera Manager to Configure Hadoop Security
Using a Cluster-Dedicated KDC with a Central MIT KDC
Important: If you plan to use Oozie or the Hue Kerberos ticket renewer in your cluster, you must
configure your KDC to allow tickets to be renewed, and you must configure krb5.conf to request
renewable tickets. Typically, you can do this by adding the max_renewable_life setting to your
realm in kdc.conf, and by adding the renew_lifetime parameter to the libdefaults section of
krb5.conf. For more information about renewable tickets, see the Kerberos documentation. This is
covered in our example krb5.conf and kdc.conf files.
1. In the /var/kerberos/krb5kdc/kdc.conf file on the local dedicated KDC server host, (for example,
KDC-Server-hostname.CLUSTER-REALM.com), configure the default realm for the Hadoop cluster by
substituting YOUR-LOCAL-REALM.COMPANY.COM in the following realms property. Please refer to our example
kdc.conf file for more information:
[realms]
CLUSTER-REALM.COMPANY.COM = {
2. In the /etc/krb5.conf file on all cluster hosts and all Hadoop client user hosts, configure the default realm
for the Hadoop cluster by substituting YOUR-LOCAL-REALM.COMPANY.COM in the following realms property.
Also specify the local dedicated KDC server host name in the /etc/krb5.conf file (for example,
KDC-Server-hostname.your-local-realm.company.com).
[libdefaults]
default_realm = CLUSTER-REALM.COMPANY.COM
[realms]
CLUSTER-REALM.COMPANY.COM = {
kdc = KDC-Server-hostname.CLUSTER-REALM.company.com:88
admin_server = KDC-Server-hostname.CLUSTER-REALM.company.com:749
default_domain = CLUSTER-REALM.company.com
}
YOUR-CENTRAL-REALM.COMPANY.COM = {
kdc = KDC-Server-hostname.your-central-realm.company.com:88
admin_server = KDC-Server-hostname.your-central-realm.company.com:749
default_domain = your-central-realm.company.com
}
[domain_realm]
.CLUSTER-REALM.company.com = CLUSTER-REALM.COMPANY.COM
CLUSTER-REALM.company.com = CLUSTER-REALM.COMPANY.COM
.your-central-realm.company.com = YOUR-CENTRAL-REALM.COMPANY.COM
your-central-realm.company.com = YOUR-CENTRAL-REALM.COMPANY.COM
3. To set up the cross-realm trust in the cluster-dedicated KDC, type the following command in the kadmin.local
or kadmin shell on the cluster-dedicated KDC host to create a krbtgt principal. Substitute your
cluster-dedicated KDC realm for CLUSTER-REALM.COMPANY.COM, and substitute your central KDC realm for
YOUR-CENTRAL-REALM.COMPANY.COM. Enter a password when prompted. Note the password because you
will need to enter the same exact password in the central KDC in the next step.
kadmin:
addprinc krbtgt/[email protected]
4. To set up the cross-realm trust in the central KDC, type the same command in the kadmin.local or kadmin
shell on the central KDC host to create the exact same krbtgt principal and password.
kadmin:
addprinc krbtgt/[email protected]
Important: In order for a cross-realm trust to operate properly, both KDCs must have the same
krbtgt principal and password, and both KDCs must be configured to use the same encryption
type.
14 | Configuring Hadoop Security with Cloudera Manager
Using Cloudera Manager to Configure Hadoop Security
5. To properly translate principal names from the central KDC realm into the cluster-dedicated KDC realm for
the Hadoop cluster, configure the Trusted Kerberos Realms property of the HDFS service.
a.
b.
c.
d.
e.
Open the Cloudera Manager Admin Console.
Go to the HDFS service.
Select Configuration > View and Edit.
Expand the Service-Wide category and click Security.
Scroll down to the Trusted Kerberos Realms property, and click on the Value field to add the name of your
central KDC realm. If you need to use more advanced mappings which do more than just allow principals
from another domain, you may enter them in the Additional Rules to Map Kerberos Principals to Short
Names property. For more information about name mapping rules, see (CDH 4) Appendix C - Configuring
the Mapping from Kerberos Principals to Short Names or (CDH 5) Appendix C - Configuring the Mapping
from Kerberos Principals to Short Names.
6. Each of your Hadoop client users must also place this information in their local core-site.xml file. The
easiest way to do so is by using the Cloudera Manager Admin Console to generate a client configuration file.
7. Proceed to Step 3: If You are Using AES-256 Encryption, Install the JCE Policy File on page 17. Later in this
procedure, you will restart the services to have the configuration changes in core-site.xml take effect.
Using a Cluster-Dedicated MIT KDC with Active Directory
1. In the /var/kerberos/krb5kdc/kdc.conf file on the local dedicated KDC server host, (for example,
kdc-server-hostname.cluster.corp.company.com), configure the default realm for the Hadoop cluster
by substituting YOUR-LOCAL-REALM.COMPANY.COM in the following realms property:
[realms]
CLUSTER-REALM.COMPANY.COM = {
2. In the /etc/krb5.conf file on all cluster hosts and all Hadoop client user hosts, configure both Kerberos
realms. Note that the default realm and the domain realm should be configured as the local MIT Kerberos
realm for the cluster. Your krb5.conf will contain more configuration properties than those provided below.
This example has been provided to clarify REALM/KDC configuration. Please see our example krb5.conf
file for more information.
[libdefaults]
default_realm = CLUSTER-REALM.COMPANY.COM
[realms]
AD-REALM.COMPANY.COM = {
kdc = ad.corp.company.com:88
admin_server = ad.corp.company.com:749
default_domain = ad.corp.company.com
}
CLUSTER-REALM.COMPANY.COM = {
kdc = kdc-server-hostname.cluster.corp.company.com:88
admin_server = kdc-server-hostname.cluster.corp.company.com:749
default_domain = cluster.corp.company.com
}
[domain_realm]
.cluster.corp.company.com = CLUSTER-REALM.COMPANY.COM
cluster.corp.company.com = CLUSTER-REALM.COMPANY.COM
.corp.company.com = AD-REALM.COMPANY.COM
corp.company.com = AD-REALM.COMPANY.COM
3. To properly translate principal names from the Active Directory realm into the cluster-dedicated KDC realm
for the Hadoop cluster, configure the Trusted Kerberos realms property of the HDFS service:
a.
b.
c.
d.
Open the Cloudera Manager Admin Console.
Go to the HDFS service.
Select Configuration > View and Edit.
Expand the Service-Wide category and click Security.
Configuring Hadoop Security with Cloudera Manager | 15
Using Cloudera Manager to Configure Hadoop Security
e. Scroll down to the Trusted Kerberos Realms property, and click on the Value field to add the name of your
central KDC realm. If you need to use more advanced mappings which do more than just allow principals
from another domain, you may enter them in the Additional Rules to Map Kerberos Principals to Short
Names property. For more information about name mapping rules, see (CDH 4) Appendix C - Configuring
the Mapping from Kerberos Principals to Short Names or (CDH 5) Appendix C - Configuring the Mapping
from Kerberos Principals to Short Names.
4. Each of your Hadoop client users must also place this information in their local core-site.xml file. The
easiest way to do so is by using the Cloudera Manager Admin Console to generate a client configuration file.
5. On the Active Directory server host, type the following command to specify the local MIT KDC host name (for
example, kdc-server-hostname.cluster.corp.company.com) and local realm (for example,
CLUSTER-REALM.COMPANY.COM):
ksetup /addkdc CLUSTER-REALM.COMPANY.COM
kdc-server-hostname.cluster.corp.company.com
Run this command on every domain controller that will be referenced by the cluster's krb5.conf file. If load
balancing is being used and a single KDC hostname has to be provided to all domain controllers, refer the
Microsoft documentation instead of explicitly using the ksetup command on individual domain controllers.
6. On the Active Directory server host, type the following command to add the local realm trust to Active Directory:
netdom trust CLUSTER-REALM.COMPANY.COM /Domain:AD-REALM.COMPANY.COM /add /realm
/passwordt:TrustPassword
7. On the Active Directory server host, type the following command to set the proper encryption type:
Windows 2003 RC2
Windows 2003 server installations do not support AES encryption for Kerberos. Therefore RC4 should be
used. Please see the Microsoft reference documentation for more information.
ktpass /MITRealmName CLUSTER-REALM.COMPANY.COM /TrustEncryp RC4
Windows 2008
Note: When using AES 256 encryption with Windows 2008 you must update the proper Java
Cryptography Extension (JCE) policy files for the version of JDK you are using.
• JCE Policy Files - JDK 1.6
• JCE Policy Files - JDK 1.7
ksetup /SetEncTypeAttr CLUSTER-REALM.COMPANY.COM <enc_type>
Where the <enc_type> parameter can be replaced with parameter strings for AES, DES, or RC4 encryption
modes. For example, for AES encryption, replace <enc_type> with AES256-CTS-HMAC-SHA1-96 or
AES128-CTS-HMAC-SHA1-96 and for RC4 encryption, replace with RC4-HMAC-MD5. See the Microsoft reference
documentation for more information.
Important: Make the encryption type you specify is supported on both your version of Windows
Active Directory and your version of MIT Kerberos.
8. On the local MIT KDC server host, type the following command in the kadmin.local or kadmin shell to add the
cross-realm krbtgt principal:
kadmin: addprinc -e "<enc_type_list>"
krbtgt/[email protected]
16 | Configuring Hadoop Security with Cloudera Manager
Using Cloudera Manager to Configure Hadoop Security
where the <enc_type_list> parameter specifies the types of encryption this cross-realm krbtgt principal will
support: either AES, DES, or RC4 encryption. You can specify multiple encryption types using the parameter
in the command above, what's important is that at least one of the encryption types corresponds to the
encryption type found in the tickets granted by the KDC in the remote realm.
Examples by Active Directory Domain or Forrest "Functional level"
Active Directory will, based on the Domain or Forrest functional level, use encryption types supported by that
release of the Windows Server operating system. It is not possible to use AES encryption types with an AD
2003 functional level. If you notice that DES encryption types are being used when authenticating or requesting
service tickets to Active Directory then it might be necessary to enable weak encryption types in the
/etc/krb5.conf. Please see the example krb5.conf file for more information.
• Windows 2003
kadmin: addprinc -e "rc4-hmac:normal"
krbtgt/[email protected]
• Windows 2008
kadmin: addprinc -e "aes256-cts:normal aes128-cts:normal rc4-hmac:normal"
krbtgt/[email protected]
Note: The cross-realm krbtgt principal that you add in this step must have at least one entry
that uses the same encryption type as the tickets that are issued by the remote KDC. If no entries
have the same encryption type, then the problem you will see is that authenticating as a principal
in the local realm will allow you to successfully run Hadoop commands, but authenticating as a
principal in the remote realm will not allow you to run Hadoop commands.
9. Proceed to Step 3: If You are Using AES-256 Encryption, Install the JCE Policy File on page 17. Later in this
procedure, you will restart the services to have the configuration changes in core-site.xml take effect.
Step 3: If You are Using AES-256 Encryption, Install the JCE Policy File
If you are using CentOS or Red Hat Enterprise Linux 5.5 or later, which use AES-256 encryption by default for
tickets, you must install the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy File on all
cluster and Hadoop user hosts. For JCE Policy File installation instructions, see the README.txt file included in
the jce_policy-x.zip file.
Alternatively, you can configure Kerberos to not use AES-256 by removing aes256-cts:normal from the
supported_enctypes field of the kdc.conf or krb5.conf file. Note that after changing the kdc.conf file, you'll
need to restart both the KDC and the kadmin server for those changes to take affect. You may also need to
recreate or change the password of the relevant principals, including potentially the Ticket Granting Ticket
principal (krbtgt/[email protected]). If AES-256 is still used after all of those steps, it's because the
aes256-cts:normal setting existed when the Kerberos database was created. To fix this, create a new Kerberos
database and then restart both the KDC and the kadmin server.
To verify the type of encryption used in your cluster:
1. On the local KDC host, type this command in the kadmin.local or kadmin shell to create a test principal:
kadmin:
addprinc test
2. On a cluster host, type this command to start a Kerberos session as test:
$ kinit test
Configuring Hadoop Security with Cloudera Manager | 17
Using Cloudera Manager to Configure Hadoop Security
3. On a cluster host, type this command to view the encryption type in use:
$ klist -e
If AES is being used, output like the following is displayed after you type the klist command (note that
AES-256 is included in the output):
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: [email protected] Manager
Valid starting
Expires
Service principal
05/19/11 13:25:04 05/20/11 13:25:04 krbtgt/Cloudera [email protected] Manager
Etype (skey, tkt): AES-256 CTS mode with 96-bit SHA-1 HMAC, AES-256 CTS mode
with 96-bit SHA-1 HMAC
Step 4: Get or Create a Kerberos Principal and Keytab File for the Cloudera
Manager Server
In order to create and deploy the host principals and keytabs on your cluster, the Cloudera Manager Server must
have the correct Kerberos principal and keytab file. Specifically, the Cloudera Manager Server must have a
Kerberos principal that has administrator privileges. Typically, principals with the second component of admin
in the principal name (for example, username/[email protected]) have administrator privileges.
This is why admin is shown in the following instructions and examples.
To get or create the Kerberos principal and keytab file for the Cloudera Manager Server, you can do either of the
following:
• Ask your Kerberos administrator to create a Kerberos administrator principal and keytab file for the Cloudera
Manager Server. After you get the Cloudera Manager Server keytab file from your administrator, proceed to
Step 5: Deploying the Cloudera Manager Server Keytab on page 19.
• Create the Kerberos principal and keytab file for the Cloudera Manager Server yourself by using the following
instructions in this step.
The instructions in this section illustrate an example of creating the Cloudera Manager Server principal and
keytab file for MIT Kerberos. (If you are using another version of Kerberos, refer to your Kerberos documentation
for instructions.)
Note: If you are running kadmin and the Kerberos Key Distribution Center (KDC) on the same host,
use kadmin.local in the following steps. If the Kerberos KDC is running on a remote host, you must
use kadmin instead of kadmin.local.
To create the Cloudera Manager Server principal and keytab:
1. In the kadmin.local or kadmin shell, type the following command to create the Cloudera Manager Server
principal, replacing YOUR-LOCAL-REALM.COM with the name of your realm:
kadmin:
addprinc -randkey cloudera-scm/[email protected]
2. Create the Cloudera Manager Server cmf.keytab file:
kadmin:
xst -k cmf.keytab cloudera-scm/[email protected]
Important: The Cloudera Manager Server keytab file must be named cmf.keytab because that
name is hard-coded in Cloudera Manager.
18 | Configuring Hadoop Security with Cloudera Manager
Using Cloudera Manager to Configure Hadoop Security
Step 5: Deploying the Cloudera Manager Server Keytab
After obtaining or creating the Cloudera Manager Server principal and keytab, follow these instructions to deploy
them.
1. Move the cmf.keytab file to the /etc/cloudera-scm-server/ directory on the host where you are running
the Cloudera Manager Server.
$ mv cmf.keytab /etc/cloudera-scm-server/
2. Make sure that the cmf.keytab file is only readable by the Cloudera Manager Server user account
cloudera-scm.
$ sudo chown cloudera-scm:cloudera-scm /etc/cloudera-scm-server/cmf.keytab
$ sudo chmod 600 /etc/cloudera-scm-server/cmf.keytab
3. Add the Cloudera Manager Server principal (cloudera-scm/[email protected]) to a text file
named cmf.principal and store the cmf.principal file in the /etc/cloudera-scm-server/ directory
on the host where you are running the Cloudera Manager Server.
4. Make sure that the cmf.principal file is only readable by the Cloudera Manager Server user account
cloudera-scm.
$ sudo chown cloudera-scm:cloudera-scm /etc/cloudera-scm-server/cmf.principal
$ sudo chmod 600 /etc/cloudera-scm-server/cmf.principal
Note: The Cloudera Manager Server assumes that the kadmin command is located in a directory
either on the path or in the /usr/kerberos/sbin directory. If this is not the case, you must create
a symbolic link to kadmin from a location accessible on the path.
Step 6: Configure the Kerberos Default Realm in the Cloudera Manager
Admin Console
Important: Hadoop is unable to use a non-default realm. The Kerberos default realm is configured
in the libdefaults property in the /etc/krb5.conf file on every host in the cluster:
[libdefaults]
default_realm = YOUR-LOCAL-REALM.COM
1. In the Cloudera Manager Admin Console, select Administration > Settings.
2. Click the Security category, and enter the Kerberos realm for the cluster in the Kerberos Security Realm field
(for example, YOUR-LOCAL-REALM.COM or YOUR-SUB-REALM.YOUR-LOCAL-REALM.COM) that you configured
in the krb5.conf file.
3. Click Save Changes.
Step 7: Stop All Services
Before you enable security in CDH, you must stop all Hadoop daemons in your cluster and then change some
configuration properties. You must stop all daemons in the cluster because after one Hadoop daemon has been
restarted with the configuration properties set to enable security. Daemons running without security enabled
will be unable to communicate with that daemon. This requirement to stop all daemons makes it impossible to
do a rolling upgrade to enable security on a Hadoop cluster.
Configuring Hadoop Security with Cloudera Manager | 19
Using Cloudera Manager to Configure Hadoop Security
Stop all running services, and the Cloudera Management service, as follows:
Stopping All Services
1.
On the Home page, click
to the right of the cluster name and select Stop.
2. Click Stop in the confirmation screen. The Command Details window shows the progress of stopping services.
When All services successfully stopped appears, the task is complete and you can close the Command Details
window.
Stopping the Cloudera Management Service
1.
On the Home page, click
to the right of mgmt and select Stop.
2. Click Stop to confirm. The Command Details window shows the progress of stopping the roles.
3. When Command completed with n/n successful subcommands appears, the task is complete. Click Close.
Step 8: Enable Hadoop Security
To enable Hadoop security for the cluster, you enable it on an HDFS service. After you do so, the Cloudera Manager
Server automatically enables Hadoop security on the MapReduce and YARN services associated with that HDFS
service.
1. Navigate to the HDFS Service > Configuration tab and click View and Edit.
2. In the Search field, type Hadoop Secure to show the Hadoop security properties (found under the Service-Wide
> Security category).
3. Click the value for the Hadoop Secure Authentication property and select the kerberos option to enable
Hadoop security on the selected HDFS service.
4. Click the value for the Hadoop Secure Authorization property and select the checkbox to enable service-level
authorization on the selected HDFS service. You can specify comma-separated lists of users and groups
authorized to use Hadoop services and/or perform admin operations using the following properties under
the Service-Wide > Security section:
• Authorized Users: Comma-separated list of users authorized to use Hadoop services.
• Authorized Groups: Comma-separated list of groups authorized to use Hadoop services.
• Authorized Admin Users: Comma-separated list of users authorized to perform admin operations on
Hadoop.
• Authorized Admin Groups: Comma-separated list of groups authorized to perform admin operations on
Hadoop.
Important: For Cloudera Manager's Monitoring services to work, the hue user should always be
added as an authorized user.
5. In the Search field, type DataNode Transceiver to find the DataNode Transceiver Port property.
6. Click the value for the DataNode Transceiver Port property and specify a privileged port number (below 1024).
Cloudera recommends 1004.
Note: If there is more than one DataNode Role Group, you must specify a privileged port number
for each DataNode Transceiver Port property.
7. In the Search field, type DataNode HTTP to find the DataNode HTTP Web UI Port property and specify a
privileged port number (below 1024). Cloudera recommends 1006.
20 | Configuring Hadoop Security with Cloudera Manager
Using Cloudera Manager to Configure Hadoop Security
Note: These port numbers for the two DataNode properties must be below 1024 in order to provide
part of the security mechanism to make it impossible for a user to run a MapReduce task that
impersonates a DataNode. The port numbers for the NameNode and Secondary NameNode can
be anything you want, but the default port numbers are good ones to use.
8. In the Search field type Data Directory Permissions to find the DataNode Data Directory Permissions property.
9. Reset the value for the DataNode Data Directory Permissions property to the default value of 700 if not
already set to that.
10. Make sure you have changed the DataNode Transceiver Port, DataNode Data Directory Permissions and
DataNode HTTP Web UI Port properties for every DataNode role group.
11. Click Save Changes to save the configuration settings.
To enable ZooKeeper security:
1. Navigate to the ZooKeeper Service > Configuration tab and click View and Edit.
2. Click the value for Enable Kerberos Authentication property.
3. Click Save Changes to save the configuration settings.
To enable HBase security:
1. Navigate to the HBase Service > Configuration tab and click View and Edit.
2. In the Search field, type HBase Secure to show the Hadoop security properties (found under the Service-Wide
> Security category).
3. Click the value for the HBase Secure Authorization property and select the checkbox to enable authorization
on the selected HBase service.
4. Click the value for the HBase Secure Authentication property and select kerberos to enable authorization
on the selected HBase service.
5. Click Save Changes to save the configuration settings.
(CDH 4.3 or later) To enable Solr security:
1. Navigate to the Solr Service > Configuration tab and click View and Edit.
2. In the Search field, type Solr Secure to show the Solr security properties (found under the Service-Wide >
Security category).
3. Click the value for the Solr Secure Authentication property and select kerberos to enable authorization on
the selected Solr service.
4. Click Save Changes to save the configuration settings.
Note: If you use the Cloudera Manager Admin Console to generate a client configuration file after
you enable Hadoop security on your cluster, the generated configuration file will not contain the
Kerberos principal and keytab file that end users need to authenticate. Users must obtain Kerberos
principal and keytab file from your Kerberos administrator and then run the kinit command
themselves.
Step 9: Wait for the Generate Credentials Command to Finish
After you enable security for any of the services in Cloudera Manager, a command called Generate Credentials
will be triggered automatically. You can watch the progress of the command on the top right corner of the screen
that shows the running commands. Wait for this command to finish (indicated by a grey box containing "0" in
it).
Configuring Hadoop Security with Cloudera Manager | 21
Using Cloudera Manager to Configure Hadoop Security
Step 10: Enable Hue to Work with Hadoop Security using Cloudera Manager
If you are using a Hue service, you must add a role instance of Kerberos Ticket Renewer to the Hue service to
enable Hue to work properly with the secure Hadoop cluster using Cloudera Manager:
1.
2.
3.
4.
5.
6.
In the Cloudera Manager Admin Console, click the Clusters tab.
Click the Hue service.
Click the Instances tab.
Click the Add button.
Assign the Kerberos Ticket Renewer role instance to the same host as the Hue server.
When the wizard is finished, the status will display Finished and the Kerberos Ticket Renewer role instance
is configured. The Hue service will now work with the secure Hadoop cluster.
Step 11: (Flume Only) Use Substitution Variables for the Kerberos Principal
and Keytab
As described in Flume Security Configuration in the CDH 4 Security Guide, if you are using Flume on a secure
cluster you must configure the HDFS sink with the following configuration options in the flume.conf file:
• hdfs.kerberosPrincipal - fully-qualified principal.
• hdfs.kerberosKeytab - location on the local host of the keytab containing the user and host keys for the
above principal
Here is an example of an HDFS sink configuration in the flume.conf file (the majority of the HDFS sink
configuration options have been omitted):
agent.sinks.sink-1.type = HDFS
agent.sinks.sink-1.hdfs.kerberosPrincipal = flume/[email protected]
agent.sinks.sink-1.hdfs.kerberosKeytab = /etc/flume-ng/conf/flume.keytab
agent.sinks.sink-1.hdfs.proxyUser = weblogs
Since Cloudera Manager generates the Flume keytab files for you, and the locations of the keytab files cannot
be known beforehand, substitution variables are required for Flume. Cloudera Manager provides two Flume
substitution variables called $KERBEROS_PRINCIPAL and $KERBEROS_KEYTAB to configure the principal name
and the keytab file path respectively on each host.
Here is an example of using the substitution variables to configure the options shown in the previous example:
agent.sinks.sink-1.type = hdfs
agent.sinks.sink-1.hdfs.kerberosPrincipal = $KERBEROS_PRINCIPAL
agent.sinks.sink-1.hdfs.kerberosKeytab = $KERBEROS_KEYTAB
agent.sinks.sink-1.hdfs.proxyUser = weblogs
Use the following instructions to have Cloudera Manager add these variables to the flume.conf file on every
host that Cloudera Manager manages.
To use the Flume substitution variables for the Kerberos principal and keytab:
1. Go to the Flume service > Configuration page in Cloudera Manager.
2. Click Agent.
3. In the Configuration File property, add the configuration options with the substitution variables. For example:
agent.sinks.sink-1.type = hdfs
agent.sinks.sink-1.hdfs.kerberosPrincipal = $KERBEROS_PRINCIPAL
agent.sinks.sink-1.hdfs.kerberosKeytab = $KERBEROS_KEYTAB
agent.sinks.sink-1.hdfs.proxyUser = weblogs
4. Click Save.
22 | Configuring Hadoop Security with Cloudera Manager
Using Cloudera Manager to Configure Hadoop Security
Step 12: (CDH 4.0 and 4.1 only) Configure Hue to Use a Local Hive Metastore
If using Hue and the Bypass Hive Metastore Server option is not selected (metastore bypass is disabled by
default), then Hue will not be able to communicate with Hive with CDH 4.0 or CDH 4.1. This is not a problem with
CDH 4.2 or later.
If you are using CDH 4.0 or 4.1, you can workaround this issue following the instructions in the Known Issues
section of the Cloudera Manager 4 Release Notes.
Step 13: Start All Services
Start all services on your cluster:
Starting All Services
1.
On the Home page, click
to the right of the cluster name and select Start.
2. Click Start that appears in the next screen to confirm. The Command Details window shows the progress of
starting services.
When All services successfully started appears, the task is complete and you can close the Command Details
window.
Starting the Cloudera Management Service
1.
On the Home page, click
to the right of mgmt and select Start.
2. Click Start to confirm. The Command Details window shows the progress of starting the roles.
3. When Command completed with n/n successful subcommands appears, the task is complete. Click Close.
Step 14: Deploy Client Configurations
1.
On the Home page, click
to the right of the cluster name and select Deploy Client Configuration.
2. Click Deploy Client Configuration.
Step 15: Create the HDFS Superuser Principal
In order to be able to create home directories for users in Step 16: Prepare the Cluster for Each User, you will
need access to the HDFS superuser account. (CDH automatically created the HDFS superuser account on each
cluster host during CDH installation.) When you enabled Kerberos for the HDFS service in Step 8: Enable Hadoop
Security on page 20, you lost access to the HDFS superuser account via sudo -u hdfs commands. To enable
your access to the HDFS superuser account now that Kerberos is enabled, you must create a Kerberos principal
whose first component is hdfs:
1. In the kadmin.local or kadmin shell, type the following command to create a Kerberos principal called hdfs:
kadmin:
addprinc [email protected]
Note: This command prompts you to create a password for the hdfs principal. You should use a
strong password because having access to this principal provides superuser access to all of the
files in HDFS.
Configuring Hadoop Security with Cloudera Manager | 23
Using Cloudera Manager to Configure Hadoop Security
2. To run commands as the HDFS superuser, you must obtain Kerberos credentials for the hdfs principal. To
do so, run the following command and provide the appropriate password when prompted.
$ kinit [email protected]
Step 16: Get or Create a Kerberos Principal or Keytab for Each User Account
Now that Kerberos is configured and enabled on your cluster, you and every other Hadoop user must have a
Kerberos principal or keytab to obtain Kerberos credentials to be allowed to access the cluster and use the
Hadoop services. In the next step of this procedure, you will need to create your own Kerberos principals in order
to verify that Kerberos security is working on your cluster. If you and the other Hadoop users already have a
Kerberos principal or keytab, or if your Kerberos administrator can provide them, you can skip to Step 16: Prepare
the Cluster for Each User.
The following instructions explain to create a Kerberos principal for a user account.
To create a Kerberos principal for a user account:
1. In the kadmin.local or kadmin shell, use the following command to create a principal for your account by
replacing YOUR-LOCAL-REALM.COM with the name of your realm, and replacing USERNAME with a username:
kadmin:
addprinc [email protected]
2. When prompted, enter a password twice.
Step 17: Prepare the Cluster for Each User
Before you and other users can access the cluster, there are a few tasks you must do to prepare the hosts for
each user.
1. Make sure all hosts in the cluster have a Unix user account with the same name as the first component of
that user's principal name. For example, the Unix account joe should exist on every box if the user's principal
name is [email protected] You can use LDAP for this step if it is available in your organization.
Note: Each account must have a user ID that is greater than or equal to 1000. In the
/etc/hadoop/conf/taskcontroller.cfg file, the default setting for the banned.users property
is mapred, hdfs, and bin to prevent jobs from being submitted via those user accounts. The default
setting for the min.user.id property is 1000 to prevent jobs from being submitted with a user
ID less than 1000, which are conventionally Unix super users.
2. Create a subdirectory under /user on HDFS for each user account (for example, /user/joe). Change the
owner and group of that directory to be the user.
$ hadoop fs -mkdir /user/joe
$ hadoop fs -chown joe /user/joe
Note: sudo -u hdfs is not included in the commands above. This is because it is not required if
Kerberos is enabled on your cluster. You will, however, need to have Kerberos credentials for the HDFS
super user in order to successfully run these commands. For information on gaining access to the
HDFS super user account, see Step 15: Create the HDFS Superuser Principal on page 23
24 | Configuring Hadoop Security with Cloudera Manager
Using Cloudera Manager to Configure Hadoop Security
Step 18: Verify that Kerberos Security is Working
After you have Kerberos credentials, you can verify that Kerberos security is working on your cluster by trying
to run MapReduce jobs. To confirm, try launching a sleep or a pi job from the provided Hadoop examples
(/usr/lib/hadoop/hadoop-examples.jar).
Note:
This section assumes you have a fully-functional CDH cluster and you have been able to access HDFS
and run MapReduce jobs before you followed these instructions to configure and enable Kerberos
on your cluster. If you have not already done so, you should at a minimum use the Cloudera Manager
Admin Console to generate a client configuration file to enable you to access the cluster. For
instructions, see Deploying Client Configuration Files.
To verify that Kerberos security is working:
1. Acquire Kerberos credentials for your user account.
$ kinit [email protected]
2. Enter a password when prompted.
3. Submit a sample pi calculation as a test MapReduce job. Use the following command if you use a
package-based setup for Cloudera Manager:
$ hadoop jar /usr/lib/hadoop-0.20/hadoop-0.20.2*examples.jar pi 10 10000
Number of Maps = 10
Samples per Map = 10000
...
Job Finished in 38.572 seconds
Estimated value of Pi is 3.14120000000000000000
If you have a parcel-based setup, use the following command instead:
$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples.jar
pi 10 10000
Number of Maps = 10
Samples per Map = 10000
...
Job Finished in 30.958 seconds
Estimated value of Pi is 3.14120000000000000000
You have now verified that Kerberos security is working on your cluster.
Configuring Hadoop Security with Cloudera Manager | 25
Using Cloudera Manager to Configure Hadoop Security
Important:
Running a MapReduce job will fail if you do not have a valid Kerberos ticket in your credentials cache.
You can examine the Kerberos tickets currently in your credentials cache by running the klist
command. You can obtain a ticket by running the kinit command and either specifying a keytab file
containing credentials, or entering the password for your principal. If you do not have a valid ticket,
you will receive an error such as:
11/01/04 12:08:12 WARN ipc.Client:
Exception encountered while connecting to the server :
javax.security.sasl.SaslException:GSS initiate failed
[Caused by GSSException: No valid credentials provided (Mechanism level: Failed
to find any
Kerberos tgt)]
Bad connection to FS. command aborted. exception: Call to nn-host/10.0.0.2:8020
failed on local exception:
java.io.IOException:javax.security.sasl.SaslException: GSS initiate failed
[Caused by GSSException: No valid credentials provided
(Mechanism level: Failed to find any Kerberos tgt)]
Step 19: (Optional) Enable Authentication for HTTP Web Consoles for
Hadoop Roles
Authentication for access to the HDFS, MapReduce, and YARN roles' web consoles can be enabled via a
configuration option for the appropriate service. To enable this authentication:
1. From the Clusters tab, select the service (HDFS, MapReduce, or YARN) for which you want to enable
authentication.
2. Select Configuration > View and Edit.
3. Expand Service-Wide > Security, check the Enable Authentication for HTTP Web-Consoles property, and save
the your changes.
A command is triggered to generate the new required credentials.
4. Once the command finishes, restart all roles of that service.
26 | Configuring Hadoop Security with Cloudera Manager
Hadoop Users in Cloudera Manager
Hadoop Users in Cloudera Manager
A number of special users are created by default when installing and using CDH & Cloudera Manager. Given
below is a list of users and groups as of the latest Cloudera Manager 5.0.x release. Also listed are the
corresponding Kerberos principals and keytab files that should be created when you configure Kerberos security
on your cluster.
Table 1: Cloudera Manager Users & Groups
Project
Unix User ID
Group
Cloudera
Manager
cloudera-scm
cloudera-scm
Group Members Notes
Cloudera Manager processes such as the
CM Server and the monitoring daemons
run as this user. It is not configurable.
The Cloudera Manager keytab file must
be named cmf.keytab since that name
has been hard-coded in Cloudera
Manager.
Note: Applicable to clusters
managed by Cloudera Manager
only.
Apache Avro
No special users.
Apache Flume flume
flume
The sink that writes to HDFS as this user
must have write privileges.
Apache HBase hbase
hbase
The Master and the RegionServer
processes run as this user.
HDFS
hdfs
hdfs
impala
The NameNode and DataNodes run as this
user, and the HDFS root directory as well
as the directories used for edit logs should
be owned by it.
The hdfs user is also part of the hadoop
group.
Apache Hive
hive
hive
impala
The HiveServer2 process and the Hive
Metastore processes run as this user.
A user must be defined for Hive access to
its Metastore DB (e.g. MySQL or Postgres)
but it can be any identifier and does not
correspond to a Unix uid. This is
javax.jdo.option.ConnectionUserName
in hive-site.xml.
Apache
HCatalog
hive
hive
The WebHCat service (for REST access to
Hive functionality) runs as the hive user.
It is not configurable.
HttpFS
httpfs
httpfs
The HttpFS service runs as this user.
Configuring Hadoop Security with Cloudera Manager | 27
Hadoop Users in Cloudera Manager
Project
Unix User ID
Group
Group Members Notes
*See HttpFS Security Configuration for
instructions on how to generate the
merged httpfs-http.keytab file.
Hue
hue
hue
Hue runs as this user. It is not
configurable.
Cloudera
Impala
impala
impala
An interactive query tool. The impala user
also belongs to the hive and hdfs groups.
Llama
llama
llama
Llama runs as this user.
Apache
Mahout
No special users.
MapReduce
mapred
mapred
Without Kerberos, the JobTracker and
tasks run as this user. The
LinuxTaskController binary is owned by
this user for Kerberos. It would be
complicated to use a different user ID.
Apache Oozie
oozie
oozie
The Oozie service runs as this user.
Parquet
No special users.
Apache Pig
No special users.
Cloudera
Search
solr
Apache Spark spark
solr
The Solr process runs as this user. It is not
configurable.
spark
The Spark process runs as this user. It is
not configurable.
Apache Sentry
(incubating)
No special users.
Apache Sqoop sqoop
sqoop
This user is only for the Sqoop1 Metastore,
a configuration option that is not
recommended.
Apache
Sqoop2
sqoop
The Sqoop2 service runs as this user.
sqoop2
Apache Whirr
YARN
No special users.
yarn
Without Kerberos, all YARN services and
applications run as this user. The
LinuxContainerExecutor binary is owned
by this user for Kerberos. It would be
complicated to use a different user ID.
yarn
The yarn user also belongs to the hadoop
group.
Apache
ZooKeeper
Other
zookeeper
The ZooKeeper process runs as this user.
It is not configurable.
zookeeper
hadoop
28 | Configuring Hadoop Security with Cloudera Manager
yarn, hdfs,
mapred
This is a group with no associated Unix
user ID or keytab.
Hadoop Users in Cloudera Manager
Note:
The Kerberos principal names should be of the format,
username/[email protected], where the term username refers to
the username of an existing UNIX account, such as hdfs or mapred. The table below lists the usernames
to be used for the Kerberos principal names. For example, the Kerberos principal for Apache Flume
would be flume/[email protected]
Table 2: Cloudera Manager Keytabs & Keytab File Permissions
Project (UNIX ID)
Service
Cloudera Manager NA
Kerberos
Principal
Primary
Filename
(.keytab)
Keytab File Keytab File File
Owner
Group
Permission
(octal)
cloudera-scm cmf
cloudera-scm cloudera-scm 600
cloudera-scm hdfs
cloudera-scm cloudera-scm 600
(cloudera-scm)
Cloudera
Management
Service
(cloudera-scm)
cloudera-mgmtREPORTSMANAGER
cloudera-mgmtACTIVITYMONITOR
cloudera-mgmtSERVICEMONITOR
cloudera-mgmtHOSTMONITOR
Flume (flume)
flume-AGENT
flume
flume
cloudera-scm cloudera-scm 600
HBase (hbase)
hbase-REGIONSERVER
hbase
hbase
cloudera-scm cloudera-scm 600
hdfs
hdfs
cloudera-scm cloudera-scm 600
hbaseHBASETHRIFTSERVER
hbaseHBASERESTSERVER
hbase-MASTER
HDFS (hdfs)
hdfs-NAMENODE
Secondary:
Merge hdfs
and HTTP
hdfs-DATANODE
hdfsSECONDARYNAMENODE
Hive (hive)
hive-HIVESERVER2
hive
hive
cloudera-scm cloudera-scm 600
hive-WEBHCAT
HTTP
HTTP
hive-HIVEMETASTORE
hive
hive
HttpFS (httpfs) hdfs-HTTPFS
httpfs
httpfs
cloudera-scm cloudera-scm 600
Hue (hue)
hue-KT_RENEWER
hue
hue
cloudera-scm cloudera-scm 600
Impala (impala)
impala-STATESTORE
impala
impala
cloudera-scm cloudera-scm 600
impala-CATALOGSERVER
impala-IMPALAD
Llama (llama)
Configuring Hadoop Security with Cloudera Manager | 29
Hadoop Users in Cloudera Manager
Project (UNIX ID)
Service
MapReduce
(mapred)
Oozie (oozie)
Filename
(.keytab)
Keytab File Keytab File File
Owner
Group
Permission
(octal)
mapreduce-JOBTRACKER mapred
mapred
cloudera-scm cloudera-scm 600
mapreduceTASKTRACKER
Secondary:
Merge
mapred and
HTTP
oozie-OOZIE_SERVER
Kerberos
Principal
Primary
oozie
oozie
cloudera-scm cloudera-scm 600
Secondary:
Merge oozie
and HTTP
Search (solr)
solr-SOLR_SERVER
solr
solr
cloudera-scm cloudera-scm 600
Secondary:
Merge solr
and HTTP
Sentry (sentry)
Spark (spark)
spark_on_yarn-SPARK_YARN_HS
ITORY_SERVER spark
spark
cloudera-scm cloudera-scm 600
yarn-NODEMANAGER
yarn
cloudera-scm cloudera-scm 644
Sqoop (sqoop)
Sqoop2 (sqoop2)
YARN (yarn)
yarn
Secondary:
Merge yarn
and HTTP
yarnRESOURCEMANAGER
yarn-JOBHISTORY
ZooKeeper
zookeeper-server
(zookeeper)
30 | Configuring Hadoop Security with Cloudera Manager
zookeeper
zookeeper
600
600
cloudera-scm cloudera-scm 600
Viewing and Regenerating Kerberos Principals
Viewing and Regenerating Kerberos Principals
As soon as you enable Hadoop secure authentication for HDFS and MapReduce service instances, Cloudera
Manager starts creating the Kerberos principals for each of the role instances. The amount of time this process
will take depends on the number of hosts and HDFS and MapReduce role instances on your cluster. The process
can take from a few seconds for a small cluster to several minutes for a larger cluster. After the process is
completed, you can use the Cloudera Manager Admin Console to view the list of Kerberos principals that Cloudera
Manager has created for the cluster. Make sure there are principals for each of the hosts and HDFS and
MapReduce role instances on your cluster. If there are no principals after 10 minutes, then there is most likely
a problem with the principal creation. See the Troubleshooting Security Issues on page 37 section below for
more information. If necessary, you can use Cloudera Manager to regenerate the principals.
If you make a global configuration change in your cluster, such as changing the encryption type, you must use
the following instructions to regenerate the principals for your cluster.
Important:
• Regenerate principals using the following steps in the Cloudera Manager Admin Console and not
directly using kadmin shell.
• Do not regenerate the principals for your cluster unless you have made a global configuration
change. Before regenerating, be sure to read Step 2: Set up a Cluster-dedicated KDC and Default
Domain for the Hadoop Cluster on page 12 to avoid making your existing host keytabs invalid.
To view and regenerate the Kerberos principals for your cluster:
1. Select Administration > Kerberos.
2. The currently configured Kerberos principals are displayed. If you are running HDFS, the hdfs/hostname and
host/hostname principals are listed. If you are running MapReduce, the mapred/hostname and
host/hostname principals are listed. The principals for other running services are also listed.
3. Only if necessary, select the principals you want to regenerate.
4. Click Regenerate.
Configuring Hadoop Security with Cloudera Manager | 31
Configuring LDAP Group Mappings
Configuring LDAP Group Mappings
To Set up LDAP (Active Directory) group mappings for Hadoop, make the following changes to the HDFS service's
security configuration:
1. Open the Cloudera Manager Admin Console and navigate to the HDFS service.
2. Go to Configuration > View and Edit.
3. Modify the following configuration properties under the Service-Wide > Security section. The table below
lists the properties and the value to be set for each property.
Configuration Property
Value
Hadoop User Group Mapping Implementation
org.apache.hadoop.security.LdapGroupsMapping
Hadoop User Group Mapping LDAP URL
ldap://<server>
Hadoop User Group Mapping LDAP Bind User
[email protected]
Hadoop User Group Mapping LDAP Bind User
Password
***
Hadoop User Group Mapping Search Base
dc=example-ad,dc=local
Although the above changes are suffiicient to configure group mappings for Active Directory, some changes to
the remaining default configurations might be required for OpenLDAP.
Important: Ensure all your services are registered users in LDAP.
Configuring Hadoop Security with Cloudera Manager | 33
Security-Related Issues in Cloudera Manager
Security-Related Issues in Cloudera Manager
The following is a known issue in Cloudera Manager:
• Cloudera Manager is unable to use a non-default realm. You must specify the default realm.
Configuring Hadoop Security with Cloudera Manager | 35
Troubleshooting Security Issues
Troubleshooting Security Issues
Typically, if Kerberos security is not working on your cluster, Hadoop will display generic messages about the
cause of the problem. If you have problems, try these troubleshooting suggestions:
• To make sure that the Cloudera Manager Server created the host and hdfs principals, run this command in
the kadmin.local or kadmin shell:
kadmin:
listprincs
• Verify that the keytab files exist in the /var/run/cloudera-scm-agent/process directory on the Cloudera
Manager Agent hosts and are not 0 bytes.
The following table contains solutions to some common Kerberos problems. You can also check the Server or
Agent logs for any errors associated with keytab generation or information about the problems.
Problems
Possible Causes
Solutions
After you enable Hadoop Secure
There is a problem with credential
Authentication in HDFS and
resolution.
MapReduce service instances, there
are no principals generated in the
Kerberos tab after about 20
seconds.
Check the Cloudera Manager Server
log file
(/var/log/cloudera-scm-server/cloudera-scm-server.log)
on the Server host to help you debug
the problem. The log file may show
why the Cloudera Manager Server
cannot generate the principals using
the gen or merge scripts.
Services are not started.
If you are using AES-256 encryption
for tickets, you must install the "Java
Cryptography Extension (JCE)
Unlimited Strength Jurisdiction
Policy File". For more information
about this issue, see: Appendix A Troubleshooting in CDH 4 or
Troubleshooting in CDH 5.
There is a problem with credential
usage in the cluster.
No principals are generated by
Because of a bug in Cloudera
See Step 6: Configure the Kerberos
Cloudera Manager, and the server Manager, you must specify the
Default Realm in the Cloudera
log contains the following message: Kerberos default realm in the
Manager Admin Console on page 19
Cloudera
Manager
Administration
>
kadmin: GSS-API (or Kerberos)
Settings page; Cloudera Manager is
error
while initializing kadmin
unable to use a non-default realm.
interface
Configuring Hadoop Security with Cloudera Manager | 37
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement