Monday, July 18, 2016

Hadoop cluster prerequisite setup in Linux server part3

Networking and Security Requirements

The hosts in a Cloudera Manager deployment must satisfy the following networking and security requirements:

Cluster hosts must have a working network name resolution system and correctly formatted /etc/hosts file. All cluster hosts must have properly configured forward and reverse host resolution through DNS. The /etc/hosts files must Contain consistent information about hostnames and IP addresses across all hosts.

Not contain uppercase hostnames
Not contain duplicate IP addresses

Also, do not use aliases, either in /etc/hosts or in configuring DNS. A properly formatted /etc/hosts file should be similar to the following example:

127.0.0.1 localhost.localdomain localhost
192.168.2.1 bdcluster-01.linuxtipss.blogspot.mx bdcluster-01
192.168.2.2 bdcluster-02.linuxtipss.blogspot.mx bdcluster-02
192.168.2.3 bdcluster-03.linuxtipss.blogspot.mx bdcluster-03

Password-less ssh setup

In most cases, the Cloudera Manager Server must have SSH access to the cluster hosts when you run the installation or upgrade wizard. You must log in using a root account or an account that has password-less sudo permission. For authentication during the installation and upgrade procedures, you must either enter the password or upload a public and private key pair for the root or sudo user account. If you want to use a public and private key pair, the public key must be installed on the cluster hosts before you use Cloudera Manager.

Cloudera Manager uses SSH only during the initial install or upgrade. Once the cluster is set up, you can disable root SSH access or change the root password. Cloudera Manager does not save SSH credentials, and all credential information is discarded when the installation is complete. For more information, see Permission Requirements for Package-based Installations and Upgrades of CDH.

SELinux

No blocking is done by Security-Enhanced Linux (SELinux).

Important: Cloudera Enterprise is supported on platforms with Security-Enhanced Linux (SELinux) enabled. However, policies need to be provided by other parties or created by the administrator of the cluster deployment. Cloudera is not responsible for policy support nor policy enforcement, nor for any issues with such. If you experience issues with SELinux, contact your OS support provider.

IPv6 must be disabled.

Iptables

No blocking by iptables or firewalls; port 7180 must be open because it is used to access Cloudera Manager after installation. Cloudera Manager communicates using specific ports, which must be open.

Users and Groups

Component (Version) Unix User ID Groups Notes
Cloudera Manager (all versions) cloudera-scm cloudera-scm Cloudera Manager processes such as the Cloudera Manager Server and the monitoring roles run as this user.
The Cloudera Manager keytab file must be named cmf.keytab since that name is hard-coded in Cloudera Manager.Note: Applicable to clusters managed by Cloudera Manager only.
Apache Accumulo (Accumulo 1.4.3 and higher) accumulo accumulo Accumulo processes run as this user.
Apache Avro   No special users.
Apache Flume (CDH 4, CDH 5) flume flume The sink that writes to HDFS as this user must have write privileges.
Apache HBase (CDH 4, CDH 5) hbase hbase The Master and the RegionServer processes run as this user.
HDFS (CDH 4, CDH 5) hdfs hdfs, hadoop The NameNode and DataNodes run as this user, and the HDFS root directory as well as the directories used for edit logs should be owned by it.
Apache Hive (CDH 4, CDH 5) hive hive The HiveServer2 process and the Hive Metastore processes run as this user.
A user must be defined for Hive access to its Metastore DB (for example, MySQL or Postgres) but it can be any identifier and does not correspond to a Unix uid. This isjavax.jdo.option.ConnectionUserName in hive-site.xml.
Apache HCatalog (CDH 4.2 and higher, CDH 5) hive hive The WebHCat service (for REST access to Hive functionality) runs as the hive user.
HttpFS (CDH 4, CDH 5) httpfs httpfs The HttpFS service runs as this user. SeeHttpFS Security Configuration for instructions on how to generate the merged httpfs-http.keytab file.
Hue (CDH 4, CDH 5) hue hue Hue services run as this user.
Cloudera Impala (CDH 4.1 and higher, CDH 5) impala impala, hive Impala services run as this user.
Apache Kafka (Cloudera Distribution of Kafka 1.2.0) kafka kafka Kafka services run as this user.
Java KeyStore KMS (CDH 5.2.1 and higher) kms kms The Java KeyStore KMS service runs as this user.
Key Trustee KMS (CDH 5.3 and higher) kms kms The Key Trustee KMS service runs as this user.
Key Trustee Server (CDH 5.4 and higher) keytrustee keytrustee The Key Trustee Server service runs as this user.
Kudu kudu kudu Kudu services run as this user.
Llama (CDH 5) llama llama Llama runs as this user.
Apache Mahout   No special users.
MapReduce (CDH 4, CDH 5) mapred mapred, hadoop Without Kerberos, the JobTracker and tasks run as this user. The LinuxTaskController binary is owned by this user for Kerberos.
Apache Oozie (CDH 4, CDH 5) oozie oozie The Oozie service runs as this user.
Parquet   No special users.
Apache Pig   No special users.
Cloudera Search (CDH 4.3 and higher, CDH 5) solr solr The Solr processes run as this user.
Apache Spark (CDH 5) spark spark The Spark History Server process runs as this user.
Apache Sentry (incubating) (CDH 5.1 and higher) sentry sentry The Sentry service runs as this user.
Apache Sqoop (CDH 4, CDH 5) sqoop sqoop This user is only for the Sqoop1 Metastore, a configuration option that is not recommended.
Apache Sqoop2 (CDH 4.2 and higher, CDH 5) sqoop2 sqoop, sqoop2 The Sqoop2 service runs as this user.
Apache Whirr   No special users.
YARN (CDH 4, CDH 5) yarn yarn, hadoop Without Kerberos, all YARN services and applications run as this user. The LinuxContainerExecutor binary is owned by this user for Kerberos.
Apache ZooKeeper (CDH 4, CDH 5) zookeeper zookeeper The ZooKeeper processes run as this user. It is not configurable.

Go back part2

Sunday, July 17, 2016

Hadoop cluster prerequisite setup in Linux server part2

supported databases

  1. Cloudera's recommendations are:
    • For Red Hat and similar systems:
      • Use MySQL server version 5.0 (or higher) 
      • Use MySQL server version 5.1 (or higher) 

    • For SLES systems, use MySQL server version 5.0 (or higher) and version 5.0 client shared libraries.
    • For Ubuntu systems:
      • Use MySQL server version 5.5 (or higher) and version 5.0 client shared libraries on Precise (12.04).
  2. For connectivity purposes only, Sqoop 1 supports MySQL5.1, PostgreSQL 9.1.4, Oracle 10.2, Teradata 13.1, and Netezza TwinFin 5.0. The Sqoop metastore works only with HSQLDB (1.8.0 and higher 1.x versions; the metastore does not work with any HSQLDB 2.x versions).
  3. Sqoop 2 can transport data to and from MySQL5.1, PostgreSQL 9.1.4, Oracle 10.2, and Microsoft SQL Server 2012. The Sqoop 2 repository is supported only on Derby.
  4. Derby is supported as shown in the table, but not always recommended. 

Supported Transport Layer Security Versions

The following components are supported by Transport Layer Security (TLS):

Components Supported by TLS

Role                         Port         Version
CMr Server         7182         TLS 1.2
CM server                 7183         TLS 1.2
Flume 9099         TLS 1.2
HBase Master 60010 TLS 1.2
NameNode 50470 TLS 1.2
Secondary NN         50495 TLS 1.2
Hive HiveServer2 10000 TLS 1.2
Hue Hue Server 8888         TLS 1.2
Impala Daemon 21000 TLS 1.2
Impala Daemon 21050 TLS 1.2
Impala Daemon 22000 TLS 1.2
Impala Daemon 25000 TLS 1.2
Impala StateStore 24000 TLS 1.2
Impala StateStore 25010 TLS 1.2
Catalog Server         25020 TLS 1.2
Catalog Server         26000 TLS 1.2
Oozie Oozie Server 11443 TLS 1.1
Solr                           8983         TLS 1.1
Solr                           8985         TLS 1.1
YARN RM            8090         TLS 1.2
JobHistory Server 19890 TLS 1.2

Resource Requirements

Cloudera Manager requires the following resources:

Disk Space - Cloudera Manager Server
5 GB on the partition hosting /var.
500 MB on the partition hosting /usr.

For parcels, the space required depends on the number of parcels you download to the Cloudera Manager Server and distribute to Agent hosts. You can download multiple parcels of the same product, of different versions and builds. If you are managing multiple clusters, only one parcel of a product/version/build/distribution is downloaded on the Cloudera Manager Server—not one per cluster. In the local parcel repository on the Cloudera Manager Server, the approximate sizes of the various parcels are as follows:

Cloudera Impala - 200 MB per parcel
Cloudera Search - 400 MB per parcel

Cloudera Management Service -The Host Monitor and Service Monitor databases are stored on the partition hosting /var. Ensure that you have at least 20 GB available on this partition. By default unpacked parcels are located in /opt/cloudera/parcels.

RAM - 4 GB is recommended for most cases and is required when using Oracle databases. 2 GB may be sufficient for non-Oracle deployments with fewer than 100 hosts. However, to run the Cloudera Manager Server on a machine with 2 GB of RAM, you must tune down its maximum heap size (by modifying -Xmx in /etc/default/cloudera-scm-server). Otherwise the kernel may kill the Server for consuming too much RAM.

Python - Cloudera Manager and CDH 4 require Python 2.4 or higher, but Hue in CDH 5 and package installs of CDH 5 require Python 2.6 or 2.7. All supported operating systems include Python version 2.4 or higher.

Perl - Cloudera Manager requires perl.

Back to part1

Saturday, July 16, 2016

Hadoop cluster prerequisite setup in Linux server

Supported OS for install Cloudera distribution hadoop

  • Red Hat Enterprise Linux and CentOS
      • 5.10, 64-bit
      • 5.7, 64-bit
      • 6.4, 64-bit
      • 6.5 in SE Linux mode
      • 6.5, 64-bit
      • 6.6, 64-bit
    • Oracle Enterprise (OEL) Linux with (UEK), 64-bit
      • 5.6 (UEK R2)
      • 6.4 (UEK R2)
      • 6.5 (UEK R2, UEK R3)
      • 6.6 (UEK R3)
  • SLES - SUSE Linux Enterprise Server 11, 64-bit. Service Pack 2 or later 
  • Debian - Wheezy (7.0 and 7.1), Squeeze (6.0) (deprecated), 64-bit
  • Ubuntu - Trusty (14.04), Precise (12.04), Lucid (10.04) (deprecated), 64-bit
Support Java version

CDH 5
1.7.0_55
1.7.0_67 or 1.7.0_75
1.8.0_60
1.8.0_60
CDH 4 and CDH 5
1.7.0_55
1.7.0_67 or 1.7.0_75
1.8.0_60
1.8.0_60
CDH 4
1.6.0_31
1.6.0_31 or higher
1.7.0_55
1.7.0_67 or 1.7.0_75