Install Hadoop 2.2 in Ubuntu

Install Hadoop 2.2 in Ubuntu

Prerequistive:
$sudo apt-get install openjdk-7-jdk
$ java -version
java version "1.7.0_25"
OpenJDKRuntime Environment (IceTea 2.3.12) (7u25-2.3.12-4ubuntu3)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)
$cd /usr/lib/jvm
$ln -s java-7-openjdk-amd64 jdk
$sudo apt-get install openssh-server

Add Hadoop Group and User
$sudo addgroup hadoop
$sudo adduser --ingroup hadoop hduser
$sudo adduser hduser sudo

After user is created, re-login into ubuntu using hduser

Setup SSH Certificate 
$ssh-keygen -t rsa -P ''
...
$cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ssh localhost


Download Hadoop 2.2.0
From path hadoop.apache.org
$cd ~
$wget http://www.trieuvan.com/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz
$sudo tar vxzf hadoop-2.2.0.tar.gz -C /usr/local
$cd /usr/local
$sudo mv hadoop-2.2.0 hadoop
$sudo chown -R hduser:hadoop hadoop

Setup Hadoop Environment VAriables
$cd ~
$vi .bashrc
paste following to the end of the file

#Hadoop variables
export JAVA_HOME=/usr/lib/jvm/jdk/
export HADOOP_INSTALL=/usr/local/hadoop
exoprt PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL

###end of paste

$cd /usr/local/hadoop/etc/hadoop
$vi hadoop-env.sh
#modify JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/jdk/

Re-login into Ubuntu using hduser and check hadoop version
$hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 7953ce7994d1628b240f08af81e1af4
This command was run using /usr/local/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar

/* If not shown then 
// $exec bash 
$$PATH
Make sure that the path you defined is shown by above command.
*/
At this point, hadoop is installed.

Configure Hadoop
$cd /usr/local/hadoop/etc/hadoop
$vi core-site.xml
#Paste following between <configuration>

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

</configuration>

$vi yarn-site.xml
#paste following between <configuration>
<property>
<name>yarn.namemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuarion>


$mv mapred-site.xml.template mapred-site.xml
$vi mapred-site.xml
#Paste following between <configuation>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

$cd ~
$mkdir -p mydata/hdfs/namenode
$mkdir -p mydata/hdfs/datanode
$cd /usr/local/hadoop/etc/hadoop
$vi hdfs-site.xml
Paste following between <configuration> tag
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/mydata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/mydata/hdfs/datanode</value>
</property>
</configuation>


Format Namenode
hduser@ubuntu$hdfs namenode -format

Start Hadoop Service
$start-dfs.sh
..
$start-yarn.sh
$jps
If everything is successful, you should see following services running 
2583 DataNode
2970 ResourceManager
3461 Jps
3177 NodeManager
2361 NameNode
2840 SecondaryNameNode

Run Hadoop Example:

hduser@ubuntu$cd /usr/local/hadoop
$hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5

Number of Maps = 2
........



To install Hive:

Download Hive from hive.apache.org and extract either using tar or by gui tool and paste it as /usr/local/hive

$tar -zxvf hive-0.12.tar.gz
$sudo mv hive-0.12.0 /usr/local/hive
$export HIVE_PREFIX=/usr/local/hive
export PATH=$PATH:$HIVE_PREFIX/bin
$hive
>set -v
>show tables;
>create table book(word string) 
>row format delimited
>fields terminated by ' '
>lines terminated by '\n';
>load data inpath '/user/hduser/war_and_peace.txt' into table book;
>select count(*) from book;
>select lower(word), count(*) as count
>from book
>where lower(substring(word,1,1)) = 'w'
>group by word
>having count > 50
>sort by count desc;

To upload data 
$hadoop fs -copyFromLocal /home/hsinay/war_and_peace.txt /user/hduser/war_and_peace.txt
or
$hadoop fs -copyFromLocal /home/hsinay/war_and_peace.txt hdfs:/user/hduesr/war_and_peace.txt

Some web-based url 
localhost:50070 -------> To go to namenode and to check the file browser.
localhost:8088 ------> to check the cluster.

No comments:

Post a Comment