HDP yarn and hadoop configuration

Import Hadoop Source Project Into Eclipse And Build Hadoop-2.2.0 On Mac OS X

The Official Building Document is where I start. There you can find all the steps and cautions you need to build binary version of Hadoop from source. The most essential URLs are as follows:
  1. Read-only version of hadoop source provided by Apache
  2. The latest version of BUILDING.txt (You can find the corresponding BUILDING.txt from the root path of the hadoop source project)
  3. Native Libraries Guide for Hadoop-2.2.0
  4. Working with Hadoop under Eclipse
Now let's start! My environment is as follows:
Mac OS X-10.9.4
Protocol Buffer-2.5.0

After deploying all the items above, add relative environment variables to '~/.profile'.
export LD_LIBRARY_PATH="/usr/local/lib"  # For protocol buffer
export JAVA_HOME=$(/usr/libexec/java_home)
export ANT_HOME="/Users/jasonzhu/general/ant-1.9.4"
export FINDBUGS_HOME="/Users/jasonzhu/general/findbugs-3.0.0"
export HADOOP_HOME="/Users/jasonzhu/general/hadoop-2.2.0"

Remember to make it valid by 'source ~/.profile' after editing. You can double-check by issuing the following command. If all the versions prints out normally, just move on!
java -version
hadoop version
mvn -version
protoc --version
findbugs -version
ant -version

Another prerequisite is required by BUILDING.txt, in which it says "A one-time manual step is required to enable building Hadoop OS X with Java 7 every time the JDK is updated":
sudo mkdir `/usr/libexec/java_home`/Classes
sudo ln -s `/usr/libexec/java_home`/lib/tools.jar `/usr/libexec/java_home`/Classes/classes.jar

Then we are going to git clone the hadoop source project to our local filesystem:
git clone git://git.apache.org/hadoop.git

When it is done, we can check out all the remote branches in hadoop project by issuing 'git branch -r' in the root path of the project. Switch to branch '2.2.0' via 'git checkout branch-2.2.0'. Open pom.xml in the root path of the project so as to make sure it has changed to branch-2.2.0:
<description>Apache Hadoop Main</description> 
<name>Apache Hadoop Main</name>

Still in the root path of the project, execute commands as below:
mvn install -DskipTests
mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true

Possible Problem #1:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hadoop-hdfs: Compilation failure
[ERROR] Failure executing javac, but could not parse the error:
[ERROR] The system is out of resources.
[ERROR] Consult the following stack trace for details.
[ERROR] java.lang.OutOfMemoryError: Java heap space
add "export MAVEN_OPTS="-Xmx2048m -XX:MaxPermSize=2048m"" to '~/.profile' and source it to make it effective.

Possible Problem #2:
[ERROR] Failed to execute goal on project hadoop-hdfs-httpfs: Could not resolve dependencies for project org.apache.hadoop:hadoop-hdfs-httpfs:war:3.0.0-SNAPSHOT: Could not find artifact org.apache.hadoop:hadoop-hdfs:jar:tests:3.0.0-SNAPSHOT in apache.snapshots.https (https://repository.apache.org/content/repositories/snapshots) -> [Help 1]
At first, I'm using 'mvn install -Dmaven.test.skip=true', and the error above is thrown. Then I found out that there are some difference between '-DskipTests' and '-Dmaven.test.skip=true'. the former one compiles the tests, but not executes it, whereas the latter one doesn't compile or execute the tests. We should be aware of that.

Finally, installing 'm2e' in Eclipse and import hadoop source project.
Eclipse -> import -> Existing Maven projects.

When project imported, you don't have to be surprised by so many errors in almost all sub-projects of hadoop (Well, at least for me, there are soooo many red crosses on my projects). The most common one is "Plugin execution not covered by lifecycle configuration ... Maven Project Build Lifecycle Mapping Problem", this is caused by the asynchronized development of m2e eclipse plugin and maven itself. By now, no good solutions to this problem has been found by me. If anyone have some better idea, please leave a message, big thanks! Anyway, we can still track, read and revise the source code in eclipse before building the project from command line.

Building hadoop is a lot more easy than I thought before. There is detailed instruction in BUILDING.txt, too. The most essential part is as follows:
Maven build goals:

 * Clean                     : mvn clean
 * Compile                   : mvn compile [-Pnative]
 * Run tests                 : mvn test [-Pnative]
 * Create JAR                : mvn package
 * Run findbugs              : mvn compile findbugs:findbugs
 * Run checkstyle            : mvn compile checkstyle:checkstyle
 * Install JAR in M2 cache   : mvn install
 * Deploy JAR to Maven repo  : mvn deploy
 * Run clover                : mvn test -Pclover [-DcloverLicenseLocation=${user.name}/.clover.license]
 * Run Rat                   : mvn apache-rat:check
 * Build javadocs            : mvn javadoc:javadoc
 * Build distribution        : mvn package [-Pdist][-Pdocs][-Psrc][-Pnative][-Dtar]
 * Change Hadoop version     : mvn versions:set -DnewVersion=NEWVERSION

 Build options:

  * Use -Pnative to compile/bundle native code
  * Use -Pdocs to generate & bundle the documentation in the distribution (using -Pdist)
  * Use -Psrc to create a project source TAR.GZ
  * Use -Dtar to create a TAR with the distribution (using -Pdist)

As is said above in Native Libraries Guide for Hadoop-2.2.0, the native hadoop library is supported on *nix platforms only. The library does not to work with Cygwin or the Mac OS X platform. Consequently, we can build hadoop with command:
mvn package -Pdist,docs,src -DskipTests -Dtar

© 2014-2017 jason4zhu.blogspot.com All Rights Reserved 
If transfering, please annotate the origin: Jason4Zhu

Wednesday, October 29, 2014

InputFormat In Hive And The Way To Customize CombineHiveInputFormat

Part.1 InputFormat In Hive

There are two places where we can specify InputFormat in hive, when creating table and before executing HQL, respectively.
For the first case, we can specify InputFormat and OutputFormat when creating hive table, just like:
CREATE TABLE example_tbl
  id int,
  name string
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
We could check out the specified InputFormat and OutputFormat for a table by:
hive> DESC FORMATTED example_tbl;
# Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:            org.apache.hadoop.mapred.TextInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:             No
In this case, the InputFormat and OutputFormat is responsible for Storing data in as well as Retrieving data out of HDFS. Thus, it is transparent to hive itself. For instance, some text content is saved in binary format in HDFS, which is mapped to a particular hive table. When we invoking a hive task on this table, it will load the data via its InputFormat so as to get the 'decoded' text content. After executing the HQL, the hive task will write the result to whatever the destination is(HDFS, local file system, screen, etc.) via its OutputFormat. 
For the second case, we could set 'hive.input.format' before invoking a HQL:
hive> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
hive> select * from example_tbl where id > 10000;
If we set this parameter in hive-site.xml, it will be the default Hive InputFormat provided not setting 'hive.input.format' explicitly before the HQL.
The InputFormat in this scenario serves different function in comparison to the former one. Firstly, let's take a glance at 'org.apache.hadoop.mapred.FileInputFormat', which is the base class for all file-based InputFormat. There are three essential methods in this class:
boolean isSplitable(FileSystem fs, Path filename)
InputSplit[] getSplits(JobConf job, int numSplits)
RecordReader<K, V> getRecordReader(InputSplit split, JobConf job, Reporter reporter)
'isSplitable' is self-explaining: it will return whether the given filename is splitable. This method is valid when working around MapReduce program, when it comes to Hive-related one, we could set 'mapreduce.input.fileinputformat.split.minsize' in hive-site.xml to a very big value to achieve the same effect alternatively.
'getSplits' will return an array of InputSplit objects, whose size is corresponding to the number of mappers for this HQL task. Every InputSplit contains one or more file chunks in current file system, the details will be discussed later.
'getRecordReader' will return a 'org.apache.hadoop.mapred.RecordReader' object, whose function is to read data record by record from underlying file system. The main methods are as follows:
K createKey()
V createValue()
boolean next(K key, V value)
float getProgress()
'createKey', 'createValue' and 'getProgress' is well self-explaining. 'next' will evaluate the key and value parameters from current read position provided it returns true; when being at EOF, false is returned.
In the former case as mentioned above, only 'getRecordReader' method will be used; Whereas in the latter case, only 'getSplits' method will be used.

Part.2 Customize CombineHiveInputFormat

In my daily work, there's a need for me to rewrite CombineHiveInputFormat class. Our data in HDFS is partitioned by yyyyMMdd, in each partition, all files are named in pattern 'part-i'(i∈[0,64)):
This experimental hive table is created by:
CREATE EXTERNAL table hive_combine_test
(id string,
rdm string)
PARTITIONED BY (dateid string)
row format delimited fields terminated by '\t'
stored as textfile;

ALTER TABLE hive_combine_test
ADD PARTITION (dateid='20140901')
location '/user/supertool/zhudi/hiveTest/20140901';

ALTER TABLE hive_combine_test
ADD PARTITION (dateid='20140902')
location '/user/supertool/zhudi/hiveTest/20140902';

ALTER TABLE hive_combine_test
ADD PARTITION (dateid='20140903')
location '/user/supertool/zhudi/hiveTest/20140903';
What we intend to do is to package all the files from different partition with the same i into one InputSplit, so as to package them into one mapper. Overall, there should be 64 mappers no matter how many days(partitions) are involved in my HQL.
The way to customize CombineHiveInputFormat in eclipe is as follows:
In eclipse, File-->New-->Other-->Maven Project-->Create a simple project.
Revise pom.xml according to your own hadoop and hive version:






At the same time, we should insert maven-assembly-plugin in pom.xml in order to package:
     <id>make-assembly</id> <!-- this is used for inheritance merges -->
     <phase>package</phase> <!-- bind to the packaging phase -->
After all the peripheral settings, now we can just create a new class derived from CombineHiveInputFormat. What we intend to do is to reconstruct the array of InputSplit returned from CombineHiveInputFormat.getSplits():
public class JudCombineHiveInputFormatOld<K extends WritableComparable, V extends Writable>
  extends CombineHiveInputFormat<WritableComparable, Writable> {

 public InputSplit[] getSplits(JobConf job, int numSplits)
   throws IOException {
  InputSplit[] iss = super.getSplits(job, numSplits);
  //TODO: Reconstruct the iss to what we want.

  return null;

Consequently, it is time that we get some knowledge on InputSplit. In CombineHiveInputFormat, the implementation class for InputSplit is CombineHiveInputSplit, which contains a 'org.apache.hadoop.hive.shims.HadoopShimsSecure.InputSplitShim' implementation class. The constructor for 'org.apache.hadoop.hive.shims.HadoopShimsSecure.InputSplitShim' needs a 'org.apache.hadoop.mapred.lib.CombineFileSplit' object, whose constructor is like:
CombineFileSplit(JobConf job, Path[] files, long[] start, long[] lengths, String[] locations)
Apparently, all parameters are corresponding to the InputSplit in MapReduce, standing for JobConf info, file paths info, file start positions, file chunk size, the hive cluster that all the files will be sent to, respectively.
After getting familiar with the structure of InputSplit Class, we can simply rearrange all the files in InputSplit according to the file name pattern.
Just one more thing: CombineHiveInputSplit has a field named 'inputFormatClassName', which is the name of InputFormat configured when creating the hive table(In the former case as stated above). In the process of executing a hive task, files may come from different source with different InputFormat(Some come from hive table's source data, some come from hive temporary data). Thus, InputFormatClassName should be grouped when we rearrange InputSplit.
Here's a code snippet for reconstruction of CombineHiveInputFormat:
Path[] files = new Path[curSplitInfos.size()];
long[] starts = new long[curSplitInfos.size()];
long[] lengths = new long[curSplitInfos.size()];
for(int i = 0; i < curSplitInfos.size(); ++i) {
 SplitInfo si = curSplitInfos.get(i);
 files[i] = si.getFile();
 starts[i] = si.getStart();
 lengths[i] = si.getLength();
String[] locations = new String[1];
locations[0] = slice2host.get(sliceid);
org.apache.hadoop.mapred.lib.CombineFileSplit cfs = 
  new org.apache.hadoop.mapred.lib.CombineFileSplit(
org.apache.hadoop.hive.shims.HadoopShimsSecure.InputSplitShim iqo = 
  new org.apache.hadoop.hive.shims.HadoopShimsSecure.InputSplitShim(cfs);
CombineHiveInputSplit chis = new CombineHiveInputSplit(job, iqo);
After implementing, we can simply issue mvn clean package -Dmaven.test.skip=true, then copy '*jar-with-dependencies*.jar' in project target folder to ($HIVE_HOME/lib in every hive clusters) as well as ($HADOOP_HOME/share/hadoop/common/lib in every hive clusters).
At last, we can set hive.input.format to our own version by 'set hive.input.format=com.judking.hive.inputformat.JudCombineHiveInputFormat;' before invoking a HQL.
If debugging is needed, we can System.out in our InputFormat class, in which way the info will be printed to screen. Alternatively, we can use 'LoggerFactory.getLog()' to retrieve a Log object, the content will output to '/tmp/(current_user)/hive.log'.
© 2014-2017 jason4zhu.blogspot.com All Rights Reserved 
If transfering, please annotate the origin: Jason4Zhu

Tuesday, October 28, 2014

Find And Replace Specific Keyword In All Files From A Directory Recursively

If we intend to replace keyword '<abc>' with '<def>' in all *.java files from $ROOT_PATH, we can simply achieve this by:
find . -name "*.java" -print | xargs sed -i "" "s/<abc>/<def>/g"

The first part "find . -name "*.java" -print" will print all the relative paths of files which matches the pattern "*.java", just like:

Then all the output lines will be passed to sed command via xargs, thus the equivalent of the latter part is as below:
sed -i "" "s/<abc>/<def>/g" ./a.java
sed -i "" "s/<abc>/<def>/g" ./dir1/b.java

'-i' stands for:
-i[SUFFIX], --in-place[=SUFFIX]

    edit files in place (makes backup if extension supplied)

which will output the replaced file to a file.

But here's a tricky part. As mentioned above, the '-i' is optional on Ubuntu, whereas it is kind of mandatory to give '-i' a value in Mac osx, or some error like 'invalid command code .' could be thrown. Consequently, it is recommend that we add '-i ""' to our command, although it's a little bit cumbersome :)

© 2014-2017 jason4zhu.blogspot.com All Rights Reserved 
If transfering, please annotate the origin: Jason4Zhu

Saturday, October 25, 2014

How To Set The Queue Where A MapReduce Task Or Hive Task To Run

There is always a need for us to specify the queue for our MR or hive task. Here's the way:

An example for MapReduce task:
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi -Dmapred.job.queue.name=root.example_queue 10 10000

For Hive Task, inserting the following code before invoking the real HQL task:
set mapred.job.queue.name=root.example_queue;

To generalize it, we can safely conclude that most of Hadoop or Hive configurations can be set in the upper forms respectively. What the 'most' means here is that some configurations cannot be revised during runtime, or being stated as 'final'.

© 2014-2017 jason4zhu.blogspot.com All Rights Reserved 
If transfering, please annotate the origin: Jason4Zhu

Friday, October 24, 2014

Linux: Kill Processes With Specific Keyword

When we are going to kill a process with specific keyword, we can simply find the PID(process id) by command as follows (PID is always shown in the second column of the result):
ps aux | grep "keyword"

Then the process can be forcibly killed by:
kill -9 PID

This method works fine when there is only one or a few processes to be terminated. As the number of process grow larger and larger, it is too painful to do it manually. Consequently, we have to deal with it in some other way.

Note: All the solutions below will kill processes which is listed on the screen via command 'ps aux | grep "keyword"'.

Solution 1:
pgrep keyword | xargs kill -9

Solution 2:
ps aux | grep keyword | awk '{print $2}' | xargs kill -9

Solution 3:
kill -s 9 `pgrep keyword`

Solution 4:
pkill -9 keyword

Pick one that you're most comfortable with :)

© 2014-2017 jason4zhu.blogspot.com All Rights Reserved 
If transfering, please annotate the origin: Jason4Zhu

VCore Configuration In Hadoop

Just like memory, vcores, the abbreviation for virtual cores, is another type of resource in Hadoop cluster. It is the abstraction of the ability of CPU.

We can see from class 'org.apache.hadoop.yarn.api.records.Resource', which is the abstraction of resource in YARN, that memory and vcores are equally treated in Hadoop:
int getMemory();

void setMemory(int memory);

int getVirtualCores();

void setVirtualCores(int vCores);

Here are all the relative params for vcores in Hadoop configuration files:
Name Description
mapreduce.map.cpu.vcores The number of virtual cores required for each map task.
mapreduce.reduce.cpu.vcores The number of virtual cores required for each reduce task.
yarn.app.mapreduce.am.resource.cpu-vcores The number of virtual CPU cores the MR AppMaster needs.
yarn.scheduler.maximum-allocation-vcores The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.
yarn.scheduler.minimum-allocation-vcores The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.
yarn.nodemanager.resource.cpu-vcores Number of CPU cores that can be allocated for containers.
The first three params are configured in mapred-site.xml, the rest are in yarn-site.xml.

mapreduce.map.cpu.vcores and mapreduce.reduce.cpu.vcores are easy to understand, which represents the number of vcores providing to each map or reduce task respectively.

yarn.app.mapreduce.am.resource.cpu-vcores stands for the number of vcores for MapReduce Application Master Node. (MR ApplicationMaster: A new framework for MR on YARN, which cooperates with NodeManager with the resource retrieved from ResourceManager)

The last three params is well-explained in the description section. As for yarn.nodemanager.resource.cpu-vcores, it should be set to the number of processors in a single node in most cases (One virtual core should correspond to one physical processor). At the same time, it is recommended that yarn.scheduler.maximum-allocation-vcores be set no more than yarn.nodemanager.resource.cpu-vcores.

For instance, if a single node has 24 processors, then an appropriate set of params can be set as follows:
Name Value
mapreduce.map.cpu.vcores 1
mapreduce.reduce.cpu.vcores 1
yarn.app.mapreduce.am.resource.cpu-vcores 1
yarn.scheduler.maximum-allocation-vcores 24
yarn.scheduler.minimum-allocation-vcores 1
yarn.nodemanager.resource.cpu-vcores 24

Relative Posts:
·  Memory Configuration In Hadoop

© 2014-2017 jason4zhu.blogspot.com All Rights Reserved 
If transfering, please annotate the origin: Jason4Zhu

Thursday, October 23, 2014

Hadoop_Troubleshooting: fair-scheduler.xml Does Not Take Effect After Revising

When using Fair Scheduler in YARN, we don't need to restart Hadoop cluster when fair-scheduler.xml is altered, as stated in official document:

The Fair Scheduler contains configuration in two places -- algorithm parameters are set in HADOOP_CONF_DIR/mapred-site.xml, while a separate XML file called the allocation file, located by default in HADOOP_CONF_DIR/fair-scheduler.xml, is used to configure pools, minimum shares, running job limits and preemption timeouts. The allocation file is reloaded periodically at runtime, allowing you to change pool settings without restarting your Hadoop cluster.

However, there are times when the changes in fair-scheduler.xml doesn't come into effect and we have no idea what's going wrong, here's the way to find it out!

Firstly, go into directory '$HADOOP_HOME/logs'.
cd $HADOOP_HOME/logs

Then open file 'yarn-hadoop-resourcemanager-*.log'.
vim yarn-hadoop-resourcemanager-[it_depends].log

In which, find 'ERROR' from bottom, looking for records like (especially the bold red part):
2014-10-24 10:39:39,037 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager: Failed to reload fair scheduler config file - will use existing allocations.

java.util.IllegalFormatConversionException: d != org.apache.hadoop.yarn.api.records.impl.pb.ResourcePBImpl

        at java.util.Formatter$FormatSpecifier.failConversion(Formatter.java:4045)

        at java.util.Formatter$FormatSpecifier.printInteger(Formatter.java:2748)

        at java.util.Formatter$FormatSpecifier.print(Formatter.java:2702)

        at java.util.Formatter.format(Formatter.java:2488)

        at java.util.Formatter.format(Formatter.java:2423)

        at java.lang.String.format(String.java:2797)

        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.loadQueue(QueueManager.java:460)

        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.reloadAllocs(QueueManager.java:312)

        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.reloadAllocsIfNecessary(QueueManager.java:243)

        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:270)

        at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)

        at java.lang.Thread.run(Thread.java:722)

From above, we can see the exception is thrown at line 460 in class 'org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager'.

Tracking to the source code, we can find the corresponding exception content:
LOG.warn(String.format("Queue %s has max resources %d less than min resources %d", queueName, maxQueueResources.get(queueName), minQueueResources.get(queueName)));

© 2014-2017 jason4zhu.blogspot.com All Rights Reserved 
If transfering, please annotate the origin: Jason4Zhu

Wednesday, October 22, 2014

Keep the Original Format Of Text When Pasting to Vim (A.K.A. Turning Off Auto-Indent Feature Of Vim)

Sometimes when we paste some text to vim, we are more likely to retain the format of the text. But by default, vim will add indent to our text automatically, just like:

Solution 1:
Vim provides the ‘paste’ option to maintain the pasting text unmodified:
:set paste
:set nopaste 

Solution 2:
Alternatively, vim offers the ‘pastetoggle’ option to turn ‘paste’ on and off by pressing a key. What we need to do is appending the following code in ~/.vimrc. Then we can just press key 'F2' to switch between ‘paste on’ and ‘paste off'.
set pastetoggle=<F2>

© 2014-2017 jason4zhu.blogspot.com All Rights Reserved 
If transfering, please annotate the origin: Jason4Zhu

Generate Public Key From A Private Key

When we've got the private key, say id_rsa,  we can regenerate the public key by:
ssh-keygen -f ./id_rsa -y > ./id_rsa.pub

It is well-explained in ‘man ssh-keygen’:
-y    This option will read a private OpenSSH format file and print an OpenSSH public key to stdout.
-f    Specify the private key file.

© 2014-2017 jason4zhu.blogspot.com All Rights Reserved 
If transfering, please annotate the origin: Jason4Zhu

Hadoop_Troubleshooting: Removing queues from fair-scheduler.xml will not take effect in the YARN monitoring webpage.

In fair-scheduler.xml, there's a queue named "test_queue", whose configuration is as below:
<queue name="test_queue">

  <minResources>120000 mb, 600vcores</minResources>

  <maxResources>200000 mb, 720vcores</maxResources>






After I deleting the settings, this queue is not removed from the YARN monitoring webpage, even though all the parameters(Min Resources, Max Resources, Fair Share) under "test_queue" is blank. I'm sure that the fair-scheduler.xml is reloaded correctly.

Then I check it up with command as follows, the queue state is running, just like all the other queues.
K11:/>hadoop queue -info root.test_queue

DEPRECATED: Use of this script to execute mapred command is deprecated.

Instead use the mapred command for it.

14/10/21 22:11:53 INFO client.RMProxy: Connecting to ResourceManager at server/ip:port


Queue Name : root.test_queue

Queue State : running

Scheduling Info : Capacity: 0.0, MaximumCapacity: UNDEFINED, CurrentCapacity: 0.0

Curiously enough, I tested whether I can set this queue again to run my hive task, and it STILL CAN!

As you can see, the MaxResources of this queue bulges to 100% of total resource after deleting it from fair-scheduler.xml, anyone can "escape" the ACL and use the resources arbitrarily. Consequently, attention should be paid to this scenario.

The solution is to restart yarn service(stop-yarn.sh => start-yarn.sh), at the price of interfering all the ongoing tasks to fail. (If anyone have a better solution, please FYI by leaving a message!)

© 2014-2017 jason4zhu.blogspot.com All Rights Reserved 
If transfering, please annotate the origin: Jason4Zhu

Tuesday, October 21, 2014

Hadoop_Troubleshooting: 'AcpSubmitApps' In fair-scheduler.xml is Not Working.

After configuring "AclSubmitApps" for a specific queue in fair-scheduler.xml, I can still submit a hive task by user who is not in "AclSubmitApps" list, which is not expected according to the official document.

The configuration of my testmonitor queue:
<queue name="testmonitor">

  <minResources>10000 mb,20vcores</minResources>

  <maxResources>30000 mb,50vcores</maxResources>







The monitoring status for my hive task:

The solution is simple and easy: Add "aclAdministerApps" to the queue correspondingly:
<queue name="testmonitor">

  <minResources>10000 mb,20vcores</minResources>

  <maxResources>30000 mb,50vcores</maxResources>








Then we can check the acl settings of queues via hadoop queue -showacls. In this time, the acl of queue `root.testmonitor` have neither SUBMIT_APPLICATIONS nor ADMINISTER_QUEUE.

When I submit a task in user monitor, The following exception is thrown as expected:
java.io.IOException: Failed to run job : User supertool cannot submit applications to queue root.mbmonitor
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:299)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:430)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)

Honestly, I don't know exactly why we have to add "aclAdministerApps" in order to make it work, the official document says nothing about it, either. If anyone knows the essential reason to this solution, please leave a message, I'd really appreciate it :)

© 2014-2017 jason4zhu.blogspot.com All Rights Reserved 
If transfering, please annotate the origin: Jason4Zhu

Hadoop_Troubleshooting: Job hangs at "map 0% reduce 0%" with logs "Reduce slow start threshold not met"

When I submitting an example hadoop task as below:
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi -Dmapred.job.queue.name=root.supertool 10 10000
The progress gets stuck at "map 0% reduce 0%", with job logs:
2014-10-22 13:10:46,703 INFO [Thread-48] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: mapResourceReqt:1536
2014-10-22 13:10:46,770 INFO [Thread-48] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: reduceResourceReqt:3072
2014-10-22 13:10:46,794 INFO [eventHandlingThread] org.apache.hadoop.conf.Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
2014-10-22 13:10:47,368 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:1 ScheduledMaps:10 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 HostLocal:0 RackLocal:0
2014-10-22 13:10:47,602 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1413952943653_0002: ask=8 release= 0 newContainers=0 finishedContainers=0 resourcelimit= knownNMs=6
2014-10-22 13:10:47,605 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0
2014-10-22 13:10:47,605 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 0
2014-10-22 13:10:47,607 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=0
2014-10-22 13:10:47,607 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 1

After google and experiment, It is more likely that hadoop job hangs at "Reduce slow start threshold not met" when there is not enough resource, like memory or vcore.

In my case, I rechecked $HADOOP_HOME/etc/hadoop/fair-scheduler.xml, and found that the vcores in root.supertool queue was accidentally set to zero:
<queue name="supertool">

  <minResources>10000 mb,0vcores</minResources>

  <maxResources>90000 mb,0vcores</maxResources>






When I set vcores back to normal, the stuck job just continues to go.

P.S. Besides the condition above, unreasonable memory or vcore configuration can also lead to this scenario. Please visit Memory Configuration in Hadoop and VCore Configuration in Hadoop for more reference. 

P.S.Again After I installed hadoop on my Mac and ran "hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi -Dmapred.job.queue.name=root.test 3 100000000", I found it hanged at "Reduce slow start threshold not met. completedMapsForReduceSlowstart 1" again. I can assure that I've configured memory and vcores as instructed in my links above. Then I found that the maxResources for queue root.test in fair-scheduler.xml is set to 500MB, but the 'yarn.scheduler.minimum-allocation-mb', 'mapreduce.map.memory.mb' and 'mapreduce.reduce.memory.mb' is all above 500MB, that is to say, not even a single mapper or reducer can be allocated in this queue. Consequently, we should be aware that the maxResources for a specific queue should be greater than all the three parameters above.

© 2014-2017 jason4zhu.blogspot.com All Rights Reserved 
If transfering, please annotate the origin: Jason4Zhu

Memory Configuration In Hadoop

In this post, there are some recommendations on how to configure YARN and MapReduce memory allocation settings based on the node hardware specifications.
YARN takes into account all of the available compute resources on each machine in the cluster. Based on the available resources, YARN negotiates resource requests from applications (such as MapReduce) running in the cluster. YARN then provides processing capacity to each application by allocating Containers. A Container is the basic unit of processing capacity in YARN, and is an encapsulation of resource elements (memory, cpu etc.).
In a Hadoop cluster, it is vital to balance the usage of memory (RAM), processors (CPU cores) and disks so that processing is not constrained by any one of these cluster resources. As a general recommendation, allowing for two Containers per disk and per core gives the best balance for cluster utilization.
When determining the appropriate YARN and MapReduce memory configurations for a cluster node, start with the available hardware resources. Specifically, note the following values on each node:
  • RAM (Amount of memory)
  • CORES (Number of CPU cores)
  • DISKS (Number of disks)
The total available RAM for YARN and MapReduce should take into account the Reserved Memory. Reserved Memory is the RAM needed by system processes and other Hadoop processes (such as HBase).
Reserved Memory = Reserved for stack memory + Reserved for HBase Memory (If HBase is on the same node)

Use the following table to determine the Reserved Memory per node.
Reserved Memory Recommendations
Total Memory per NodeRecommended Reserved System MemoryRecommended Reserved HBase Memory
4 GB1 GB1 GB
8 GB2 GB1 GB
16 GB2 GB2 GB
24 GB4 GB4 GB
48 GB6 GB8 GB
64 GB8 GB8 GB
72 GB8 GB8 GB
96 GB12 GB16 GB
128 GB24 GB24 GB
256 GB32 GB32 GB
512 GB64 GB64 GB
The next calculation is to determine the maximum number of containers allowed per node. The following formula can be used:
# of containers = min (2*CORES, 1.8*DISKS, (Total available RAM) / MIN_CONTAINER_SIZE)

Where MIN_CONTAINER_SIZE is the minimum container size (in RAM). This value is dependent on the amount of RAM available -- in smaller memory nodes, the minimum container size should also be smaller. The following table outlines the recommended values:
Total RAM per NodeRecommended Minimum Container Size
Less than 4 GB256 MB
Between 4 GB and 8 GB512 MB
Between 8 GB and 24 GB1024 MB
Above 24 GB2048 MB
The final calculation is to determine the amount of RAM per container:
RAM-per-container = max(MIN_CONTAINER_SIZE, (Total Available RAM) / containers))
With these calculations, the YARN and MapReduce configurations can be set:
Configuration FileConfiguration SettingValue Calculation
yarn-site.xmlyarn.nodemanager.resource.memory-mb= containers * RAM-per-container
yarn-site.xmlyarn.scheduler.minimum-allocation-mb= RAM-per-container
yarn-site.xmlyarn.scheduler.maximum-allocation-mb= containers * RAM-per-container
mapred-site.xmlmapreduce.map.memory.mb= RAM-per-container
mapred-site.xml        mapreduce.reduce.memory.mb= 2 * RAM-per-container
mapred-site.xmlmapreduce.map.java.opts= 0.8 * RAM-per-container
mapred-site.xmlmapreduce.reduce.java.opts= 0.8 * 2 * RAM-per-container
yarn-site.xml (check)yarn.app.mapreduce.am.resource.mb= 2 * RAM-per-container
yarn-site.xml (check)yarn.app.mapreduce.am.command-opts= 0.8 * 2 * RAM-per-container
Note: After installation, both yarn-site.xml and mapred-site.xml are located in the /etc/hadoop/conf folder.
Cluster nodes have 12 CPU cores, 48 GB RAM, and 12 disks.
Reserved Memory = 6 GB reserved for system memory + (if HBase) 8 GB for HBase
Min container size = 2 GB
If there is no HBase:
# of containers = min (2*12, 1.8* 12, (48-6)/2) = min (24, 21.6, 21) = 21
RAM-per-container = max (2, (48-6)/21) = max (2, 2) = 2
ConfigurationValue Calculation
yarn.nodemanager.resource.memory-mb= 21 * 2 = 42*1024 MB
yarn.scheduler.minimum-allocation-mb= 2*1024 MB
yarn.scheduler.maximum-allocation-mb= 21 * 2 = 42*1024 MB
mapreduce.map.memory.mb= 2*1024 MB
mapreduce.reduce.memory.mb         = 2 * 2 = 4*1024 MB
mapreduce.map.java.opts= 0.8 * 2 = 1.6*1024 MB
mapreduce.reduce.java.opts= 0.8 * 2 * 2 = 3.2*1024 MB
yarn.app.mapreduce.am.resource.mb= 2 * 2 = 4*1024 MB
yarn.app.mapreduce.am.command-opts= 0.8 * 2 * 2 = 3.2*1024 MB
If HBase is included:
# of containers = min (2*12, 1.8* 12, (48-6-8)/2) = min (24, 21.6, 17) = 17
RAM-per-container = max (2, (48-6-8)/17) = max (2, 2) = 2
ConfigurationValue Calculation
yarn.nodemanager.resource.memory-mb= 17 * 2 = 34*1024 MB
yarn.scheduler.minimum-allocation-mb= 2*1024 MB
yarn.scheduler.maximum-allocation-mb= 17 * 2 = 34*1024 MB
mapreduce.map.memory.mb= 2*1024 MB
mapreduce.reduce.memory.mb         = 2 * 2 = 4*1024 MB
mapreduce.map.java.opts= 0.8 * 2 = 1.6*1024 MB
mapreduce.reduce.java.opts= 0.8 * 2 * 2 = 3.2*1024 MB
yarn.app.mapreduce.am.resource.mb= 2 * 2 = 4*1024 MB
yarn.app.mapreduce.am.command-opts= 0.8 * 2 * 2 = 3.2*1024 MB
Linked from http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-

No comments:

Post a Comment