Wednesday 12 July 2017

How to Change the priority of Hadoop Jobs (Hive,MapReduce,pig)?

Hadoop provides a command line utility for changing the priority of running job. There are total 5 different priority level are there. These priority levels are listed below:

VERY_HIGH
HIGH
NORMAL
LOW
VERY_LOW

We can change the priority of the running hadoop job syntax is shown below:

 hadoop job -set-priority <job-id> <priority>  

Example:

 hadoop job -set-priority job_20170111540_64444 VERY_HIGH  

From the names of the priority levels, it is quite obvious that highest priority is given to the jobs whose priority level is VERY_HIGH and least priority is given to the jobs whose priority level is VERY_LOW.

Priority in Map Reduce Program:

In order to set priority for a job on Hadoop 1 cluster, you can use the following example:

 -> SET mapred.job.priority=<priority_value>;  

In order to set priority for a job on Hadoop 2 cluster, you can use the following example:

 -> SET mapreduce.job.priority=<priority_value>;  

Priority in Pig Program:

The below property is used to set the job priority is Pig Programming.

 grunt> SET job.priority 'high'  

The following <priority_value> values are supported:
very_low, low, normal, high, very_high

Priority for Hive Query:

In order to set priority for a HIVE query on Hadoop 1 cluster, you can use the following example:

 hive> SET mapred.job.priority=VERY_HIGH;  

In order to set priority for a HIVE query on Hadoop 2 cluster, you can use the following example:

 hive> SET mapreduce.job.priority=VERY_HIGH;  


If you are facing any problems in changing the priority of the hadoop jobs, then please comment here.

Sunday 8 January 2017

Difference between filter and where in scala spark sql ?

Filters rows using the given condition. This is an alias for filterwhere Docmentation

There is no confusion in Filter and where in spark sql since both gives same result.

// The following are equivalent:
employee.filter($"age" > 15)
employee.where($"age" > 15)
employees.filter($"emp_id".isin(items:_*)).show
employees.where($"emp_id".isin(items:_*)).show

Result is same for the both
 Filter is simply the standard Scala name for such a function, and where is for people who prefer SQL.