Sunday, 24 April 2016
HDFS Commands Reference
In this article, the basic syntax of hadoop file system i.e HDFS has been explained with examples and screen shot. This is very useful for the beginners who are interested to explore the big data world and HDFS is the gate to that world.
Hadoop is open source software [It is a java frame work] which runs on a cluster of commodity hardware machines. It provides both storage [HDFS] and processing [MAP REDUCE] in distributed manner. It has capable of processing huge volume of data that is ranging from Giga bytes to Peta bytes.
HDFS Commands
hdfs dfs
hadood fs:
hdfs dfs/hadoop fs
1. Creating a directory [HDFS]
Syntax:
- hdfs dfs –mkdir <Directory name along with path details >
Example:
- hdfs dfs –mkdir /user/root/hadoop_mahendhar
Screenshot
2. Listing the contains of the hadoop directory
Syntax:
- hdfs dfs –ls < argument like absolute path of the file >
Example:
- hdfs dfs –ls /user/root/hadoop_mahendhar
Screen Shot:
3. Create a file in local file system and put the file in HDFS
Create a file in local system by vi <file_name>, add some texts and save, exit.
Syntax: vi First_hadoop.txt
Putting the normal file to hadoop file system
Syntax:
- hdfs dfs –put <Local file path with file name > <hadoop destination path with file name >
Example:
- hdfs dfs –put First_hadoop.txt /user/root/hadoop_mahendhar
Screen Shot:
4. Moving a normal file to hadoop file system
Syntax:
- hdfs dfs – moveFromLocal <Local file path with file name > <hadoop destination path with file name >
Example:
- hdfs dfs –moveFromLocal /root/Second_hadoop.txt /user/root/hadoop_mahendhar
Screenshot:
Note:
1. Before executing the above command, ensure that the second_hadoop.txt file is created in the Local normal file system.
2. This operation will move the local file, so there is no local copy of the file exist after this operation.
5. For listing all directories and sub-directories recursively
Syntax:
- hdfs dfs –lsr <hadoop directory>
Example:
- hdfs dfs -lsr /user/root/hadoop_mahendhar/
- Note: Create more directory and sub directory to validate this command correctly.
6. Check the size of the file in HDFS
Syntax:
- hdfs dfs – du <File path with file name >
Example:
- hdfs dfs – du /user/root/hadoop_mahendhar/
Screen Shot:
7. Download a file from HDFS to normal file system
Syntax:
- hdfs dfs – get <hadoop file path details with file name > < local file path details with file name>
Example:
- hdfs dfs – get /user/root/hadoop_mahendhar/Second_hadoop.txt /root/local_files/
Screen Shot:
8. Getting a directory of files from HDFS and merge into a single file in normal file system
Syntax:
- hdfs dfs – getmerge <HDFS file directory > < Local file path with file details > < add new line>
Example:
- hdfs dfs – getmerge /user/root/hadoop_mahendhar/ /root/local_files/hadoop_merge_file.txt
Screen Shot:
Note:
1. The add newline is optional and it will just add a new line at the end of the each file.
2. Before this make sure you have created 2-3 files in HDFS so that you can check and validate the file contain with normal file w.r.t size
9. Copying data from one node to another node in HDFS
Syntax:
- hdfs dfs – distcp <node1 file path details > <node 2 file path details >
10. Display the contain of the data
Syntax:
- hdfs dfs – cat < File path details with file name >
Example:
- hdfs dfs –cat /user/root/hadoop_mahendhar/ First_hadoop.txt
Screen Shot:
11. Change the group, owner, permission of the file or directory
Syntax:
- hdfs dfs –chgrp [-R] <New group Name > <File or directory >
- hdfs dfs –chmod [ -R] <Fileor directory>
- hdfs dfs –chown <New Owner name> < File or directory>
12. Copying and Moving files within HDFS
Syntax:
- hdfs dfs –cp <First file path details > < Destination file details >
- hdfs dfs –mv <First file path details > < Destination file details >
13. Empty the hadoop thrash
Syntax:
- hdfs dfs –expunge
Screen Shot:
Apache Hadoop Terms/Abbreviations click here
Saturday, 23 April 2016
Apache Hadoop Abbreviations/Terms
Hadoop Terms
Hadoop -1.0.4.tar.gz Directory Structure click here
HDFS - Hadoop Distributed File System
GFS - Google File System NN - NameNode DN - Data Node SNN - Secondary NameNode JT - Job Tracker TT - Task Tracker HA NN - Highly Available NameNode (or NN HA - NameNode Highly Available) REST - Representational State Transfer HiveQL - Hive SQL HAR - Hadoop Archive ORC - Optimized Row Columnar JSON - Java Script Object Notation CDH - Cloudera’s Distribution Including Apache Hadoop ZKFC - ZooKeeper Failover Controller FUSE - Filesystem In Userspace YARN - Yet Another Resource Negotiator Amazon EC2 - Amazon Elastic Compute Cloud Amazon S3 - Amazon Simple Storage Service WASB - Windows Azure Storage Blobs (WASB) EMR - Elastic MapReduce JAR - Java ARchive RPC - Remote Procedure Call UDFs - user-defined functions ETL - Extract/Transform/Load |
Hadoop versions
This article will help you understand what are all Apache HADOOP versions.
Data Types in Hadoop click here
Hadoop -1.0.4.tar.gz Directory Structure click here
Hadoop 2.7.2 (released on 25 January, 2016) 2.X.X - current stable version Hadoop 2.7.1 (released on 06 July, 2015) Hadoop 2.7.0 (released on 21 April 2015) Hadoop 2.6.4 (released on 11 February, 2016) Hadoop 2.6.3 (released on 17 December, 2015) Hadoop 2.6.2 (released on 28 October, 2015) Hadoop 2.6.1 (released on 23 September, 2015) Hadoop 2.6.0 (released on 18 November, 2014) Hadoop 2.5.2 (released on 19 November, 2014) Hadoop 2.5.1 (released on 12 September, 2014) Hadoop 2.5.0 (released on 11 August, 2014) Hadoop 2.4.1 (released on 30 June, 2014) Hadoop 2.4.0 (released on 07 April, 2014) Hadoop 2.3.0 (released on 20 February, 2014) Hadoop 2.2.0 (released on 15 October, 2013) Hadoop 2.1.1 (released on 23 September, 2013) 2.X.X - beta version Hadoop 2.1.0 (released on 25 August, 2013) Hadoop 2.0.6 (released on 23 August, 2013) 2.X.X - alpha version Hadoop 2.0.5 (released on 6 June, 2013) Hadoop 2.0.4 (released on 25 April, 2013) Hadoop 2.0.3-alpha (released on 14 February, 2013) Hadoop 2.0.2-alpha (released on 9 October, 2012) Hadoop 2.0.1-alpha (released on 26 July, 2012) Hadoop 2.0.0-alpha (released on 23 May, 2012) Hadoop 1.2.1 (released on 1 Aug, 2013) 1.2.1 - Stable version Hadoop 1.2.0 (released on 13 May, 2013) Hadoop 1.1.2 (released on 15 February, 2013) 1.1.X - beta version Hadoop 1.1.1 (released on 1 December, 2012) Hadoop 1.1.0 (released on 13 October, 2012) Hadoop 1.0.4 (released on 12 October, 2012) 1.0.X - stable version Hadoop 1.0.3 (released on 16 May, 2012) Hadoop 1.0.2 (released on 3 Apr, 2012) Hadoop 1.0.1 (released on 10 Mar, 2012) Hadoop 1.0.0 (released on 27 December, 2011) Hadoop 0.23.11(released on 27 June, 2014) Hadoop 0.23.10(released on 11 December, 2013) Hadoop 0.23.9 (released on 8 July, 2013) Hadoop 0.23.8 (released on 5 June, 2013) Hadoop 0.23.7 (released on 18 April, 2013) Hadoop 0.23.6 (released on 7 February, 2013) 0.23.X - simmilar to 2.X.X but missing NN HA Hadoop 0.23.5 (released on 28 November, 2012) Hadoop 0.23.4 (released on 15 October, 2012) Hadoop 0.23.3 (released on 17 September, 2012) Hadoop 0.23.1 (released on 27 Feb, 2012) Hadoop 0.22.0 (released on 10 December, 2011) 0.22.X - does not include security Hadoop 0.23.0 (released on 11 Nov, 2011) Hadoop 0.20.205.0 (released on 17 Oct, 2011) Hadoop 0.20.204.0 (released on 5 Sep, 2011) Hadoop 0.20.203.0 (released on 11 May, 2011) 0.20.203.X - old legacy stable version Hadoop 0.21.0 (released on 23 August, 2010) Hadoop 0.20.2 (released on 26 February, 2010) 0.20.X - old legacy version Hadoop 0.20.1 (released on 14 September, 2009) Hadoop 0.19.2 (released on 23 July, 2009) Hadoop 0.20.0 (released on 22 April, 2009) Hadoop 0.19.1 (released on 24 February, 2009) Hadoop 0.18.3 (released on 29 January, 2009) Hadoop 0.19.0 (released on 21 November, 2008) Hadoop 0.18.2 (released on 3 November, 2008) Hadoop 0.18.1 (released on 17 September, 2008) Hadoop 0.18.0 (released on 22 August, 2008) Hadoop 0.17.2 (released on 19 August, 2008) Hadoop 0.17.1 (released on 23 June, 2008) Hadoop 0.17.0 (released on 20 May, 2008) Hadoop 0.16.4 (released on 5 May, 2008) Hadoop 0.16.3 (released on 16 April, 2008) Hadoop 0.16.2 (released on 2 April, 2008) Hadoop 0.16.1 (released on 13 March, 2008) Hadoop 0.16.0 (released on 7 February, 2008) Hadoop 0.15.3 (released on 18 January, 2008) Hadoop 0.15.2 (released on 2 January, 2008) Hadoop 0.15.1 (released on 27 November, 2007) Hadoop 0.14.4 (released on 26 November, 2007) Hadoop 0.15.0 (released on 29 October 2007) Hadoop 0.14.3 (released on 19 October, 2007) Hadoop 0.14.1 (released on 4 September, 2007) |
Data Types in Hadoop click here
Hadoop -1.0.4.tar.gz Directory Structure click here
Thursday, 21 April 2016
How to use unix tool AWK
This artical will help you
understand how to work with the AWK utility in UNIX. It also gives
the meaning of some of the AWK Built-in Variables
the meaning of some of the AWK Built-in Variables
These few AWK one liners give very basic and
random examples which will help to understand basic about this UNIX
tool.
Meaning of some of the Awk Built-in Variables used below:
NF : Number of fields in current line/record
NR : Ordial number of current line/record
FS : Field Separator (Also -F can be used)
OFS : Output Field Separator (default=blank)
FILENAME : Name of current input file
All of following Awk one liner examples is based on the input
file 'test1.txt’
Continent:Val
AS:12000
AF:9800
AS:12300
NA:3400
OC:12000
AF:500
Scenario
|
Print 'line number' NR and 'Number of fields' NF for
each line
|
Command
|
awk -F ":" '{print NR,NF}' test1.txt
|
Output
|
1 2
|
2 2
|
|
3 2
|
|
4 2
|
|
5 2
|
|
6 2
|
|
7 2
|
|
8 2
|
|
Scenario
|
Print first field, colon delimited
|
Command
|
awk -F ":" '{print $1}' test1.txt
|
Continent
|
|
AS
|
|
AF
|
|
AS
|
|
NA
|
|
OC
|
|
AF
|
|
AS
|
|
Scenario
|
Print first field, colon delimited, but excluding the
'first line' (NR!=1)
|
Command
|
awk -F ":" 'NR!=1 {print $1}' test1.txt
|
Output
|
AS
|
AF
|
|
AS
|
|
Scenario
|
Printing FILENAME, but printing only last instance
using END clause
|
Command
|
awk -F ":" ' END {print FILENAME}' test1.txt
|
Output
|
test1.txt
|
Scenario
|
Printing the last field of the file, same as printing
$2 as there are only 2 fields
|
Command
|
awk -F ":" '{print $NF}' test1.txt
|
Output
|
Val
|
12000
|
|
9800
|
|
12300
|
|
3400
|
|
12000
|
|
500
|
|
1000
|
|
Scenario
|
Matching, printing lines begin with "AS"
|
Command
|
awk -F ":" '/^AS/' test1.txt
|
Output
|
AS:12000
|
AS:12300
|
|
AS:1000
|
|
Scenario
|
Matching, printing lines not begining with
"AS"
|
Command
|
awk -F ":" '!/^AS/' test1.txt
|
Output
|
Continent:Val
|
AF:9800
|
|
NA:3400
|
|
OC:12000
|
|
AF:500
|
|
Scenario
|
Direct matching, first field as "AS"
|
Command
|
awk -F ":" '$1=="AS"' test1.txt
|
Output
|
AS:12000
|
AS:12300
|
|
AS:1000
|
|
Scenario
|
Direct matching, first field as "AS", Print
2nd Column
|
Command
|
awk -F ":" '$1=="AS" {print $2}'
test1.txt
|
Output
|
12000
|
12300
|
|
1000
|
|
Scenario
|
$0 prints the full line, same as {print}
|
Command 1
|
awk -F ":" '$1=="AS" {print $0}'
test1.txt
|
Output
|
AS:12000
|
AS:12300
|
|
AS:1000
|
|
Command 2
|
awk -F ":" '$1=="AS" {print}'
test1.txt
|
Output
|
AS:12000
|
AS:12300
|
|
AS:1000
|
|
Scenario
|
'Or' and 'AND' together
|
Command
|
awk -F ":" '($1=="AS" ||
$1=="OC") && $NF > 11000 {print}' test1.txt
|
Output
|
AS:12000
|
AS:12300
|
|
OC:12000
|
|
Scenario
|
Partial Matching
|
Command
|
awk -F ":" '$1 ~ /A/ {print}' test1.txt
|
Output
|
AS:12000
|
AF:9800
|
|
AS:12300
|
|
NA:3400
|
|
AF:500
|
|
AS:1000
|
|
Scenario
|
Reading from STDOUT
|
Command
|
cat test1.txt | awk -F ":" '!/Continent/
{print $1}' | sort | uniq
|
Output
|
AF
|
AS
|
|
NA
|
|
OC
|
|
Scenario
|
Add value 1000 to the 2nd field, where first field is
"AF" and then print the output file
|
Command
|
awk -F ":" '$1=="AF" {$2+=1000}
{print}' test1.txt
|
Output
|
Continent:Val
|
AS:12000
|
|
AF 10800
|
|
AS:12300
|
|
NA:3400
|
|
OC:12000
|
|
AF 1500
|
|
AS:1000
|
|
Scenario
|
Sum of 2nd fields, exclude first line
|
Command
|
awk -F ":" 'NR!=1 {sum+=$NF} END {print sum}'
test1.txt
|
Output
|
51000
|
Scenario
|
Set 2nd value as 0 where first field is "AS"
|
Command
|
awk -F ":" 'BEGIN {OFS=":"}
$1=="AS" {$2=0} {print}' test1.txt
|
Output
|
Continent:Val
|
AS:0
|
|
AF:9800
|
|
AS:0
|
|
NA:3400
|
|
Subscribe to:
Posts (Atom)