Tuesday 17 November 2015

SQOOP is not supporting HIVE external tables at the moment

Sqoop does not support creating Hive external tables. Instead you might:

Step 1: import data from mysql to hive table.

sqoop import --connect jdbc:mysql://localhost/ --username training --password training --table --hive-import --hive-table -m 1 --fields-terminated-by ','

Step 2: In hive change the table type from Managed to External.

Alter table <Table-name> SET TBLPROPERTIES('EXTERNAL'='TRUE')

refer SQOOP-816

Monday 16 November 2015

Shell scripting to check the memory space in unix

This script will check for the memory space usages for the filesystem.

#!/bin/bash
##################################################
# Author: java2bigdata                           #
# Description:  check the memory space in unix   #
##################################################
# Total memory space details

echo "Memory Space Details"
free -t -m | grep "Total" | awk '{ print "Total Memory space : "$2 " MB";
print "Used Memory Space : "$3" MB";
print "Free Memory : "$4" MB";
}'

echo "Swap memory Details"
free -t -m | grep "Swap" | awk '{ print "Total Swap space : "$2 " MB";
print "Used Swap Space : "$3" MB";
print "Free Swap : "$4" MB";

}'

Delete Files Older Than 2 years on Linux Click here
Shell Scripting Routines  Click here



Delete Files Older Than 2 years on Linux

The logic helps in the automatic deletion of unnecessary files that keep on getting accumulated in the server without manual intervention .

//To delete the Cost* folders older than 2years under /usr/test/reports:

find /usr/test/reports -maxdepth 0 -type f -name Cost* -mtime +730 -exec rm '{}' '+'; > log.txt

//The above script deletes all the files as per the mentioned criteria and also creates a log for them and stores it in log.txt We can also use the :/ instead of + .The only difference is :/ deletes each file when found but + deletes all the files at once at the end.

//To delete all the files under /usr/test/reports older than 2 years:

find /usr/test/reports -maxdepth 0 -type f -mtime +730 -exec rm {} \;

//To delete all the files under /usr/test/reports older than 2 years(including sub directories):

find /usr/test/reports -type f -mtime +730 -exec rm {} \;

Note:-The statements mentioned after // are the comments for the script.

shell Scripting Routines Click here


Sunday 15 November 2015

Checking Disk Space using Java program

With this code you can check Disk Space.

/** To Check the Disk Space **/
package in.blogspot.java2bigdata;

import java.io.File;
public class DiskSpaceCheck {
public DiskSpaceCheck() {
File file = new File("E:");
System.out.println("E:");
System.out.println("Total:  " + file.getTotalSpace());
System.out.println("Free:   " + file.getFreeSpace());
System.out.println("Usable: " + file.getUsableSpace());
file = new File("E://movie");
System.out.println("E://movie");
System.out.println("Total:  " + file.getTotalSpace());
System.out.println("Free:   " + file.getFreeSpace());
System.out.println("Usable: " + file.getUsableSpace());
file = new File("/");
System.out.println("n/");
System.out.println("Total:  " + file.getTotalSpace());
System.out.println("Free:   " + file.getFreeSpace());
System.out.println("Usable: " + file.getUsableSpace());
}
public static void main(String[] args) {
new DiskSpaceCheck();
}
}

Java Program to List Contents of Directory in Hadoop (HDFS)

Saturday 14 November 2015

Shell scripting routines

  Most programming languages have a set of "best practices" that should be followed when writing code in that language. Based on my experience i have decided to mention some of generic routines in shell scripting.

Here is my list of best routines for shell scripting :

1. Use functions
2. Document your functions
3. Use shift to read function arguments
4. Declare your variables
5. Quote all parameter expansions
6. Use arrays where appropriate
7. Use "$@" to refer to all arguments
8. Use uppercase variable names for environment variables only
9. Prefer shell builtins over external programs

Basic shell scripting.......

==============================================================================
Variables:
==============================================================================
variable="value"
variable=`unix command`

index=1

==============================================================================
Incrementing a numeric variable:
==============================================================================
index++

==============================================================================
Arrays:
==============================================================================
variable[1]="value"
variable[2]=`unix command`

==============================================================================
Print to console:
==============================================================================
echo "Prining my first line.."
echo $variableName

echo ${variable[${index}]}


==============================================================================
Generating a unique Nubmer/Name:
==============================================================================
date +%Y%m%d%H%M

How to add a new column to a table in Hive

Use the following command to add a new column to a table in Hive

    ALTER TABLE students ADD COLUMNS (new_col INT);

How to copy a table in Hive to another table or create temp table?


If you want to store results of a query in a table in Hive then

    Create a schema of the temp table using command CREATE TABLE ..

    Execute the following command

INSERT OVERWRITE TABLE temp_tablename SELECT * FROM table_name limit 10

How to create output in gzip files in Hadoop Hive ?

How to create output in gzip files in Hadoop Hive

Sometimes its required to output hive results in gzip files  to reduce the file size so that the files can be transferred over network.
To do this, run the following commands in hive before running the query. The following code sets these options and then runs the hive query. The output of this hive query will be stored in gzip files.

set mapred.output.compress=true;
set hive.exec.compress.output=true;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec;
INSERT OVERWRITE DIRECTORY 'hive_out' select * from tables limit 10000;


Java Program to List Contents of Directory in Hadoop (HDFS)

Java Program to List Contents of Directory in Hadoop (HDFS)

How to list out the files and sub directories in the specified directory in Hadoop HDFS using java program?

The following java program prints the contents (files and directories) of a given directory(/user/hadoop) in HDFS:

/*Java Program to Print Contents of HDFS Directory*/
package in.blogspot.java2bigdata;

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileUtil;
import org.apache.hadoop.fs.Path;

public class ListDirectoryContents {
  public static void main(String[] args) throws IOException, URISyntaxException
  {
    //1. Get the Configuration instance
    Configuration configuration = new Configuration();
    //2. Get the instance of the HDFS
    FileSystem hdfs = FileSystem.get(new URI("hdfs://localhost:54310"), configuration);
    //3. Get the metadata of the desired directory
    FileStatus[] fileStatus = hdfs.listStatus(new Path("hdfs://localhost:54310/user/hadoop"));
    //4. Using FileUtil, getting the Paths for all the FileStatus
    Path[] paths = FileUtil.stat2Paths(fileStatus);
    //5. Iterate through the directory and display the files in it
    System.out.println("***** Contents of the Directory *****");
    for(Path path : paths)
    {
      System.out.println(path);
    }
  }
}

In the above program, hdfs://localhost:54310 is the namenode URI.

Java program which can list all files in a given directory

 With this code you can list all files in a given directory:

package in.blogspot.java2bigdata;

import java.io.File;
 
public class ListFiles 
{
 
 public static void main(String[] args) 
{
   // Directory path here
  String path = "."; 
 
  String files;
  File folder = new File(path);
  File[] listOfFiles = folder.listFiles(); 
 
  for (int i = 0; i < listOfFiles.length; i++) 
  {
    if (listOfFiles[i].isFile()) 
   {
   files = listOfFiles[i].getName();
   System.out.println(files);
      }
  }
}
}

If you want to list only .txt files for example, Use this code:

package in.blogspot.java2bigdata;

import java.io.File;
 
public class ListTxtFiles 
{
 
 public static void main(String[] args) 
{
 
  // Directory path here
  String path = "."; 
 
  String files;
  File folder = new File(path);
  File[] listOfFiles = folder.listFiles(); 
 
  for (int i = 0; i < listOfFiles.length; i++) 
  {
 
   if (listOfFiles[i].isFile()) 
   {
   files = listOfFiles[i].getName();
       if (files.endsWith(".txt") || files.endsWith(".TXT"))
       {
          System.out.println(files);
        }
     }
  }
}
}

You can modify the .txt or .TXT to be whatever file extension you wish.