Wednesday, 23 January 2019

Delete files older than 30days on HDFS

This post helps in cleanup of HDFS files older than a certain date(30days) using a shell script.
 #!/bin/sh  
 #finding HDFS load time of particular folder  
 today=`date +'%s'`  
 hdfs dfs -ls /file/Path/ | grep "^d" | while read line ; do  
 dir_date=$(echo ${line} | awk '{print $6}')  
 difference=$(( ( ${today} - $(date -d ${dir_date} +%s) ) / ( 24*60*60 ) ))  
 filePath=$(echo ${line} | awk '{print $8}')  
 if [ ${difference} -gt 30 ]; then  
   hdfs dfs -rm -r $filePath  
 fi  
 done  

If you are facing any problems in deleting files, then please comment here.

3 comments:

  1. while reading the date the output is coming in some vierd number

    sh test.sh
    1569308362


    sh test.sh
    1569308239
    test.sh: line 5: hdfs: command not found

    ReplyDelete
  2. I looked at the script and I saw that you only check for directories, not files. Nor do you check for subfolders and files in those.
    And when you delete you don't use the -skiptrash option so if you have space issues they will still be there when you run the script.

    ReplyDelete
  3. wow, great, I was wondering how to cure acne naturally. and found your site by google, learned a lot, now i’m a bit clear. I’ve bookmark your site and also add rss. keep us updated
    self deleting files

    ReplyDelete