This post helps in cleanup of HDFS files older than a certain date(30days) using a shell script.
#!/bin/sh
#finding HDFS load time of particular folder
today=`date +'%s'`
hdfs dfs -ls /file/Path/ | grep "^d" | while read line ; do
dir_date=$(echo ${line} | awk '{print $6}')
difference=$(( ( ${today} - $(date -d ${dir_date} +%s) ) / ( 24*60*60 ) ))
filePath=$(echo ${line} | awk '{print $8}')
if [ ${difference} -gt 30 ]; then
hdfs dfs -rm -r $filePath
fi
done
If you are facing any problems in deleting files, then please comment here.
while reading the date the output is coming in some vierd number
ReplyDeletesh test.sh
1569308362
sh test.sh
1569308239
test.sh: line 5: hdfs: command not found
I looked at the script and I saw that you only check for directories, not files. Nor do you check for subfolders and files in those.
ReplyDeleteAnd when you delete you don't use the -skiptrash option so if you have space issues they will still be there when you run the script.
wow, great, I was wondering how to cure acne naturally. and found your site by google, learned a lot, now i’m a bit clear. I’ve bookmark your site and also add rss. keep us updated
ReplyDeleteself deleting files