Skip to content

Instantly share code, notes, and snippets.

@carlosa8c
Forked from ace-subido/list-old-hdfs-files.sh
Created October 10, 2017 15:43
Show Gist options
  • Save carlosa8c/d228bac6dbf4ad244ac7f61edf2ec983 to your computer and use it in GitHub Desktop.
Save carlosa8c/d228bac6dbf4ad244ac7f61edf2ec983 to your computer and use it in GitHub Desktop.
Script to list/delete old files in an HDFS Directory
#!/bin/bash
usage="Usage: ./list-old-hdfs-files.sh [path] [days]"
if [ ! "$1" ]
then
echo $usage;
exit 1;
fi
if [ ! "$2" ]
then
echo $usage;
exit 1;
fi
now=$(date +%s);
# Loop through files
sudo -u hdfs hdfs dfs -ls $1 | while read f; do
# Get File Date and File Name
file_date=`echo $f | awk '{print $6}'`;
file_name=`echo $f | awk '{print $8}'`;
# Calculate Days Difference
difference=$(( ($now - $(date -d "$file_date" +%s)) / (24 * 60 * 60) ));
if [ $difference -gt $2 ]; then
# Insert delete logic here
echo "This file $file_name is dated $file_date.";
fi
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment