[Discuss] fdupes opposite?
Patrick
NixNoob-sneaking at sneakEmail.com
Wed Mar 12 14:54:17 PDT 2008
On Wed, 12 Mar 2008 02:20:20 -0700
Murray Strome wrote:
> Does anyone know of a utility that does the opposite of fdupes?
>
> fdupes - finds duplicate files in a given set of directories
>
> This has several options, including searching through subdirectories. It
> uses md5sum and finds duplicates even though the names may be different
> (see man fdupes for details).
>
> I am looking for a utility that would be similar, except that instead of
> finding and listing those files which are duplicated in a directory
> tree, it would find those that are unique and list those.
>
> Does anyone know of a utility that does this?
Umm... no.
I tried fdupes and found that it wouldn't do what I wanted, so I
wrote something that did. See below.
WARNING: This `dropDupes' script does exactly that. It *deletes*
duplicated files. That's what I wrote it for. But you could
tweak it to simply echo the pathnames of files that don't match
any other files.
Caveat: Only accepts one pathname on the command line, so you'd
probably have to do a little more tweaking, if you want it to
compare two directories that aren't under the same directory.
#!/bin/bash
ifs_old=$IFS
ifs_new='
'
myName="$(basename "$0")"
function ech1 () { echo $1; }
function ech2 () { echo $2; }
if [ -d "`pwd`/$1" ]
then
d1re="`pwd`/$1"
elif [ -d "/$1" ]
then
d1re="$1"
else
echo "Invalid directory name."
exit 1
fi
d1re="`dirname $d1re`/`basename $d1re`"
echo "$myName; Scanning $d1re/"
md5db="$d1re"/.md5db
dupez="$d1re"/.dupesDropped
direz="$d1re"/.dirsDropped
direm="$d1re"/.dirsMatched
cp $dupez "$dupez".tmp
cp $direz "$direz".tmp
cp $direm "$direm".tmp
rm "$md5db" && \
echo "
$myName $d1re; Cleared stale database."
echo "
$myName $d1re; Rewriting database..."
IFS=$ifs_new
for phile in `find $d1re -type f`
do
if [ "$(basename $phile | grep '^\.')" = "" ]
# Skip dotfiles.
then
md5sum $phile >> "$md5db"
else
echo "dropDupes; Ignoring $phile"
fi
done
echo "
$myName $d1re; Built database.
$myName $d1re; Removing duplicate files..."
lastSome=""
lastFile=""
for phile in `cat "$md5db" | sort -u`
do
IFS=$ifs_old
nextSome=`ech1 $phile`
nextFile=`ech2 $phile`
if [ "$nextSome" = "$lastSome" ]
then
rm -f "$nextFile" & echo -n '.' & \
echo "$lastFile = $nextFile" >> "$dupez".tmp & \
echo "`dirname $lastFile` = `dirname $nextFile`" >> "$direm".tmp
else
lastSome=$nextSome
lastFile=$nextFile
fi
IFS=$ifs_new
done
echo "
$myName $d1re; Removed duplicate files.
$myName $d1re; Removing empty directories (if any)..."
for dire in `find $d1re -type d | sort -r`
do
if [ "`ls $dire`" = "" ]
then
rmdir $dire & echo -n '.' & \
echo "$dire" >> "$direz".tmp
fi
done
cat "$dupez".tmp | sort -u > "$dupez" && rm "$dupez".tmp
cat "$direz".tmp | sort -u > "$direz" && rm "$direz".tmp
cat "$direm".tmp | sort -u > "$direm" && rm "$direm".tmp
echo "
$myName $d1re; Done."
# End.
>
> Thanks.
>
> Murray
No problem,
Patrick.
--
Alas, I am dying beyond my means.
-- Oscar Wilde [as he sipped champagne on his
deathbed]
More information about the Discuss
mailing list