[Discuss] fdupes opposite?

Patrick NixNoob-sneaking at sneakEmail.com
Wed Mar 12 14:54:17 PDT 2008


On Wed, 12 Mar 2008 02:20:20 -0700
Murray Strome wrote:

> Does anyone know of a utility that does the opposite of fdupes?
> 
> fdupes - finds duplicate files in a given set of directories
> 
> This has several options, including searching through subdirectories. It 
> uses md5sum and finds duplicates even though the names may be different 
> (see man fdupes for details).
> 
> I am looking for a utility that would be similar, except that instead of 
> finding and listing those files which are duplicated in a directory 
> tree, it would find those that are unique and list those.
> 
> Does anyone know of a utility that does this?

Umm... no.

I tried fdupes and found that it wouldn't do what I wanted, so I
wrote something that did.  See below.

WARNING: This `dropDupes' script does exactly that.  It *deletes*
duplicated files.  That's what I wrote it for.  But you could
tweak it to simply echo the pathnames of files that don't match
any other files.

Caveat: Only accepts one pathname on the command line, so you'd
probably have to do a little more tweaking, if you want it to
compare two directories that aren't under the same directory.

#!/bin/bash

ifs_old=$IFS
ifs_new='
'
myName="$(basename "$0")"
function ech1 () { echo $1; }
function ech2 () { echo $2; }

if [ -d "`pwd`/$1" ]
then
	d1re="`pwd`/$1"

elif [ -d "/$1" ]
then
	d1re="$1"
else
	echo "Invalid directory name."
	exit 1
fi
d1re="`dirname $d1re`/`basename $d1re`"
echo "$myName;  Scanning $d1re/"

md5db="$d1re"/.md5db
dupez="$d1re"/.dupesDropped
direz="$d1re"/.dirsDropped
direm="$d1re"/.dirsMatched

cp $dupez "$dupez".tmp
cp $direz "$direz".tmp
cp $direm "$direm".tmp

rm "$md5db" && \
echo "
$myName $d1re;  Cleared stale database."
echo "
$myName $d1re;  Rewriting database..."

IFS=$ifs_new
for phile in `find $d1re -type f`
do
	if [ "$(basename $phile | grep '^\.')" = "" ]
	#  Skip dotfiles.
	then
		md5sum $phile >> "$md5db"
	else
		echo "dropDupes;  Ignoring $phile"
	fi
done

echo "
$myName $d1re;  Built database.

$myName $d1re;  Removing duplicate files..."

lastSome=""
lastFile=""

for phile in `cat "$md5db" | sort -u`
do
	IFS=$ifs_old

	nextSome=`ech1 $phile`
	nextFile=`ech2 $phile`
	
	if [ "$nextSome" = "$lastSome" ]
	then
		rm -f "$nextFile" & echo -n '.' & \
		echo "$lastFile = $nextFile" >> "$dupez".tmp & \
		echo "`dirname $lastFile` = `dirname $nextFile`" >> "$direm".tmp
	else
		lastSome=$nextSome
		lastFile=$nextFile
	fi

	IFS=$ifs_new
done

echo "

$myName $d1re;  Removed duplicate files.

$myName $d1re;  Removing empty directories (if any)..."

for dire in `find $d1re -type d | sort -r`
do
	if [ "`ls $dire`" = "" ]
	then
		rmdir $dire & echo -n '.' & \
		echo "$dire" >> "$direz".tmp
	fi
done

cat "$dupez".tmp  |  sort -u > "$dupez"  &&  rm "$dupez".tmp
cat "$direz".tmp  |  sort -u > "$direz"  &&  rm "$direz".tmp
cat "$direm".tmp  |  sort -u > "$direm"  &&  rm "$direm".tmp


echo "

$myName $d1re;  Done."

#  End.


> 
> Thanks.
> 
> Murray

No problem,

Patrick.

-- 
Alas, I am dying beyond my means.
		-- Oscar Wilde [as he sipped champagne on his
		   deathbed]


More information about the Discuss mailing list