Discussion on:

9
Comments

Join the conversation!

Follow via:
RSS
Email Alert
0 Votes
+ -
it would be very useful if all duplicate files are sorted based on file-size, so that I can delete only those files which occupy more disk space.
also, how about ignoring empty files? some servers use file-locks or /tmp/status_ok for specific requirements, deleting these empty files effect functionality.
0 Votes
+ -
Contributr
Nivas0522,

In my mind, having ten copies of the same document is always absolutely bad regardless of its size, because it slows down backups, file searches and other operations. But if I need to keep a file, I need it regardless of its size.

In other words, in my mind duplicates are a problem to solve by itself, regardless of size and recovering disk space. That's another issue that comes after, and that's why I have never considered adding a sorting function like the one you suggest.
Besides, the reason I ignore empty files is that I only run these scripts on the folders that contain the documents that I create or recover from backups, not in the folders like /tmp that are used by the system to work.
finding duplicates and removing them by simple scripts is always been easy. But practically, we may keep the same file in multiple locations for valid purposes. So, instead of just 'rm'ing the dups, I have always 'soft-linked' them to the original copy. I save huge space by avoiding dups, but still won't potentially break anything.

hth.

0 Votes
+ -
Contributr
lamp19,
I see your point. However, in my experience, the "same file in multiple locations for valid purposes" thing happened to me many times, but only and always in specific, special directories (for example those where I compiled software). I handle those directories in other ways, including revision control systems.

The scripts I explain here, instead, are specifically designed only for all those times and folders (e.g. archives of my articles) in which duplicates are completely useless, and the sooner they disappear the better; and I only use them in such folders. So I agree with you, it's just that in my own experience the "duplicates that have a purposes" and the "duplicates that have no purpose" never end up in the same folders.

Marco
0 Votes
+ -
Symlink dups? How to do that? Can you specify it?? Thanks.
0 Votes
+ -
Fslint works just as easily within the graphical shell, and is included in nearly every distribution.
-1 Votes
+ -
Thanks for the Great Idea I also Face this issue many time so i think you solve it.
Termopane Veka
0 Votes
+ -
I am looking for a duplicate finder tool which has the feature of comparing two files by CRC and removing largest files. Anyone here can give you some suggestion??
I urgently need that kind of duplicate searching tool. Thanks
Hi, thanks for a very useful article.

In your last step, I wonder if there is a reason why you didn't use find -print 0 and xargs to handle the tricky directory names? Something like:

find . -depth -type d -empty -print0 | xargs -0 -n1 rmdir
Keyboard Shortcuts:
Prev
Next
Toggle
Join the conversation
Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]

Join the TechRepublic Community and join the conversation! Signing-up is free and quick, Do it now, we want to hear your opinion.