About Dupecheck
Dupechecking is a term that refers to checking uploaded files for "dupes" - meaning duplicates. It does this in a simplistic manner, it only compares the uploaded filename to a database of previously uploaded filenames.
Dupecheck in wzdftpd
wzdftpd has a module named 'dupecheck' which takes care of this. It's a part of the default wzdftpd distribution (probably included in official releases >= 0.8.4), but is not enabled per default. To enable it, simply edit your wzd.cfg to have a line like this:
/some/prefix/lib/wzdftpd/modules/libwzd_dupecheck.so = allow
There should be a similar line already present, except it has "deny" instead of "allow".
You can configure the dupecheck module in wzd.cfg, there should be a section like this:
[dupecheck] ## Where should dupecheck keep it's sqlite database? # database = /some/prefix/var/lib/dupelog
Database is per default /some/prefix/var/lib/dupelog, but can be changed. This path is where the dupecheck module stores its log of earlier uploaded files, and is vital to it. The file is a sqlite3 disk database, and if you have sqlite3 installed - you can view the contents of it by typing the following command:
sqlite3 -column -header /some/prefix/var/lib/dupelog 'select * from dupelog'
You MUST make sure the directory your database resides in exists, or the dupecheck script will fail. Run something like:
mkdir -p /some/prefix/var/lib/
TODO
Quite a few features are missing, here's a list of planned or suggested changes:
- Allow for more extensive site dupe syntax
- match filenames
- match in a specific period range
- Strip away the users ftproot from matches, and also only match the part after their ftproot.
- Add configuration options for:
- nocheckdirs
- dirs that are not checked for dupes when uploading files. files uploaded into these dirs are not added to the dupelog either
- retentiontime
- how many days (?) to keep files in the dupelog, to keep the size manageable.
- hidedirs
- dirs in this list will be added (and checked) for dupes, but the path of the file will not be added to the dupelog. this means that searching for the path yields no results, but the filename will still be matched on a dupecheck
- bannedpattern
- a pattern that all uploads (except those in nocheckdirs) will be matched against. if the pattern matches, the upload is denied. this pattern is _ONLY_ matched against the filename, not the path.
- nocheckdirs
- Configurable rights for site commands (dupe = anyone, undupe = +O?)
