PHP: Automatic file renamer script

I have been working on a personal non commercial PHP based programming project in the evenings the last couple of days.

I download quite a lot of files through P2P networks and unpacking archives and renaming files takes a lot of time out of my free time in the evenings/weekends.

So what if you could automate the process of unpacking downloaded archives, renaming files, moving files, putting files in ready to burn folders with the maximum size of 4.5 GB and creating the file list automatically. Sounds like a dream?

Not at all, its reality (almost).

Anyway the almost finished prototype was run yesterday and it was successful in renaming a directory with 44 GB of files in various subdirectories. The process took about 1 second thanks to PHP, if I would do that manually it would easily have taken me 20 minutes.

I thought I would walk you through the process.

A directory is scanned, say “/movies”.

Most movies are released accord to “scene”-standards. Say it is a DivX movie, there are then two folders called “CD1″ and “CD2″ in the individual movie directory.

movie/latest.movie.divx.screener.omg.wtf.lol/CD1

movie/latest.movie.divx.screener.omg.wtf.lol/CD2

Each archive contains a set of RAR packed files. Unpack these files and you might get two “AVI”-files in each “CD” directory. These files may be called something like “prd-bfma.omg.wtf.lol.avi” and “prd-bfmb.omg.wtf.lol.avi”.

movie/latest.movie.divx.screener.omg.wtf.lol/CD1/prd-bfma.avi

movie/latest.movie.divx.screener.omg.wtf.lol/CD2/prd-bfmb.avi

Directories

The first task is renaming the directory. Take a look at the name of the directory of the movie again:

latest.movie.divx.screener.omg.wtf.lol

By analyzing lots of names of scene released movies I found out that the first part of the directory name was always the name of the move, this is the part we want to keep.

So how do you know what to chop of the name?

By analyzing lots of “scene”-releases I could build a small database of words that indicate the beginning part of the directory name we did not want to keep.

Check out these examples:

lord.of.da.rings.screener.we.are.a.cool.rlz.group.lol

terminater.4.divx.dmning.2007.da.bst.rlz.group.omg

wargamez.vcd.complete.repack.rofl

The names are fictional (maybe you noticed that). Anyway, in example one “.screener” is the word that indicates where the nonsense in the directory name begins. In the second example “.divx” is the word that indicates where nonsense text begins. In example three “.vcd” is the indicator for where the text begins we do not want to keep.

So the database of words we want to erase now contains:

.screener

.divx

.vcd

Everytime one of the words is found in a directory name, that directory name gets chopped keeping everything to the left of the word we did not want to keep. I put this process in a loop that goes round and round until no words are detected anymore

So…

lord.of.da.rings.screener.we.are.a.cool.rlz.group.lol -> lord.of.da.rings

terminater.4.divx.dmning.2007.da.bst.rlz.group.omg -> terminater.4

wargamez.vcd.complete.repack.rofl -> wargamez

Directories are now renamed but personally I do not like punctuation in filenames and directories. So I put the array of directory names in another loop and replace every instance of “.” with ” “, result: no punctuations in filenames anymore.

lord.of.da.rings -> lord of da rings

terminater.4 -> terminater 4

wargamez -> wargamez

Files

Lets assume that RAR-archives are already unpacked (there are some RAR-unpackers available that can monitor a directory and automatically unpack files).

So in every movie directory there is a CD1 and CD2 (and CD3, CD4 if there are more than two AVI-files).

For example

movie/ terminater 4/CD1/prd-trmora.avi

movie/ terminater 4/CD2/prd-trmorb.avi

However my RAR-unpacker puts the files like the example below since I do not like files of a release in different sub folders, instead every AVI-file is put directly in the movie folder…

movie/ terminater 4/prd-trmora.avi

movie/ terminater 4/prd-trmorb.avi

See the above example, the directory name is right, that is the way we want filenames to be renamed. By analyzing filenames of movies I found out that if there are more than one AVI-file the last character of the filename before the punctuation is always “a” for CD1, “b” for CD2 etc.

So the first task is to determine which file is CD1 and which file is CD2.

prd-trmora.avi -> prd-trmora -> a -> CD1 ->  terminater 4 CD1 -> terminater 4 CD1.avi

prd-trmorb.avi -> prd-trmorb -> b -> CD2 -> terminater 4 CD2 -> terminater 4 CD2.avi

Check out the flow of the above example. First I stripped out the last four characters (.avi), then I got the last character of the filename (either a or b), then I put that character in an “if else” string to decide which one is CD1 and CD2. Then I built a string out of the name of the directory (terminater 4) + the CD type (CD1 or CD2) + the filetype “.avi”. Finally a quick rename function which renames old string to new string and the filenames are automatically renamed.

Covers and cleaning up

Do not forget that there is two more folders in the movie directory. One is the “sample” directory. This one we can delete because we do not need a sample of the movie when the movie is already downloaded.

Sample

So..

movie/ terminater 4/sample

contains…

movie/ terminater 4/sample/prd.trmor-sample.avi

But we do not need it so we delete this directory and all content in it…

movie/ terminater 4/

Covers

How about covers?

Covers usually resides in the “covers” directory like so..

movie/ terminater 4/covers

The “cover” directory usually contains two files called something like this..

movie/ terminater 4/covers/prd-trmorb.jpg

movie/ terminater 4/covers/prd-trmorf.jpg

Some people like to keep covers, if you have a HTPC usually you can automatically show the picture of files you browse in your movie directory.  I do not have a HTPC but I still like to keep covers.

The first thing to do is to determine which jpg file is front and which is back. That one is very easy to figure out because the first letter before the punctuation in the filename is either “b” or “f”. “b” is back and “f” is front. Renaming these files is a piece of cake then.

Improvements

Currently I have the basic renaming engine up and running. It is coded in PHP and is controlled through a browser. One thing I would really like to do is incorporate automatic file list creator which put in name of the movie in a database so I can keep track of files I already have.

I am also brainstorming about the possibilities of letting a PHP-script take care of moving each file in folders with the maximum size of 4.5-4.6 GB in order to erase yet another tedious manual step.

A final though I had was to build a kind of scraper, a script that automatically every day download relevant .torrent files from a tracker page and puts them in a Bit torrent client. The downloaded .torrent files are compared to files already downloaded through a database. The Bit torrent client then downloads not downloaded files which the RAR-unpacker automatically unpacks when the download finishes. Once the files are unpacked my automatic rename PHP script is run which renames files and put them in directories ready to burn at the same time as the databases are updated with the new entries. The possibilities are endless.

Leave a Reply