Process ALL the FILES!
There's More Than One Way to Find Files
The very first release of perl 5 included File::Find. It provided a mechanism for searching a file hierarchy for files and, presumably, doing stuff with them. You use it like this:
Like many libraries that have their origins in the early days of Perl 5, its interface can seem a bit weird today. The usual complaint is that the
wanted argument is not actually a test as to whether we want the file. It's a combination of testing for whether we want a file, whether we want to descend into a directory, and doing whatever we want to do.
Oh, and the whole thing works with package variables and used to have a bunch of problems with reentrancy.
Quite a while ago, we got a much simpler API for doing this sort of thing in File::Find::Rule. It let you build up a query that would find the files you wanted, and then you could iterate over them doing stuff. It got a nice separation of "find" and "do," and as a bonus, threw in a lot of nice methods for writing your query quickly.
We'd write the above something like this:
This looks a lot simpler – at least to me. There are a whole heap of extra simple rules, too, and you can add your own. I've used File::Find::Rule happily for years, except for the one case where it becomes completely intolerable. Allow me to demonstrate:
This program takes nearly no time at all to run. It starts looking for files in /, finds something, and the
wanted coderef exits immediately.
File::Find::Rule actually compiles your rules down to use File::Find, but it loses one of File::Find's key properties: it is not lazy. Even if you ask for an iterator, it slurps up all the files, then iterates over that list. If you want to look through millions of files, this is just not going to cut it.
That doesn't mean you need to go back to File::Find, though.
Okay, so it's fast. What is it?
Path::Class::Rule is yet another file finder, with an interface very much like that of File::Find::Rule, with two key differences: it provides Path::Class objects instead of filename strings, and its iterator is actually lazy.
We could write our search as:
There are a bunch of little differences between File::Find::Rule and Path::Class::Rule, but it's worth getting over them so that when you need to take your program and run it against that huge set of files you accidentally let accumulate under /var because you thought that the other guy was taking care of it (I mean, it's pretty much his responsibility, right?)... well, you just want the program to work without having to read the whole filesystem into memory first, right?