Perl Advent Calendar 2006-12-08

Olly Olly Oxen Free

by Jerrad Pierce

Sometimes an RDBMS is overkill, even something as simple as SQLite is more of a dependency than you might like. More importantly, what if you only need "pseudorandom" access? There are several alternatives including a DBM or a hand-rolled index. Another option, which may allow you to leave your data file unaltered, is File::SortedSeek. If your data happens to be sorted, like a dictionary, this module can provide speedier access to records than a linear search through the file. It provides three search methods: alphabetic, numeric, and temporal (find_time). The latter being a special case of numeric which can be used to locate records in logfiles. File::SortedSeek also provides functions to access records between two search results and to retrieve the last N records. Also note that Search::Dict, which is part of the core, provides the same functionality as alphabetic and enough hooks that with the proper parameters other File::SortedSeek features could be reproduced.

   1 use File::SortedSeek ':all';
   2 use Search::Dict;
   4 my $DICT;
   5 open($DICT='/usr/dict/words', $DICT) && do {
   6   my $pos;
   8   #Discard the returned position tell() of the closest result: Wasserman
   9   #Optional custom munge routine to normalize case else result is: would
  10   alphabetic($DICT, 'wassail', \&munge);
  12   #And print it; scalar context to avoid reading the whole file
  13   print scalar <$DICT>;
  15   #Try again, it automatically seeks so no need to reset
  16   my $pos = alphabetic($DICT, 'mistletoe', \&munge);
  18   #This time we checked if we got what we wanted, alas :all *isn't*
  19   printf("In %s at byte %i found exactly: %s", $DICT, $pos, scalar <$DICT>)
  20     if File::SortedSeek::was_exact();
  23   #Core distribution alternative
  24   $pos = look($DICT, 'mistletoe');
  25   printf("In %s at byte %i found exactly: %s", $DICT, $pos, scalar <$DICT>)
  26 };
  29 sub munge{
  30   local $_ = shift || return undef;
  31   chomp;
  32   return lc $_
  33 }