The 2003 Perl Advent Calendar

On the 16th day of Advent my True Language brought to me..

Math::BigInt

Scalars in Perl are highly magical things. Unlike much more statically typed languages like C, they can contain many different types of information. They can hold strings, integers, and floating point numbers and more. You gets so used to them holding whatever you throw at them is can be a big shock when you reach the limitations of your platform.

The problem comes when you run out of space to store the data in the slot. This isn't so much a problem with a string, but when you're storing numbers, there's only so big a number you can store before you start losing accuracy - that is, unless you switch to using a Math::BigInt object.

The Devel::Peek module can be used to inspect the inside of Perl scalars. For example:

#!/usr/bin/perl

# turn on perl's safety features use strict; use warnings;

# load the 'Dump' routine use Devel::Peek qw(Dump);

my $foo = 42; print Dump($foo);

This prints out the rather cryptic looking thing

SV = IV(0x8172b74) at 0x8170efc REFCNT = 1 FLAGS = (PADBUSY,PADMY,IOK,pIOK) IV = 42

The important bit in here is the IV, which contains the number stored within the scalar, and the IOK flag which indicates that that value is correct. In a similar fashion, we can print out a non-integer like so:

my $bar = 2/3; print Dump($bar);

We get a slightly different type of output:

SV = NV(0x817ddf8) at 0x8170efc REFCNT = 1 FLAGS = (PADBUSY,PADMY,NOK,pNOK) NV = 0.666666666666667

The IV type can no longer hold our result...this time the scalar's NV field has been used, and the NOK flag has been set to indicate that we're using that value instead. So, we use IV for integer values and NV for floating point. Let's try something a little more complicated:

$foo = 42; $foo = $foo + 0.4077; print Dump($foo);

We take an integer and add a floating point number to it. What we see is rather odd...

SV = PVNV(0x81adb10) at 0x8170efc REFCNT = 1 FLAGS = (PADBUSY,PADMY,NOK,pNOK) IV = 42 NV = 42.4077 PV = 0

...unless you know how how to read this. Now there's two values in here, the IV (integer value) and the NV (floating point.) The important thing to read is that only the NOK flags is there - meaning that only the NV value is valid and should be used by the system (and anything else, like the old IV, should be ignored.)

The same thing happens when you get a string and add something to it. A normal string like this:

my $baz = "42"; print Dump($baz);

Looks like this:

SV = PV(0x815de80) at 0x8170efc REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x816db20 "42"\0 CUR = 2 LEN = 3

Note we've got a PV value now and a POK flag. As soon as we try and do any maths on it:

my $baz = "42"; $baz = $baz + 1; print Dump($baz);

It loses it's POK flag and updates the IV flag.

SV = PVIV(0x815e290) at 0x8170efc REFCNT = 1 FLAGS = (PADBUSY,PADMY,IOK,pIOK) IV = 43 PV = 0x816db20 "42"\0 CUR = 2 LEN = 3

Where, you ask, am I going with this? Well besides giving you an insight into how Perl works I have a point to make about the size of data you can store in an IV and maintain accuracy. Well we've seen how perl automatically updates it's scalars from an IV to a NV when it can't represent the number as an IV. Let's look at some other cases where this happens:

my $buzz = "100000000000000000"; $buzz = $buzz + 0; print Dump($buzz);

This prints the confusing hodge podge that is:

SV = PVNV(0x81adb08) at 0x8170efc REFCNT = 1 FLAGS = (PADBUSY,PADMY,NOK,pNOK) IV = -1 NV = 1e+17 PV = 0x816de10 "100000000000000000"\0 CUR = 18 LEN = 19

What we can see here is that the string has been converted into a number, but that number was too big to fit inside an IV on my platform. The biggest number you can fit in an IV depends on your platform and the version of perl you're running (With perl 5.8.X and a 64 bit platform it's quite big indeed, but with my lowly 32 bit system it's quite small.) Once a number is bigger than the number of bits that your system has available to represent an integer, the only field that can hold the number is the NV. However this stores it as 1 plus ten to the power of 17 rather than counting each number one by one. The trouble is the representation of the number is approximate. For example, if we do this:

my $buzz = "100000000000000001"; $buzz = $buzz + 0; print $buzz;

We get "1e+17" printed out because the floating point number is unable to accurately store 1.00000000000000001e+17 in it in much the same way as the figure was unable to be stored in the IV value.

This disturbing approximation means that this program, which we think should go on for ever, actually does terminate:

my $c = 0; $c++ while ($c+1 > $c); print "$c\n";

On my computer this eventually will terminate and print

9.00719925474099e+15

(Of course, it will take it about two hundred and thirty years to do so)

So the question is, if we need to use really big numbers, how do we maintain precision?

The solution, as often is the case, is to use a module. In this case
it's the **Math::BigInt** module that creates objects that represent
numbers that can be arbitrary sized scalars. Because of the wonders of
overloading these objects can be manipulated with the standard
mathematical operators

#!/usr/bin/perl

# turn on perl's safety features use strict; use warnings;

# load the extension library use Math::BigInt;

# create some numbers my $op = Math::BigInt->new(4); my $aj = Math::BigInt->new(3);

# work out the hypotenuse my $hy = sqrt($op**2 + $aj**2); print "hy: $hy\n";

The snazzy thing about Math::BigInt is that you can use numbers as large as you want for calculations

my $x = Math::BigInt->new("100_000_000_000_000_000";); my $y = Math::BigInt->new("1"); print $x+$y;

This correctly prints:

100000000000000001

And it it hasn't lost the precision. Isn't **Math::BigInt** wonderfully
simple to use? Of course, the price you pay for all this extra
precision is speed. We can attempt to benchmark the comparative speed
of running normal increment and using increment on a Math::BigInt
object:

#!/usr/bin/perl

use strict; use warnings;

use Benchmark qw(:all); use Math::BigInt;

cmpthese(10, { 'native' => q{my $c = 1; $c++ while $c<10000; print "$c\n"}, 'bigint' => q{my $c = Math::BigInt->new("1"); $c++ while $c<10000; print "$c\n"} });

This completely fails to produce any meaningful results. We simply get this out the other end:

(warning: too few iterations for a reliable count) s/iter bigint native bigint 3.15 -- -100% native 4.00e-03 78700% --

Which indicates that either the native version of the code is so fast compared to the Math::BigInt code that it's impossible to do a meaningful comparison. The other worrying thing is the memory requirements to store a Math::BigInt number compared to a normal scalar:

#!/usr/bin/perl

# turn on perl's safety features use strict; use warnings;

# load the extensions use Devel::Size qw(total_size); use Math::BigInt;

# output print "native is ".total_size(3)."\n"; print "bigint is ".total_size(Math::BigInt->new("3"))."\n";

This prints:

native is 16 bigint is 271

Meaning that **Math::BigInt** objects are 16 times bigger than normal
numbers on my system.

If you want even more precision, and by this I mean floating point
numbers, then you can use **Math::BigFloat** instead. This creates
objects that are like **Math::BigInt** numbers but instead these
numbers can hold floating point numbers:

use Math::BigFloat; my $x = Math::BigFloat->new("0.000000000000000001"); my $y = Math::BigFloat->new("0.0000000000000000001"); print $x + $y;

This, rather predictably, prints out

0.0000000000000000011

Of course, this extra precision comes again at the code of yet another
speed hit and **Math::BigFloat**'s even slower to use that **Math::BigInt**.

One thing you can do to make your code more efficient is to use the
**Math::BigInt::Lite** class. This class uses less storage than a
**Math::BigInt** but can only hold values that an IV would normally
hold. The cunning thing is that the **Math::BigInt::Lite** object will
auto-promote itself to a proper **Math::BigInt** (in much the same way an IV will to a NV) when the
object runs out of space.

use Math::BigInt::Lite; my $num = Math::BigInt::Lite->new("10"); print "num, '$num' isa ".ref($num)."\n"; $num = $num**100; print "num, '$num' isa ".ref($num)."\n";

You can see it's possible to do some very smart things with large numbers very easily with Perl, but only if you're willing to take the massive speed an memory hits. There's no such thing as a free lunch.

Copyright 2000-2004 Mark Fowler, all rights reserved.

This documentation may be distributed under the Academic Free License

Comments/Complaints/Suggestions re this site: webmaster

This documentation may be distributed under the Academic Free License

Comments/Complaints/Suggestions re this site: webmaster