TIP #237: Arbitrary-Precision Integers for Tcl


TIP:237
Title:Arbitrary-Precision Integers for Tcl
Version:$Revision: 1.19 $
Authors: Kevin B. Kenny <kennykb at acm dot org>
Don Porter <dgp at users dot sf dot net>
State:Final
Type:Project
Tcl-Version:8.5
Vote:Done
Created:Friday, 14 January 2005

Abstract

This TIP adds the capability to perform computations on integer values of arbitrary precision.

Rationale

There have been a number of issues in Tcl 8.4 dealing with the limited range of native integers and wide values. The original ideas of TIP #72, while they have given at least the basics of 64-bit integer support, have also introduced some subtle violations of the doctrine that "everything is a string." Some of these have been insidious bugs - [1] and [2] are illustrative - while others are perhaps not "bugs" in the strictest sense, but are still surprising behaviour.

For instance, it is possible for a script to tell integers from wide integers even when their string representations are equal, by performing any arithmetic operation that overflows:

    % set x 2147483647    % set x [expr { wide(2146483647) }]
    2147483647            2147483647
    % incr x              % incr x
    -2147483648           2147483648

With things as they stand, [3] is nearly unfixable. It causes misconversion of large floating point numbers that look like integers:

    % set x -9223372036854775809
    -9223372036854775809
    % expr {double($x)}
    9.22337203685e+018
    % scan $x %g
    -9.22337203685e+018

The reason here is that the string of digits is first converted to a 64-bit unsigned integer (Tcl_WideUInt). The '-' sign causes the unsigned integer to be interpreted as a signed integer, and the sign reversed. Since interpreting the unsigned integer as signed yields an overflow, the result is positive rather than negative and gives the odd behaviour shown above.

Of course, even if the implementation of 64-bit arithmetic were bug free, there would be uses for arithmetic of higher precision. One example is that several Tcl users have attempted pure-Tcl implementations of RSA and Diffie-Hellman cryptographic algorithms. These algorithms depend on arithmetic of high precision; implementing them efficiently in today's Tcl requires a C extension because the high-precision algorithms implemented in Tcl are simply too slow to be acceptable.

Finally, studying the references for accurate conversion of floating point numbers TIP #132 reveals that input and output conversions both require arbitrary-precision arithmetic in their implementation. The reference implementation supplied with that TIP, alas, is code that is poorly commented and difficult to integrate into Tcl's build system. Reimplementing it according to Tcl's engineering practices means implementing a large part of an arbitrary-precision arithmetic library.

Proposal

This TIP proposes augmenting Tcl with code for processing integers of arbitrary precision. Specifically:

  1. The libtommath library [4] shall be added in a subdirectory of the Tcl source tree, parallel to generic/, compat/, etc. This library implements arithmetic on integers of arbitrary precision. For the rationale behind this library, and some of the precise integration details, see "Choice of Libary" below.

  2. New functions, Tcl_NewBignumObj, Tcl_SetBignumObj, Tcl_GetBignumFromObj and Tcl_GetBignumAndClearObj shall be added to the Tcl library. They shall be specified as follows:

    Tcl_NewBignumObj

    Creates an object containing an integer of arbitrary precision. The value argument gives the integer in the format native to libtommath. Upon return, the value argument is cleared, because its digit array has had its ownership transferred to the Tcl library.

    Tcl_Obj *Tcl_NewBignumObj(mp_int *value)

    Tcl_SetBignumObj

    Changes the value of an object to an integer of arbitrary precision. The value argument gives the integer in the format native to libtommath. As with other Tcl_SetFooObj routines, the Tcl_Obj having its value set must be unshared (copy on write). As with Tcl_NewBignumObj, the value argument is cleared on return and the digit list is owned by Tcl thereafter.

    void Tcl_SetBignumObj(Tcl_Obj *objPtr, mp_int *value)

    Tcl_GetBignumFromObj

    Interprets an object as a large integer, and constructs a copy of that large integer in the value argument. Returns TCL_OK if successful. On failure, stores an error message in the result of interp and returns TCL_ERROR.

    int Tcl_GetBignumFromObj(Tcl_Interp *interp, Tcl_Obj*objPtr, mp_int *value)

    Tcl_GetBignumAndClearObj

    Interprets an object as a large integer, and stores the large integer in the value argument. Returns TCL_OK if successful. On failure, stores an error message in the result of interp and returns TCL_ERROR. Calls Tcl_Panic if the object is shared. The object is reset to the empty state prior to return from this call.

    int Tcl_GetBignumAndClearObj(Tcl_Interp *interp, Tcl_Obj*objPtr, mp_int *value)

    The memory management of these routines deserves some explanation. For performance, it is desirable that copying bignums be avoided as far as possible. For this reason, the digit array stored in the mp_int object will be stored by pointer in the Tcl internal representation. This will result in memory problems unless the mp_int appears to be destroyed after it has been placed in a Tcl object, since further calls to the libtommath functions may reallocate the array.

    Similarly, when retrieving a large integer from an object, if the object retains an internal representation with the mp_int, the mp_int must be copied. Code that intends to overwrite or destroy an unshared object immediately can avoid the copy by using the call that clears the object; this call returns the original mp_int without needing to do any memory management.

  3. The internalRep union in the Tcl_Obj structure shall be augmented with a ptrAndLongRep field. This field shall be a structure comprising a pointer and a long integer. The pointer will designate an array of digits (the dp member of an mp_int structure in libtommath). The long integer will be an encoding comprising:

  4. A new tclBignumType (with type name bignum) shall be added to the internal set of object types; it will have the ptrAndLongRep internal representation, and the usual four conversion routines.

  5. The wideIntRep field in the internalRep union shall remain (lest there be extensions that use it), but all code in the Tcl library that deals with it shall be removed. The routines, Tcl_GetWideIntFromObj, Tcl_SetWideIntObj, and Tcl_NewWideIntObj shall remain as part of the API, but will create objects with either int or bignum internal representations.

  6. The expr command shall be reworked so that all integer arithmetic is performed as if all integers are of arbitrary precision. In practice, numbers between LONG_MIN and LONG_MAX shall be stored in native long integers. The operators shall perform type promotions as needed. The command shall avoid (as far as practicable) performing arbitrary-precision arithmetic when native long ints are presented. Specifically:

  7. The incr and dict incr commands shall work on arbitary-precision values. Specifically, [incr a $n] will behave like [set a [expr { $a + $n }]] with the constraint that $a and $b must both be integers.

  8. The format and scan commands will acquire %lld, %lli, %llo, %llx and %llX specifiers that format their arguments as arbitrary-precision decimal, (decimal/any format), octal, and hexadecimal integers, respectively. The format specifier %llu is invalid and will cause an error. The %llo and %llx specifiers, unlike their native-integer counterparts, will format signed numbers; the result of [format %#llx -12345] will not be 0xffffcfc7, but rather -0x3039. (If an application requires hexadecimal numbers in two's complement notation, it can get them by forcing a number to be positive:

         set x -12345
         set x64bit [expr { $x & ((1<<64) - 1) }]
         format %#llx $x64bit
    

    will yield '0xffffffffffffcfc7'.

  9. User defined math functions will be able to gain access to a bignumValue only if they are created using the techniques described in TIP #232. The Tcl command that implements the user defined math function will be able to receive a bignum Tcl_Obj value just as it can receive any other Tcl_ObjType. The legacy Tcl_Value structure, will not be updated to add a bignum-valued field.

  10. The number parser detailed in TIP #249 will be adopted into the Tcl internals. See TIP #249 for details on the implications.

Integration Details

The libtommath source code shall be extracted from the distributions available at [5] and brought into the CVS repository as a vendor branch (see [6] for a discussion of managing vendor branches. It appears that all the necessary modifications to integrate this source code with Tcl can be made in the single file, tommath.h; it is likely that the tools/ directory in the Tcl source tree will contain a Tcl script to make the modifications automatically when importing a new version. CVS can maintain local modifications effectively, should we find it necessary to patch the other sources.

The chief modification is that all the external symbols in the library will have TclBN prepended to their names, so that they will not give linking conflicts if an extension uses a 'tommath' library not supplied with Tcl - or uses any other library compatible with the Berkeley mp API.

Choice of Library

The libtommath code is chosen from among a fairly large set of possible third-party bignum codes. Among the ones considered were:

GMP

The Gnu GMP library is probably the fastest and most complete of the codes available. Alas, it is licensed under LGPL, which would require all distributors who prepare binary Tcl distributions to include the GMP source code. I chose to avoid any legal entanglements, and avoided GMP for this reason.

The original Berkeley mp library

This library is atrociously slow, and was avoided for that reason.

Gnu Calc

GPL licensed. A non-starter for that reason.

mpexpr

This Tcl extension has been available for many years and is released under the BSD license. (It was the basis for Gnu Calc but predates the fork, and hence is not GPL-contaminated.) It would certainly be a possibility, but the code is not terribly well documented, is slow by today's standards, and still uses string-based Tcl API's. It would be a considerable amount of work to bring it into conformance with today's Tcl core engineering practices.

OpenSSL

The OpenSSL library includes a fast and well-documented bignum library (developed so that OpenSSL can do RSA cryptography). Alas, the code is licensed under the original BSD license agreement, and has several parties added to the dreaded Advertising Clause. The Advertising Clause would present serious difficulties for our distributors, and so OpenSSL is not suitable. (The OpenSSL developers are not amenable to removing the Advertising Clause from the license.)

Several libraries implemented in C++ were dismissed out of hand, because of the deployment issues associated with C++ runtime libraries and static constructors.

The libtommath code is released to the Public Domain. Its author, Tom St. Denis, explicitly and irrevocably authorizes its use for any purpose, without fee, and without attribution. So license incompatibilities aren't going to be an issue. The documentation is wonderful (Tom has written a book of several hundred pages [7] describing the algorithms used), and the code is lucid.

The chief disadvantage to libtommath is that it is a one-person effort, and hence has the risk of becoming orphaned. I consider this risk to be of acceptable severity, because of the quality of the code. The Tcl maintainers could take it on quite readily.

Additional Possibilities

The Tcl library will actually use only about one-third of libtommath to implement the expr command. In particular, the library functions for squaring, fast modular reduction, modular exponentiation, greatest common divisor, least common multiple, Jacobi symbol, modular inverse, primality testing and the solution of linear Diophantine equations are not needed, and shall not be include in the Tcl library to save memory footprint.

Nevertheless, these operations would be extremely useful in an extension, so that programs that don't live with tight memory constraints can do things like fast Diffie-Hellman or RSA cryptography. We should probably consider a 'bignum' package to be bundled with the core distribution that would add appropriate math functions wrapping these operations.

Reference Implementation

A feature complete implementation is present on the CVS branch called 'kennykb-numerics-branch'.

Copyright

Copyright © 2005 by Kevin B. Kenny. All rights reserved.

This document may be distributed subject to the terms and conditions set forth in the Open Publication License, version 1.0 [8].


Powered by Tcl[Index] [History] [HTML Format] [Source Format] [LaTeX Format] [Text Format] [XML Format] [*roff Format (experimental)] [RTF Format (experimental)]

TIP AutoGenerator - written by Donal K. Fellows