------------------------------------------------------------------------------
                       Inform/Z-machine FP library                 Release 1/2

                               User Manual

                               Kevin Bracey
                             26th March 2002
------------------------------------------------------------------------------
                   

Introduction
============
The FP library provides IEEE-754 compliant floating-point support to Inform
programmers targetting the Z-machine.

Floating-point numbers are held in static memory. This might be as a static
array, or as an array property of an object. Routines providing floating point
operations are passed pointers to their operands, and a pointer to the
destination. There is no restriction on the destination being the same as
either operand.

Single-precision numbers occupy 4 bytes, or 2 words of memory.
Extended-precision (and packed-decimal) numbers occupy 6 bytes, or 3 words of
memory. So, to allocate storage for a single-precision number, you might use:

    Array my_num --> 2;           ! Two words of storage (initialised to 0.0)
    Array one    --> $3F80 $0000; ! =1.0
    
 or
 
    Object xxx
      with my_num 0 0,
           one    $3F80 $0000;

For all FP formats, if memory is initialised to zero, the value of the number
is 0. Another possible initialisation value might be $7F80 $0001, a
signalling "not-a-number" (which will raise an exception if used, indicating
use of an uninitialised variable).

If (as is likely) you don't fancy initialising your variables in raw
hexadecimal, you can use the following routines in your initialisation code:

   finit(one, "1");   ! Initialise one from high string
or fitos(one, 1);     ! Initialise one from integer

or

   finit(xxx.&one, "1");
or fitos(xxx.&one, 1);

Then, to calculate 1 + 1, store the result in my_num, then print it, you would
write:

    fadd(my_num, one, one);
    print (fp) my_num;
    
 or

    fadd(xxx.&my_num, xxx.&one, xxx.&one);
    print (fp) xxx.&my_num;


Formats
=======
The library provides two IEEE-754 binary formats, and a packed decimal (BCD)
format used for binary<->decimal conversion.

Single
------
The basic format is the IEEE-defined 32-bit single format. It can be viewed
as an array of 2 Z-machine words thus:


      bit 15 14  7 6      0  15           0
          +-+-----+-------+  +------------+
          |S| Exp |  Frac |  |    Frac    |
          +-+-----+-------+  +------------+
          msb           lsb  msb        lsb
          
    S Exp Frac      Meaning
    -----------------------------------------------------------
    x  0  0         +/- Zero
    x  0  non-0     Subnormal number:  +/- 0.frac x 2^(-126)
    x $FF 0         +/- Infinity
    x $FF non-0     NaN (signalling if Frac MSB is clear)
     others         Normal number:     +/- 1.frac x 2^(exp-127)

Single format provides 24 bits of precision and 8 bits of range, roughly
equivalent to at least 6 (very nearly 7) decimal digits of precision with a
range of 1.1x10^(-38) to 3.4x10^38. The smallest subnormal number is
approximately 1.4x10^(-45).

Single extended
---------------
For extra precision and range during intermediate calculations, a form of
IEEE single extended format is provided. It can be viewed as an array of
3 Z-machine words:

      bit 15      10      0  15 14        0  15          0
          +-+----+--------+  +-+----------+  +-----------+
          |S|zero|  Exp   |  |J|  Frac    |  |   Frac    |
          +-+----+--------+  +-+----------+  +-----------+
          msb           lsb  msb        lsb  msb       lsb

   S  Exp J Frac    Meaning
   --------------------------------------------------------------------
   x   0  0 0       +/- Zero
   x   0  0 non-0   Subnormal number:         +/- 0.frac x 2^(-1023)
   x $7FF 0 0       +/- Infinity
   x $7FF 0 non-0   NaN (signalling if Frac MSB is clear)
   x   x  1 x       Normal number (exp<$7FF): +/- 1.frac x 2^(exp-1023)
     others         illegal
          
Single extended format provides 32 bits of precision and 11 bits of range,
roughly equivalent to at least 9 decimal digits of precision with a range of
1.1x10^(-308) to 1.8x10^308. The smallest subnormal number is approximately
5.2x10^(-318).

No support for I/O of single extended precision numbers is provided (apart
from hexadecimal output) - they can only be used as intermediate values, or
by manual raw initialisation of constants. 

Packed decimal
--------------
Packed decimal format is a raw IEEE-like representation of a number in
decimal. The IEEE-mandated conversions between binary and decimal take the
form of conversions between single and packed decimal format. Formatted
decimal printing and input routines go via packed decimal format.

      bit   12  8  4  0    12  8  4  0    12  8  4  0
          +--+--+--+--+  +--+--+--+--+  +--+--+--+--+ 
          | S|e0|e1|d0|  |d1|d2|d3|d4|  |d5|d6|d7|d8| 
          +--+--+--+--+  +--+--+--+--+  +--+--+--+--+ 
          msb       lsb  msb       lsb  msb       lsb
          
      ee       ddddddddd       Meaning
    -----------------------------------------------------------------------
       0           0           +/- Zero
    [0-99]  [1-9.99999999]     Finite number:  +/- d.dddddddd x 10^(+/- ee)
      FF           0           +/- Infinity
      FF        non-zero       NaN (signalling if MSB of d0 is clear)
         others                Illegal

The format is best viewed in terms of nibbles, rather than bits. The top bit
of the S nibble holds the sign of the number, the next highest bit holds the
sign of the exponent (and the lowest two are zero). Each of the remaining
nibbles holds a decimal digit from 0-9. The special value $FF stored in the
exponent indicates a non-finite number.

Packed decimal format can accurately represent any single format number
(including NaNs), so single->packed->single conversion is the identity (if
rounding to nearest). Packed->single conversion, on the other hand, may over-
or underflow, and lose precision.


Floating-point environment
==========================
The FP system has a number of global state variables. These are reflectively
referred to as the "floating-point environment".

The floating-point environment comprises the following settings:

       Setting                     Example manipulation function
       ---------------------------------------------------------
       rounding mode               fesetround
       exception flags             fetestexcept
       trap enables                feenabletrap
       decimal printing mode       fesetprintmode
       decimal printing precision  fesetprintprecision
       trap handlers               fesettraphandler

To manipulate the current settings use the "fe"-prefixed functions - these are
intended to resemble the FP functions specified initially by the C Numerical
Extensions Group, and later by ISO C (1999).

The entire floating point environment status can be stored by declaring
an FEnv object and using fegetenv or similar.

Please do not manipulate the properties of an FEnv directly - they are subject
to change. Also, all items with names beginning with "_" in the library are
for internal use only.


Rounding
========
By default, all floating-point operations round their result to the nearest
representable value (rounding to even in the halfway case). The rounding mode
can be altered either globally, or on a per-operation basis.

To set the rounding mode, call

    fesetround(mode);
     
where mode is one of

    FE_TONEAREST
    FE_DOWNWARD
    FE_UPWARD
    FE_TOWARDZERO

The rounding mode can be read by calling fegetround().

Floating-point operations by default use the globally set rounding mode. To
use a different rounding mode in an individual operation, pass the rounding
mode as an extra trailing argument to the routine (where documented).


Operations
==========
The following section documents the core IEEE-754 operations supplied by the
library.

All operations in the library that take a destination as their first argument
store a value of the same format as their operands, unless stated otherwise.
Operands are always of the same format.


Copy operations
---------------

The following routines are used to copy floating-point values:

    fcpy[x]    fabs[x]     fneg[x]

They are non-floating-point operations, so signal no exceptions, even on
signalling NaNs. fcpy copies a number as is, fabs clears the sign bit, and
fneg reverses the sign bit.

Usage:        fcpy(destination, operand);

Return value: None

Exceptions:   None


Arithmetic
----------

The five core arithmetic routines are:

    fadd[x]    fsub[x]     fmul[x]     fdiv[x]      frem[x]

They provide the IEEE-754 add (+), subtract (-), multiply (x), divide (/) and
remainder (%) operations.

Usage:        fadd(destination, operand1, operand2 [, rounding] );

Return value: None

Exceptions:   Invalid operation (signalling NaN operand, magnitude subtraction
              of infinities, 0 x Infinity, Infinity x 0, 0 / 0,
              Infinity / Infinity, n % 0, Infinity % n)
              Divide by zero (fdiv[x] only)
              Overflow
              Underflow
              Inexact


Square root
-----------

Square root is provided by:

    fsqrt[x]

Usage:        fsqrt(destination, operand [, rounding]);

Return value: None

Exceptions:   Invalid operation (signalling NaN operand, operand < 0)
              Inexact


Format conversions
------------------

The following routines convert between the three floating-point formats,
single (s), extended (x) and packed decimal (p):

    fstox      fxtos       fstop       fptos

All except fstox can take an optional extra rounding argument.

Usage:        fxtos(destination, source [, rounding]);
              fstox(destination, source);

Return value: None

Exceptions:   Invalid operation (signalling NaN)
              Overflow
              Underflow
              Inexact


Floating-point <-> integer conversions
--------------------------------------

Conversion between floating-point and integer values are provided by:

    fitos      fitox       fstoi       fxtoi

Conversions to floating-point are always exact.

Usage:        fitos(destination, value);
              fstoi(source [, rounding]);              

Return value: Integer result (fstoi and fxtoi)
              None (fitos and fitox)

Exceptions:   Invalid operation (signalling NaN, quiet NaN, infinity,
              out-of-range) when converting to integer
              Inexact when converting to integer


Rounding to integer value               
-------------------------

The following routine rounds to an integral floating-point value:

    frnd[x]

It never raises an inexact exception.

Usage:        frnd(destination, operand [, rounding]);

Return value: None

Exceptions:   Invalid operation (signalling NaN)


Comparison
----------

The primary comparison routines are:

    fcmp[x]    fcmpe[x].

They return one of FCMP_L, FCMP_G, FCMP_E, or FCMP_U, depending on the
comparison of the first operand to the second; less-than (<), greater-than
(>), equal (=) or unordered (?). In addition fcmpe signals an invalid
operation exception if the operands are unordered.

The return values take the form of disjoint bits, so expressions like
"fcmp(a,b) & (FCMP_L|FCMP_E)" can be used.

Usage:        fcmp(operand1, operand2);

Return value: One of FCMP_L, FCMP_G, FCMP_E and FCMP_U.

Exceptions:   Invalid operation (signalling NaN operand, or quiet NaN operand
              to fcmpe[x])

Wrapper routines providing fourteen of the IEEE predicates are provided (all
with corresponding "x" forms. They return non-zero if and only if the
respective predicate is true.

  Routine     IEEE Predicate     Exception if unordered?
  -------     --------------     -----------------------
   feq[x]           =                   No
   fne[x]          ?<>                  No
   fgt[x]           >                   Yes
   fge[x]           >=                  Yes
   flt[x]           <                   Yes
   fle[x]           <=                  Yes
   funordered[x]    ?                   No
   flg[x]           <>                  Yes
   fleg[x]         <=>                  Yes
   fug[x]           ?>                  No
   fuge[x]         ?>=                  No
   ful[x]           ?<                  No
   fule[x]         ?<=                  No
   fue[x]           ?=                  No

Usage:        feq(operand1, operand2);

Return value: Non-zero iff the predicate is true

Exceptions:   Invalid operation (signalling NaN operand, or quiet NaN operand
              to fgt, fge, flt, fle, flg or fleg)


NaNs
====
Signalling and quiet NaNs are distinguished by the most-significant bit of
the fraction (see "Formats" above) - quiet NaNs have the bit set, signalling
clear.

When a NaN is generated, a number giving a reason for the generation is stored
in the lower bits of the fraction, as suggested by IEEE-754. In single format,
the number is aligned at the bottom of the stored fraction. In extended, it
is aligned 8 bits above the bottom of the stored fraction. Single<->extended
conversion of NaNs preserves the reason code, by adding or truncating the
normally unused low-order bits.

In single<->packed decimal conversion of NaNs, the low order 22 bits of the
single number are converted as an integer to decimal, then stored at the
bottom of the decimal fraction, again preserving the reason code.

The reason codes used are as follows:

       InvReason_SigNan    0       Operand was a signalling NaN
       InvReason_InitNaN   1       A manually initialised NaN
       InvReason_MagSubInf 4       Magnitude subtraction of infinities
       InvReason_InfTimes0 5       Infinity times zero
       InvReason_0TimesInf 6       Zero times infinity
       InvReason_0Div0     7       Zero divided by zero
       InvReason_InfDivInf 8       Infinity divided by infinity
       InvReason_InfRemX   9       Infinity REM x
       InvReason_XRem0     10      x REM 0
       InvReason_SqrtNeg   11      Square root of number < 0
       InvReason_FixQNaN   12      (Conversion of a quiet NaN to integer)
       InvReason_FixInf    13      (Conversion of infinity to integer)
       InvReason_FixRange  14      (Conversion to integer out of range)
       InvReason_CompQNaN  15      (Comparison of quiet NaNs (fcmpe))

Some of these reason codes (those in parentheses) do not normally occur in
NaNs, but may appear as reason codes to invalid operation trap handlers (see
below).


Exceptions
==========
The FP library supports the standard set of five exceptions specified by
IEEE 754.

The exceptions are represented by the five constants

     FE_INVALID  FE_DIVBYZERO  FE_OVERFLOW  FE_UNDERFLOW  FE_INEXACT

or by bitwise-ORs of them. FE_ALL_EXCEPT is the bitwise-OR of all five
constants.

The environment stores cumulative status flags that are set whenever an
untrapped exception occurs. The flags can only be cleared manually.

The following routines manipulate the exception flags.

feclearexcept(excepts)
----------------------
Clears the specified exception flags.

fesetexcept(excepts)
--------------------
Sets the specified exception flags.

feraiseexcept(excepts)
----------------------
Raises the specified exceptions; this means setting the flags if the exceptions
are untrapped, or calling the trap handlers if they are trapped. Note that
trap handlers called through feraiseexcept (or feupdateenv) do not receive
information about the cause of the original exception.

fetestexcept(excepts)
---------------------
Returns the state of the specified exception flags.

fegetenv(env)
-------------
Stores the current state of the floating-point environment in the FEnv object
env.

feholdexcept(env)
-----------------
Stores the current state of the floating-point environment in the FEnv object
env. It then disables all traps and clears all exception flags.

fesetenv(env)
-------------
Establishes the floating-point environment stored in the FEnv object env.

feupdateenv(env)
----------------
Notes the current exception flag status, installs the floating-point environment
stored in the FEnv object env, and then raises the saved exceptions.

Example:           FEnv saved_env;
                   [ f;
                     feholdexcept(saved_env);
                     ! Perform calculation, for which we wish to hide
                     ! spurious underflows
                     if (spurious_underflow)
                       feclearexcept(FE_UNDERFLOW);
                     feupdateenv(saved_env);
                     return result;
                   ];


Traps
=====
Traps can be enabled for each of the five exceptions; by default all traps
are disabled.

The default trap handlers print a diagnostic message, then quit. It is possible
to replace individual trap handlers with your own routine.

The following routines control trap handlers.

fetesttrap(excepts)
-------------------
Returns a bitfield indicating which of the specified traps are enabled.

fedisabletrap(excepts)
----------------------
Disable traps for the specified exceptions.

feenabletrap(excepts)
---------------------
Enable traps for the specified exceptions.

fesettraphandler(routine, excepts)
----------------------------------
Set the trap handler for the exception(s) specified.

fegettraphandler(except)
------------------------
Return the trap handler for the exception specified.


Trap handler calling conventions
--------------------------------
The trap handler is given information about the operation that was being
performed, and either references to its operands or the result.

The overflow, underflow and inexact trap handlers are passed the following
arguments:

    dest  -  pointer to destination, containing the rounded (and biased)
             result, in destination format (except FE_FMT_I case)
             rounded result (FE_FMT_I case)
    fmt   -  format of the destination (FE_FMT_S, FE_FMT_X or FE_FMT_I)
    op    -  the operation being performed (FE_OP_ADD etc)
    round -  rounding mode for this operation
    ae    -  (overflow/underflow only) the bias of the result (the correct
             result is dest * 2^ae)

The invalid operation and divide by zero trap handlers are passed the following
arguments:

    dest   -  pointer to destination, not filled in (except FE_FMT_I case)
    fmt    -  format of the destination (FE_FMT_S, FE_FMT_X, FE_FMT_P or
              FE_FMT_I)
    op     -  the operation being performed (FE_OP_CONV etc)
    round  -  rounding mode for this operation
    reason -  For invalid operation, a NaNReason_XXX constant giving the
              reason for the exception, else zero
    OP1    -  operand 1 in extended format
    OP2    -  operand 2 (if any) in extended format
    
The trap handlers must either quit the program, execute a @throw to return
control back to user code, or fill in dest with an appropriate result, and
return. Trap handlers may also update the cumulative exception flags.

If fmt is FE_FMT_I, the operation that trapped was one that returned an integer
value (such as fstoi or fcmp). In this case, dest is undefined, and the
trap handler should instead return an integer result.

The operands or results passed in may be held in areas of memory liable to be
overwritten by arithmetic operations in the floating point library (but not
non-arithmetic ones such as fcpy, fxtos or hex printing), so the trap handler
should take a copy if necessary.

If exceptions are raised manually (through feraiseexcept or feupdateenv),
the ae, OP1 and OP2 parameters are not supplied, dest is NULL, and fmt
is FE_FMT_N.

The following chart summarises the possible combinations of op and fmt for
each trap handler. It is suggested that your trap handler does not attempt
to continue execution if an unrecognised operation or destination format
occurs.

Operation        Invalid    DivByZero   Overflow    Underflow   Inexact
-----------------------------------------------------------------------
ADD               SX                     SX          SX          SX
SUB               SX                     SX          SX          SX
MUL               SX                     SX          SX          SX
DIV               SX         SX          SX          SX          SX
REM               SX                                 SX
CMP[E]               I
CONV              SX                     S           S           S
DEC               S P                                              P
FIX                  I                                              I
FLT
RND               SX
RAISE                 N          N           N           N           N
SQRT              SX                                             SX


Utility functions
=================
The library also supplies a number of utility functions.

Formatted output
----------------
The following printing rules are supplied:

     (hex)          print an integer as 4-digit hexadecimal (like C's %04X)
     (hexdigit)     print the least-significant hex digit
     (fp)           print single as formatted decimal (like C's %F/%G/%E)
     (fhex)         print single as formatted hexadecimal (like C's %A)
     (fhexx)        print extended as formatted hexadecimal
     (fraw)         print single as a raw 32-bit representation
     (frawx)        print extended or packed as raw 48-bit memory representation

The behaviour of (fp) can be modified by the following routines:

fesetprintmode(mode)
--------------------
Sets the (fp) printing mode to one of FE_PRINT_G, FE_PRINT_F or FE_PRINT_E,
and returns the previous mode.

FE_PRINT_G is the default, and is analogous to C's %G specifier. It prints
<precision> significant digits, either in FE_PRINT_E or FE_PRINT_F format,
with trailing zeros suppressed.

FE_PRINT_F prints with a fixed number (<precision>) of decimal places.

FE_PRINT_E prints in exponent form, as n.nnnnnEnn, with a fixed number
(<precision>) of decimal places.

fesetprintprecision(precision)
------------------------------
Sets the precision for (fp) printing, returning the previous value. See above
for details.

Formatted input
---------------
To read a floating-point number, eg from user input, the strtof routine can
be used:

          strtof(destination, string, length)
          
String should be a pointer to a ZSCII string, and length should be its
length. The string should not be terminated, or be preceded by a length
byte/word.

The routine returns the number of characters consumed. If 0, the input was
not recognised at all, and destination is set to zero; otherwise destination
is filled in with the single-precision result. If the return value is less
than length, then the input has some trailing text that did not form part of
the number.

Input is case insensitive and ignores leading spaces. Numbers can take the
following forms:

       23       23E2      $4      $4P12       NaN
       23.      23.E2     $4.     $4.P12      Infinity
       23.5     23.5E22   $44.E   $22.P18     -Infinity
       .23      .23E-2    $.EF    $.2EP-7     Inf

Enn signifies "x 10^nn", and Pxx signifies "x 2^nn".

Formatted initialisation
------------------------
For initialisation purposes, the finit() function can be used, thus:

         finit(destination, "12.8");

This prints its parameter into temporary storage, and calls strtof on it.
Diagnostics are output if the parameter is unrecognised.

Classification
--------------
A variety of routines for testing the property of a number are
supplied. None of these cause exceptions.

   isnan[x]          returns true if its argument is a NaN
   issignalling[x]   returns true if its argument is a signalling NaN
   isinf[x]          returns true if its argument is infinite
   isfinite[x]       returns true if its argument is finite (not infinite/NaN)

More generally, fpclassify[x] returns a constant indicating the type of its
argument:

   FP_NANS           signalling NaN
   FP_NANQ           quiet NaN
   FP_ZEROM          -0
   FP_ZEROP          +0
   FP_SUBNORMALM     negative subnormal number
   FP_SUBNORMALP     positive subnormal number
   FP_NORMALM        negative normal number
   FP_NORMALP        positive normal number
   FP_INFINITYM      -infinity
   FP_INFINITYP      +infinity
   
These constants are mutually exclusive bit values, but should be tested as
flags (in case of further subdivision):

      if (fpclassify(num) & FP_NORMALP) ...

Some combinations are predefined:

   FP_NAN        = FP_NANS | FP_NANQ
   FP_ZERO       = FP_ZEROM | FP_ZEROP
   FP_SUBNORMAL  = FP_SUBNORMALM | FP_SUBNORMALP
   FP_NORMAL     = FP_NORMALM | FP_NORMALP
   FP_INFINITY   = FP_INFINITYM | FP_INFINITYP
  
Minima and maxima
-----------------
fmin[x] and fmax[x] return the minimum/maximum of their two operands:

   fmin(destination, operand1, operand2);
   
If one operand (only) is a NaN, the non-NaN argument is returned (as
specified in C99).


Change history
==============
Release 1/2: * fptos and fstop now reported to trap handlers as FE_OP_DEC
             * fptos and fstop now trigger signalling NaNs
             * fstop now accepts rounding mode parameter
             * fegetenv and fesetenv fixed
             * Rounding mode now passed to trap handlers
             * fsqt renamed to fsqrt
             * Removed some stray debugging (font on/off statements)


Notice of copyright
===================
This library is intended to be used under the same conditions as Inform
itself. Specifically:

The FP library is copyright (C) Kevin Bracey 2002.

The FP library may be freely distributed provided that: (a) distributed
copies are not substantially different from those archived by the author, (b)
this and other copyright messages are always retained in full, and (c) no
profit is involved. Exceptions to these conditions may be negotiated directly
with the author.

A story file produced using the FP library belongs to whoever wrote it and may
be sold for profit if so desired, without the need for royalty payment.

The author assumes no liability for errors and omissions in this manual, or
for damages or loss of revenue resulting from the use of the information
contained herein, or the use of any of the software described herein. The FP
library is supplied "as is" and carries no warranty, actual or implied.