------------------------------------------------------------------------------
Inform/Z-machine FP library Release 1/2
User Manual
Kevin Bracey
26th March 2002
------------------------------------------------------------------------------
Introduction
============
The FP library provides IEEE-754 compliant floating-point support to Inform
programmers targetting the Z-machine.
Floating-point numbers are held in static memory. This might be as a static
array, or as an array property of an object. Routines providing floating point
operations are passed pointers to their operands, and a pointer to the
destination. There is no restriction on the destination being the same as
either operand.
Single-precision numbers occupy 4 bytes, or 2 words of memory.
Extended-precision (and packed-decimal) numbers occupy 6 bytes, or 3 words of
memory. So, to allocate storage for a single-precision number, you might use:
Array my_num --> 2; ! Two words of storage (initialised to 0.0)
Array one --> $3F80 $0000; ! =1.0
or
Object xxx
with my_num 0 0,
one $3F80 $0000;
For all FP formats, if memory is initialised to zero, the value of the number
is 0. Another possible initialisation value might be $7F80 $0001, a
signalling "not-a-number" (which will raise an exception if used, indicating
use of an uninitialised variable).
If (as is likely) you don't fancy initialising your variables in raw
hexadecimal, you can use the following routines in your initialisation code:
finit(one, "1"); ! Initialise one from high string
or fitos(one, 1); ! Initialise one from integer
or
finit(xxx.&one, "1");
or fitos(xxx.&one, 1);
Then, to calculate 1 + 1, store the result in my_num, then print it, you would
write:
fadd(my_num, one, one);
print (fp) my_num;
or
fadd(xxx.&my_num, xxx.&one, xxx.&one);
print (fp) xxx.&my_num;
Formats
=======
The library provides two IEEE-754 binary formats, and a packed decimal (BCD)
format used for binary<->decimal conversion.
Single
------
The basic format is the IEEE-defined 32-bit single format. It can be viewed
as an array of 2 Z-machine words thus:
bit 15 14 7 6 0 15 0
+-+-----+-------+ +------------+
|S| Exp | Frac | | Frac |
+-+-----+-------+ +------------+
msb lsb msb lsb
S Exp Frac Meaning
-----------------------------------------------------------
x 0 0 +/- Zero
x 0 non-0 Subnormal number: +/- 0.frac x 2^(-126)
x $FF 0 +/- Infinity
x $FF non-0 NaN (signalling if Frac MSB is clear)
others Normal number: +/- 1.frac x 2^(exp-127)
Single format provides 24 bits of precision and 8 bits of range, roughly
equivalent to at least 6 (very nearly 7) decimal digits of precision with a
range of 1.1x10^(-38) to 3.4x10^38. The smallest subnormal number is
approximately 1.4x10^(-45).
Single extended
---------------
For extra precision and range during intermediate calculations, a form of
IEEE single extended format is provided. It can be viewed as an array of
3 Z-machine words:
bit 15 10 0 15 14 0 15 0
+-+----+--------+ +-+----------+ +-----------+
|S|zero| Exp | |J| Frac | | Frac |
+-+----+--------+ +-+----------+ +-----------+
msb lsb msb lsb msb lsb
S Exp J Frac Meaning
--------------------------------------------------------------------
x 0 0 0 +/- Zero
x 0 0 non-0 Subnormal number: +/- 0.frac x 2^(-1023)
x $7FF 0 0 +/- Infinity
x $7FF 0 non-0 NaN (signalling if Frac MSB is clear)
x x 1 x Normal number (exp<$7FF): +/- 1.frac x 2^(exp-1023)
others illegal
Single extended format provides 32 bits of precision and 11 bits of range,
roughly equivalent to at least 9 decimal digits of precision with a range of
1.1x10^(-308) to 1.8x10^308. The smallest subnormal number is approximately
5.2x10^(-318).
No support for I/O of single extended precision numbers is provided (apart
from hexadecimal output) - they can only be used as intermediate values, or
by manual raw initialisation of constants.
Packed decimal
--------------
Packed decimal format is a raw IEEE-like representation of a number in
decimal. The IEEE-mandated conversions between binary and decimal take the
form of conversions between single and packed decimal format. Formatted
decimal printing and input routines go via packed decimal format.
bit 12 8 4 0 12 8 4 0 12 8 4 0
+--+--+--+--+ +--+--+--+--+ +--+--+--+--+
| S|e0|e1|d0| |d1|d2|d3|d4| |d5|d6|d7|d8|
+--+--+--+--+ +--+--+--+--+ +--+--+--+--+
msb lsb msb lsb msb lsb
ee ddddddddd Meaning
-----------------------------------------------------------------------
0 0 +/- Zero
[0-99] [1-9.99999999] Finite number: +/- d.dddddddd x 10^(+/- ee)
FF 0 +/- Infinity
FF non-zero NaN (signalling if MSB of d0 is clear)
others Illegal
The format is best viewed in terms of nibbles, rather than bits. The top bit
of the S nibble holds the sign of the number, the next highest bit holds the
sign of the exponent (and the lowest two are zero). Each of the remaining
nibbles holds a decimal digit from 0-9. The special value $FF stored in the
exponent indicates a non-finite number.
Packed decimal format can accurately represent any single format number
(including NaNs), so single->packed->single conversion is the identity (if
rounding to nearest). Packed->single conversion, on the other hand, may over-
or underflow, and lose precision.
Floating-point environment
==========================
The FP system has a number of global state variables. These are reflectively
referred to as the "floating-point environment".
The floating-point environment comprises the following settings:
Setting Example manipulation function
---------------------------------------------------------
rounding mode fesetround
exception flags fetestexcept
trap enables feenabletrap
decimal printing mode fesetprintmode
decimal printing precision fesetprintprecision
trap handlers fesettraphandler
To manipulate the current settings use the "fe"-prefixed functions - these are
intended to resemble the FP functions specified initially by the C Numerical
Extensions Group, and later by ISO C (1999).
The entire floating point environment status can be stored by declaring
an FEnv object and using fegetenv or similar.
Please do not manipulate the properties of an FEnv directly - they are subject
to change. Also, all items with names beginning with "_" in the library are
for internal use only.
Rounding
========
By default, all floating-point operations round their result to the nearest
representable value (rounding to even in the halfway case). The rounding mode
can be altered either globally, or on a per-operation basis.
To set the rounding mode, call
fesetround(mode);
where mode is one of
FE_TONEAREST
FE_DOWNWARD
FE_UPWARD
FE_TOWARDZERO
The rounding mode can be read by calling fegetround().
Floating-point operations by default use the globally set rounding mode. To
use a different rounding mode in an individual operation, pass the rounding
mode as an extra trailing argument to the routine (where documented).
Operations
==========
The following section documents the core IEEE-754 operations supplied by the
library.
All operations in the library that take a destination as their first argument
store a value of the same format as their operands, unless stated otherwise.
Operands are always of the same format.
Copy operations
---------------
The following routines are used to copy floating-point values:
fcpy[x] fabs[x] fneg[x]
They are non-floating-point operations, so signal no exceptions, even on
signalling NaNs. fcpy copies a number as is, fabs clears the sign bit, and
fneg reverses the sign bit.
Usage: fcpy(destination, operand);
Return value: None
Exceptions: None
Arithmetic
----------
The five core arithmetic routines are:
fadd[x] fsub[x] fmul[x] fdiv[x] frem[x]
They provide the IEEE-754 add (+), subtract (-), multiply (x), divide (/) and
remainder (%) operations.
Usage: fadd(destination, operand1, operand2 [, rounding] );
Return value: None
Exceptions: Invalid operation (signalling NaN operand, magnitude subtraction
of infinities, 0 x Infinity, Infinity x 0, 0 / 0,
Infinity / Infinity, n % 0, Infinity % n)
Divide by zero (fdiv[x] only)
Overflow
Underflow
Inexact
Square root
-----------
Square root is provided by:
fsqrt[x]
Usage: fsqrt(destination, operand [, rounding]);
Return value: None
Exceptions: Invalid operation (signalling NaN operand, operand < 0)
Inexact
Format conversions
------------------
The following routines convert between the three floating-point formats,
single (s), extended (x) and packed decimal (p):
fstox fxtos fstop fptos
All except fstox can take an optional extra rounding argument.
Usage: fxtos(destination, source [, rounding]);
fstox(destination, source);
Return value: None
Exceptions: Invalid operation (signalling NaN)
Overflow
Underflow
Inexact
Floating-point <-> integer conversions
--------------------------------------
Conversion between floating-point and integer values are provided by:
fitos fitox fstoi fxtoi
Conversions to floating-point are always exact.
Usage: fitos(destination, value);
fstoi(source [, rounding]);
Return value: Integer result (fstoi and fxtoi)
None (fitos and fitox)
Exceptions: Invalid operation (signalling NaN, quiet NaN, infinity,
out-of-range) when converting to integer
Inexact when converting to integer
Rounding to integer value
-------------------------
The following routine rounds to an integral floating-point value:
frnd[x]
It never raises an inexact exception.
Usage: frnd(destination, operand [, rounding]);
Return value: None
Exceptions: Invalid operation (signalling NaN)
Comparison
----------
The primary comparison routines are:
fcmp[x] fcmpe[x].
They return one of FCMP_L, FCMP_G, FCMP_E, or FCMP_U, depending on the
comparison of the first operand to the second; less-than (<), greater-than
(>), equal (=) or unordered (?). In addition fcmpe signals an invalid
operation exception if the operands are unordered.
The return values take the form of disjoint bits, so expressions like
"fcmp(a,b) & (FCMP_L|FCMP_E)" can be used.
Usage: fcmp(operand1, operand2);
Return value: One of FCMP_L, FCMP_G, FCMP_E and FCMP_U.
Exceptions: Invalid operation (signalling NaN operand, or quiet NaN operand
to fcmpe[x])
Wrapper routines providing fourteen of the IEEE predicates are provided (all
with corresponding "x" forms. They return non-zero if and only if the
respective predicate is true.
Routine IEEE Predicate Exception if unordered?
------- -------------- -----------------------
feq[x] = No
fne[x] ?<> No
fgt[x] > Yes
fge[x] >= Yes
flt[x] < Yes
fle[x] <= Yes
funordered[x] ? No
flg[x] <> Yes
fleg[x] <=> Yes
fug[x] ?> No
fuge[x] ?>= No
ful[x] ?< No
fule[x] ?<= No
fue[x] ?= No
Usage: feq(operand1, operand2);
Return value: Non-zero iff the predicate is true
Exceptions: Invalid operation (signalling NaN operand, or quiet NaN operand
to fgt, fge, flt, fle, flg or fleg)
NaNs
====
Signalling and quiet NaNs are distinguished by the most-significant bit of
the fraction (see "Formats" above) - quiet NaNs have the bit set, signalling
clear.
When a NaN is generated, a number giving a reason for the generation is stored
in the lower bits of the fraction, as suggested by IEEE-754. In single format,
the number is aligned at the bottom of the stored fraction. In extended, it
is aligned 8 bits above the bottom of the stored fraction. Single<->extended
conversion of NaNs preserves the reason code, by adding or truncating the
normally unused low-order bits.
In single<->packed decimal conversion of NaNs, the low order 22 bits of the
single number are converted as an integer to decimal, then stored at the
bottom of the decimal fraction, again preserving the reason code.
The reason codes used are as follows:
InvReason_SigNan 0 Operand was a signalling NaN
InvReason_InitNaN 1 A manually initialised NaN
InvReason_MagSubInf 4 Magnitude subtraction of infinities
InvReason_InfTimes0 5 Infinity times zero
InvReason_0TimesInf 6 Zero times infinity
InvReason_0Div0 7 Zero divided by zero
InvReason_InfDivInf 8 Infinity divided by infinity
InvReason_InfRemX 9 Infinity REM x
InvReason_XRem0 10 x REM 0
InvReason_SqrtNeg 11 Square root of number < 0
InvReason_FixQNaN 12 (Conversion of a quiet NaN to integer)
InvReason_FixInf 13 (Conversion of infinity to integer)
InvReason_FixRange 14 (Conversion to integer out of range)
InvReason_CompQNaN 15 (Comparison of quiet NaNs (fcmpe))
Some of these reason codes (those in parentheses) do not normally occur in
NaNs, but may appear as reason codes to invalid operation trap handlers (see
below).
Exceptions
==========
The FP library supports the standard set of five exceptions specified by
IEEE 754.
The exceptions are represented by the five constants
FE_INVALID FE_DIVBYZERO FE_OVERFLOW FE_UNDERFLOW FE_INEXACT
or by bitwise-ORs of them. FE_ALL_EXCEPT is the bitwise-OR of all five
constants.
The environment stores cumulative status flags that are set whenever an
untrapped exception occurs. The flags can only be cleared manually.
The following routines manipulate the exception flags.
feclearexcept(excepts)
----------------------
Clears the specified exception flags.
fesetexcept(excepts)
--------------------
Sets the specified exception flags.
feraiseexcept(excepts)
----------------------
Raises the specified exceptions; this means setting the flags if the exceptions
are untrapped, or calling the trap handlers if they are trapped. Note that
trap handlers called through feraiseexcept (or feupdateenv) do not receive
information about the cause of the original exception.
fetestexcept(excepts)
---------------------
Returns the state of the specified exception flags.
fegetenv(env)
-------------
Stores the current state of the floating-point environment in the FEnv object
env.
feholdexcept(env)
-----------------
Stores the current state of the floating-point environment in the FEnv object
env. It then disables all traps and clears all exception flags.
fesetenv(env)
-------------
Establishes the floating-point environment stored in the FEnv object env.
feupdateenv(env)
----------------
Notes the current exception flag status, installs the floating-point environment
stored in the FEnv object env, and then raises the saved exceptions.
Example: FEnv saved_env;
[ f;
feholdexcept(saved_env);
! Perform calculation, for which we wish to hide
! spurious underflows
if (spurious_underflow)
feclearexcept(FE_UNDERFLOW);
feupdateenv(saved_env);
return result;
];
Traps
=====
Traps can be enabled for each of the five exceptions; by default all traps
are disabled.
The default trap handlers print a diagnostic message, then quit. It is possible
to replace individual trap handlers with your own routine.
The following routines control trap handlers.
fetesttrap(excepts)
-------------------
Returns a bitfield indicating which of the specified traps are enabled.
fedisabletrap(excepts)
----------------------
Disable traps for the specified exceptions.
feenabletrap(excepts)
---------------------
Enable traps for the specified exceptions.
fesettraphandler(routine, excepts)
----------------------------------
Set the trap handler for the exception(s) specified.
fegettraphandler(except)
------------------------
Return the trap handler for the exception specified.
Trap handler calling conventions
--------------------------------
The trap handler is given information about the operation that was being
performed, and either references to its operands or the result.
The overflow, underflow and inexact trap handlers are passed the following
arguments:
dest - pointer to destination, containing the rounded (and biased)
result, in destination format (except FE_FMT_I case)
rounded result (FE_FMT_I case)
fmt - format of the destination (FE_FMT_S, FE_FMT_X or FE_FMT_I)
op - the operation being performed (FE_OP_ADD etc)
round - rounding mode for this operation
ae - (overflow/underflow only) the bias of the result (the correct
result is dest * 2^ae)
The invalid operation and divide by zero trap handlers are passed the following
arguments:
dest - pointer to destination, not filled in (except FE_FMT_I case)
fmt - format of the destination (FE_FMT_S, FE_FMT_X, FE_FMT_P or
FE_FMT_I)
op - the operation being performed (FE_OP_CONV etc)
round - rounding mode for this operation
reason - For invalid operation, a NaNReason_XXX constant giving the
reason for the exception, else zero
OP1 - operand 1 in extended format
OP2 - operand 2 (if any) in extended format
The trap handlers must either quit the program, execute a @throw to return
control back to user code, or fill in dest with an appropriate result, and
return. Trap handlers may also update the cumulative exception flags.
If fmt is FE_FMT_I, the operation that trapped was one that returned an integer
value (such as fstoi or fcmp). In this case, dest is undefined, and the
trap handler should instead return an integer result.
The operands or results passed in may be held in areas of memory liable to be
overwritten by arithmetic operations in the floating point library (but not
non-arithmetic ones such as fcpy, fxtos or hex printing), so the trap handler
should take a copy if necessary.
If exceptions are raised manually (through feraiseexcept or feupdateenv),
the ae, OP1 and OP2 parameters are not supplied, dest is NULL, and fmt
is FE_FMT_N.
The following chart summarises the possible combinations of op and fmt for
each trap handler. It is suggested that your trap handler does not attempt
to continue execution if an unrecognised operation or destination format
occurs.
Operation Invalid DivByZero Overflow Underflow Inexact
-----------------------------------------------------------------------
ADD SX SX SX SX
SUB SX SX SX SX
MUL SX SX SX SX
DIV SX SX SX SX SX
REM SX SX
CMP[E] I
CONV SX S S S
DEC S P P
FIX I I
FLT
RND SX
RAISE N N N N N
SQRT SX SX
Utility functions
=================
The library also supplies a number of utility functions.
Formatted output
----------------
The following printing rules are supplied:
(hex) print an integer as 4-digit hexadecimal (like C's %04X)
(hexdigit) print the least-significant hex digit
(fp) print single as formatted decimal (like C's %F/%G/%E)
(fhex) print single as formatted hexadecimal (like C's %A)
(fhexx) print extended as formatted hexadecimal
(fraw) print single as a raw 32-bit representation
(frawx) print extended or packed as raw 48-bit memory representation
The behaviour of (fp) can be modified by the following routines:
fesetprintmode(mode)
--------------------
Sets the (fp) printing mode to one of FE_PRINT_G, FE_PRINT_F or FE_PRINT_E,
and returns the previous mode.
FE_PRINT_G is the default, and is analogous to C's %G specifier. It prints
significant digits, either in FE_PRINT_E or FE_PRINT_F format,
with trailing zeros suppressed.
FE_PRINT_F prints with a fixed number () of decimal places.
FE_PRINT_E prints in exponent form, as n.nnnnnEnn, with a fixed number
() of decimal places.
fesetprintprecision(precision)
------------------------------
Sets the precision for (fp) printing, returning the previous value. See above
for details.
Formatted input
---------------
To read a floating-point number, eg from user input, the strtof routine can
be used:
strtof(destination, string, length)
String should be a pointer to a ZSCII string, and length should be its
length. The string should not be terminated, or be preceded by a length
byte/word.
The routine returns the number of characters consumed. If 0, the input was
not recognised at all, and destination is set to zero; otherwise destination
is filled in with the single-precision result. If the return value is less
than length, then the input has some trailing text that did not form part of
the number.
Input is case insensitive and ignores leading spaces. Numbers can take the
following forms:
23 23E2 $4 $4P12 NaN
23. 23.E2 $4. $4.P12 Infinity
23.5 23.5E22 $44.E $22.P18 -Infinity
.23 .23E-2 $.EF $.2EP-7 Inf
Enn signifies "x 10^nn", and Pxx signifies "x 2^nn".
Formatted initialisation
------------------------
For initialisation purposes, the finit() function can be used, thus:
finit(destination, "12.8");
This prints its parameter into temporary storage, and calls strtof on it.
Diagnostics are output if the parameter is unrecognised.
Classification
--------------
A variety of routines for testing the property of a number are
supplied. None of these cause exceptions.
isnan[x] returns true if its argument is a NaN
issignalling[x] returns true if its argument is a signalling NaN
isinf[x] returns true if its argument is infinite
isfinite[x] returns true if its argument is finite (not infinite/NaN)
More generally, fpclassify[x] returns a constant indicating the type of its
argument:
FP_NANS signalling NaN
FP_NANQ quiet NaN
FP_ZEROM -0
FP_ZEROP +0
FP_SUBNORMALM negative subnormal number
FP_SUBNORMALP positive subnormal number
FP_NORMALM negative normal number
FP_NORMALP positive normal number
FP_INFINITYM -infinity
FP_INFINITYP +infinity
These constants are mutually exclusive bit values, but should be tested as
flags (in case of further subdivision):
if (fpclassify(num) & FP_NORMALP) ...
Some combinations are predefined:
FP_NAN = FP_NANS | FP_NANQ
FP_ZERO = FP_ZEROM | FP_ZEROP
FP_SUBNORMAL = FP_SUBNORMALM | FP_SUBNORMALP
FP_NORMAL = FP_NORMALM | FP_NORMALP
FP_INFINITY = FP_INFINITYM | FP_INFINITYP
Minima and maxima
-----------------
fmin[x] and fmax[x] return the minimum/maximum of their two operands:
fmin(destination, operand1, operand2);
If one operand (only) is a NaN, the non-NaN argument is returned (as
specified in C99).
Change history
==============
Release 1/2: * fptos and fstop now reported to trap handlers as FE_OP_DEC
* fptos and fstop now trigger signalling NaNs
* fstop now accepts rounding mode parameter
* fegetenv and fesetenv fixed
* Rounding mode now passed to trap handlers
* fsqt renamed to fsqrt
* Removed some stray debugging (font on/off statements)
Notice of copyright
===================
This library is intended to be used under the same conditions as Inform
itself. Specifically:
The FP library is copyright (C) Kevin Bracey 2002.
The FP library may be freely distributed provided that: (a) distributed
copies are not substantially different from those archived by the author, (b)
this and other copyright messages are always retained in full, and (c) no
profit is involved. Exceptions to these conditions may be negotiated directly
with the author.
A story file produced using the FP library belongs to whoever wrote it and may
be sold for profit if so desired, without the need for royalty payment.
The author assumes no liability for errors and omissions in this manual, or
for damages or loss of revenue resulting from the use of the information
contained herein, or the use of any of the software described herein. The FP
library is supplied "as is" and carries no warranty, actual or implied.