------------------------------------------------------------------------------ Inform/Z-machine FP library Release 1/2 User Manual Kevin Bracey 26th March 2002 ------------------------------------------------------------------------------ Introduction ============ The FP library provides IEEE-754 compliant floating-point support to Inform programmers targetting the Z-machine. Floating-point numbers are held in static memory. This might be as a static array, or as an array property of an object. Routines providing floating point operations are passed pointers to their operands, and a pointer to the destination. There is no restriction on the destination being the same as either operand. Single-precision numbers occupy 4 bytes, or 2 words of memory. Extended-precision (and packed-decimal) numbers occupy 6 bytes, or 3 words of memory. So, to allocate storage for a single-precision number, you might use: Array my_num --> 2; ! Two words of storage (initialised to 0.0) Array one --> $3F80 $0000; ! =1.0 or Object xxx with my_num 0 0, one $3F80 $0000; For all FP formats, if memory is initialised to zero, the value of the number is 0. Another possible initialisation value might be $7F80 $0001, a signalling "not-a-number" (which will raise an exception if used, indicating use of an uninitialised variable). If (as is likely) you don't fancy initialising your variables in raw hexadecimal, you can use the following routines in your initialisation code: finit(one, "1"); ! Initialise one from high string or fitos(one, 1); ! Initialise one from integer or finit(xxx.&one, "1"); or fitos(xxx.&one, 1); Then, to calculate 1 + 1, store the result in my_num, then print it, you would write: fadd(my_num, one, one); print (fp) my_num; or fadd(xxx.&my_num, xxx.&one, xxx.&one); print (fp) xxx.&my_num; Formats ======= The library provides two IEEE-754 binary formats, and a packed decimal (BCD) format used for binary<->decimal conversion. Single ------ The basic format is the IEEE-defined 32-bit single format. It can be viewed as an array of 2 Z-machine words thus: bit 15 14 7 6 0 15 0 +-+-----+-------+ +------------+ |S| Exp | Frac | | Frac | +-+-----+-------+ +------------+ msb lsb msb lsb S Exp Frac Meaning ----------------------------------------------------------- x 0 0 +/- Zero x 0 non-0 Subnormal number: +/- 0.frac x 2^(-126) x $FF 0 +/- Infinity x $FF non-0 NaN (signalling if Frac MSB is clear) others Normal number: +/- 1.frac x 2^(exp-127) Single format provides 24 bits of precision and 8 bits of range, roughly equivalent to at least 6 (very nearly 7) decimal digits of precision with a range of 1.1x10^(-38) to 3.4x10^38. The smallest subnormal number is approximately 1.4x10^(-45). Single extended --------------- For extra precision and range during intermediate calculations, a form of IEEE single extended format is provided. It can be viewed as an array of 3 Z-machine words: bit 15 10 0 15 14 0 15 0 +-+----+--------+ +-+----------+ +-----------+ |S|zero| Exp | |J| Frac | | Frac | +-+----+--------+ +-+----------+ +-----------+ msb lsb msb lsb msb lsb S Exp J Frac Meaning -------------------------------------------------------------------- x 0 0 0 +/- Zero x 0 0 non-0 Subnormal number: +/- 0.frac x 2^(-1023) x $7FF 0 0 +/- Infinity x $7FF 0 non-0 NaN (signalling if Frac MSB is clear) x x 1 x Normal number (exp<$7FF): +/- 1.frac x 2^(exp-1023) others illegal Single extended format provides 32 bits of precision and 11 bits of range, roughly equivalent to at least 9 decimal digits of precision with a range of 1.1x10^(-308) to 1.8x10^308. The smallest subnormal number is approximately 5.2x10^(-318). No support for I/O of single extended precision numbers is provided (apart from hexadecimal output) - they can only be used as intermediate values, or by manual raw initialisation of constants. Packed decimal -------------- Packed decimal format is a raw IEEE-like representation of a number in decimal. The IEEE-mandated conversions between binary and decimal take the form of conversions between single and packed decimal format. Formatted decimal printing and input routines go via packed decimal format. bit 12 8 4 0 12 8 4 0 12 8 4 0 +--+--+--+--+ +--+--+--+--+ +--+--+--+--+ | S|e0|e1|d0| |d1|d2|d3|d4| |d5|d6|d7|d8| +--+--+--+--+ +--+--+--+--+ +--+--+--+--+ msb lsb msb lsb msb lsb ee ddddddddd Meaning ----------------------------------------------------------------------- 0 0 +/- Zero [0-99] [1-9.99999999] Finite number: +/- d.dddddddd x 10^(+/- ee) FF 0 +/- Infinity FF non-zero NaN (signalling if MSB of d0 is clear) others Illegal The format is best viewed in terms of nibbles, rather than bits. The top bit of the S nibble holds the sign of the number, the next highest bit holds the sign of the exponent (and the lowest two are zero). Each of the remaining nibbles holds a decimal digit from 0-9. The special value $FF stored in the exponent indicates a non-finite number. Packed decimal format can accurately represent any single format number (including NaNs), so single->packed->single conversion is the identity (if rounding to nearest). Packed->single conversion, on the other hand, may over- or underflow, and lose precision. Floating-point environment ========================== The FP system has a number of global state variables. These are reflectively referred to as the "floating-point environment". The floating-point environment comprises the following settings: Setting Example manipulation function --------------------------------------------------------- rounding mode fesetround exception flags fetestexcept trap enables feenabletrap decimal printing mode fesetprintmode decimal printing precision fesetprintprecision trap handlers fesettraphandler To manipulate the current settings use the "fe"-prefixed functions - these are intended to resemble the FP functions specified initially by the C Numerical Extensions Group, and later by ISO C (1999). The entire floating point environment status can be stored by declaring an FEnv object and using fegetenv or similar. Please do not manipulate the properties of an FEnv directly - they are subject to change. Also, all items with names beginning with "_" in the library are for internal use only. Rounding ======== By default, all floating-point operations round their result to the nearest representable value (rounding to even in the halfway case). The rounding mode can be altered either globally, or on a per-operation basis. To set the rounding mode, call fesetround(mode); where mode is one of FE_TONEAREST FE_DOWNWARD FE_UPWARD FE_TOWARDZERO The rounding mode can be read by calling fegetround(). Floating-point operations by default use the globally set rounding mode. To use a different rounding mode in an individual operation, pass the rounding mode as an extra trailing argument to the routine (where documented). Operations ========== The following section documents the core IEEE-754 operations supplied by the library. All operations in the library that take a destination as their first argument store a value of the same format as their operands, unless stated otherwise. Operands are always of the same format. Copy operations --------------- The following routines are used to copy floating-point values: fcpy[x] fabs[x] fneg[x] They are non-floating-point operations, so signal no exceptions, even on signalling NaNs. fcpy copies a number as is, fabs clears the sign bit, and fneg reverses the sign bit. Usage: fcpy(destination, operand); Return value: None Exceptions: None Arithmetic ---------- The five core arithmetic routines are: fadd[x] fsub[x] fmul[x] fdiv[x] frem[x] They provide the IEEE-754 add (+), subtract (-), multiply (x), divide (/) and remainder (%) operations. Usage: fadd(destination, operand1, operand2 [, rounding] ); Return value: None Exceptions: Invalid operation (signalling NaN operand, magnitude subtraction of infinities, 0 x Infinity, Infinity x 0, 0 / 0, Infinity / Infinity, n % 0, Infinity % n) Divide by zero (fdiv[x] only) Overflow Underflow Inexact Square root ----------- Square root is provided by: fsqrt[x] Usage: fsqrt(destination, operand [, rounding]); Return value: None Exceptions: Invalid operation (signalling NaN operand, operand < 0) Inexact Format conversions ------------------ The following routines convert between the three floating-point formats, single (s), extended (x) and packed decimal (p): fstox fxtos fstop fptos All except fstox can take an optional extra rounding argument. Usage: fxtos(destination, source [, rounding]); fstox(destination, source); Return value: None Exceptions: Invalid operation (signalling NaN) Overflow Underflow Inexact Floating-point <-> integer conversions -------------------------------------- Conversion between floating-point and integer values are provided by: fitos fitox fstoi fxtoi Conversions to floating-point are always exact. Usage: fitos(destination, value); fstoi(source [, rounding]); Return value: Integer result (fstoi and fxtoi) None (fitos and fitox) Exceptions: Invalid operation (signalling NaN, quiet NaN, infinity, out-of-range) when converting to integer Inexact when converting to integer Rounding to integer value ------------------------- The following routine rounds to an integral floating-point value: frnd[x] It never raises an inexact exception. Usage: frnd(destination, operand [, rounding]); Return value: None Exceptions: Invalid operation (signalling NaN) Comparison ---------- The primary comparison routines are: fcmp[x] fcmpe[x]. They return one of FCMP_L, FCMP_G, FCMP_E, or FCMP_U, depending on the comparison of the first operand to the second; less-than (<), greater-than (>), equal (=) or unordered (?). In addition fcmpe signals an invalid operation exception if the operands are unordered. The return values take the form of disjoint bits, so expressions like "fcmp(a,b) & (FCMP_L|FCMP_E)" can be used. Usage: fcmp(operand1, operand2); Return value: One of FCMP_L, FCMP_G, FCMP_E and FCMP_U. Exceptions: Invalid operation (signalling NaN operand, or quiet NaN operand to fcmpe[x]) Wrapper routines providing fourteen of the IEEE predicates are provided (all with corresponding "x" forms. They return non-zero if and only if the respective predicate is true. Routine IEEE Predicate Exception if unordered? ------- -------------- ----------------------- feq[x] = No fne[x] ?<> No fgt[x] > Yes fge[x] >= Yes flt[x] < Yes fle[x] <= Yes funordered[x] ? No flg[x] <> Yes fleg[x] <=> Yes fug[x] ?> No fuge[x] ?>= No ful[x] ?< No fule[x] ?<= No fue[x] ?= No Usage: feq(operand1, operand2); Return value: Non-zero iff the predicate is true Exceptions: Invalid operation (signalling NaN operand, or quiet NaN operand to fgt, fge, flt, fle, flg or fleg) NaNs ==== Signalling and quiet NaNs are distinguished by the most-significant bit of the fraction (see "Formats" above) - quiet NaNs have the bit set, signalling clear. When a NaN is generated, a number giving a reason for the generation is stored in the lower bits of the fraction, as suggested by IEEE-754. In single format, the number is aligned at the bottom of the stored fraction. In extended, it is aligned 8 bits above the bottom of the stored fraction. Single<->extended conversion of NaNs preserves the reason code, by adding or truncating the normally unused low-order bits. In single<->packed decimal conversion of NaNs, the low order 22 bits of the single number are converted as an integer to decimal, then stored at the bottom of the decimal fraction, again preserving the reason code. The reason codes used are as follows: InvReason_SigNan 0 Operand was a signalling NaN InvReason_InitNaN 1 A manually initialised NaN InvReason_MagSubInf 4 Magnitude subtraction of infinities InvReason_InfTimes0 5 Infinity times zero InvReason_0TimesInf 6 Zero times infinity InvReason_0Div0 7 Zero divided by zero InvReason_InfDivInf 8 Infinity divided by infinity InvReason_InfRemX 9 Infinity REM x InvReason_XRem0 10 x REM 0 InvReason_SqrtNeg 11 Square root of number < 0 InvReason_FixQNaN 12 (Conversion of a quiet NaN to integer) InvReason_FixInf 13 (Conversion of infinity to integer) InvReason_FixRange 14 (Conversion to integer out of range) InvReason_CompQNaN 15 (Comparison of quiet NaNs (fcmpe)) Some of these reason codes (those in parentheses) do not normally occur in NaNs, but may appear as reason codes to invalid operation trap handlers (see below). Exceptions ========== The FP library supports the standard set of five exceptions specified by IEEE 754. The exceptions are represented by the five constants FE_INVALID FE_DIVBYZERO FE_OVERFLOW FE_UNDERFLOW FE_INEXACT or by bitwise-ORs of them. FE_ALL_EXCEPT is the bitwise-OR of all five constants. The environment stores cumulative status flags that are set whenever an untrapped exception occurs. The flags can only be cleared manually. The following routines manipulate the exception flags. feclearexcept(excepts) ---------------------- Clears the specified exception flags. fesetexcept(excepts) -------------------- Sets the specified exception flags. feraiseexcept(excepts) ---------------------- Raises the specified exceptions; this means setting the flags if the exceptions are untrapped, or calling the trap handlers if they are trapped. Note that trap handlers called through feraiseexcept (or feupdateenv) do not receive information about the cause of the original exception. fetestexcept(excepts) --------------------- Returns the state of the specified exception flags. fegetenv(env) ------------- Stores the current state of the floating-point environment in the FEnv object env. feholdexcept(env) ----------------- Stores the current state of the floating-point environment in the FEnv object env. It then disables all traps and clears all exception flags. fesetenv(env) ------------- Establishes the floating-point environment stored in the FEnv object env. feupdateenv(env) ---------------- Notes the current exception flag status, installs the floating-point environment stored in the FEnv object env, and then raises the saved exceptions. Example: FEnv saved_env; [ f; feholdexcept(saved_env); ! Perform calculation, for which we wish to hide ! spurious underflows if (spurious_underflow) feclearexcept(FE_UNDERFLOW); feupdateenv(saved_env); return result; ]; Traps ===== Traps can be enabled for each of the five exceptions; by default all traps are disabled. The default trap handlers print a diagnostic message, then quit. It is possible to replace individual trap handlers with your own routine. The following routines control trap handlers. fetesttrap(excepts) ------------------- Returns a bitfield indicating which of the specified traps are enabled. fedisabletrap(excepts) ---------------------- Disable traps for the specified exceptions. feenabletrap(excepts) --------------------- Enable traps for the specified exceptions. fesettraphandler(routine, excepts) ---------------------------------- Set the trap handler for the exception(s) specified. fegettraphandler(except) ------------------------ Return the trap handler for the exception specified. Trap handler calling conventions -------------------------------- The trap handler is given information about the operation that was being performed, and either references to its operands or the result. The overflow, underflow and inexact trap handlers are passed the following arguments: dest - pointer to destination, containing the rounded (and biased) result, in destination format (except FE_FMT_I case) rounded result (FE_FMT_I case) fmt - format of the destination (FE_FMT_S, FE_FMT_X or FE_FMT_I) op - the operation being performed (FE_OP_ADD etc) round - rounding mode for this operation ae - (overflow/underflow only) the bias of the result (the correct result is dest * 2^ae) The invalid operation and divide by zero trap handlers are passed the following arguments: dest - pointer to destination, not filled in (except FE_FMT_I case) fmt - format of the destination (FE_FMT_S, FE_FMT_X, FE_FMT_P or FE_FMT_I) op - the operation being performed (FE_OP_CONV etc) round - rounding mode for this operation reason - For invalid operation, a NaNReason_XXX constant giving the reason for the exception, else zero OP1 - operand 1 in extended format OP2 - operand 2 (if any) in extended format The trap handlers must either quit the program, execute a @throw to return control back to user code, or fill in dest with an appropriate result, and return. Trap handlers may also update the cumulative exception flags. If fmt is FE_FMT_I, the operation that trapped was one that returned an integer value (such as fstoi or fcmp). In this case, dest is undefined, and the trap handler should instead return an integer result. The operands or results passed in may be held in areas of memory liable to be overwritten by arithmetic operations in the floating point library (but not non-arithmetic ones such as fcpy, fxtos or hex printing), so the trap handler should take a copy if necessary. If exceptions are raised manually (through feraiseexcept or feupdateenv), the ae, OP1 and OP2 parameters are not supplied, dest is NULL, and fmt is FE_FMT_N. The following chart summarises the possible combinations of op and fmt for each trap handler. It is suggested that your trap handler does not attempt to continue execution if an unrecognised operation or destination format occurs. Operation Invalid DivByZero Overflow Underflow Inexact ----------------------------------------------------------------------- ADD SX SX SX SX SUB SX SX SX SX MUL SX SX SX SX DIV SX SX SX SX SX REM SX SX CMP[E] I CONV SX S S S DEC S P P FIX I I FLT RND SX RAISE N N N N N SQRT SX SX Utility functions ================= The library also supplies a number of utility functions. Formatted output ---------------- The following printing rules are supplied: (hex) print an integer as 4-digit hexadecimal (like C's %04X) (hexdigit) print the least-significant hex digit (fp) print single as formatted decimal (like C's %F/%G/%E) (fhex) print single as formatted hexadecimal (like C's %A) (fhexx) print extended as formatted hexadecimal (fraw) print single as a raw 32-bit representation (frawx) print extended or packed as raw 48-bit memory representation The behaviour of (fp) can be modified by the following routines: fesetprintmode(mode) -------------------- Sets the (fp) printing mode to one of FE_PRINT_G, FE_PRINT_F or FE_PRINT_E, and returns the previous mode. FE_PRINT_G is the default, and is analogous to C's %G specifier. It prints significant digits, either in FE_PRINT_E or FE_PRINT_F format, with trailing zeros suppressed. FE_PRINT_F prints with a fixed number () of decimal places. FE_PRINT_E prints in exponent form, as n.nnnnnEnn, with a fixed number () of decimal places. fesetprintprecision(precision) ------------------------------ Sets the precision for (fp) printing, returning the previous value. See above for details. Formatted input --------------- To read a floating-point number, eg from user input, the strtof routine can be used: strtof(destination, string, length) String should be a pointer to a ZSCII string, and length should be its length. The string should not be terminated, or be preceded by a length byte/word. The routine returns the number of characters consumed. If 0, the input was not recognised at all, and destination is set to zero; otherwise destination is filled in with the single-precision result. If the return value is less than length, then the input has some trailing text that did not form part of the number. Input is case insensitive and ignores leading spaces. Numbers can take the following forms: 23 23E2 $4 $4P12 NaN 23. 23.E2 $4. $4.P12 Infinity 23.5 23.5E22 $44.E $22.P18 -Infinity .23 .23E-2 $.EF $.2EP-7 Inf Enn signifies "x 10^nn", and Pxx signifies "x 2^nn". Formatted initialisation ------------------------ For initialisation purposes, the finit() function can be used, thus: finit(destination, "12.8"); This prints its parameter into temporary storage, and calls strtof on it. Diagnostics are output if the parameter is unrecognised. Classification -------------- A variety of routines for testing the property of a number are supplied. None of these cause exceptions. isnan[x] returns true if its argument is a NaN issignalling[x] returns true if its argument is a signalling NaN isinf[x] returns true if its argument is infinite isfinite[x] returns true if its argument is finite (not infinite/NaN) More generally, fpclassify[x] returns a constant indicating the type of its argument: FP_NANS signalling NaN FP_NANQ quiet NaN FP_ZEROM -0 FP_ZEROP +0 FP_SUBNORMALM negative subnormal number FP_SUBNORMALP positive subnormal number FP_NORMALM negative normal number FP_NORMALP positive normal number FP_INFINITYM -infinity FP_INFINITYP +infinity These constants are mutually exclusive bit values, but should be tested as flags (in case of further subdivision): if (fpclassify(num) & FP_NORMALP) ... Some combinations are predefined: FP_NAN = FP_NANS | FP_NANQ FP_ZERO = FP_ZEROM | FP_ZEROP FP_SUBNORMAL = FP_SUBNORMALM | FP_SUBNORMALP FP_NORMAL = FP_NORMALM | FP_NORMALP FP_INFINITY = FP_INFINITYM | FP_INFINITYP Minima and maxima ----------------- fmin[x] and fmax[x] return the minimum/maximum of their two operands: fmin(destination, operand1, operand2); If one operand (only) is a NaN, the non-NaN argument is returned (as specified in C99). Change history ============== Release 1/2: * fptos and fstop now reported to trap handlers as FE_OP_DEC * fptos and fstop now trigger signalling NaNs * fstop now accepts rounding mode parameter * fegetenv and fesetenv fixed * Rounding mode now passed to trap handlers * fsqt renamed to fsqrt * Removed some stray debugging (font on/off statements) Notice of copyright =================== This library is intended to be used under the same conditions as Inform itself. Specifically: The FP library is copyright (C) Kevin Bracey 2002. The FP library may be freely distributed provided that: (a) distributed copies are not substantially different from those archived by the author, (b) this and other copyright messages are always retained in full, and (c) no profit is involved. Exceptions to these conditions may be negotiated directly with the author. A story file produced using the FP library belongs to whoever wrote it and may be sold for profit if so desired, without the need for royalty payment. The author assumes no liability for errors and omissions in this manual, or for damages or loss of revenue resulting from the use of the information contained herein, or the use of any of the software described herein. The FP library is supplied "as is" and carries no warranty, actual or implied.