Understanding Floating-point Performance

Denormal Computations

A denormal number is where the mantissa is non zero, but the exponent value is zero in an IEEE* floating-point representation. The smallest normal single precision floating point number greater than zero is about 1.175494350822288e-38. Smaller numbers are possible, but are denormal and take hardware or operating system intervention to handle them, which can cost hundreds of clock cycles.

In many cases, denormal numbers are evidence of an algorithm problem where a poor choice of algorithms is causing excessive computation in the denormal range. There are several ways to handle denormal numbers. For example, you can translate to normal, which means to multiply by a large scalar number, do the remaining computations in the normal space, then scale back down to denormal range. Do this whenever the small denormal values benefit the program design. In many cases, denormals that can be considered to be zero may be flushed to zero.

Denormals are computed in software on Itanium® processors. Hundreds of clock cycles are required, resulting in excessive kernel time. Attempt to understand why denormal results occur, and determine if they are justified. If you determine they are not justified, then use the following suggestions to handle the results:

Translate to normal problem by scaling values.
Increase precision and range by using a wider data type.
Set flush-to-zero mode in floating-point control register: -ftz (Linux*) or /Qftz (Windows*).

Denormal numbers always indicate a loss of precision, an underflow condition, and usually an error (or at least a less than desirable condition). On the Intel® Pentium® 4 processor and the Intel Itanium® processor, floating-point computations that generate denormal results can be set to zero, improving the performance.

The Intel compiler disables the FTZ and DAZ bits when you specify value-safe options, including the strict, precise, source, double, and extended models supported by the -fp-model (Linux*) or /fp (Windows*) option.

IA-32 and Intel® EM64T compilers

The Intel® compiler automatically sets flush-to-zero mode in the SSE Control Register (MXCSR) when running on a processor that supports SSE instructions. SSE instructions are enabled by default in the Intel® EM64T compiler. Enable SSE instructions in the IA-32 compiler by using -xK, -xW, -xN, -xB, or -xP (Linux) or /QxK, /QxW, /QxN, /QxB, or /QxP (Windows). The MXCSR flush-to-zero setting only affects the behavior of SSE, SSE2, and SSE3 instructions. x87 floating-point instructions are not affected.

When SSE instructions are used, options -no-ftz (Linux and Mac OS*) and /Qftz- (Windows) are ignored. However, you can enable gradual underflow by calling a function in C that clears the FTZ and DAZ bits in the MXCSR or by using run-time function for_set_fpe to clear those bits. Be aware that denormal processing can significantly slow down computation.

Refer to IA-32 Intel® Architecture Software Developer’s Manual Volume 1: Basic Architecture (http://www.intel.com/design/pentiumii/manuals/243190.htm) for more details about flush to zero or specific bit field settings.

Use the -ftz (Linux) or /Qftz (Windows) option to flush x87 floating-point values to zero. It is necessary to use the option on the source containing PROGRAM and on any source where abrupt underflow is desired.

Note

Windows* Only: The /Qftz option is not supported for Intel® EM64T.

Itanium® compiler

The Itanium® compiler supports the -ftz (Linux) or /Qftz (Windows) option used to flush denormal results to zero when the application is in the gradual underflow mode. Use this option if the denormal values are not critical to application behavior. The default status of the option is OFF. By default, the compiler lets results gradually underflow.

Use the -ftz (Linux) or /Qftz (Windows) on the source containing PROGRAM; the option turns on the Flush-to-Zero (FTZ) mode for the process started by PROGRAM. The initial thread, and any threads subsequently created by that process, will operate in FTZ mode.

By default, the -O3 (Linux) or /O3 (Windows) option enables FTZ; in contrast, the -O2 (Linux) or /O2 (Windows) option disables FTZ. Alternately, you can use -no-ftz (Linux) or /Qftz- (Windows) to disable flushing denormal results to zero (DAZ).

For detailed optimization information related to microarchitectural optimization and cycle accounting, refer to Introduction to Microarchitectural Optimization for Itanium® 2 Processors Reference Manual also known as “Software Optimization book“ document number 251464-001 located at http://www.intel.com/software/products/vtune/techtopic/software_optimization.pdf.

Inexact Floating Point Comparisons

Some floating point applications exhibit extremely poor performance by not terminating. The applications do not terminate, in many cases, because exact floating-point comparisons were made against a given value.