Using Tuning Tools and Strategies

Identifying and Analyzing Hotspots

To identify opportunities for optimization, you should start with the time-based and event-based sampling functionality of the VTune™ Performance Analyzer.

Time-based sampling identifies the sections of code that use the most processor time. Event-based sampling identifies microarchitecture bottlenecks such as cache misses and mispredicted branches. Clock ticks and instruction count are good counters to use initially to identify specific functions of interest for tuning.

Once you have identified specific functions through sampling, use call-graph analysis to provide thread-specific reports. Call-graph analysis returns the following information about the functions:

You can also use the Counter monitor (which is equivalent to Microsoft*Perfmon*) to provide real-time performance data based on more than 200 possible operating system counters, or you can create custom counters created for specific environments and tasks.

You can use Intel® Tuning Assistant and Intel® Thread Checker, which ship as part of the VTune™ Performance Tools. The Intel® Tuning Assistant interprets data generated by the VTune™ Performance Tools and generates application-specific tuning advice based on that information. Intel® Thread Checker provides insight into the accuracy of the threading methodology applied to an application by identifying specific threading issues that should be addressed to improve performance.

See Using Intel Performance Analysis Tools for more information about these tools.

Using the Intel® Compilers for Tuning

The compilers provide advanced optimization features for Intel processors, which make them an efficient, cost-effective way to improve performance for Intel® architectures. IA-32 and Intel® EM64T compilers support processor dispatch, which allows a single executable to run on current IA-32 Intel microarchitectures and on legacy processors.

You might find the following options useful when tuning:

Note

Mac OS*: The -mtune option is not supported, and P is the only supported value for the -x and -ax options.

See Optimizations Option Summary and Optimizing for Specific Processors Overview to get more information about the options listed above.

Intel compilers provide support for auto-parallelization and substantial support of OpenMP* as described in the OpenMP Fortran version 2.5 specification.