SoftwarePipelining (SWP) Report (Linux* and Windows*)

The SWP report can provide details information about loops currently taking advantage of software pipelining available on Itanium®-based systems. Additionally, the report suggests reasons for the loops not being pipelined.

The following command syntax examples demonstrates how to generate a SWP report for the Itanium® Compiler Code Generator (ECG) Software Pipeliner (SWP).

Platform

Syntax Examples

Linux*

ifort -c -opt-report -opt-report-phase ecg_swp swp.f90

Windows*

ifort /c /Qopt-report /Qopt-report-phaseecg_swp swp.f90

where -c (Linux) or /c (Windows) tells the compiler to stop at generating the object code (no linking occurs), -opt-report (Linux) or /Qopt-report (Windows) invokes the report generator, and -opt-report-phase ecg_swp (Linux) or /Qopt-report-phase ecg_swp (Windows) indicates the phase (ecg) for which to generate the report.

Note

Linux* only: The space between the option and the phase is optional.

Typically, loops that software pipeline will have a line that indicates the compiler has scheduled the loop for SWP in the report. If the -O3 (Linux) or /O3 (Windows) option is specified, the SWP report merges the loop transformation summary performed by the loop optimizer.

You can compile this example code to generate a sample SWP report. The sample reports is also shown below.

Example

!#define NUM 1024

subroutine multiply_d(a,b,c,NUM)

  implicit none

  integer :: i,j,k,NUM

  real :: a(NUM,NUM), b(NUM,NUM), c(NUM,NUM)

  NUM=1024

  do i=0,NUM

    do j=0,NUM

      do k=0,NUM

        c(j,i) = c(j,i) + a(j,k) * b(k,i)

      end do

    end do

  end do

end subroutine multiply_d

The following sample report shows the report phase that results from compiling the example code shown above (when using the ecg_swp phase).

Sample SWP Report

Swp report for loop at line 10 in _Z10multiply_dPA1024_dS0_S0_ in file SWP report.f90

 Resource II   = 2

 Recurrence II = 2

 Minimum II    = 2

 Scheduled II  = 2
 

 Estimated GCS II   = 7
 

 Percent of Resource II needed by arithmetic ops     = 100%

 Percent of Resource II needed by memory ops         =  50%

 Percent of Resource II needed by floating point ops =  50%
 

 Number of stages in the software pipeline = 6

Reading the Reports

One fast way to determine if specific loops are software pipelining is to search the report output for the phrase “Number of stages in the software pipeline”; if this phrase is present in the report, it indicates that software pipelining suceeded for the associated loop.

To understand the SWP report results, you must know something about the terminology used and the related concepts. The following table describes some of the terminology used in the SWP report.

Term

Definition

II

Initiation Interval (II). The number of cycles between the start of one iteration and the next in the SWP. The presence of the term II in any SWP report indicates that SWP succeeded for the loop in question.

II can be used in a quick calculation to determine how many cycles your loop will take, if you also know the number of iterations. Total cycle time of the loop is approximately N * Scheduled II + number Stages (Where N is the number of iterations of the loop). This is an approximation because it does not take into account the ramp-up and ramp-down of the prolog and epilog of the SWP, and only considers the kernel of the SWP loop. As you modify your code, it is generally better to see scheduled II go down, though it is really N* (Scheduled II) + Number of stages in the software pipeline that is ultimately the figure of merit.

Resource II

Resource II implies what the Initiation Interval should be when considering the number of functional units available.

Recurrence II

Recurrence II indicates what the Initiation Interval should be when there is a recurrence relationship in the loop. A recurrence relationship is a particular kind of a data dependency called a flow dependency like a[i] = a[i-1] where a[i] cannot be computed until a[i-1] is known. If Recurrence II is non-zero and there is no flow dependency in the code, then this indicates either Non-Unit Stride Access or memory aliasing.

See Helping the Compiler for more information.

Minimum II

Minimum II is the theoretical minimum Initiation Interval that could be achieved.

Scheduled II

Scheduled II is what the compiler actually scheduled for the SWP.

number of stages

Indicates the number of stages. For example, in the report results below, the line Number of stages in the software pipeline = 3 indicates there were three stages of work, which will show, in assembly, to be a load, an FMA instruction and a store.

loop-carried memory dependence edges

The loop-carried memory dependence edges means the compiler avoided WAR (Write After Read) dependency.

Loop-carried memory dependence edges can indicate problems with memory aliasing. See Helping the Compiler.

Using the Report to Resolve Issues

The most efficient path to solve problems is to analyze the loops that did not SWP in order to determine how to enable SWP.

If the compiler reports the Loop was not SWP because..., see the following table for suggestions about how to mitigate the problems:

Message in Report

Suggested Action

acyclic global scheduler can achieve a better schedule: => loop not pipelined

Indicates that the most likely cause is memory aliasing issues. For memory alias problems see memory aliasing (restrict, #pragma ivdep).

Might also indicate that the application might be accessing memory in a non-Unit Stride fashion. Non-Unit Stride issues may be indicated by an artificially high recurrence II; If you know there is no recurrence relationship (a[i] = a[i-1] + b[i] for example) in the loop, then a high recurrence II (greater than 0) is a sign that you are accessing memory non-Unit Stride. Rearranging code, perhaps a loop interchange, might help mitigate this problem.

Loop body has a function call

Indicates that inlining the function might help solve the problem.

Not enough static registers

Indicates that you should distribute the loop by separating it into two or more loops.

Not enough rotating registers

Indicates that the loop carried values use the rotating registers. Distribute the loop.

Loop too large

Indicates that you should distribute the loop.

Loop has a constant trip count < 4

Indicates that unrolling was insufficient. Attempt to fully unroll the loop. However, with small loops fully unrolling the loop is not likely to affect performance significantly.

Too much flow control

Indicates complex loop structure. Attempt to simplify the loop.

Index variable type used can greatly impact performance. In some cases, using loop index variables of type short or unsigned int can prevent software pipelining. If the report indicates performance problems in loops where the index variable is not int and if there are no other obvious causes, try changing the loop index variable to type int.

See Optimizer Report Generation for more information about options you can use to generate reports.