Vectorization Report

The vectorization report can provide information on what loops take advantage of Streaming SIMD Extensions 2 (SSE2) and Streaming SIMD Extensions 3 (SSE3) vectorization and which ones do not. The vectorization report is available on IA-32 and Intel® EM64T systems.

The -vec-report (Linux*) or /Qvec-report (Windows*) options directs the compiler to generate the vectorization reports with different levels of information. You can use this option to control the diagnostic message set to the reports. If you want to redirect the results to a text file you would use a command similar to the following:

Platform	Command
Linux	ifort -xW -vec-report3 matrix1.f > report.txt
Windows	ifort /QxW /Qvec-report3 matrix1.f > report.txt

For more information about this option, see the following topic:

-vec-report compiler option

See Parallelism Overview for information on other vectorizer options.

The following example results illustrate the type of information generated by the vectorization report:

Example results
matmul.f(27) : (col. 9) remark: loop was not vectorized: not inner loop. matmul.f(28) : (col. 11) remark: LOOP WAS VECTORIZED. matmul.f(31) : (col. 9) remark: loop was not vectorized: not inner loop. matmul.f(32) : (col. 11) remark: LOOP WAS VECTORIZED. matmul.f(37) : (col. 10) remark: loop was not vectorized: not inner loop. matmul.f(38) : (col. 12) remark: loop was not vectorized: not inner loop. matmul.f(40) : (col. 14) remark: loop was not vectorized: vectorization possible but seems inefficient. matmul.f(46) : (col. 10) remark: loop was not vectorized: not inner loop. matmul.f(47) : (col. 12) remark: loop was not vectorized: contains unvectorizable statement at line 48.

Example results

matmul.f(27) : (col. 9) remark: loop was not vectorized: not inner loop.

matmul.f(28) : (col. 11) remark: LOOP WAS VECTORIZED.

matmul.f(31) : (col. 9) remark: loop was not vectorized: not inner loop.

matmul.f(32) : (col. 11) remark: LOOP WAS VECTORIZED.

matmul.f(37) : (col. 10) remark: loop was not vectorized: not inner loop.

matmul.f(38) : (col. 12) remark: loop was not vectorized: not inner loop.

matmul.f(40) : (col. 14) remark: loop was not vectorized: vectorization possible but seems inefficient.

matmul.f(46) : (col. 10) remark: loop was not vectorized: not inner loop.

matmul.f(47) : (col. 12) remark: loop was not vectorized: contains unvectorizable statement at line 48.

If the compiler reports “Loop was not vectorized” because of the existence of vector dependence, then you need to do a vector dependence analysis of the loop.

If you are convinced that no legitimate vector dependence exists, then the above message indicates that the compiler was likely assuming the pointers or arrays in the loop were dependent, which is another way of saying that the pointers or arrays were aliased. Memory disambiguation techniques should be used to get the compiler to vectorize in these cases.

There are three major types of vector dependence: FLOW, ANTI, and OUTPUT.

See Loop Independence to determine if you have a valid vector dependence. Many times the compiler report will assert a vector dependence where none exists – this is because the compiler assumes memory aliasing. The action to take in these cases is to check code for dependencies; if there are none, inform the compiler using methods described in memory aliasing including restrict or pragma ivdep.

There are a number of situations where the vectorization report may indicate vector dependencies. The following situations will sometimes be reported as vector dependencies, Non-Unit Stride, Low Trip Count, Complex Subscript Expression.

Non-Unit Stride

The report might indicate that a loop could not be vectorized when the memory is accessed in a non-Unit Stride fashion. This means that nonconsecutive memory locations are being accessed in the loop. In such cases, see if loop interchange can help or if it is practical. If not, then sometimes you can force vectorization through vector always directive; however, you should verify improvement.

See Understanding Runtime Performance for more information about non-unit stride conditions.

Usage with Other Options

The vectorization reports are generated during the final compilation phase, which is when the executable is generated; therefore, there are certain option combinations you cannot use if you are attempting to generate a report. If you use the following option combinations, the compiler issues a warning and does not generate a report:

-c or -ipo or -x with -vec-report (Linux*) and /c or /Qipo or /Qx with /Qvec-report (Windows*)
-c or -ax with -vec-report (Linux) and /c or /Qax with /Qvec-report (Windows)

The following example commands can generate vectorization reports:

Platform	Command Examples
Linux	ifort -xK -vec-report3 file.f ifort -xK -ipo -vec-report3 file.f
Windows	ifort /QxK /Qvec-report3 file.f ifort /QxK /Qipo /Qvec-report3 file.f

Platform

Command Examples

Linux

ifort -xK -vec-report3 file.f

ifort -xK -ipo -vec-report3 file.f

Windows

ifort /QxK /Qvec-report3 file.f

ifort /QxK /Qipo /Qvec-report3 file.f

Changing Code Based on Report Results

You might consider changing existing code to allow vectorization under the following conditions:

The vectorization report indicates that the program contains unvectorizable statement at line XXX.
The vectorization report states there is a vector dependence: proven FLOW dependence between ‘variable’ line XXX, and ‘variable’ line XXX or loop was not vectorized: existence of vector dependence. Generally, these conditions indicate true loop dependencies are stopping vectorization. In such cases, consider changing the loop algorithm.

For example, consider the two equivalent algorithms producing identical output below. "Foo" will not vectorize due to the FLOW dependence but "bar" does vectorize.

Example
subroutine foo(y) implicit none integer :: i real :: y(10) do i=2,10 y (i) = y (i-1)+1 end do end subroutine foo subroutine bar(y) implicit none integer :: i real :: y(10) do i=2,10 y (i) = y (1)+i end do end subroutine bar

Example

subroutine foo(y)

implicit none

integer :: i

real :: y(10)

do i=2,10

y (i) = y (i-1)+1

end do

end subroutine foo

subroutine bar(y)

implicit none

integer :: i

real :: y(10)

do i=2,10

y (i) = y (1)+i

end do

end subroutine bar

Unsupported loop structures may prevent vectorization. An example of an unsupported loop structure is a loop index variable that requires complex computation. Change the structure to remove function calls to loop limits and other excessive computation for loop limits.

Example
function func(n) implicit none integer :: func, n func = nn-1 end function func subroutine unsupported_loop_structure(y,n) implicit none integer :: i,n, func real :: y(n) do i=0,func(n) y(i) = y(i) 2.0 end do end subroutine unsupported_loop_structure

Example

function func(n)

implicit none

integer :: func, n

func = n*n-1

end function func

subroutine unsupported_loop_structure(y,n)

implicit none

integer :: i,n, func

real :: y(n)

do i=0,func(n)

y(i) = y(i) * 2.0

end do

end subroutine unsupported_loop_structure

Non-unit stride access might cause the report to state that vectorization possible but seems inefficient. Try to restructure the loop to access the data in a unit-stride manner (for example, apply loop interchange), or try directive vector always.

Using mixed data types in the body of a loop might prevent vectorization. In the case of mixed data types, the vectorization report might state something similar to loop was not vectorized: condition too complex.

The following example code demonstrates a loop that cannot vectorize due to mixed data types within the loop. For example, withinborder is an int while all other data types in loop are defined as double. Simply changing the withinborder data type to double will allow this loop to vectorize.

Example
subroutine howmany_close(x,y,n) implicit none integer :: i,n,withinborder real :: x(n), y(n), dist withinborder=0 do i=0,100 dist=sqrt(x(i)x(i) + y(i)y(i)) if (dist<5) withinborder= withinborder+1 end do end subroutine howmany_close

Example

subroutine howmany_close(x,y,n)

implicit none

integer :: i,n,withinborder

real :: x(n), y(n), dist

withinborder=0

do i=0,100

dist=sqrt(x(i)*x(i) + y(i)*y(i))

if (dist<5) withinborder= withinborder+1

end do

end subroutine howmany_close