Loop Independence

Loop independence is important since loops that are independent can be parallelized. Independent loops can be parallelized in a number of ways, from the course-grained parallelism of OpenMP*, to fine-grained Instruction Level Parallelism (ILP) of vectorization and software pipelining.

Loops are considered independent when the computation of iteration Y of a loop can be done independently of the computation of iteration X. In other words, if iteration 1 of a loop can be computed and iteration 2 simultaneously could be computed without using any result from iteration 1, then the loops are independent.

Occasionally, you can determine if a loop is independent by comparing results from the output of the loop with results from the same loop written with a decrementing index counter.

For example, the loop shown in example 1 might be independent if the code in example 2 generates the same result .

Example
subroutine loop_independent_A (a,b,MAX) implicit none integer :: j, MAX, a(MAX), b(MAX) do j=0, MAX a(j) = b(j) end do end subroutine loop_independent_A subroutine loop_independent_B (a,b,MAX) implicit none integer :: j, MAX, a(MAX), b(MAX) do j=MAX, 0, -1 a(j) = b(j) end do end subroutine loop_independent_B

Example

subroutine loop_independent_A (a,b,MAX)

implicit none

integer :: j, MAX, a(MAX), b(MAX)

do j=0, MAX

a(j) = b(j)

end do

end subroutine loop_independent_A

subroutine loop_independent_B (a,b,MAX)

implicit none

integer :: j, MAX, a(MAX), b(MAX)

do j=MAX, 0, -1

a(j) = b(j)

end do

end subroutine loop_independent_B

When loops are dependent, improving loop performance becomes much more difficult. Loops can be dependent in several, general ways.

Flow Dependency
Anti Dependency
Output Dependency
Reductions

The following sections illustrate the different loop dependencies.

Flow Dependency - Read After Write

Cross-iteration flow dependence is created when variables are read then written in different iterations, as shown in the following example:

Example
subroutine flow_dependence (A,MAX) implicit none integer :: J,MAX real :: A(MAX) do j=1, MAX A(J)=A(J-1) end do end subroutine flow_dependence

Example

subroutine flow_dependence (A,MAX)

implicit none

integer :: J,MAX

real :: A(MAX)

do j=1, MAX

A(J)=A(J-1)

end do

end subroutine flow_dependence

The above example is equivalent to the following lines for the first few iterations:

Sample iterations
A[1]=A[0]; A[2]=A[1];

Sample iterations

A[1]=A[0];

A[2]=A[1];

Recurrence relations feed information forward from one iteration to the next.

Example
subroutine time_stepping_loops (a,b,MAX) implicit none integer :: J,MAX real :: a(MAX), b(MAX) do j=1, MAX a(j) = a(j-1) + b(j) end do end subroutine time_stepping_loops

Example

subroutine time_stepping_loops (a,b,MAX)

implicit none

integer :: J,MAX

real :: a(MAX), b(MAX)

do j=1, MAX

a(j) = a(j-1) + b(j)

end do

end subroutine time_stepping_loops

Most recurrences cannot be made fully parallel. Instead, look for a loop further out or further in to parallelize. You might be able to get more performance gains through unrolling.

Anti Dependency - Write After Read

Cross-iteration anti-dependence is created when variables are written then read in different iterations, as shown in the following example:

Example
subroutine anti_dependence (A,MAX) implicit none integer :: J,MAX real :: a(MAX), b(MAX) do J=1, MAX A(J)=A(J+1) end do end subroutine anti_dependence

Example

subroutine anti_dependence (A,MAX)

implicit none

integer :: J,MAX

real :: a(MAX), b(MAX)

do J=1, MAX

A(J)=A(J+1)

end do

end subroutine anti_dependence

The above example is equivalent to the following lines for the first few iterations:

Sample interations
A[1]=A[2]; A[2]=A[3];

Sample interations

A[1]=A[2];

A[2]=A[3];

Output Dependency - Write After Write

Cross-iteration output dependence is where variables are written then rewritten in a different iteration. The following example illustrates this type of dependency:

Example
subroutine output_dependence (A,B,C,MAX) implicit none integer :: J,MAX real :: a(MAX), b(MAX), c(MAX) do J=1, MAX A(J)=B(J) A(J+1)=C(J) end do end subroutine output_dependence

Example

subroutine output_dependence (A,B,C,MAX)

implicit none

integer :: J,MAX

real :: a(MAX), b(MAX), c(MAX)

do J=1, MAX

A(J)=B(J)

A(J+1)=C(J)

end do

end subroutine output_dependence

The above example is equivalent to the following lines for the first few iterations:

Sample interations
A[1]=B[1]; A[2]=C[1]; A[2]=B[2]; A[3]=C[2];

Sample interations

A[1]=B[1];

A[2]=C[1];

A[2]=B[2];

A[3]=C[2];

Reductions

The Intel® compiler can successfully vectorize or software pipeline (SWP) most loops containing reductions on simple math operators like multiplication (*), addition (+), subtraction (-), and division (/). Reductions collapse array data to scalar data by using associative operations:

Example
subroutine reduction (sum,c,MAX) implicit none integer :: j,MAX real :: sum, c(MAX) do J=0, MAX sum = sum + c(j) end do end subroutine reduction

Example

subroutine reduction (sum,c,MAX)

implicit none

integer :: j,MAX

real :: sum, c(MAX)

do J=0, MAX

sum = sum + c(j)

end do

end subroutine reduction

The compiler might occasionally misidentify a reduction and report flow-, anti-, output-dependencies or sometimes loop-carried memory-dependency-edges; in such cases, the compiler will not vectorize or SWP the loop.