Loop interchange need unit-stride constructs to be vectorized. Matrix multiplication is commonly written as shown in the following example:
Example: Typical Matrix Multiplication |
---|
subroutine matmul_slow(a, b, c) integer :: i, j, k real :: a(100,100), b(100,100), c(100,100) do i = 1, n do j = 1, n do k = 1, n c(i,j) = c(i,j) + a(i,k)*b(k,j); end do end do end do end subroutine matmul_slow |
The use of B(K,J)is not a stride-1 reference and therefore will not normally be vectorizable.
If the loops are interchanged, however, all the references will become stride-1 as shown in the following example.
Example: Matrix Multiplication with Stride-1 |
---|
subroutine matmul_fast(a, b, c) integer :: i, j, k real :: a(100,100), b(100,100), c(100,100) do j = 1, n do k = 1, n do i = 1, n c(i,j) = c(i,j) + a(i,k)*b(k,j) enddo enddo enddo end subroutine matmul_fast |
Interchanging is not always possible because of dependencies, which can lead to different results.