Loop Interchange and Subscripts: Matrix Multiply

Loop interchange need unit-stride constructs to be vectorized. Matrix multiplication is commonly written as shown in the following example:

Example: Typical Matrix Multiplication
subroutine matmul_slow(a, b, c) integer :: i, j, k real :: a(100,100), b(100,100), c(100,100) do i = 1, n do j = 1, n do k = 1, n c(i,j) = c(i,j) + a(i,k)*b(k,j); end do end do end do end subroutine matmul_slow

subroutine matmul_slow(a, b, c)

integer :: i, j, k

real :: a(100,100), b(100,100), c(100,100)

do i = 1, n

do j = 1, n

do k = 1, n

c(i,j) = c(i,j) + a(i,k)*b(k,j);

end do

end subroutine matmul_slow

The use of B(K,J)is not a stride-1 reference and therefore will not normally be vectorizable.

If the loops are interchanged, however, all the references will become stride-1 as shown in the following example.

Example: Matrix Multiplication with Stride-1
subroutine matmul_fast(a, b, c) integer :: i, j, k real :: a(100,100), b(100,100), c(100,100) do j = 1, n do k = 1, n do i = 1, n c(i,j) = c(i,j) + a(i,k)*b(k,j) enddo enddo enddo end subroutine matmul_fast

subroutine matmul_fast(a, b, c)

integer :: i, j, k

real :: a(100,100), b(100,100), c(100,100)

do j = 1, n

do k = 1, n

do i = 1, n

c(i,j) = c(i,j) + a(i,k)*b(k,j)

enddo

end subroutine matmul_fast

Interchanging is not always possible because of dependencies, which can lead to different results.