Loop Interchange and Subscripts: Matrix Multiply

Loop interchange need unit-stride constructs to be vectorized. Matrix multiplication is commonly written as shown in the following example:

Example: Typical Matrix Multiplication

subroutine matmul_slow(a, b, c)

  integer :: i, j, k

  real :: a(100,100), b(100,100), c(100,100)

  do i = 1, n

    do j = 1, n

      do k = 1, n

        c(i,j) = c(i,j) + a(i,k)*b(k,j);

      end do

    end do

  end do

end subroutine matmul_slow

The use of B(K,J)is not a stride-1 reference and therefore will not normally be vectorizable.

If the loops are interchanged, however, all the references will become stride-1 as shown in the following example.

Example: Matrix Multiplication with Stride-1

subroutine matmul_fast(a, b, c)

  integer :: i, j, k

  real :: a(100,100), b(100,100), c(100,100)

  do j = 1, n

    do k = 1, n

      do i = 1, n

        c(i,j) = c(i,j) + a(i,k)*b(k,j)

      enddo

    enddo

  enddo

end subroutine matmul_fast

Interchanging is not always possible because of dependencies, which can lead to different results.