The auto-parallelization feature of the Intel® compiler automatically translates serial portions of the input program into equivalent multithreaded code. The auto-parallelizer analyzes the dataflow of the loops in the application source code and generates multithreaded code for those loops which can safely and efficiently be executed in parallel.
This behavior enables the potential exploitation of the parallel architecture found in symmetric multiprocessor (SMP) systems.
Automatic parallelization relieves the user from:
Dealing with the details of finding loops that are good worksharing candidates
Performing the dataflow analysis to verify correct parallel execution
Partitioning the data for threaded code generation as is needed in programming with OpenMP* directives.
The parallel run-time support provides the same run-time features as found in OpenMP, such as handling the details of loop iteration modification, thread scheduling, and synchronization.
While OpenMP directives enable serial applications to transform into parallel applications quickly, a programmer must explicitly identify specific portions of the application code that contain parallelism and add the appropriate compiler directives.
Auto-parallelization, which is triggered by the -parallel (Linux*) or /Qparallel (Windows*) option, automatically identifies those loop structures that contain parallelism. During compilation, the compiler automatically attempts to deconstruct the code sequences into separate threads for parallel processing. No other effort by the programmer is needed.
Intel® Itanium®-based systems: Specifying these options implies -opt-mem-bandwith1 (Linux) or /Qopt-mem-bandwidth1 (Windows).
Serial code can be divided so that the code can execute concurrently on multiple threads. For example, consider the following serial code example.
Example 1: Original Serial Code |
---|
subroutine ser(a, b, c) integer, dimension(100) :: a, b, c do i=1,100 a(i) = a(i) + b(i) * c(i) enddo end subroutine ser |
The following example illustrates one method showing how the loop iteration space, shown in the previous example, might be divided to execute on two threads.
Example 2: Transformed Parallel Code |
---|
subroutine par(a, b, c) integer, dimension(100) :: a, b, c ! Thread 1 do i=1,50 a(i) = a(i) + b(i) * c(i) enddo ! Thread 2 do i=51,100 a(i) = a(i) + b(i) * c(i) enddo end subroutine par |