Topic: Co-Array vs. OpenMPI/ MPI vs. CUDA/OpenCL/OpenACC vs. DO CONCURRENT

Dear Fortran user and experts in SF,

I don't have much experience with parallelizing programs on Fortran. I mostly use old and verified source code of fortran 77/90-95, but I have any questions regarding the transfer of legacy code with at the lowest possible cost from old Fortran into modern fortran with using a multicore CPU and GPU.
Which route is recommended to translate the Fortran source code to Modern Fortran:
a.    Can one use Co-Array / OpenMPI / MPI / CUDA / OpenCL / OpenACC / DO CONCURRENT – statement of Fortran 2008  / maybe I missed something …
b.    what libraries are here available and free and which opportunities from p.a are supported in GNU Fortran compiler from SF here?
c.    is there any difference compared to the official description of gnu fortran (https://gcc.gnu.org/wiki/GFortran) and features of using the parallelization capabilities in under the Win10 ?
d.    What will work faster on PC and is it possible to somehow combine these features with each other ?
e.    there are some examples of parallelization capabilities using OpenMP/OpenCL/CoArray/ etc.. in SimplyFortan ?
I would be very grateful if someone will share their experiences on these issues.

Best regards,
Andrey

Re: Co-Array vs. OpenMPI/ MPI vs. CUDA/OpenCL/OpenACC vs. DO CONCURRENT

Andrey,

In my opinion, OpenMP tends to be the easiest way to add parallel operations to existing Fortran code.  However, it still isn't as simple as throwing in some OpenMP directives around loops.  There is still a significant amount of work that is necessary to identify exactly what can be parallelized.  Basically, you need to look for repetitive operations that don't necessarily depend on each other.  For example, if we were considering an explicit time-stepping fluid mechanics simulation, you could probably parallelize much of the update on each time step because the new state depends only on the state in the previous time step (hence the term "explicit").  Even in that simple case, you would need to make sure you're not updating the state arrays "in place" because other threads would also be looking at the previous time step's state.  So something like this in pseudo-code:

for cell i in all cells do

   new_state of cell i = update function ( old state of all cells )

end do

could easily be parallelized if and only if:

  • The new state of cell i is stored in a different array than the old state of cell i

  • The update function is pure in that it does not have any side effects

Wrapping that loop in some OpenMP directives would make for simple parallelization on a multicore CPU.

In contrast, doing the same task on a simulation where implicit time stepping is used (the new state depends on both the old state and the new state) would be substantially harder to parallelize.  While absolutely possible, the parallelization step would most likely be at the matrix math level rather than simply wrapping an explicit loop. 

If you want to use MPI, your code would probably need a substantial rewrite to actually make MPI calls.  We do not currently ship an MPI implementation with Simply Fortran for Windows, though we do offer a compatible version of Microsoft MPI on the SF Package Manager site.  You'll need to do some reading on how MPI works and how to effectively use it in Fortran; it isn't a simple plug-and-play operation.

Coarrays are another option, but you'll need to redesign your code to use them.  Coarrays are declared differently than regular Fortran arrays, and, similar to OpenMP, some thought needs to be given to how to effectively use them.  For example, in our previous example, one might be tempted when using an implicit time stepping scheme to just use a coarray since that scheme allows access to other images' current states for performing the update to the new time step.  However, there is non-trivial overhead in coarray data transfer between images because the images are not running in the same memory space.  They can't just "access" another image's memory to retrieve a value; the data has to be transferred.  We do provide coarray support on Windows 10 using a native library, but there is a penalty to using it due to data transfer.  It probably requires more research and understanding to use effectively than OpenMP, especially when trying to adapt legacy code to a coarray paradigm.

Simply Fortran on Windows does differ from the GNU Fortran description and default distributions in regards to how OpenMP and coarrays work.  Our compiler ships with different OpenMP and coarray support libraries than other GNU Fortran compilers on Windows.  Our OpenMP library is "native" in that it does not require a Pthreads shim; it directly uses Windows threading.  Our Coarray library is also completely native, and it does not use an MPI library like most other coarray libraries.  It can and will use as many images as requested or default to the number of system cores when executing.

We only ship a trivial OpenMP example with the base Simply Fortran installation.  You can probably look for some OpenMP and coarray examples online; they should "just work" with Simply Fortran.  I'll try to hunt some down for you, though, and post a follow-up.

Jeff Armstrong
Approximatrix, LLC

Re: Co-Array vs. OpenMPI/ MPI vs. CUDA/OpenCL/OpenACC vs. DO CONCURRENT

Dear Jeff,

thank you for the detailed description and explanation.
I will try to test the technologies, thet offered SF by using the simple examples.
First I will be use the information from the book 'Arjen Markus-Modern Fortran in Practice' and the another resource for CoArray and OpenMPI.

Best regards,
Andrey