Andrey,
In my opinion, OpenMP tends to be the easiest way to add parallel operations to existing Fortran code. However, it still isn't as simple as throwing in some OpenMP directives around loops. There is still a significant amount of work that is necessary to identify exactly what can be parallelized. Basically, you need to look for repetitive operations that don't necessarily depend on each other. For example, if we were considering an explicit time-stepping fluid mechanics simulation, you could probably parallelize much of the update on each time step because the new state depends only on the state in the previous time step (hence the term "explicit"). Even in that simple case, you would need to make sure you're not updating the state arrays "in place" because other threads would also be looking at the previous time step's state. So something like this in pseudo-code:
for cell i in all cells do
new_state of cell i = update function ( old state of all cells )
end do
could easily be parallelized if and only if:
Wrapping that loop in some OpenMP directives would make for simple parallelization on a multicore CPU.
In contrast, doing the same task on a simulation where implicit time stepping is used (the new state depends on both the old state and the new state) would be substantially harder to parallelize. While absolutely possible, the parallelization step would most likely be at the matrix math level rather than simply wrapping an explicit loop.
If you want to use MPI, your code would probably need a substantial rewrite to actually make MPI calls. We do not currently ship an MPI implementation with Simply Fortran for Windows, though we do offer a compatible version of Microsoft MPI on the SF Package Manager site. You'll need to do some reading on how MPI works and how to effectively use it in Fortran; it isn't a simple plug-and-play operation.
Coarrays are another option, but you'll need to redesign your code to use them. Coarrays are declared differently than regular Fortran arrays, and, similar to OpenMP, some thought needs to be given to how to effectively use them. For example, in our previous example, one might be tempted when using an implicit time stepping scheme to just use a coarray since that scheme allows access to other images' current states for performing the update to the new time step. However, there is non-trivial overhead in coarray data transfer between images because the images are not running in the same memory space. They can't just "access" another image's memory to retrieve a value; the data has to be transferred. We do provide coarray support on Windows 10 using a native library, but there is a penalty to using it due to data transfer. It probably requires more research and understanding to use effectively than OpenMP, especially when trying to adapt legacy code to a coarray paradigm.
Simply Fortran on Windows does differ from the GNU Fortran description and default distributions in regards to how OpenMP and coarrays work. Our compiler ships with different OpenMP and coarray support libraries than other GNU Fortran compilers on Windows. Our OpenMP library is "native" in that it does not require a Pthreads shim; it directly uses Windows threading. Our Coarray library is also completely native, and it does not use an MPI library like most other coarray libraries. It can and will use as many images as requested or default to the number of system cores when executing.
We only ship a trivial OpenMP example with the base Simply Fortran installation. You can probably look for some OpenMP and coarray examples online; they should "just work" with Simply Fortran. I'll try to hunt some down for you, though, and post a follow-up.
Jeff Armstrong
Approximatrix, LLC