Topic: Call to system function

General question regarding the use of the system call as per the examples below:
            iresult=system(trim(ShellCommand))
            call system(trim(ShellCommand),iresult)

Is there any preference on whether one uses the first or second form of the system call? I am asking because I just discovered the second form after reading the GNU documentation on this feature and learning that it can be a function or a subroutine call. I was using the first form extensively and kept running into memory segmentation faults and could not understand why (see previous forum threads by me). These memory faults would crop up occasionally, in different run instances and different machine; seemed to be completely random and rare.

After finding out about the second form above, I switched the code over to the call system form. The instance that was causing problems now doesn't exhibit the memory fault but that really means nothing as it is a new instance and problem with differing memory usage. I am just wondering if there is something know to be different in those two system calls methods?

FYI, I am still using SF 2.41, build 2559 and GNU 7.2.0 FORTRAN.
Thanks,
Rod

Re: Call to system function

Rod,

Either should be fine, and I'm not sure why there is a crash on the first.  I would suggest, however, switching to EXECUTE_COMMAND_LINE, which has a form similar to the subroutine call you've shown. This call is part of the Fortran standard and would be portable across compilers.

Jeff Armstrong
Approximatrix, LLC

Re: Call to system function

Thanks Jeff.

Will EXECUTE_COMMAND_LINE work with a FORTRAN90 compilation? I see this system call is a FORTRAN 2008 feature. I guess I can give it a try.

Thanks again.

Rod

Re: Call to system function

Rod,

You're already using a "GNU Extension," so the EXECUTE_COMMAND_LINE call would be more reliable if you ever want to try another compiler.  GNU Fortran will allow Fortran 2008 intrinsic procedures in basically any Fortran source code unless specifically instructed to conform to a different standard.

Jeff Armstrong
Approximatrix, LLC

5 (edited by grogley 2020-02-27 14:08:29)

Re: Call to system function

I tried the EXECUTE_COMMAND_LINE call in my code and it crashes too with the same memory segmentation fault. This problem has been going on for years now and I find no way to figure it out. Clearly as I have indicated before, it does not occur if I compile without the OpenMP extensions. However, this makes no sense to me since all the code is running single threaded when the system calls happen.

This problem also seems to only happen with certain parameter conditions. The simulation I am running when it crashes has 5001 particles but if I run it with fewer or more, it may not crash. I am just not getting what is happening.

Since we have rehashed this before, I don't expect to get any answers. Maybe an upgrade will help. Sigh.

Rod

PS. Just to follow up, I run the same simulation that crashes on one machine and run it on my development system but this time with even more frequent calls to the subroutine that has the EXECUTE_COMMAND_LINE calls and it does not crash.

Re: Call to system function

Rod,

I was unaware that OpenMP was being used. The crash is almost certainly caused by an insufficient stack.  Try adding the following flag to the Linker box in Compiler Flags under Project Options:

-Wl,--stack,256000000

The above will cause the stack to be 256MB for each thread (so be aware of memory issues).  OpenMP changes how some allocation occurs, and stack space can be exhausted quickly.  Let me know if the above works.

Jeff Armstrong
Approximatrix, LLC

Re: Call to system function

Jeff,

I already have a stack allocation as per my option:

Wl,--stack,150000000; not as much as your suggestion but seems to be more than adequate. For example, the offending task that would crash with only a total memory footprint of 5.1 MBytes as per task manager. I routinely run this with 10 times the number of particles without incident. I have one running with 50,000 particles and it only uses 14 Mbytes, so I don't think this is a stack problem (note I changed my code to allocate the arrays sizes at run time to be the exact size needed). As I said before, you and I have had several exchanges about this in the past and this issue still comes back to bite me. Ugh.

That all said, I downloaded the SF load for R3 in trial mode, compiled my code and exchanged the code on the machine that was crashing. The new code breezed right through the crash point and is still running. I am causously optimistic that this will help to rid me of this scourge. I will run that code for a few days to be sure and if all is well, I have convinced my CFO (read the wife) that I need to buy a new license for the new version.

FYI, here is my R 2.41. Makefile if you are interested:

#
# Automagically generated by Approximatrix Simply Fortran 2.41
#
FC="C:\Program Files (x86)\Simply Fortran 2\mingw-w64\bin\gfortran.exe"
CC="C:\Program Files (x86)\Simply Fortran 2\mingw-w64\bin\gcc.exe"
AR="C:\Program Files (x86)\Simply Fortran 2\mingw-w64\bin\ar.exe"
WRC="C:\Program Files (x86)\Simply Fortran 2\mingw-w64\bin\windres.exe"
RM=rm -f


OPTFLAGS= -O3 -fgraphite-identity -floop-interchange -floop-strip-mine -floop-block -floop-parallelize-all -mtune=native

SPECIALFLAGS=$(IDIR)

RCFLAGS=-O coff

PRJ_FFLAGS= -fopenmp

PRJ_CFLAGS=

PRJ_LFLAGS=-Wl,--stack,150000000 -lgomp

FFLAGS=$(SPECIALFLAGS) $(OPTFLAGS) $(PRJ_FFLAGS) -Jmodules

CFLAGS=$(SPECIALFLAGS) $(OPTFLAGS) $(PRJ_CFLAGS)

"build\riod.o": ".\riod.f90"
    @echo Compiling .\riod.f90
    @$(FC) -c -o "build\riod.o" $(FFLAGS) ".\riod.f90"

clean: .SYMBOLIC
    @echo Deleting build\riod.o and related files
    @$(RM) "build\riod.o"
    @echo Deleting default icon resource
    @$(RM) "build\sf_default_resource.res"
    @echo Deleting riod.exe
    @$(RM) "riod.exe"

"riod.exe":  "build\riod.o" "build\Riod-MP-F90.prj.target"
    @echo Generating riod.exe
    @$(FC) -o "riod.exe" -static -fopenmp "build\riod.o" $(LDIR) $(PRJ_LFLAGS)

all: "riod.exe" .SYMBOLIC

Re: Call to system function

Rod,

There is the possibility that the OpenMP library we had been providing was not properly sizing the stack on subsequent threads, making the linker flag useless with OpenMP.  I seem to remember having to fix that issue at some point, but I may be mistaken.  Let me know if it works with the latest version.

Jeff Armstrong
Approximatrix, LLC

Re: Call to system function

Hi Jeff,

I have done a fair amount of testing with the new version. I have not seen the specific problem reported earlier but it is too early to really tell. There is still another issue that has me perplexed that is probably related but I am not sure.

Specifically, I run my program in a command prompt that I start with batch program which opens the CMD.EXE window with the Start command. Here is in essence the command I start with:

start "N0x00ff_Program" /low  /affinity 0x00ff program.exe

However, sometimes, obviously not always, this start method will close out immediately after starting. I notice most on my Windows 10 machine. If I open a static CMD prompt with the same affinity restrictions, the program will run just fine. I am not sure what could be causing this.

In any event, I will be upgrading my license to SF R3 shortly. Thanks for your support.

Rod

Re: Call to system function

Jeff,

I just was checking back here when I noticed that I never really followed up on this thread. I have been using SF R3 since then (4 months). I still have the crash issue but it is different. For simulations running on Windows 7 pro machines, the program is very stable, no crashes. For Window 10 pro machines, I get random crashes for most simulation runs.

As I write this, I have three Win7Pro machines running seven simulation instances. They are completely stable and will run happily for weeks/months untouched.

I have two Win10Pro machines running three simulation instances. Two of those three instances will crash randomly after half an hour or several hours. Understand that a crash in this case is that the simulation instance just stops, the CMD prompt that it runs in halts and the CPU using those threads goes idle. I have a work around for this behavior by just stopping the program every hour and restarting it but this is a real nuisance.

I really don't expect any help since we have gone over this many times; I just wanted to follow with the latest.

Thanks,

Rod

Re: Call to system function

Rod,

I'm concerned the error is in our OpenMP implementation, which has significant customizations.  Have you tried running a simulation with OpenMP disabled with Simply Fortran 3 on Windows 10?

Also, how many processors do the Windows 10 machines report?  I'm wondering if perhaps they have significantly more threads running in your simulation than the Windows 7 machines.

EDIT: Also, Simply Fortran 3.13 was recently released with an updated compiler.  You might try installing that as well.  I'm not hopeful that it will solve the crashing problem, though.

Jeff Armstrong
Approximatrix, LLC

12 (edited by grogley 2020-07-03 13:27:54)

Re: Call to system function

Jeff,

For short tests (a few hours) with no OpenMP, the programs seems to run without issue. I have not tested this recently on R3 though. However, what runs for a few hours single threaded, is only a few minutes with multi-threaded code. The multi-threaded code may run for days (or just a few minutes) without halting. Since it is so random it is hard to know.

As far as processors on Windows 10, one machine has 4 threads available and the other gas 16. For the 4 threaded system, the simulation uses all of them. For the 16 threaded system, I have two simulations running with 8 threads each. I use the affinity setting with the start command to segregate the threads between simulation runs. For example in a batch file I use the following to start the simulation on the 16 threaded system:

start  /low  /affinity 0xff00 riod.exe

This command will only allow the program to use the last 8 system threads.

I will try test the simulation on the 4 thread with no OpenMP for a while and see how that works.
EDIT: I forgot that the current simulation of my 4 thread system is not crashing, the previous one did it, ugh! I will try to create a new sim that will crash, then test without OpenMP. This could take a while...

Rod

Re: Call to system function

Jeff,

An update on what I have done in the last couple days. After a trying to test using running sims, I decided to make a new sim and test the code. The new sim did fail while using OpenMP. Then I ran that sim with OpenMP off. It ran for over 12 hours with failing, so I halted it. I did try running it again using the OpenMP extentions but after more that four hours would not crash. Frustrated, I stopped testing with this new sim.

I downloaded the SF 3.13 release and installed it.

I rethought using my one current sim that does fail and decided to update the code a bit since this is code that dates back to the beginning of March. I updated the code with some changes so that the old sim will run without issues and then recompiled with the new SF release. This older sim has now run for nearly 16 hours without halting using OpenMP extensions. This is not completely unexpected though as this sim was a long running sim that I was restarting hourly and it would only show remnant halted windows maybe once or twice a week; implying that it would halt within the hourly runtime window rarely.

The newly created code is encouraging but the code is "different" and I have seen simple changes to the code make one sim versus another stable or not. In addition, the changes I made have nothing to do with the code in the multi-threaded part of the code.

I will update later after more run time.

Rod

PS. As soon as I push submit on this post, the test sim will crash! LOL!.

Re: Call to system function

Rod,

How did the simulation work on 3.13?  I'll start poking around in our OpenMP library a bit more to see if I can identify any glaring problems, but I think I'll actually have to find a long-running, non-trivial code to hunt down and find the intermittent bug.  There must be a place in our library that's violating memory access if you're only seeing crashes with OpenMP, but everything seems fine without it.

Jeff Armstrong
Approximatrix, LLC

15 (edited by grogley 2020-07-06 20:34:39)

Re: Call to system function

Jeff,

The old sim that would regularly crash has now run for over two days with new the SF 3.13 recompile. I am still not ready to claim victory but it is looking better. Only running many new instances of sims will give better confidence that the issue is not showing up. I have had sims have an early crash and then never do it again. I have had sims that will crash every several hours and ones that never will halt. That is why this is so frustrating. 

That said, in my tiny little brain, I have my doubts that this is your OpenMP issue; I am truly a horrid programmer.

Rod

Re: Call to system function

Jeff,

Just a last update on this for now. The program that I was testing went for over 6 days without halting. I had to shut it down yesterday afternoon as we had power outages. So I am pretty sure that sim instance is "cured". I will not know if this issue can be put to bed going forward without many more sim instances to try against and since this is always been a random issue, it will take months of more instance before I will know more.

In the last few days, I have started three new sims (with the new SF 3.13 compiles), two on Win7 platforms and one on Win10 and no bad behaviors yet.

Thanks again for all your support.
Rod