Topic: Code does weird stuff depending on compilier optimization level used.

I am using version 2.21 build 1837.

I am testing a different version of my simulation. Note that the original simulation has been running stably for over a year. The new version is a reduced complexity of the original code somewhat and at the same time, reduced its memory footprint without really changing what it does. Note too that the older code uses the compiler options of "Extreme" optimization and has the "Aggressive Loop Optimization" box checked at compile time, so things are fully optimized. Note too that none of the testing below uses any debug level in the compile step.

I noted shortly after testing the new version that at some point deep into the simulation, that things start to get weird and by that I mean I get nonphysical results. I have over the last several weeks tried to figure out if I have some mistake in the code that causes this, but found nothing obvious; stared at the code until I was cross-eyed. When all else failed me, I wondered if the compiler options may have something to do with it. So I ran the code with no optimizations and it runs perfectly, with no anomalous behavior.

Then I decided to see which optimization level would cause it to happen. After days of testing, the anomalous behavior happens when the "Extreme" option is used. 

Let me get more specific about what this anomalous behavior is. My code tracks positions and velocities of objects. After running for a significant time, the first object in my array of objects has its velocity in the Y-direction go in one direction and increase over time, accelerating the object away from where it should have been. Only the first object get this behavior, every other object behaves as the simulation physics dictates. Again, this does not happen in my older code.

As I have said before, I am not the world's best coder but I would like to understand what is happening here. Are there rookie mistakes in the code that can cause this during extreme optimization; what should I avoid?

Thanks in advance.
Rod

Re: Code does weird stuff depending on compilier optimization level used.

Rod,

The "Extreme" optimization setting basically passes a "-O3" flag to the underlying compiler, which enables a significant number of optimizations.  These optimizations, however, are not limited to using certain CPU features.  They instruct the compiler to take certain liberties about how to construct machine code based on your Fortran source code.  For example, according to our manual, the "-O3" optimization will also enable inline functions and some loop vectorization optimizations that are above and beyond what the "Common" (or "-O2") optimization level provides.  There is a pretty good chance that one of these optimizations is not considered "safe" (or there's just a subtle bug in the compiler), and the optimization is leading to some sort of malformed compiled code.  I can't say for sure. 

I don't think you're doing anything wrong, though.  If you've written Fortran code that is correct, optimization level shouldn't lead to it not working properly.  There's a small chance that a memory-related issue (like an array bounds overrun) is only manifested when the optimization level is that high, but I highly doubt that is the case.

I would suggest leaving the optimization level as "Common" and perhaps trying the "Aggressive Loop Optimizations."  The performance increase gained by using "Extreme" probably wouldn't have been more than a few percent.

Jeff Armstrong
Approximatrix, LLC

Re: Code does weird stuff depending on compilier optimization level used.

Thanks for the information on the optimization. After posting this question, I continued thinking about what was happening and thought that perhaps the code was somehow ignoring when I zeroed out array elements (although I am not sure why this wouldn't be observable right away as the code ran but only shows up after hours of running). For example, I have things that look like DVY(I)=0.0d0 before entering the calculation loops. I wondered if the compiler would see this and decide, hey I don't need to do this!

I made a global change in the code where I set zero=0.0d0 and made it common to where ever I reinitialize a  real*8 variable to zero. I recompiled with the "Extreme" option and ran another test and compared it to the previous test. This new code did not have the behavior seen before the changes I made.

I am now testing the same new code using the "Extreme" with "Aggressive Loop Optimizations" to see what will happen. Used together, these offer significant decrease in execution times; not sure but about 10% is a guess, which would make a big difference when running for months and years (I have one running for a year and a half-yes I am nuts).

I'll update this post when the test shows its behavior.

Rod

Re: Code does weird stuff depending on compilier optimization level used.

Rod,

Interesting results!  The compiler may have "optimized out" setting the variables to zero incorrectly.  I'll have to do some digging to see if its a known bug.  I agree, though, that a 10% reduction in execution time is significant.  Let us know if you find out more!

Jeff Armstrong
Approximatrix, LLC

5 (edited by JohnWasilewski 2015-03-30 18:18:03)

Re: Code does weird stuff depending on compilier optimization level used.

Of,

I have one running for a year and a half-yes I am nuts

..please may I ask, was this by any chance the answer?


                                42

---
To members on the other side of the pond:
Is this is as embedded in your mythology as it is here in Blighty?

J.

Re: Code does weird stuff depending on compilier optimization level used.

Rod, don't forget also that using optimization often changes the order of some floating-point operations, which in turn leads to different numerical results.

If your algorithm is numerically stable, your round-off errors should be small (though it is often difficult to be sure that a complex algorithm is numerically stable ...) and your results not sensitive to the optimization level. On the contrary, round-off errors may be important and your results could be very sensitive to the optimization level.

Jeff, I don't agree your following assertion : << There is a pretty good chance that one of these optimizations is not considered "safe" (or there's just a subtle bug in the compiler), and the optimization is leading to some sort of malformed compiled code.  I can't say for sure. >> Ok, a compiler bug is possible (I found many bugs in Gfortran the last 10 years), but statistically, the occurence of a floating-point instability is much more important, IMHO.

Regards,
Édouard

Re: Code does weird stuff depending on compilier optimization level used.

Rod, I hope you have test your code in a debug mode, using the '-fbounds-check' option to detect if some array indices are out-of-range. On the contrary, writing at a random location in you memory often lead to strange behaviors.
Additionnally, it is a good practise, before using optimization, to check your code using valgrind. valgrind is a well-known tool available under unix (and linux, too !) which can detect many obscur coding faults which are usually very difficult to detect. valgrind also detects unitialized variables and memory fault.

Re: Code does weird stuff depending on compilier optimization level used.

Édouard,

Good point, and I don't think I used the proper wording in my original reply.  By not being "safe," I was thinking about "fast-math" optimizations (which actually aren't enabled with -O3, but anyway...), which reorder floating point operations and start violating some IEEE math standards, which, if your code is approaching numerical instability, would lead to what Rod was seeing.  I used to see similar behavior quite a bit in a previous job when attempting to calculate optimal control algorithms relying on the discrete algebraic Ricatti equation, which tended to easily approach numerical instability.  Disabling certain optimizations would keep the algorithm a bit more robust.  We're talking about the same thing, I think.

Based on Rod's findings so far, I'm guessing the compiler might be optimizing out his array initialization for some reason.  That behavior would be a bug.

And I agree with John, Rod will probably find the answer to be 42 once his program finally terminates in a few more years.

Jeff Armstrong
Approximatrix, LLC

Re: Code does weird stuff depending on compilier optimization level used.

So much discussion here, responding in order:

John,

42 if that's the answer, I should have asked you first and I wouldn't have wasted so much time, money and effort. Actually, I was thinking your comment referred to my being "nuts" comment and that the problem was rooted there. LOL.

Edouard,

If optimization changes the order of FP operations, that would be extremely bad for my code. The code is built with my special madness but I deem there is some method to the madness and I put stuff in where it is suppose to go and I really don't want the compiler to decide that it knows my intent better than I.

Thanks for the hint about the compiler option. I didn't know about that one. I did recompile with that option and saw no output that would suggest a problem. Do I need to enable some debug output level to see any of these issues or will they just appear in the build status tab if found? Array boundaries was the first thing I started looking at when this problem appeared but I could find no obvious issues. This version of the code is very simple in terms of arrays, nothing too exotic.

The latest test that I mention in my last post is about 12 hours into running and still no issues but I am not quite to the point were I saw it happen before the code changes. I am going to let this one run for another 12 hours or so before even thinking about claiming "victory".

Thanks all,
Rod

Re: Code does weird stuff depending on compilier optimization level used.

I just wanted to update this thread with what I found during my testing.

The fully optimized code that I changed using a variable for zero ran for about 24 hours and I did not see the anomalous behavior that I saw before I changed the code. I tested further with that same new code with no optimization and ran it for about 37 hours and compared output along the away to the fully optimized code. There were significant differences in the data as time progressed though, indicating that something was different in the execution of the two compiled versions. Actually, I had save data from three runs, using exactly the same starting data with three different compile variants and all three show differences after running for the same period of time.

This disturbs me because I was sure I have tested this behavior on my other code and didn't see any real differences after modest run times. I may have to look at that code and retest.

Rod

Re: Code does weird stuff depending on compilier optimization level used.

Rod,

What do you mean by significant differences ?

How many significant digits are lost ? If few digits are lost (e.g. 1 to 5), this is the normal behavior when changing the FP operations' order. However, if many digits are lost (10 to 15, in double precision), this can be the consequence of an algorithm FP instability, and you should investigate your code (and perhaps re-write some parts).

Ideally, you should use another compiler (for example, INTEL ifort) to check the behavior. As the two compilers (GNU-GCC and INTEL ifort) are IEEE-754 standard compliant, an important difference between the results of the two compilers may reveal a bug in one of them; on the contrary, if the differences between the two compilers are very small, I think that the only possibility is that your code embeds an unstable algorithm.

Re: Code does weird stuff depending on compilier optimization level used.

Edouard,

My simulation is by its own nature is a chaotic system. I am not even close to being conversant in this topic but you can correct me if I am wrong in following understanding. If I start with the same initial conditions, any round-off FP errors as the code executes will be identical for the code as it executes. In other words, if the code is doing the exact same thing every time, the FP errors will be reproducible each time the code runs with the same initial conditions. Thus in my mind, if the compiler makes no changes in the execution algorithms then the output should be the same, chaotic or not. As such I would expect (naively of course) that any optimizations of the compiler would not change the overall execution of the code, just make it faster or more efficient. As for the stability of my code, I have tested (not recently with the GNU FORTRAN but the algorithm hasn't changed) the algorithm in controlled but stressful circumstances and have shown that the code will reproduce known physically correct behaviors for billions of iterations. I may have to revisit those tests too...

As to your question about the "significant differences" I see, there is no quantifiable measure that I have. I can monitor the behavior of objects over time and I see that distances, velocities and other measurables are obviously different between the various compiled versions of the code.

I do not have access to Intel's FORTRAN; the reason I am using SF is it is an affordable option for my hobby. I will continue to test my code for anomalous behaviors but at least that original weird behavior seems to be gone.

Thanks for your help.
Rod

Re: Code does weird stuff depending on compilier optimization level used.

Rod,

It is well known that a chaotic system is by itself very sensitive to initial conditions, and therefore also very sensitive to changes in FP execution order. My only advice is then: use only first-level optimization, not aggressive optimization...

I never worked with chaotic systems but I guess that finding numerical values for the primary variables is not relevant. Instead, there should be some invariants, or some properties revealed only in the phase space, isn't it? (I have in mind the Lorenz attractor for which it seems that x(t), y(t), z(t) are completely random, but the solution drawn in the phase space (x,y,z) shows a beautiful figure consisting of two different orbital planes... )