1 (edited by grogley 2020-10-20 13:03:40)

Topic: OpenMP and AMD Ryzen CPU

I am trying to evaluate if the current AMD Ryzen offerings (Ryzen 9 or Threadripper) have limitations on their memory/NUMA architecture (not sure what the terms for this are) which will cause performance degradation when extending the number of OpenMP threads out to the maximum. So what do I mean by this?

For example, my current workstation CPU is an Intel I9 9900K processor, with 8 core and 16 threads. When I run my simulation with all 16 threads the performance is worse than if I use 8 threads. This suggests to me that there are memory access issues across the CPU chip that slow things down but not if I use only half the threads. The same happens for a Xeon E52680 chip that has 10 cores and 20 threads. I have to segregate the CPU by setting the affinity for the simulation to use half the available number of threads.

So does anyone here have experience with the AMD Ryzen chips? Specifically, I am interested in the Ryzen 9 3900X chip using OpenMP and threading limitations but discussion of the other AMD chips might help.

I am leery of AMD because I have been burned by them in the past where their 8 core CPU (I think the FX-8350) could not use OpenMP because of memory and core pipelines or something like that (LOL!).

Thanks in advance,
Rod

Re: OpenMP and AMD Ryzen CPU

Rod,

While I don't currently have a Ryzen system here, I only hear good things about them.  The FX series from AMD used a much older architecture that I can imagine had plenty of bottlenecks and drawbacks.  However, Ryzen and Threadripper CPUs seem to be the darlings of the CPU world right now.  I have seen that the Intel offerings are basically keeping pace with these AMD chips, though, but the AMD equivalents are substantially cheaper.

The Intel "8 core and 16 threads" stuff can be misleading.  A lot of the multiple thread operations are only useful when your code is executing instructions where a portion of that core is busy for multiple clock cycles but another thread has some work that can be done in the meantime.  With a lot of the OpenMP stuff where every thread is running very similar code, you might lose some of the benefits of the "16 threads" claim.  That behavior won't necessarily change with a Ryzen either. 

Someone must have published some OpenMP benchmarks for the current batches of CPUs.  I would try seeking them out online.  They'd be drastically more useful than some of the more general benchmarks.

Jeff Armstrong
Approximatrix, LLC

Re: OpenMP and AMD Ryzen CPU

Thanks Jeff,

Yes, I have heard all the hoopla about the new AMD offerings and I lust after the threadripper chips but they are not in my budget. I have looked around the web for information on these chips and am encouraged but most reviews are aimed at gamers which is not necessarily the same performance needs I require. (My son is in the game industry and he tries to explain why games require so much CPU horsepower but I am not sure I buy into that hype.) I look at CPU benchmarks (PassMark) for comparisons purposes but I am not exactly sure their multi-threaded tests are comparable to how I will use the chip. I was hoping someone here in this forum would have objectively (non-fanboy) used these chips and had some data to share.

I am certainly not a expert on CPU architecture, all I know is what I empirically observe on the CPU I have to use. In the example I used above, while using all 16 threads for a single simulation with my i9 chip is not efficient, I can split the CPU into two and run two simultaneous simulations segregating the threads between the two sims using the affinity modes in Windows. That is not a bad compromise. I do this also in two other systems with two physical CPU chips (two Numa nodes) and split sims across affinity and nodes. One system of these systems runs 4 sims simultaneously.

Anyway, I will probably have to find other sources for more information. I am wary of the AMD forums as those tend to be biased.

Again, thanks,

Rod

Re: OpenMP and AMD Ryzen CPU

I thought I would update this thread for those who may be interested.

I built a new system with the AMD Rysen 9 3900x chip. When testing this system, I discovered that the 24 possible threads, my code can only utilize 6 threads efficiently. Any more than 6 threads and the simulation slows down. I suspect that there are 4 pipeline to the memory and thus, memory fetches across those pipelines are inefficient (here speaking without knowing exactly the architecture and appropriate terminology; not an expert on this).

Bottom line, this chip was not what I has hoped for in terms of performance running my simulation; perhaps the threadripper chips are what I really need. I am still able to use this system as I have been running 4 simultaneous simulation on it quite well. Six threads on the 3900x will do 100 iterations in 95 seconds compared to 8 threads on my I9-9900K system will do 100 iterations in 82 seconds, so a little slower on comparative simulations. However, I can to 4 simultaneous simulations on the 3900x and only two on the i9. Shrug...

Rod