Topic: coarrays
Will be coarrays for multiple images available?
Petr
For discussions of all Approximatrix products and related software
You are not logged in. Please login or register.
Approximatrix Forums → User Support → coarrays
Will be coarrays for multiple images available?
Petr
Our compiler relies on OpenCoarrays, which does not support multiple images on Windows using Microsoft's MPI implementation. We are working on a possible solution internally, but we do not have any timeline available as to when it might be ready.
I really enjoy a new possibility to use coarrays. Thanks much. Do you plan extend this feature for real distributed computing?
pm1a
Our Coarray library is a single-system implementation. It relies heavily on Windows Event and Mutex objects. To my knowledge, these can't scale across multiple systems. For scaling across multiple systems, one might be better off using OpenCoarrays with our product on macOS or Linux. However, with 16-core AMD chips that are available for consumer desktops at this point, our implementation should perform adequately.
I just installed version 3.7 build 3131 and get linker error message:
==============================================================================
Generating Makefile... Okay
==============================================================================
Compiling ..\spec\pmatools.f90
Generating vyvoj.exe
* Complete *
==============================================================================
Generating Makefile... Okay
==============================================================================
Generating target.exe
c:/program files (x86)/simply fortran 3/mingw-w64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib\libwca.a(wca_images.o):wca_images.c:(.text+0x61): undefined reference to `wca_get_STAT_STOPPED_IMAGE'
c:/program files (x86)/simply fortran 3/mingw-w64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib\libwca.a(wca_images.o):wca_images.c:(.text+0x93): undefined reference to `wca_get_STAT_FAILED_IMAGE'
c:/program files (x86)/simply fortran 3/mingw-w64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib\libwca.a(wca_images.o):wca_images.c:(.text+0x9a): undefined reference to `wca_get_STAT_OK'
c:/program files (x86)/simply fortran 3/mingw-w64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib\libwca.a(wca_images.o):wca_images.c:(.text+0x1b1): undefined reference to `wca_get_STAT_FAILED_IMAGE'
c:/program files (x86)/simply fortran 3/mingw-w64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib\libwca.a(wca_images.o):wca_images.c:(.text+0x813): undefined reference to `wca_get_STAT_FAILED_IMAGE'
c:/program files (x86)/simply fortran 3/mingw-w64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib\libwca.a(wca_images.o):wca_images.c:(.text+0x853): undefined reference to `wca_get_STAT_STOPPED_IMAGE'
c:/program files (x86)/simply fortran 3/mingw-w64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib\libwca.a(wca_images.o):wca_images.c:(.text+0x7aa): undefined reference to `wca_get_STAT_FAILED_IMAGE'
collect2.exe: error: ld returned 1 exit status
Error: Last command making (target.exe) returned a bad status
Error: Make execution terminated
* Failed *
Could you suggest some solution?
Than you Petr
Petr,
Looks like a new file in the Windows Coarray Library was not compiled properly. I should have a fix for this today.
EDIT: The actual problem is caused by a Fortran file in the library. Because Fortran is case-insensitive, the compiler defines the functions in the library as wca_get_stat_failed_image. The C routines, however, are not case-insensitive, and are looking for _wca_get_STAT_FAILED_IMAGE. It's an odd, little problem, and I'm not sure how it snuck through.
Petr,
Build 3133 should fix this issue. Let me know if you encounter any other problems.
Code compiles, but change behavior.
Petr,
Are you calling num_images by any chance? Could you include that particular line? I'm seeing some possible problems when asking for the number of failed images.
Yes I do. Source code follows
sorry, I'm not able send you a source code. I'm getting some strange error message
Warning! The following errors must be corrected before your message can be posted:
[i ]d within itself, this is not allowed
module typy
integer, parameter, public :: ikind = selected_int_kind(16)
integer, parameter, public :: rkind = selected_real_kind(25,99)
end module typy
module pocty
use typy
implicit none
!> typ pro stabilizovane scitani
type, public :: sumator
integer(kind=ikind), private :: depth = 0
real(kind=rkind), dimension(:), allocatable, private :: s
contains
procedure clear
procedure add
procedure sum
procedure setdepth
end type sumator
public :: work
public :: f
public :: integral
contains
subroutine clear(s)
implicit none
class(sumator), intent(in out) :: s
integer(kind=ikind) :: i
do i=1, s%depth
s%s(i) = 0
end do
end subroutine clear
subroutine setdepth(s,d)
implicit none
class(sumator), intent(in out) :: s
integer(kind=ikind), intent(in) :: d
integer(kind=ikind) :: i
!print *,"setdepth1"
if (allocated(s%s)) deallocate(s%s)
!print *,"setdepth2"
s%depth = d
allocate(s%s(1:d))
!print *,"setdepth3"
do i=1, s%depth
!print *,i
s%s(i) = 0
!print *," po",i
end do
!print *,"setdepth4"
end subroutine setdepth
function sum(s) result(y)
implicit none
class(sumator), intent(in) :: s
real(kind=rkind) :: y
integer(kind=ikind) :: i
y = 0
do i = s%depth, 1, -1
y = y + s%s(i)
end do
end function sum
subroutine add(s,v)
implicit none
class(sumator), intent(in out) :: s
real(kind=rkind), intent(in) :: v
integer(kind=ikind) :: i
s%s(s%depth) = s%s(s%depth) + v
do i = s%depth, 2, -1
if ( s%s(i) > s%s(i-1) ) then
s%s(i-1) = s%s(i-1) + s%s(i)
s%s(i) = 0
else
exit
end if
end do
end subroutine add
subroutine work(meze,res)
use typy
implicit none
real(kind=rkind), dimension(2), intent(in) :: meze
real(kind=rkind), intent(out) :: res
!print *, "work zacina", meze
res = integral(f,meze(1), meze(2), 1.0e-30_rkind)
!print *, "work konci"
end subroutine work
!> Vypocet pomoci Rombergovy metody
recursive function integral(f,a,b,tol) result(y)
use typy
implicit none
interface
function f(x) result(y)
use typy
implicit none
real(kind=rkind), intent(in) :: x
real(kind=rkind) :: y
end function f
end interface
integer, parameter :: rsize = 20
real(kind=rkind), intent(in) :: a,b,tol
real(kind=rkind) :: y, y1, x, wrk
real(kind=rkind) :: h
integer(kind=ikind) :: n, i, cnt
type(sumator) :: sy
real(kind=rkind), dimension(1:rsize) :: yy
integer :: size
!print *,"a"
h = b-a
!print *,"b"
y = (f(a)+f(b))/2
!print *,"c" nasledujici dela pro 32b problem
call sy%setdepth(20_ikind)
!print *,"c01"
call sy%clear
call sy%add(y)
!print *,"c1"
n = 1
size = 1
yy(1) = y*h
cnt = 1
!print *,"d"
do
n = 2*n
h = h/2
do i=1,n,2
x = a + i*h
y =f(x)
call sy%add(y)
end do
y = sy%sum() *h
! ted zaradime
wrk = 4
y1 = yy(1)
if (cnt < rsize) cnt = cnt + 1
do i = 1,cnt-1
yy(i) = y
y = yy(i) + (yy(i)-y1)/(wrk - 1)
wrk = 4*wrk
y1 = yy(i+1)
end do
yy(cnt) = y
if ( abs(y-y1) < 1.0e-30 ) exit
end do
!print *,"e"
y = yy(cnt)
end function integral
function f(x) result(y)
use typy
implicit none
real(kind=rkind), intent(in) :: x
real(kind=rkind) :: y
y = 4/(1+x*x)
end function f
end module pocty
program copi
use typy
use pocty
implicit none
integer :: myid
integer :: nimages
integer :: pocet_intervalu
real(kind=rkind), dimension(1:2), codimension[* ] :: lokmeze
real(kind=rkind), codimension[* ] :: lokresults
logical, codimension[* ] :: finished ! vyznamna je jen hodnota v jednicce
real(kind=rkind), dimension(:,:), allocatable :: meze
real(kind=rkind) :: mz, pi
integer(kind=ikind) :: i
integer(kind=ikind) :: done
integer, dimension(1:8) :: t1, t2
real(kind=rkind) :: eltime, wrksec, ll(2),lr
myid = this_image()
nimages = num_images()
finished = .false.
print *,"jsem ", myid, " z ", nimages
sync all
if (myid == 1) then
pi = 0
print *, "zaciname, mam celkem ", nimages, " vlaken"
print *, "Zadej pocet intervalu"
read *, pocet_intervalu
allocate(meze(1:pocet_intervalu,1:2))
! ted je naplnim
meze(1,1) = 0
meze(pocet_intervalu,2) = 1
do i = 1, pocet_intervalu-1
mz = real(i,rkind)/pocet_intervalu
meze(i,2) = mz
meze(i+1,1) = mz
end do
done = 0
finished = .false.
! zacnu merit cas
call date_and_time(values=t1)
print *, t1
end if
do
sync all
! pridelim praci
if (myid == 1) then
! print *, 0
do i = 1, nimages
print *, i
done = done + 1
if (done <= pocet_intervalu) then
lokmeze(1)[i ] = meze(done,1)
lokmeze(2)[i ] = meze(done,2)
! print *, lokmeze(:)[i ], done
else
! print *, done
lokmeze(1)[i ] = 0.0_rkind
lokmeze(2)[i ] = 0.0_rkind
finished = .true.
end if
end do
end if
!print *, "jeste ano", myid
sync all
!print *, lokmeze(1)[myid], lokmeze(2)[myid], lokresults[myid]
! udelam praci
ll(1) = lokmeze(1)[myid]
ll(2) = lokmeze(2)[myid]
!print *,"vlakno ",myid, " zacina"
call work(ll, lr)
!print *,"vlakno ",myid, " konci"
lokresults[myid] = lr
lokmeze(:)[myid] = 0.0_rkind
!print *, lokmeze(:)[myid], myid
! vezmu si vysledky
sync all ! everything should be sybchronized
!print *, myid, finished[1], lokresults[myid]
if ( myid == 1 ) then
do i = 1, nimages
pi = pi + lokresults[i ]
lokresults[i ]=0.0_rkind
end do
end if
! zkontroluji konec
sync all
if ( finished[1] ) exit
end do
sync all
if ( myid == 1 ) then
print *,"Pi=", pi
call date_and_time(values=t2)
print *, t2
eltime = (t2(8)-t1(8))/1000.0_rkind +t2(7)-t1(7) + 60.0_rkind*(t2(6)-t1(6)) + 3600.0_rkind*(t2(5)-t1(5))
eltime = eltime + 86400.0_rkind*(t2(3)-t1(3))
print "(a,f0.7)", "uplynuly cas=", eltime
end if
sync all
end program copi
finally I succeed, interesting is only main program.
It works OK with older version and with OpenCoarrays too.
Petr,
The forum software sees the text [ i ] as markup indicating italics, and it is complaining there is no closing italics tag.
Regardless, does this not work with version 3.7? I'm not sure I quite understand. You can also email the full code to support@approximatrix.com, and I'll be happy to see what's going wrong if the library is not working as expected. There may be a chance that the status messages, recently added, are returning incorrect values to internal calls.
Petr,
The internal image status indicators were experiencing an off-by-one error internally, causing some images to be reported as failed. We've fixed that issue, and we're currently hunting down a sync bug that is still present.
The previous library was working because every thread always reported a status of OK rather than waiting or failed. Trying to improve it apparently broke some things.
I have some other problem with coarrays. The folloving code produce error message, but with OpenCoarrays works OK.
Program:
module typy
integer, parameter :: rkind = selected_real_kind(16,99)
type, public :: LocProblem
end type LocProblem
type, public :: GlobProblem
real(rkind),dimension(:), allocatable :: u
real(rkind) :: h = 0
end type GlobProblem
end module typy
program ddc
use typy
implicit none
type(GlobProblem), codimension[ * ] :: GP
integer :: my_id
integer :: n_images
integer :: n
n_images = num_images()
my_id = this_image()
if (my_id == 1) then
print *,"n="
read *, n
allocate(GP%u(0:n))
GP%h = 1.0_rkind/n
GP%u = 0
GP%u(0) = 1
GP%u(n) = exp(1.0_rkind)
end if
sync all
print *, my_id, GP[1]%u(my_id-1)
sync all
print *,"koncim ", this_image() ," z ",num_images()
end program ddc
Win output:
Approximatrix Windows Coarray Library
Copyright 2016-2019 Approximatrix, LLC
=======================================
Library Build: Dec 10 2019
Images: 4
=======================================
n=
346
Windows Coarray ERROR ==> Unknown type in caf_get_by_ref: O
image=1
Windows Coarray ERROR ==> Unknown type in caf_get_by_ref: O
image=4
Windows Coarray ERROR ==> Unknown type in caf_get_by_ref: O
image=2
Windows Coarray ERROR ==> Unknown type in caf_get_by_ref: O
image=3
Open Coarrays output:
n=
1 1.00000000000000000000
2 0.00000000000000000000
3 0.00000000000000000000
4 0.00000000000000000000
5 0.00000000000000000000
koncim 4 z 5
koncim 1 z 5
koncim 2 z 5
koncim 3 z 5
koncim 5 z 5
It looks like some support for derived types may have broken. I'll see what's wrong this week.
Looking into the issue, it appears that the compiler documentation for the ABI for the internal "byref" functions for coarrays is incorrect in multiple places. Ours was designed based on the incorrect documentation. We're fixing our library now.
Thanks.
I wanted to provide a quick update. This bug actually revealed some problems with our implementation and dealing with allocatable coarrays.
The first problem encountered was an inconsistency in the data describing the size of arrays. Our implementation was not properly requesting remote array sizes from other images prior to requesting the data. With static coarrays, there is no problem since all images know the sizes of all coarrays. However, with allocatable coarrays, the images need to request sizes from each other, which they were not doing. This oversight led to requests that had incorrect memory addresses when making the request. It should be fixed now, and the test program you've provided no longer errors out or crashes. The test program still doesn't execute properly, however.
The second problem that is still present appears to be how we're handling memory allocation for allocated coarrays. I believe right now that we might be leaking memory quite significantly, and there's a chance that we're destroying data in the process of registering coarrays. We're trying to sort out this bug right now.
I upgraded to 3.8. I am sorry, I still have some problems with coarrays.
Can you post another problematic example and the output? Running your example code on 3.8, I get:
Approximatrix Windows Coarray Library
Copyright 2016-2019 Approximatrix, LLC
=======================================
Library Build: Jan 13 2020
Images: 4
=======================================
n=
100
1 1.00000000000000000000
2 0.00000000000000000000
3 0.00000000000000000000
4 0.00000000000000000000
koncim 1 z 4
koncim 4 z 4
koncim 3 z 4
koncim 2 z 4
There's a good chance that problems may exist with allocated coarrays (or plenty of other features) that we've overlooked.
It gets a bit longer, is it possible to send it to you by email?
Petr
Approximatrix Forums → User Support → coarrays
Powered by PunBB, supported by Informer Technologies, Inc.