Topic: coarrays

Will be coarrays for multiple images available?
  Petr

Re: coarrays

Our compiler relies on OpenCoarrays, which does not support multiple images on Windows using Microsoft's MPI implementation.  We are working on a possible solution internally, but we do not have any timeline available as to when it might be ready.

Jeff Armstrong
Approximatrix, LLC

Re: coarrays

I really enjoy a new possibility to use coarrays. Thanks much. Do you plan extend this feature for real distributed computing?
pm1a

Re: coarrays

Our Coarray library is a single-system implementation.  It relies heavily on Windows Event and Mutex objects. To my knowledge, these can't scale across multiple systems.  For scaling across multiple systems, one might be better off using OpenCoarrays with our product on macOS or Linux.  However, with 16-core AMD chips that are available for consumer desktops at this point, our implementation should perform adequately.

Jeff Armstrong
Approximatrix, LLC

Re: coarrays

I just installed version 3.7 build 3131 and get linker error message:
==============================================================================
Generating Makefile... Okay
==============================================================================
Compiling ..\spec\pmatools.f90
Generating vyvoj.exe

* Complete *
==============================================================================
Generating Makefile... Okay
==============================================================================
Generating target.exe
c:/program files (x86)/simply fortran 3/mingw-w64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib\libwca.a(wca_images.o):wca_images.c:(.text+0x61): undefined reference to `wca_get_STAT_STOPPED_IMAGE'
c:/program files (x86)/simply fortran 3/mingw-w64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib\libwca.a(wca_images.o):wca_images.c:(.text+0x93): undefined reference to `wca_get_STAT_FAILED_IMAGE'
c:/program files (x86)/simply fortran 3/mingw-w64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib\libwca.a(wca_images.o):wca_images.c:(.text+0x9a): undefined reference to `wca_get_STAT_OK'
c:/program files (x86)/simply fortran 3/mingw-w64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib\libwca.a(wca_images.o):wca_images.c:(.text+0x1b1): undefined reference to `wca_get_STAT_FAILED_IMAGE'
c:/program files (x86)/simply fortran 3/mingw-w64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib\libwca.a(wca_images.o):wca_images.c:(.text+0x813): undefined reference to `wca_get_STAT_FAILED_IMAGE'
c:/program files (x86)/simply fortran 3/mingw-w64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib\libwca.a(wca_images.o):wca_images.c:(.text+0x853): undefined reference to `wca_get_STAT_STOPPED_IMAGE'
c:/program files (x86)/simply fortran 3/mingw-w64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/../../../../x86_64-w64-mingw32/lib/../lib\libwca.a(wca_images.o):wca_images.c:(.text+0x7aa): undefined reference to `wca_get_STAT_FAILED_IMAGE'
collect2.exe: error: ld returned 1 exit status
Error: Last command making (target.exe) returned a bad status
Error: Make execution terminated

* Failed *

Could you suggest some solution?
Than you  Petr

Re: coarrays

Petr,

Looks like a new file in the Windows Coarray Library was not compiled properly.  I should have a fix for this today.

EDIT: The actual problem is caused by a Fortran file in the library.  Because Fortran is case-insensitive, the compiler defines the functions in the library as wca_get_stat_failed_image.  The C routines, however, are not case-insensitive, and are looking for _wca_get_STAT_FAILED_IMAGE.  It's an odd, little problem, and I'm not sure how it snuck through.

Jeff Armstrong
Approximatrix, LLC

Re: coarrays

Petr,

Build 3133 should fix this issue.  Let me know if you encounter any other problems.

Jeff Armstrong
Approximatrix, LLC

Re: coarrays

Code compiles, but change behavior.

Re: coarrays

Petr,

Are you calling num_images by any chance?  Could you include that particular line?  I'm seeing some possible problems when asking for the number of failed images.

Jeff Armstrong
Approximatrix, LLC

Re: coarrays

Yes I do. Source code follows

Re: coarrays

sorry, I'm not able send you  a source code. I'm getting some strange error message

Re: coarrays

Warning! The following errors must be corrected before your message can be posted:
[i ]d within itself, this is not allowed

Re: coarrays

module typy
  integer, parameter, public :: ikind = selected_int_kind(16)
  integer, parameter, public :: rkind = selected_real_kind(25,99)
end module typy

module pocty
  use typy
  implicit none

  !> typ pro stabilizovane scitani
  type, public :: sumator
    integer(kind=ikind), private :: depth = 0
    real(kind=rkind), dimension(:), allocatable, private :: s
  contains
    procedure clear
    procedure add
    procedure sum
    procedure setdepth
  end type sumator

  public :: work
  public :: f
  public :: integral
contains
  subroutine clear(s)
    implicit none
    class(sumator), intent(in out) :: s
    integer(kind=ikind) :: i
    do i=1, s%depth
      s%s(i) = 0
    end do
  end subroutine clear

  subroutine setdepth(s,d)
    implicit none
    class(sumator), intent(in out) :: s
    integer(kind=ikind), intent(in) :: d
    integer(kind=ikind) :: i
    !print *,"setdepth1"
    if  (allocated(s%s)) deallocate(s%s)
    !print *,"setdepth2"
    s%depth = d
    allocate(s%s(1:d))
    !print *,"setdepth3"
    do i=1, s%depth
        !print *,i
      s%s(i) = 0
      !print *," po",i
    end do
    !print *,"setdepth4"
  end subroutine setdepth

  function sum(s) result(y)
    implicit none
    class(sumator), intent(in) :: s
    real(kind=rkind) :: y
    integer(kind=ikind) :: i

    y = 0
    do i = s%depth, 1, -1
      y = y + s%s(i)
    end do
  end function sum

  subroutine add(s,v)
    implicit none
    class(sumator), intent(in out) :: s
    real(kind=rkind), intent(in)   :: v

    integer(kind=ikind) :: i

    s%s(s%depth) = s%s(s%depth) + v
    do i = s%depth, 2, -1
      if ( s%s(i) > s%s(i-1)  ) then
        s%s(i-1) = s%s(i-1) + s%s(i)
        s%s(i) = 0
      else
        exit
      end if
    end do
  end subroutine add

  subroutine work(meze,res)
    use typy
    implicit none
    real(kind=rkind), dimension(2), intent(in) :: meze
    real(kind=rkind), intent(out) :: res
    !print *, "work zacina", meze
    res = integral(f,meze(1), meze(2), 1.0e-30_rkind)
    !print *, "work konci"
  end subroutine work

  !> Vypocet pomoci Rombergovy metody
  recursive function integral(f,a,b,tol) result(y)
    use typy
    implicit none
    interface
      function f(x) result(y)
        use typy
        implicit none
        real(kind=rkind), intent(in)  :: x
        real(kind=rkind)              :: y
      end function f
    end interface
    integer, parameter :: rsize = 20
    real(kind=rkind), intent(in) :: a,b,tol
    real(kind=rkind) :: y, y1, x, wrk
    real(kind=rkind) :: h
    integer(kind=ikind) :: n, i, cnt
    type(sumator) :: sy
    real(kind=rkind), dimension(1:rsize) :: yy
    integer :: size

    !print *,"a"
    h = b-a
    !print *,"b"
    y = (f(a)+f(b))/2
    !print *,"c" nasledujici dela pro 32b problem
    call sy%setdepth(20_ikind)
    !print *,"c01"
    call sy%clear
    call sy%add(y)
    !print *,"c1"
    n = 1
    size = 1
    yy(1) = y*h
    cnt = 1
    !print *,"d"
    do
      n = 2*n
      h = h/2
      do i=1,n,2
        x = a + i*h
        y =f(x)
        call sy%add(y)
      end do
      y = sy%sum() *h
      ! ted zaradime
      wrk = 4
      y1 = yy(1)
      if (cnt  < rsize) cnt  = cnt  + 1
      do i = 1,cnt-1
        yy(i) = y
        y = yy(i) + (yy(i)-y1)/(wrk - 1)
        wrk = 4*wrk
        y1 = yy(i+1)
      end do
      yy(cnt) = y
      if ( abs(y-y1) < 1.0e-30 )  exit
    end do
    !print *,"e"
    y = yy(cnt)
  end function integral

  function f(x) result(y)
    use typy
    implicit none
    real(kind=rkind), intent(in)  :: x
    real(kind=rkind)              :: y
    y = 4/(1+x*x)
  end function f
end module pocty


program copi
  use typy
  use pocty
  implicit none
  integer :: myid
  integer :: nimages
  integer :: pocet_intervalu
  real(kind=rkind), dimension(1:2), codimension[* ] :: lokmeze
  real(kind=rkind), codimension[* ]                 :: lokresults
  logical, codimension[* ]                          :: finished ! vyznamna je jen hodnota v jednicce
  real(kind=rkind), dimension(:,:), allocatable    :: meze
  real(kind=rkind) :: mz, pi
  integer(kind=ikind) :: i
  integer(kind=ikind) :: done
  integer, dimension(1:8) :: t1, t2
  real(kind=rkind) :: eltime, wrksec, ll(2),lr


  myid = this_image()
  nimages = num_images()
  finished = .false.
  print *,"jsem ", myid, " z ", nimages
  sync all
  if (myid == 1) then
    pi = 0
    print *, "zaciname, mam celkem ", nimages, " vlaken"
    print *, "Zadej pocet intervalu"
    read *, pocet_intervalu
    allocate(meze(1:pocet_intervalu,1:2))
    ! ted je naplnim
    meze(1,1)               = 0
    meze(pocet_intervalu,2) = 1
    do i = 1, pocet_intervalu-1
      mz = real(i,rkind)/pocet_intervalu
      meze(i,2)   = mz
      meze(i+1,1) = mz
    end do
    done = 0
    finished = .false.
    ! zacnu merit cas
    call date_and_time(values=t1)
    print *, t1
  end if
  do
  sync all
  ! pridelim praci
  if (myid == 1) then
  !  print *, 0
    do i = 1, nimages
      print *, i
      done = done + 1
      if (done <= pocet_intervalu) then
        lokmeze(1)[i ] = meze(done,1)
        lokmeze(2)[i ] = meze(done,2)
  !      print *, lokmeze(:)[i ], done
      else
  !      print *, done
        lokmeze(1)[i ] = 0.0_rkind
        lokmeze(2)[i ] = 0.0_rkind
        finished = .true.
      end if
    end do
   end if
   !print *, "jeste ano", myid
   sync all
   !print *, lokmeze(1)[myid], lokmeze(2)[myid], lokresults[myid]
  ! udelam praci
  ll(1) = lokmeze(1)[myid]
  ll(2) = lokmeze(2)[myid]
  !print *,"vlakno ",myid, " zacina"
  call work(ll, lr)
  !print *,"vlakno ",myid, " konci"
  lokresults[myid] = lr
  lokmeze(:)[myid] = 0.0_rkind
  !print *, lokmeze(:)[myid], myid
  ! vezmu si vysledky
  sync all ! everything should be sybchronized
  !print *, myid, finished[1], lokresults[myid]
  if ( myid == 1 ) then
    do i = 1, nimages
      pi = pi + lokresults[i ]
      lokresults[i ]=0.0_rkind
    end do
  end if
  ! zkontroluji konec
  sync all
  if ( finished[1] ) exit

  end do
  sync all
  if ( myid == 1 ) then
    print *,"Pi=", pi
    call date_and_time(values=t2)
    print *, t2
    eltime = (t2(8)-t1(8))/1000.0_rkind +t2(7)-t1(7) + 60.0_rkind*(t2(6)-t1(6)) + 3600.0_rkind*(t2(5)-t1(5))
    eltime = eltime + 86400.0_rkind*(t2(3)-t1(3))
    print "(a,f0.7)", "uplynuly cas=", eltime
  end if
  sync all
end program copi

Re: coarrays

finally I succeed, interesting is only main program.

Re: coarrays

It works OK with older version and with OpenCoarrays too.

Re: coarrays

Petr,

The forum software sees the text [ i ] as markup indicating italics, and it is complaining there is no closing italics tag.

Regardless, does this not work with version 3.7?  I'm not sure I quite understand.  You can also email the full code to support@approximatrix.com, and I'll be happy to see what's going wrong if the library is not working as expected.  There may be a chance that the status messages, recently added, are returning incorrect values to internal calls.

Jeff Armstrong
Approximatrix, LLC

Re: coarrays

Petr,

The internal image status indicators were experiencing an off-by-one error internally, causing some images to be reported as failed.  We've fixed that issue, and we're currently hunting down a sync bug that is still present.

The previous library was working because every thread always reported a status of OK rather than waiting or failed. Trying to improve it apparently broke some things.

Jeff Armstrong
Approximatrix, LLC

Re: coarrays

I have some other problem with coarrays. The folloving code produce error message, but with OpenCoarrays works OK.

Program:

module typy
    integer, parameter :: rkind = selected_real_kind(16,99)

    type, public :: LocProblem
    end type LocProblem


    type, public :: GlobProblem
        real(rkind),dimension(:), allocatable :: u
        real(rkind) :: h = 0
    end type GlobProblem




end module typy





program ddc
    use typy
    implicit none
    type(GlobProblem), codimension[ * ] :: GP

    integer :: my_id
    integer :: n_images
    integer :: n

    n_images = num_images()
    my_id = this_image()
    if (my_id == 1) then
        print *,"n="
        read *, n
        allocate(GP%u(0:n))
        GP%h = 1.0_rkind/n
        GP%u = 0
        GP%u(0) = 1
        GP%u(n) = exp(1.0_rkind)
    end if
    sync all
    print *, my_id, GP[1]%u(my_id-1)

    sync all
print *,"koncim ", this_image() ," z ",num_images()
end program  ddc


Win output:

Approximatrix Windows Coarray Library
Copyright 2016-2019 Approximatrix, LLC
=======================================

Library Build: Dec 10 2019
Images:        4

=======================================

n=
346
Windows Coarray ERROR ==> Unknown type in caf_get_by_ref: O
                          image=1
Windows Coarray ERROR ==> Unknown type in caf_get_by_ref: O
                          image=4
Windows Coarray ERROR ==> Unknown type in caf_get_by_ref: O
                          image=2
Windows Coarray ERROR ==> Unknown type in caf_get_by_ref: O
                          image=3

Open Coarrays output:

n=
           1   1.00000000000000000000     
           2   0.00000000000000000000     
           3   0.00000000000000000000     
           4   0.00000000000000000000     
           5   0.00000000000000000000     
koncim            4  z            5
koncim            1  z            5
koncim            2  z            5
koncim            3  z            5
koncim            5  z            5

Re: coarrays

It looks like some support for derived types may have broken.  I'll see what's wrong this week.

Jeff Armstrong
Approximatrix, LLC

Re: coarrays

Looking into the issue, it appears that the compiler documentation for the ABI for the internal "byref" functions for coarrays is incorrect in multiple places.  Ours was designed based on the incorrect documentation.  We're fixing our library now.

Jeff Armstrong
Approximatrix, LLC

Re: coarrays

Thanks.

Re: coarrays

I wanted to provide a quick update.  This bug actually revealed some problems with our implementation and dealing with allocatable coarrays. 

The first problem encountered was an inconsistency in the data describing the size of arrays.  Our implementation was not properly requesting remote array sizes from other images prior to requesting the data.  With static coarrays, there is no problem since all images know the sizes of all coarrays.  However, with allocatable coarrays, the images need to request sizes from each other, which they were not doing. This oversight led to requests that had incorrect memory addresses when making the request.  It should be fixed now, and the test program you've provided no longer errors out or crashes.  The test program still doesn't execute properly, however.

The second problem that is still present appears to be how we're handling memory allocation for allocated coarrays.  I believe right now that we might be leaking memory quite significantly, and there's a chance that we're destroying data in the process of registering coarrays.  We're trying to sort out this bug right now.

Jeff Armstrong
Approximatrix, LLC

Re: coarrays

I upgraded to 3.8. I am sorry, I still have some problems with coarrays.

Re: coarrays

Can you post another problematic example and the output?  Running your example code on 3.8, I get:

 Approximatrix Windows Coarray Library
 Copyright 2016-2019 Approximatrix, LLC
 =======================================

 Library Build: Jan 13 2020
 Images:        4

 =======================================

 n=
100
           1   1.00000000000000000000      
           2   0.00000000000000000000      
           3   0.00000000000000000000      
           4   0.00000000000000000000      
 koncim            1  z            4
 koncim            4  z            4
 koncim            3  z            4
 koncim            2  z            4

There's a good chance that problems may exist with allocated coarrays (or plenty of other features) that we've overlooked.

Jeff Armstrong
Approximatrix, LLC

Re: coarrays

It gets a bit longer, is it possible to send it to you by email?
  Petr