From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 800443858D33; Mon, 23 Jan 2023 11:37:08 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 800443858D33
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1674473828;
	bh=xE9vZUNM+PcRwZ58L2nxyUVcz4eMIL1zMp2QHzNwveQ=;
	h=From:To:Subject:Date:From;
	b=h8QlOKWUAeAiJHbpgJhKXav8yFAAphbVoEM1ti/Wdh3rm6FfcMiARcp5dLEV5+Iua
	 U5njQeE45fZSiFTNmfwzPgSmL+xj6lOF7PbxNTqFGWub8w+560EvGrSJ/cTQ4pzJei
	 3GGkR4NVOxyyPgH6odvZYU8+uDR8f5qxg4b/wnUw=
From: "dewhurst@mpi-halle.mpg.de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug libgomp/108494] New: Slow thread creation with nested loops in
 GFortran
Date: Mon, 23 Jan 2023 11:37:07 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: libgomp
X-Bugzilla-Version: unknown
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: dewhurst@mpi-halle.mpg.de
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter cc target_milestone
Message-ID: <bug-108494-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108494

            Bug ID: 108494
           Summary: Slow thread creation with nested loops in GFortran
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgomp
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dewhurst@mpi-halle.mpg.de
                CC: jakub at gcc dot gnu.org
  Target Milestone: ---

This is an issue with very slow thread creation for nested loops in code
compiled with GFortran, however I suspect it may be due to the libgomp libr=
ary.

Here is a simple example the problem:

program test
implicit none
integer l
!$OMP PARALLEL DO &
!$OMP NUM_THREADS(1)
do l=3D1,1000
  call foo
end do
!$OMP END PARALLEL DO
end program

subroutine foo
implicit none
integer, parameter :: l=3D200,m=3D100,n=3D10
! number of threads
integer, parameter :: nthd=3D10
integer i,j
! automatic arrays
real(8) a(n,l),b(n,m),x(m)
a(:,:)=3D2.d0
b(:,:)=3D3.d0
do i=3D1,l
!$OMP PARALLEL DO DEFAULT(SHARED) &
!$OMP NUM_THREADS(nthd)
  do j=3D1,m
    x(j)=3Ddot_product(a(:,i),b(:,j))
  end do
!$OMP END PARALLEL DO
end do
end subroutine

The wall-clock time is about 0.5 seconds when compiled with Intel or PGI
Fortran. However, for GFortran compiled with

gfortran -O3 -fopenmp test.f90

and OMP_NESTED set to true, the wall-clock time is about 70 seconds, or abo=
ut
140 times slower. (The =E2=80=98dot_product=E2=80=99 can be removed from th=
e loop =E2=80=93 all the
time is taken with thread creation).

This only affects nested loops; if the OMP directives are removed from the =
loop
in the program part in the code above then GFortran is as fast as the other
compilers. I=E2=80=99ve tried several different versions of GFortran (from =
7.5.0 to
12.1.0) on different Linux machines and it=E2=80=99s slow on all of them.

It may problem with libgomp. If I substitute the libgomp library for that
provided with the NVIDIA compiler (on our machine this is in the directory
nvhpcsdk/22.11/Linux_x86_64/22.11/compilers/lib/libgomp.so.1) then it=E2=80=
=99s as fast
as the others.

This has been reproduced by others and also in Windows, see here:
https://fortran-lang.discourse.group/t/slow-thread-creation-with-nested-loo=
ps-in-gfortran/5062=