From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id E3B533857C7C; Wed, 29 Jul 2020 22:25:21 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E3B533857C7C
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
 s=default; t=1596061521;
 bh=ZOKdwJPfa25yxOgVs29SqbOSmQdMH/MOfhtGuwp5P9E=;
 h=From:To:Subject:Date:In-Reply-To:References:From;
 b=iXboqPZ8yh96HnQwEkzD0PhmCZZMK4vFxCwBqlLgEP9vFfs9dvc/DRK2Fivr+QlaT
 +bp8M80bUX75ixybLmzbqjERMASSMGXTx7HOoQnFaSZ2ati4q4GhzSgYOY2AmOCm2r
 5Br19m3Xm08s27Ava4G6wPv+AI3slHzNUKK0zVxM=
From: "prop_design at protonmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug fortran/53957] Polyhedron 11 benchmark: MP_PROP_DESIGN twice as
 long as other compiler
Date: Wed, 29 Jul 2020 22:25:21 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: fortran
X-Bugzilla-Version: 4.8.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: prop_design at protonmail dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-53957-4-UYTUcyi5Zi@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-53957-4@http.gcc.gnu.org/bugzilla/>
References: <bug-53957-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jul 2020 22:25:22 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D53957

--- Comment #25 from Anthony <prop_design at protonmail dot com> ---
(In reply to Anthony from comment #24)
> (In reply to rguenther@suse.de from comment #23)
> > On Sun, 28 Jun 2020, prop_design at protonmail dot com wrote:
> >=20
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D53957
> > >=20
> > > --- Comment #22 from Anthony <prop_design at protonmail dot com> ---
> > > (In reply to Thomas Koenig from comment #21)
> > > > Another question: Is there anything left to be done with the
> > > > vectorizer, or could we remove that dependency?
> > >=20
> > > thanks for looking into this again for me. i'm surprised it worked th=
e same on
> > > Linux, but knowing that, at least helps debug this issue some more. I=
'm not
> > > sure about the vectorizer question, maybe that question was intended =
for
> > > someone else. the runtimes seem good as is though. i doubt the
> > > auto-parallelization will add much speed. but it's an interesting fea=
ture that
> > > i've always hoped would work. i've never got it to work though. the o=
nly code
> > > that did actually implement something was Intel Fortran. it implement=
ed one
> > > trivial loop, but it slowed the code down instead of speeding it up. =
the output
> > > from gfortran shows more loops it wants to run in parallel. they aren=
't
> > > important ones. but something would be better than nothing. if it slo=
wed the
> > > code down, i would just not use it.
> >=20
> > GCC adds runtime checks for a minimal number of iterations before
> > dispatching to the parallelized code - I guess we simply never hit
> > the threshold.  This is configurable via --param parloops-min-per-threa=
d,
> > the default is 100, the default number of threads is determined the same
> > as for OpenMP so you can probably tune that via OMP_NUM_THREADS.
>=20
> thanks for that tip. i tried changing the parloops parameters but no luck.
> the only difference was the max thread use went from 2 to 3. core use was
> the same.
>=20
> i added the following an some variations of these:
>=20
> --param parloops-min-per-thread=3D2 (the default was 100 like you said)
> --param parloops-chunk-size=3D1 (the default was zero so i removed this
> parameter later) --param parloops-schedule=3Dauto (tried all options exce=
pt
> guided, the default is static)
>=20
> i was able to check that they were set via:
>=20
> --help=3Dparam -Q
>=20
> some other things i tried was adding -mthreads and removing -static. but =
so
> far no luck. i also tried using -mthreads instead of -pthread.
>=20
> i should make clear i'm testing PROP_DESIGN_MAPS, not MP_PROP_DESIGN.
> MP_PROP_DESIGN is ancient and the added benchmarking loops were messing w=
ith
> the ability of the optimizer to auto-parallelize (in the past at least).

I did more testing and it the added options actually slow the code way down.
however, it still is only using one core. from what i can tell if i set
OMP_PLACES it doesn't seem like it's working. i saw a thread from years ago
where someone had the same problem. i think OMP_PLACES might be working on
linux but not on windows. that's what the thread i found was saying. don't
really know. but i've exhausted all the possibilities at this point. the on=
ly
thing i know for sure is i can't get it to use anything more than one core.

--- Comment #26 from Anthony <prop_design at protonmail dot com> ---
(In reply to Anthony from comment #24)
> (In reply to rguenther@suse.de from comment #23)
> > On Sun, 28 Jun 2020, prop_design at protonmail dot com wrote:
> >=20
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D53957
> > >=20
> > > --- Comment #22 from Anthony <prop_design at protonmail dot com> ---
> > > (In reply to Thomas Koenig from comment #21)
> > > > Another question: Is there anything left to be done with the
> > > > vectorizer, or could we remove that dependency?
> > >=20
> > > thanks for looking into this again for me. i'm surprised it worked th=
e same on
> > > Linux, but knowing that, at least helps debug this issue some more. I=
'm not
> > > sure about the vectorizer question, maybe that question was intended =
for
> > > someone else. the runtimes seem good as is though. i doubt the
> > > auto-parallelization will add much speed. but it's an interesting fea=
ture that
> > > i've always hoped would work. i've never got it to work though. the o=
nly code
> > > that did actually implement something was Intel Fortran. it implement=
ed one
> > > trivial loop, but it slowed the code down instead of speeding it up. =
the output
> > > from gfortran shows more loops it wants to run in parallel. they aren=
't
> > > important ones. but something would be better than nothing. if it slo=
wed the
> > > code down, i would just not use it.
> >=20
> > GCC adds runtime checks for a minimal number of iterations before
> > dispatching to the parallelized code - I guess we simply never hit
> > the threshold.  This is configurable via --param parloops-min-per-threa=
d,
> > the default is 100, the default number of threads is determined the same
> > as for OpenMP so you can probably tune that via OMP_NUM_THREADS.
>=20
> thanks for that tip. i tried changing the parloops parameters but no luck.
> the only difference was the max thread use went from 2 to 3. core use was
> the same.
>=20
> i added the following an some variations of these:
>=20
> --param parloops-min-per-thread=3D2 (the default was 100 like you said)
> --param parloops-chunk-size=3D1 (the default was zero so i removed this
> parameter later) --param parloops-schedule=3Dauto (tried all options exce=
pt
> guided, the default is static)
>=20
> i was able to check that they were set via:
>=20
> --help=3Dparam -Q
>=20
> some other things i tried was adding -mthreads and removing -static. but =
so
> far no luck. i also tried using -mthreads instead of -pthread.
>=20
> i should make clear i'm testing PROP_DESIGN_MAPS, not MP_PROP_DESIGN.
> MP_PROP_DESIGN is ancient and the added benchmarking loops were messing w=
ith
> the ability of the optimizer to auto-parallelize (in the past at least).

I did more testing and it the added options actually slow the code way down.
however, it still is only using one core. from what i can tell if i set
OMP_PLACES it doesn't seem like it's working. i saw a thread from years ago
where someone had the same problem. i think OMP_PLACES might be working on
linux but not on windows. that's what the thread i found was saying. don't
really know. but i've exhausted all the possibilities at this point. the on=
ly
thing i know for sure is i can't get it to use anything more than one core.

--- Comment #27 from Anthony <prop_design at protonmail dot com> ---
so after trying a bunch of things, i think the final problem may be this. i=
 get
the following result when i try to set thread affinity:

set GOMP_CPU_AFFINITY=3D"0 1"

gives the following feedback at run time; libgomp: Affinity not supported on
this configuration

i have to close the command prompt window to stop the program. the program
doesn't run properly if i try to set thread affinity.

so this still makes me thing it might work on linux and not windows 10, but=
 i
have no way to test that.

the extra threads that auto-parallelization create will only go to one core=
, on
my machine at least.=