From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id E3B533857C7C; Wed, 29 Jul 2020 22:25:21 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E3B533857C7C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1596061521; bh=ZOKdwJPfa25yxOgVs29SqbOSmQdMH/MOfhtGuwp5P9E=; h=From:To:Subject:Date:In-Reply-To:References:From; b=iXboqPZ8yh96HnQwEkzD0PhmCZZMK4vFxCwBqlLgEP9vFfs9dvc/DRK2Fivr+QlaT +bp8M80bUX75ixybLmzbqjERMASSMGXTx7HOoQnFaSZ2ati4q4GhzSgYOY2AmOCm2r 5Br19m3Xm08s27Ava4G6wPv+AI3slHzNUKK0zVxM= From: "prop_design at protonmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug fortran/53957] Polyhedron 11 benchmark: MP_PROP_DESIGN twice as long as other compiler Date: Wed, 29 Jul 2020 22:25:21 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: fortran X-Bugzilla-Version: 4.8.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: prop_design at protonmail dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jul 2020 22:25:22 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D53957 --- Comment #25 from Anthony --- (In reply to Anthony from comment #24) > (In reply to rguenther@suse.de from comment #23) > > On Sun, 28 Jun 2020, prop_design at protonmail dot com wrote: > >=20 > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D53957 > > >=20 > > > --- Comment #22 from Anthony --- > > > (In reply to Thomas Koenig from comment #21) > > > > Another question: Is there anything left to be done with the > > > > vectorizer, or could we remove that dependency? > > >=20 > > > thanks for looking into this again for me. i'm surprised it worked th= e same on > > > Linux, but knowing that, at least helps debug this issue some more. I= 'm not > > > sure about the vectorizer question, maybe that question was intended = for > > > someone else. the runtimes seem good as is though. i doubt the > > > auto-parallelization will add much speed. but it's an interesting fea= ture that > > > i've always hoped would work. i've never got it to work though. the o= nly code > > > that did actually implement something was Intel Fortran. it implement= ed one > > > trivial loop, but it slowed the code down instead of speeding it up. = the output > > > from gfortran shows more loops it wants to run in parallel. they aren= 't > > > important ones. but something would be better than nothing. if it slo= wed the > > > code down, i would just not use it. > >=20 > > GCC adds runtime checks for a minimal number of iterations before > > dispatching to the parallelized code - I guess we simply never hit > > the threshold. This is configurable via --param parloops-min-per-threa= d, > > the default is 100, the default number of threads is determined the same > > as for OpenMP so you can probably tune that via OMP_NUM_THREADS. >=20 > thanks for that tip. i tried changing the parloops parameters but no luck. > the only difference was the max thread use went from 2 to 3. core use was > the same. >=20 > i added the following an some variations of these: >=20 > --param parloops-min-per-thread=3D2 (the default was 100 like you said) > --param parloops-chunk-size=3D1 (the default was zero so i removed this > parameter later) --param parloops-schedule=3Dauto (tried all options exce= pt > guided, the default is static) >=20 > i was able to check that they were set via: >=20 > --help=3Dparam -Q >=20 > some other things i tried was adding -mthreads and removing -static. but = so > far no luck. i also tried using -mthreads instead of -pthread. >=20 > i should make clear i'm testing PROP_DESIGN_MAPS, not MP_PROP_DESIGN. > MP_PROP_DESIGN is ancient and the added benchmarking loops were messing w= ith > the ability of the optimizer to auto-parallelize (in the past at least). I did more testing and it the added options actually slow the code way down. however, it still is only using one core. from what i can tell if i set OMP_PLACES it doesn't seem like it's working. i saw a thread from years ago where someone had the same problem. i think OMP_PLACES might be working on linux but not on windows. that's what the thread i found was saying. don't really know. but i've exhausted all the possibilities at this point. the on= ly thing i know for sure is i can't get it to use anything more than one core. --- Comment #26 from Anthony --- (In reply to Anthony from comment #24) > (In reply to rguenther@suse.de from comment #23) > > On Sun, 28 Jun 2020, prop_design at protonmail dot com wrote: > >=20 > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D53957 > > >=20 > > > --- Comment #22 from Anthony --- > > > (In reply to Thomas Koenig from comment #21) > > > > Another question: Is there anything left to be done with the > > > > vectorizer, or could we remove that dependency? > > >=20 > > > thanks for looking into this again for me. i'm surprised it worked th= e same on > > > Linux, but knowing that, at least helps debug this issue some more. I= 'm not > > > sure about the vectorizer question, maybe that question was intended = for > > > someone else. the runtimes seem good as is though. i doubt the > > > auto-parallelization will add much speed. but it's an interesting fea= ture that > > > i've always hoped would work. i've never got it to work though. the o= nly code > > > that did actually implement something was Intel Fortran. it implement= ed one > > > trivial loop, but it slowed the code down instead of speeding it up. = the output > > > from gfortran shows more loops it wants to run in parallel. they aren= 't > > > important ones. but something would be better than nothing. if it slo= wed the > > > code down, i would just not use it. > >=20 > > GCC adds runtime checks for a minimal number of iterations before > > dispatching to the parallelized code - I guess we simply never hit > > the threshold. This is configurable via --param parloops-min-per-threa= d, > > the default is 100, the default number of threads is determined the same > > as for OpenMP so you can probably tune that via OMP_NUM_THREADS. >=20 > thanks for that tip. i tried changing the parloops parameters but no luck. > the only difference was the max thread use went from 2 to 3. core use was > the same. >=20 > i added the following an some variations of these: >=20 > --param parloops-min-per-thread=3D2 (the default was 100 like you said) > --param parloops-chunk-size=3D1 (the default was zero so i removed this > parameter later) --param parloops-schedule=3Dauto (tried all options exce= pt > guided, the default is static) >=20 > i was able to check that they were set via: >=20 > --help=3Dparam -Q >=20 > some other things i tried was adding -mthreads and removing -static. but = so > far no luck. i also tried using -mthreads instead of -pthread. >=20 > i should make clear i'm testing PROP_DESIGN_MAPS, not MP_PROP_DESIGN. > MP_PROP_DESIGN is ancient and the added benchmarking loops were messing w= ith > the ability of the optimizer to auto-parallelize (in the past at least). I did more testing and it the added options actually slow the code way down. however, it still is only using one core. from what i can tell if i set OMP_PLACES it doesn't seem like it's working. i saw a thread from years ago where someone had the same problem. i think OMP_PLACES might be working on linux but not on windows. that's what the thread i found was saying. don't really know. but i've exhausted all the possibilities at this point. the on= ly thing i know for sure is i can't get it to use anything more than one core. --- Comment #27 from Anthony --- so after trying a bunch of things, i think the final problem may be this. i= get the following result when i try to set thread affinity: set GOMP_CPU_AFFINITY=3D"0 1" gives the following feedback at run time; libgomp: Affinity not supported on this configuration i have to close the command prompt window to stop the program. the program doesn't run properly if i try to set thread affinity. so this still makes me thing it might work on linux and not windows 10, but= i have no way to test that. the extra threads that auto-parallelization create will only go to one core= , on my machine at least.=