public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/48329] New: Program takes twice as long *without* -fopenmp than with 1 OpenMP thread
@ 2011-03-29 10:02 burnus at gcc dot gnu.org
2011-03-29 11:30 ` [Bug tree-optimization/48329] Missed vectorization of reduction due to PRE rguenth at gcc dot gnu.org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: burnus at gcc dot gnu.org @ 2011-03-29 10:02 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48329
Summary: Program takes twice as long *without* -fopenmp than
with 1 OpenMP thread
Product: gcc
Version: 4.6.0
Status: UNCONFIRMED
Keywords: missed-optimization, openmp
Severity: normal
Priority: P3
Component: middle-end
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: burnus@gcc.gnu.org
Program taken from http://openmp.org/forum/viewtopic.php?f=3&t=1123
System: Intel Xeon X5570 @ 2.93GHz, SUSE SLES 11 (x86_64) [glibc-2.11.1]
No OpenMP:
gfortran -O3 -ffast-math test2.f90
time ./a.out -> 14.44user 0.00system 0:14.46elapsed 99%CPU
With OpenMP and OMP_NUM_THREADS=1
gfortran -fopenmp -O3 -ffast-math test2.f90
time ./a.out -> 7.22user 0.00system 0:07.23elapsed 99%CPU
Using gfortran 4.3.4, I get the 7s result also without -fopenmp; ditto with
ifort 11.1. With OpenMP the run time of GCC 4.6 and ifort is exactly the same
[modulo noise] for 1 and 2 threads.
program calcpi
USE omp_lib
implicit none
double precision:: h,x,sum,pi
integer:: n,i
double precision:: f
f(x) = 4.0/(1.0+x**2)
n = 2100000000
h= 1.0 / dble(n)
sum = 0.0
!$OMP PARALLEL DO DEFAULT(NONE) &
!$OMP SHARED(n,h) PRIVATE(x) &
!$OMP REDUCTION(+:sum)
DO i=1, n
x = h * (dble(i)-0.5)
sum = sum + f(x)
END DO
!$OMP END PARALLEL DO
pi = h * sum
write(*,*) 'Pi=',pi
end program calcpi
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/48329] Missed vectorization of reduction due to PRE
2011-03-29 10:02 [Bug middle-end/48329] New: Program takes twice as long *without* -fopenmp than with 1 OpenMP thread burnus at gcc dot gnu.org
@ 2011-03-29 11:30 ` rguenth at gcc dot gnu.org
2014-04-30 9:34 ` dominiq at lps dot ens.fr
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-03-29 11:30 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48329
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Keywords|openmp |
Last reconfirmed| |2011.03.29 10:31:56
Component|middle-end |tree-optimization
CC| |rguenth at gcc dot gnu.org
Ever Confirmed|0 |1
Summary|Program takes twice as long |Missed vectorization of
|*without* -fopenmp than |reduction due to PRE
|with 1 OpenMP thread |
--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-03-29 10:31:56 UTC ---
We vectorize the reduction if the function is outlined. I suppose sth
confuses the vectorizer in the non-OMP path. Yep, it's PRE, so try
-fno-tree-pre:
<bb 3>:
# i_1 = PHI <1(2), i_22(4)>
# sum_2 = PHI <0.0(2), sum_20(4)>
# prephitmp.9_50 = PHI
<5.66893424036281234980410020432668056299176519904892395524e-20(2),
D.1586_48(4)>
# ivtmp.12_10 = PHI <2100000000(2), ivtmp.12_11(4)>
D.1574_17 = prephitmp.9_50 + 1.0e+0;
D.1575_18 = ((D.1574_17));
D.1576_19 = 4.0e+0 / D.1575_18;
sum_20 = D.1576_19 + sum_2;
ivtmp.12_11 = ivtmp.12_10 - 1;
if (ivtmp.12_11 == 0)
goto <bb 5>;
else
goto <bb 4>;
<bb 4>:
i_22 = i_1 + 1;
pretmp.8_44 = (real(kind=8)) i_22;
pretmp.8_45 = pretmp.8_44 - 5.0e-1;
pretmp.8_46 = ((pretmp.8_45));
pretmp.8_47 = pretmp.8_46 *
4.76190476190476200439314681013558416822206709184683859348e-10;
D.1586_48 = __builtin_pow (pretmp.8_47, 2.0e+0);
goto <bb 3>;
is not detected as reduction. Probably not only because, but at least
also because of the latch block not being empty.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/48329] Missed vectorization of reduction due to PRE
2011-03-29 10:02 [Bug middle-end/48329] New: Program takes twice as long *without* -fopenmp than with 1 OpenMP thread burnus at gcc dot gnu.org
2011-03-29 11:30 ` [Bug tree-optimization/48329] Missed vectorization of reduction due to PRE rguenth at gcc dot gnu.org
@ 2014-04-30 9:34 ` dominiq at lps dot ens.fr
2014-04-30 11:35 ` rguenth at gcc dot gnu.org
2014-04-30 11:44 ` rguenth at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: dominiq at lps dot ens.fr @ 2014-04-30 9:34 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48329
--- Comment #2 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
This seems to have been fixed during the 4.7 revisions: I see the problem with
4.6.4, but not with 4.7.3 or higher.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/48329] Missed vectorization of reduction due to PRE
2011-03-29 10:02 [Bug middle-end/48329] New: Program takes twice as long *without* -fopenmp than with 1 OpenMP thread burnus at gcc dot gnu.org
2011-03-29 11:30 ` [Bug tree-optimization/48329] Missed vectorization of reduction due to PRE rguenth at gcc dot gnu.org
2014-04-30 9:34 ` dominiq at lps dot ens.fr
@ 2014-04-30 11:35 ` rguenth at gcc dot gnu.org
2014-04-30 11:44 ` rguenth at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-04-30 11:35 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48329
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
Target Milestone|--- |4.7.0
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Indeed.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/48329] Missed vectorization of reduction due to PRE
2011-03-29 10:02 [Bug middle-end/48329] New: Program takes twice as long *without* -fopenmp than with 1 OpenMP thread burnus at gcc dot gnu.org
` (2 preceding siblings ...)
2014-04-30 11:35 ` rguenth at gcc dot gnu.org
@ 2014-04-30 11:44 ` rguenth at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-04-30 11:44 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48329
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Wed Apr 30 11:43:41 2014
New Revision: 209930
URL: http://gcc.gnu.org/viewcvs?rev=209930&root=gcc&view=rev
Log:
2014-04-30 Richard Biener <rguenther@suse.de>
PR tree-optimization/48329
* gfortran.dg/vect/pr48329.f90: New testcase.
Added:
trunk/gcc/testsuite/gfortran.dg/vect/pr48329.f90
Modified:
trunk/gcc/testsuite/ChangeLog
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-04-30 11:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-29 10:02 [Bug middle-end/48329] New: Program takes twice as long *without* -fopenmp than with 1 OpenMP thread burnus at gcc dot gnu.org
2011-03-29 11:30 ` [Bug tree-optimization/48329] Missed vectorization of reduction due to PRE rguenth at gcc dot gnu.org
2014-04-30 9:34 ` dominiq at lps dot ens.fr
2014-04-30 11:35 ` rguenth at gcc dot gnu.org
2014-04-30 11:44 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).