[Bug middle-end/48329] New: Program takes twice as long *without* -fopenmp than with 1 OpenMP thread

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug middle-end/48329] New: Program takes twice as long *without* -fopenmp than with 1 OpenMP thread
@ 2011-03-29 10:02 burnus at gcc dot gnu.org
  2011-03-29 11:30 ` [Bug tree-optimization/48329] Missed vectorization of reduction due to PRE rguenth at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: burnus at gcc dot gnu.org @ 2011-03-29 10:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48329

           Summary: Program takes twice as long *without* -fopenmp than
                    with 1 OpenMP thread
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Keywords: missed-optimization, openmp
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: burnus@gcc.gnu.org


Program taken from http://openmp.org/forum/viewtopic.php?f=3&t=1123

System: Intel Xeon X5570  @ 2.93GHz, SUSE SLES 11 (x86_64) [glibc-2.11.1]

No OpenMP:
  gfortran -O3 -ffast-math test2.f90
  time ./a.out ->  14.44user 0.00system 0:14.46elapsed 99%CPU

With OpenMP and OMP_NUM_THREADS=1
  gfortran -fopenmp -O3 -ffast-math test2.f90
  time ./a.out ->  7.22user 0.00system 0:07.23elapsed 99%CPU

Using gfortran 4.3.4, I get the 7s result also without -fopenmp; ditto with
ifort 11.1. With OpenMP the run time of GCC 4.6 and ifort is exactly the same
[modulo noise] for 1 and 2 threads.



program calcpi
USE omp_lib
    implicit none
    double precision:: h,x,sum,pi
    integer:: n,i
    double precision:: f

   f(x) = 4.0/(1.0+x**2)

   n = 2100000000

   h= 1.0 / dble(n)
   sum = 0.0
!$OMP PARALLEL DO DEFAULT(NONE) &
!$OMP SHARED(n,h) PRIVATE(x) &
!$OMP REDUCTION(+:sum)
  DO i=1, n
     x = h * (dble(i)-0.5)
     sum = sum + f(x)
  END DO
!$OMP END PARALLEL DO
  pi = h * sum
  write(*,*) 'Pi=',pi

end program calcpi


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/48329] Missed vectorization of reduction due to PRE
  2011-03-29 10:02 [Bug middle-end/48329] New: Program takes twice as long *without* -fopenmp than with 1 OpenMP thread burnus at gcc dot gnu.org
@ 2011-03-29 11:30 ` rguenth at gcc dot gnu.org
  2014-04-30  9:34 ` dominiq at lps dot ens.fr
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-03-29 11:30 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48329

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
           Keywords|openmp                      |
   Last reconfirmed|                            |2011.03.29 10:31:56
          Component|middle-end                  |tree-optimization
                 CC|                            |rguenth at gcc dot gnu.org
     Ever Confirmed|0                           |1
            Summary|Program takes twice as long |Missed vectorization of
                   |*without* -fopenmp than     |reduction due to PRE
                   |with 1 OpenMP thread        |

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-03-29 10:31:56 UTC ---
We vectorize the reduction if the function is outlined.  I suppose sth
confuses the vectorizer in the non-OMP path.  Yep, it's PRE, so try
-fno-tree-pre:

<bb 3>:
  # i_1 = PHI <1(2), i_22(4)>
  # sum_2 = PHI <0.0(2), sum_20(4)>
  # prephitmp.9_50 = PHI
<5.66893424036281234980410020432668056299176519904892395524e-20(2),
D.1586_48(4)>
  # ivtmp.12_10 = PHI <2100000000(2), ivtmp.12_11(4)>
  D.1574_17 = prephitmp.9_50 + 1.0e+0;
  D.1575_18 = ((D.1574_17));
  D.1576_19 = 4.0e+0 / D.1575_18;
  sum_20 = D.1576_19 + sum_2;
  ivtmp.12_11 = ivtmp.12_10 - 1;
  if (ivtmp.12_11 == 0)
    goto <bb 5>;
  else
    goto <bb 4>;

<bb 4>:
  i_22 = i_1 + 1;
  pretmp.8_44 = (real(kind=8)) i_22;
  pretmp.8_45 = pretmp.8_44 - 5.0e-1;
  pretmp.8_46 = ((pretmp.8_45));
  pretmp.8_47 = pretmp.8_46 *
4.76190476190476200439314681013558416822206709184683859348e-10;
  D.1586_48 = __builtin_pow (pretmp.8_47, 2.0e+0);
  goto <bb 3>;

is not detected as reduction.  Probably not only because, but at least
also because of the latch block not being empty.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/48329] Missed vectorization of reduction due to PRE
  2011-03-29 10:02 [Bug middle-end/48329] New: Program takes twice as long *without* -fopenmp than with 1 OpenMP thread burnus at gcc dot gnu.org
  2011-03-29 11:30 ` [Bug tree-optimization/48329] Missed vectorization of reduction due to PRE rguenth at gcc dot gnu.org
@ 2014-04-30  9:34 ` dominiq at lps dot ens.fr
  2014-04-30 11:35 ` rguenth at gcc dot gnu.org
  2014-04-30 11:44 ` rguenth at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: dominiq at lps dot ens.fr @ 2014-04-30  9:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48329

--- Comment #2 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
This seems to have been fixed during the 4.7 revisions: I see the problem with
4.6.4, but not with 4.7.3 or higher.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/48329] Missed vectorization of reduction due to PRE
  2011-03-29 10:02 [Bug middle-end/48329] New: Program takes twice as long *without* -fopenmp than with 1 OpenMP thread burnus at gcc dot gnu.org
  2011-03-29 11:30 ` [Bug tree-optimization/48329] Missed vectorization of reduction due to PRE rguenth at gcc dot gnu.org
  2014-04-30  9:34 ` dominiq at lps dot ens.fr
@ 2014-04-30 11:35 ` rguenth at gcc dot gnu.org
  2014-04-30 11:44 ` rguenth at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-04-30 11:35 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48329

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED
   Target Milestone|---                         |4.7.0

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Indeed.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/48329] Missed vectorization of reduction due to PRE
  2011-03-29 10:02 [Bug middle-end/48329] New: Program takes twice as long *without* -fopenmp than with 1 OpenMP thread burnus at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2014-04-30 11:35 ` rguenth at gcc dot gnu.org
@ 2014-04-30 11:44 ` rguenth at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-04-30 11:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48329

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Wed Apr 30 11:43:41 2014
New Revision: 209930

URL: http://gcc.gnu.org/viewcvs?rev=209930&root=gcc&view=rev
Log:
2014-04-30  Richard Biener  <rguenther@suse.de>

    PR tree-optimization/48329
    * gfortran.dg/vect/pr48329.f90: New testcase.

Added:
    trunk/gcc/testsuite/gfortran.dg/vect/pr48329.f90
Modified:
    trunk/gcc/testsuite/ChangeLog


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-04-30 11:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-29 10:02 [Bug middle-end/48329] New: Program takes twice as long *without* -fopenmp than with 1 OpenMP thread burnus at gcc dot gnu.org
2011-03-29 11:30 ` [Bug tree-optimization/48329] Missed vectorization of reduction due to PRE rguenth at gcc dot gnu.org
2014-04-30  9:34 ` dominiq at lps dot ens.fr
2014-04-30 11:35 ` rguenth at gcc dot gnu.org
2014-04-30 11:44 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).