public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/31067]  New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
@ 2007-03-07 10:20 fxcoudert at gcc dot gnu dot org
  2007-03-07 11:28 ` [Bug fortran/31067] " tkoenig at gcc dot gnu dot org
                   ` (40 more replies)
  0 siblings, 41 replies; 47+ messages in thread
From: fxcoudert at gcc dot gnu dot org @ 2007-03-07 10:20 UTC (permalink / raw)
  To: gcc-bugs

[see http://www.polyhedron.co.uk/pb05/linux/f90bench_AMD.html for the original
polyhedron benchmark results, an explanation of what the benchmark is and the
source code]

Typical timings for the gas_dyn.f90 benchmark on my AMD64/linux system are:

* ifort -O3 -xW -ipo -static -V gas_dyn.f90 -o gas_dyn.intel
=> ./gas_dyn.intel  10.53s user 0.43s system 99% cpu 10.976 total

* gfortran -static -ftree-vectorize -march=opteron -ffast-math -funroll-loops
-O3 gas_dyn.f90 -o gas_dyn.gfortran
./gas_dyn.gfortran  15.92s user 0.05s system 99% cpu 15.969 total

Experimenting a bit with Intel options to understand why it is so fast, I found
that:
  * disabling inlining doesn't change the execution time
  * disabling vectorization drops it to the same execution time as gfortran
(roughly speaking)

Following an analysis by Tobias Burnus, and noting that 22.16% of the total
time is spent in the MINLOC library routine, I modified the source by replacing
a call to MINLOC by inline code:
--- gas_dyn.f90 2007-03-07 09:36:23.000000000 +0100
+++ gas_dyn.modified.f90        2007-03-07 10:44:14.000000000 +0100
@@ -234,12 +234,23 @@ end module ints
 !-----------------------------------------------
 !   L o c a l   V a r i a b l e s
 !-----------------------------------------------
-      INTEGER :: ISET(1)
-      REAL :: VSET, SSET
+      INTEGER :: ISET(1), I
+      REAL :: VSET, SSET, T
       REAL, DIMENSION (NODES) :: DTEMP
 !-----------------------------------------------
       DTEMP = DX/(ABS(VEL) + SOUND)
-      ISET = MINLOC (DTEMP)
+! FXC replace this:
+!      ISET = MINLOC (DTEMP)
+! by this:
+      ISET(1) = 0
+      T = HUGE(T)
+      DO I = 1, NODES
+        IF (DTEMP(I) < T) THEN
+          T = DTEMP(I)
+          ISET(1) = I
+        END IF
+      END DO
+! end of modification
       DT = DTEMP(ISET(1))
       VSET = VEL(ISET(1))
       SSET = SOUND(ISET(1))

this makes the code faster by 14%:
./gas_dyn.modified.gfortran  13.56s user 0.05s system 99% cpu 13.614 total

Maybe we should have MINLOC inlined when there's no mask, stride 1 and
one-dimensional?


PS: Other hot spots are:
  %   cumulative   self              self     total
 time   seconds   seconds    calls  Ts/call  Ts/call  name
 29.13      4.18     4.18                             eos_ (gas_dyn.f90:410 @
413386)
 14.22      6.22     2.04                             chozdt_ (gas_dyn.f90:241
@ 4152b3)

Both lines are whole-array operations, corresponding to:
    CS(:NODES) = SQRT(CGAMMA*PRES(:NODES)/DENS(:NODES))
and
    DTEMP = DX/(ABS(VEL) + SOUND)

I filed PR31066, which is I think a small reproducer for the two lines above.


-- 
           Summary: MINLOC should sometimes be inlined (gas_dyn is sooooo
                    sloooow)
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: fortran
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: fxcoudert at gcc dot gnu dot org
 BugsThisDependsOn: 31066


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
@ 2007-03-07 11:28 ` tkoenig at gcc dot gnu dot org
  2007-03-07 12:18 ` fxcoudert at gcc dot gnu dot org
                   ` (39 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu dot org @ 2007-03-07 11:28 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from tkoenig at gcc dot gnu dot org  2007-03-07 11:27 -------
(In reply to comment #0)

> Maybe we should have MINLOC inlined when there's no mask, stride 1 and
> one-dimensional?

Definitely.  We do this for minval, and from glancing at
gfc_conv_intrinsic_minmaxval and gfc_conv_intrinsic_minmaxloc,
it should happen already.


-- 

tkoenig at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2007-03-07 11:27:52
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
  2007-03-07 11:28 ` [Bug fortran/31067] " tkoenig at gcc dot gnu dot org
@ 2007-03-07 12:18 ` fxcoudert at gcc dot gnu dot org
  2007-03-07 21:00 ` tkoenig at gcc dot gnu dot org
                   ` (38 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: fxcoudert at gcc dot gnu dot org @ 2007-03-07 12:18 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from fxcoudert at gcc dot gnu dot org  2007-03-07 12:18 -------
(In reply to comment #1)
> We do this for minval, and from glancing at
> gfc_conv_intrinsic_minmaxval and gfc_conv_intrinsic_minmaxloc,
> it should happen already.

No, because we never get into gfc_conv_intrinsic_minmaxloc. We translate the
expression directly into a function call by calling
gfc_conv_intrinsic_funcall() at the head of gfc_conv_intrinsic_function(),
instead of going down the list. I wonder how SUM and PRODUCT are inlined...


-- 

fxcoudert at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2007-03-07 11:27:52         |2007-03-07 12:18:10
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
  2007-03-07 11:28 ` [Bug fortran/31067] " tkoenig at gcc dot gnu dot org
  2007-03-07 12:18 ` fxcoudert at gcc dot gnu dot org
@ 2007-03-07 21:00 ` tkoenig at gcc dot gnu dot org
  2007-03-07 21:09 ` fxcoudert at gmail dot com
                   ` (37 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu dot org @ 2007-03-07 21:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from tkoenig at gcc dot gnu dot org  2007-03-07 21:00 -------
(In reply to comment #2)

> No, because we never get into gfc_conv_intrinsic_minmaxloc. We translate the
> expression directly into a function call by calling
> gfc_conv_intrinsic_funcall() at the head of gfc_conv_intrinsic_function(),
> instead of going down the list. I wonder how SUM and PRODUCT are inlined...

In  gfc_conv_intrinsic_function, expr->rank is 0 for minval
and 1 for minloc (which is bogus).  I wonder where this is
set...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (2 preceding siblings ...)
  2007-03-07 21:00 ` tkoenig at gcc dot gnu dot org
@ 2007-03-07 21:09 ` fxcoudert at gmail dot com
  2007-03-07 21:10 ` tkoenig at gcc dot gnu dot org
                   ` (36 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: fxcoudert at gmail dot com @ 2007-03-07 21:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from fxcoudert at gmail dot com  2007-03-07 21:09 -------
Subject: Re:  MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

> In  gfc_conv_intrinsic_function, expr->rank is 0 for minval
> and 1 for minloc (which is bogus).

It's not bogus. The MINLOC is an array of rank 1, that's correct.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (3 preceding siblings ...)
  2007-03-07 21:09 ` fxcoudert at gmail dot com
@ 2007-03-07 21:10 ` tkoenig at gcc dot gnu dot org
  2007-03-07 21:29 ` tkoenig at gcc dot gnu dot org
                   ` (35 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu dot org @ 2007-03-07 21:10 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from tkoenig at gcc dot gnu dot org  2007-03-07 21:09 -------
(In reply to comment #3)

> In  gfc_conv_intrinsic_function, expr->rank is 0 for minval
> and 1 for minloc (which is bogus).  I wonder where this is
> set...

To answer my own question:  This is set in gfc_resolve_minloc.
I'll try to give it a shot.


-- 

tkoenig at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tkoenig at gcc dot gnu dot
                   |                            |org
         AssignedTo|unassigned at gcc dot gnu   |tkoenig at gcc dot gnu dot
                   |dot org                     |org
             Status|NEW                         |ASSIGNED
   Last reconfirmed|2007-03-07 12:18:10         |2007-03-07 21:09:53
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (4 preceding siblings ...)
  2007-03-07 21:10 ` tkoenig at gcc dot gnu dot org
@ 2007-03-07 21:29 ` tkoenig at gcc dot gnu dot org
  2007-03-08  5:50 ` fxcoudert at gcc dot gnu dot org
                   ` (34 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu dot org @ 2007-03-07 21:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from tkoenig at gcc dot gnu dot org  2007-03-07 21:29 -------
Created an attachment (id=13165)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13165&action=view)
Setting the correct rank in minloc

This makes minloc have rank 0, and allows for
inlining.  I guess we'll find out now wether the
inline code works :-)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (5 preceding siblings ...)
  2007-03-07 21:29 ` tkoenig at gcc dot gnu dot org
@ 2007-03-08  5:50 ` fxcoudert at gcc dot gnu dot org
  2007-03-10 12:34 ` tkoenig at gcc dot gnu dot org
                   ` (33 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: fxcoudert at gcc dot gnu dot org @ 2007-03-08  5:50 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from fxcoudert at gcc dot gnu dot org  2007-03-08 05:50 -------
(In reply to comment #6)
> This makes minloc have rank 0, and allows for
> inlining.

No, it's wrong. See F95 13.14.71: "Result Characteristics. The result is of
type default integer.  If DIM is absent, the result is an array of rank one and
of size equal to the rank of ARRAY; otherwise, the result is of rank n-1 and
shape ..."

Note in particular the example for case (i), on the next page: "The value of
MINLOC((/4,3,6,3/)) is [2]". If DIM is absent, the result is always an array of
rank 1. Only if DIM is present, and the ARRAY is of rank 1, then MINLOC is a
scalar: MINLOC((/4,3,6,3/),1) == 2


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (6 preceding siblings ...)
  2007-03-08  5:50 ` fxcoudert at gcc dot gnu dot org
@ 2007-03-10 12:34 ` tkoenig at gcc dot gnu dot org
  2007-03-11 19:43 ` tkoenig at gcc dot gnu dot org
                   ` (32 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu dot org @ 2007-03-10 12:34 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from tkoenig at gcc dot gnu dot org  2007-03-10 12:34 -------
(In reply to comment #7)
> (In reply to comment #6)
> > This makes minloc have rank 0, and allows for
> > inlining.
> 
> No, it's wrong. See F95 13.14.71: "Result Characteristics. The result is of
> type default integer.  If DIM is absent, the result is an array of rank one and
> of size equal to the rank of ARRAY; otherwise, the result is of rank n-1 and
> shape ..."

You're right.  Back to the drawing board.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (7 preceding siblings ...)
  2007-03-10 12:34 ` tkoenig at gcc dot gnu dot org
@ 2007-03-11 19:43 ` tkoenig at gcc dot gnu dot org
  2007-03-12 19:04 ` pault at gcc dot gnu dot org
                   ` (31 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu dot org @ 2007-03-11 19:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from tkoenig at gcc dot gnu dot org  2007-03-11 19:43 -------
I have looked at this some more.  Channging gfc_conv_intrinsic_function so that
we call gfc_conv_intrinsic_minmaxloc is easy enough:

@@ -3481,7 +3481,9 @@ gfc_conv_intrinsic_function (gfc_se * se

   name = &expr->value.function.name[2];

-  if (expr->rank > 0 && !expr->inline_noncopying_intrinsic)
+  if (expr->rank > 0 && !expr->inline_noncopying_intrinsic
+      && ! (expr->rank == 1 && (isym->generic_id ==  GFC_ISYM_MINLOC
+                               || isym->generic_id == GFC_ISYM_MAXLOC)))
     {
       lib = gfc_is_intrinsic_libcall (expr);
       if (lib != 0)

If we do that, we hit the  "if (se->ss)" contition on top of that function, and
we would have to handle scalarization of that one-trip loop.  I have currently
no idea how to go about that.  Simply removing the condition doesn't work :-)

As a workaround, one could always use "minloc(...,dim=1)", then
we get the inline version.

Unassigning myself.


-- 

tkoenig at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|tkoenig at gcc dot gnu dot  |unassigned at gcc dot gnu
                   |org                         |dot org
             Status|ASSIGNED                    |NEW


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (8 preceding siblings ...)
  2007-03-11 19:43 ` tkoenig at gcc dot gnu dot org
@ 2007-03-12 19:04 ` pault at gcc dot gnu dot org
  2007-03-13 20:12 ` tkoenig at gcc dot gnu dot org
                   ` (30 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: pault at gcc dot gnu dot org @ 2007-03-12 19:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from pault at gcc dot gnu dot org  2007-03-12 19:04 -------
(In reply to comment #9)

> 
> As a workaround, one could always use "minloc(...,dim=1)", then
> we get the inline version.
> 
Thomas, it's a bit kludgy, but why not add a constant expression = 1, if dim is
not present?

Cheers

Paul


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (9 preceding siblings ...)
  2007-03-12 19:04 ` pault at gcc dot gnu dot org
@ 2007-03-13 20:12 ` tkoenig at gcc dot gnu dot org
  2007-03-26 10:38 ` pault at gcc dot gnu dot org
                   ` (29 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu dot org @ 2007-03-13 20:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from tkoenig at gcc dot gnu dot org  2007-03-13 20:12 -------
(In reply to comment #10)

> Thomas, it's a bit kludgy, but why not add a constant expression = 1, if dim is
> not present?

Hi Paul,

unless I'm mistaken, this would also change the rank of the
function to 0, which FX explained is wrong.

Do you have any idea what I cold do to turn this into an array?
All the "interesting" gfc_conv_intrinsic_* functions have the
"if (se->ss)" statement on top.

Cheers,
      Thomas


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (10 preceding siblings ...)
  2007-03-13 20:12 ` tkoenig at gcc dot gnu dot org
@ 2007-03-26 10:38 ` pault at gcc dot gnu dot org
  2007-03-26 11:44 ` pault at gcc dot gnu dot org
                   ` (28 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: pault at gcc dot gnu dot org @ 2007-03-26 10:38 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from pault at gcc dot gnu dot org  2007-03-26 11:37 -------
(In reply to comment #11)
> (In reply to comment #10)

> Do you have any idea what I cold do to turn this into an array?
> All the "interesting" gfc_conv_intrinsic_* functions have the
> "if (se->ss)" statement on top.

I'll put my thinking cap on.  To first order, you have to follow the same route
as array_transfer..... I'll come back to you.

Paul


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (11 preceding siblings ...)
  2007-03-26 10:38 ` pault at gcc dot gnu dot org
@ 2007-03-26 11:44 ` pault at gcc dot gnu dot org
  2007-04-02 16:44 ` tkoenig at gcc dot gnu dot org
                   ` (27 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: pault at gcc dot gnu dot org @ 2007-03-26 11:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from pault at gcc dot gnu dot org  2007-03-26 12:43 -------
(In reply to comment #11)
> (In reply to comment #10)
Thomas,

It does not look too bad:

Look at the tail end of array_transfer -

  gfc_trans_create_temp_array (&se->pre, &se->post, se->loop,
                               info, mold_type, false, true, false);

  /* Cast the pointer to the result.  */
  tmp = gfc_conv_descriptor_data_get (info->descriptor);

Will produce a temporary array of the right dimension, together with its
descriptor.

For minmaxloc, mold_type will have to be replaced by the TREE_TYPE of the
result (gfc_array_index_type, I suppose?  or else, you will have to use
gfc_typenode_for_spec (&expr->ts);, I think). tmp will now be a pointer to your
result array.  This needs to appear fairly early on in minmaxloc, so that the
array can be set to zero to initialize it and so that the location can be
tranferred to it.  The standard checks will have to be made that ss->loop
exists etc.. - just check some of the other array valued in-line intrinsics.

Having done this, you will need to replace the line (~2049), folowing "remember
where we are", with a loop over the n dimensions (note, we do not have to
restrict ourselves to one dimension:) ).

Something like:

  /* Remember where we are.  */
  for (n = 0; n < loop.dimen; n++)
    {
      pos = build_fold_indirect_ref (gfc_conv_array_data info->descriptor));
      pos = gfc_build_array_ref (pos, build_int_cst (gfc_array_index_type, n))
      gfc_add_modify_expr (&ifblock, pos, loop.loopvar[n]);
    }
should bang the position into the result array, which is transferred at the end
with
  se->expr = info->descriptor;

Good luck

Cheers

Paul


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (12 preceding siblings ...)
  2007-03-26 11:44 ` pault at gcc dot gnu dot org
@ 2007-04-02 16:44 ` tkoenig at gcc dot gnu dot org
  2007-04-02 20:01 ` tkoenig at gcc dot gnu dot org
                   ` (26 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu dot org @ 2007-04-02 16:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from tkoenig at gcc dot gnu dot org  2007-04-02 17:44 -------
I'll give this another shot.

Maybe inlining isn't even necessary for good performance...


-- 

tkoenig at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |tkoenig at gcc dot gnu dot
                   |dot org                     |org
             Status|NEW                         |ASSIGNED
   Last reconfirmed|2007-03-07 21:09:53         |2007-04-02 17:44:28
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (13 preceding siblings ...)
  2007-04-02 16:44 ` tkoenig at gcc dot gnu dot org
@ 2007-04-02 20:01 ` tkoenig at gcc dot gnu dot org
  2007-05-18 20:16 ` jb at gcc dot gnu dot org
                   ` (25 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu dot org @ 2007-04-02 20:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from tkoenig at gcc dot gnu dot org  2007-04-02 21:00 -------
The library version doesn't do too badly compared
to the inline version:

$ cat benchmark-inline.f90
program main
  implicit none
  integer, dimension(1) :: n
  real, allocatable :: a(:)
  integer :: i

  allocate (a(100))
  call random_number(a)
  do i=1, 10000000
    n = minloc(a, dim=1)
 end do
end program main
$ cat benchmark-library.f90
program main
  implicit none
  integer, dimension(1) :: n
  real, allocatable :: a(:)
  integer :: i

  allocate (a(100))
  call random_number(a)
  do i=1, 10000000
    n = minloc(a)
 end do
end program main
$ gfortran -O3 -static benchmark-inline.f90 && ./a.out && time ./a.out

real    0m6.941s
user    0m5.232s
sys     0m0.016s
$ gfortran -O3 -static benchmark-library.f90 && ./a.out && time ./a.out

real    0m5.720s
user    0m5.472s
sys     0m0.004s


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (14 preceding siblings ...)
  2007-04-02 20:01 ` tkoenig at gcc dot gnu dot org
@ 2007-05-18 20:16 ` jb at gcc dot gnu dot org
  2007-05-18 20:20 ` jb at gcc dot gnu dot org
                   ` (24 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: jb at gcc dot gnu dot org @ 2007-05-18 20:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from jb at gcc dot gnu dot org  2007-05-18 21:15 -------
The critical thing with inlining array intrinsics, IMHO is to give the
optimizer more data to work with allowing it to get rid of temp arrays, perform
loop fusion or fission etc. So with a trivial benchmark like #15, you don't see
any difference, except with potentially higher optimization for user code than
libgfortran. For example in gas_dyn/chozdt:

      REAL, DIMENSION (NODES) :: DTEMP
!-----------------------------------------------
! Profile for gfortran 4.3:
! CPU_CLK_UNHALTED L2_CACHE_MISS
! samp  %runtim  samp %tot
! 59887 22.4783  1484 10.9828   :      DTEMP = DX/(ABS(VEL) + SOUND)
! ifort 9.1 profile
! 40104 16.2034  1198  8.8166   :      DTEMP = DX/(ABS(VEL) + SOUND)

      DTEMP = DX/(ABS(VEL) + SOUND)
      ISET = MINLOC (DTEMP)
      DT = DTEMP(ISET(1))

If MINLOC were inlined, perhaps a sufficiently optimizer could convert this
into the equivalent

REAL :: DTEMPMIN
INTEGER :: i

DTEMPMIN = HUGE(1.0d0)
DO I = 1, NODES
    DT = DX(I)/(ABS(VEL(I)+SOUND(I))
    IF (DT < DTEMPMIN) THEN
        DTEMPMIN = DT
    END IF
END DO
DT = DTEMPMIN

i.e. avoid the temporary array entirely. Yes, I guess this is quite a lot to
ask.


-- 

jb at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jb at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (15 preceding siblings ...)
  2007-05-18 20:16 ` jb at gcc dot gnu dot org
@ 2007-05-18 20:20 ` jb at gcc dot gnu dot org
  2007-06-15 20:35 ` tkoenig at gcc dot gnu dot org
                   ` (23 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: jb at gcc dot gnu dot org @ 2007-05-18 20:20 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from jb at gcc dot gnu dot org  2007-05-18 21:20 -------
Or even better (duh):

REAL :: DTEMP

DT = HUGE(1.0d0)
DO I = 1, NODES
    DTEMP = DX(I)/(ABS(VEL(I)+SOUND(I))
    IF (DTEMP < DT) THEN
        DT = DTEMP
    END IF
END DO


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (16 preceding siblings ...)
  2007-05-18 20:20 ` jb at gcc dot gnu dot org
@ 2007-06-15 20:35 ` tkoenig at gcc dot gnu dot org
  2007-06-27 14:49 ` jb at gcc dot gnu dot org
                   ` (22 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu dot org @ 2007-06-15 20:35 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #18 from tkoenig at gcc dot gnu dot org  2007-06-15 20:35 -------
Too little time right now.

Unassigning myself.


-- 

tkoenig at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|tkoenig at gcc dot gnu dot  |unassigned at gcc dot gnu
                   |org                         |dot org
             Status|ASSIGNED                    |NEW


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (17 preceding siblings ...)
  2007-06-15 20:35 ` tkoenig at gcc dot gnu dot org
@ 2007-06-27 14:49 ` jb at gcc dot gnu dot org
  2008-01-02 21:03 ` pault at gcc dot gnu dot org
                   ` (21 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: jb at gcc dot gnu dot org @ 2007-06-27 14:49 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #19 from jb at gcc dot gnu dot org  2007-06-27 14:49 -------
gfortran does inline most array intrinsics, but only if the result is a scalar.
For most array intrinsics this isn't that much of a problem since usually one
uses the variant that returns a scalar, but MINLOC is different in that usually
one wants to use the version that returns an array. If one implements this I
guess it would be straightforward to replicate the solution to many other array
intrinsics as well.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (18 preceding siblings ...)
  2007-06-27 14:49 ` jb at gcc dot gnu dot org
@ 2008-01-02 21:03 ` pault at gcc dot gnu dot org
  2008-01-02 21:07 ` dominiq at lps dot ens dot fr
                   ` (20 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: pault at gcc dot gnu dot org @ 2008-01-02 21:03 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #20 from pault at gcc dot gnu dot org  2008-01-02 19:30 -------
(In reply to comment #19)
> gfortran does inline most array intrinsics, but only if the result is a scalar.
> For most array intrinsics this isn't that much of a problem since usually one
> uses the variant that returns a scalar, but MINLOC is different in that usually
> one wants to use the version that returns an array. If one implements this I
> guess it would be straightforward to replicate the solution to many other array
> intrinsics as well.
> 
Janne,

In contemplating what to do with gfortran in the New Year, I have been mulling
over in-lining of array intrinsics; MATMUL is one distinctly possible one, as
are MINLOC and MAXLOC.

Cheers

Paul


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (19 preceding siblings ...)
  2008-01-02 21:03 ` pault at gcc dot gnu dot org
@ 2008-01-02 21:07 ` dominiq at lps dot ens dot fr
  2009-07-03 10:00 ` rguenth at gcc dot gnu dot org
                   ` (19 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: dominiq at lps dot ens dot fr @ 2008-01-02 21:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #21 from dominiq at lps dot ens dot fr  2008-01-02 20:27 -------
> MATMUL is one distinctly possible one

Paul,
If you are interested, I have a variant of induct.f90 in which I have replaced
three dot-products by the matrix-vector product for a total disaster on all
compilers I have tried (for gfortran from 66s to 196s).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (20 preceding siblings ...)
  2008-01-02 21:07 ` dominiq at lps dot ens dot fr
@ 2009-07-03 10:00 ` rguenth at gcc dot gnu dot org
  2009-07-03 12:19 ` rguenth at gcc dot gnu dot org
                   ` (18 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-07-03 10:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #22 from rguenth at gcc dot gnu dot org  2009-07-03 10:00 -------
One issue is that

     ISET = MINLOC (DTEMP)

will cause GCC to assume that DTEMP is clobbered.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (21 preceding siblings ...)
  2009-07-03 10:00 ` rguenth at gcc dot gnu dot org
@ 2009-07-03 12:19 ` rguenth at gcc dot gnu dot org
  2009-07-03 12:41 ` burnus at gcc dot gnu dot org
                   ` (17 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-07-03 12:19 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #23 from rguenth at gcc dot gnu dot org  2009-07-03 12:19 -------
We are not able to vectorize the loop in

program main
  implicit none
  integer, volatile, dimension(1) :: n
  real, allocatable :: a(:)
  integer :: i
  real :: t1, t2

  allocate (a(100))
  call random_number(a) ! negligible time
  !call cpu_time(t1)
  do i=1, 10000000
    n = minloc(a, dim=1)
  end do
  !call cpu_time(t2)
 print *, n !, t2-t1
end program main

because there are two reductions in that loop which I think the vectorizer
cannot handle:

<bb 7>:
  # pos.0_31 = PHI <pos.0_3(10), 0(6)>
  # limit.2_8 = PHI <limit.2_5(10),
3.4028234663852885981170418348451692544e+38(6)>
  # S.3_74 = PHI <S.3_33(10), pretmp.22_79(6)>
  D.1593_21 = S.3_74 + pretmp.22_77;
  limit.2_22 = (*D.1568_14)[D.1593_21];
  D.1595_23 = limit.2_22 < limit.2_8;
  D.1596_24 = pos.0_31 == 0;
  D.1597_27 = limit.2_8 == limit.2_22;
  D.1598_28 = D.1597_27 & D.1596_24;
  D.1599_29 = D.1595_23 | D.1598_28;
  pos.0_32 = S.3_74 + pretmp.28_90;
  pos.0_3 = [cond_expr] D.1599_29 ? pos.0_32 : pos.0_31;
  limit.2_5 = [cond_expr] D.1599_29 ? limit.2_22 : limit.2_8;
  S.3_33 = S.3_74 + 1;
  if (S.3_33 > pretmp.22_81)
    goto <bb 11>;
  else
    goto <bb 10>;

<bb 10>:
  goto <bb 7>;

we reduce over limit.2_5 and pos.0_3.  The intel compiler vectorizes
the function just fine.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |irar at il dot ibm dot com,
                   |                            |rguenth at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (22 preceding siblings ...)
  2009-07-03 12:19 ` rguenth at gcc dot gnu dot org
@ 2009-07-03 12:41 ` burnus at gcc dot gnu dot org
  2009-07-03 12:57 ` rguenth at gcc dot gnu dot org
                   ` (16 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: burnus at gcc dot gnu dot org @ 2009-07-03 12:41 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #24 from burnus at gcc dot gnu dot org  2009-07-03 12:40 -------
> One issue is that
>      ISET = MINLOC (DTEMP)
> will cause GCC to assume that DTEMP is clobbered.

The problem is that while "MINLOC" is pure, we cannot use DECL_PURE_P as the
result is passed by reference:
  (void)  minloc(&isset, DTEMP)
                 ^^^^^^--- result


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (23 preceding siblings ...)
  2009-07-03 12:41 ` burnus at gcc dot gnu dot org
@ 2009-07-03 12:57 ` rguenth at gcc dot gnu dot org
  2009-07-03 13:07 ` burnus at gcc dot gnu dot org
                   ` (15 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-07-03 12:57 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #25 from rguenth at gcc dot gnu dot org  2009-07-03 12:57 -------
Btw, the inlined minloc

              D.1570 = a.dim[0].lbound;
              D.1571 = a.dim[0].ubound;
              pos.0 = 0;
              {
                integer(kind=8) S.3;

                 ({ S.3 = D.1570;
                while (1)
                  {
                     ({ if (S.3 > D.1571) goto L.3;
                    offset.1 = 1 - D.1570;
                    if ((*D.1568)[S.3 + D.1569] < limit.2 || pos.0 == 0 &&
(*D.1568)[S.3 + D.1569] == limit.2)
                      {
                         ({ limit.2 = (*D.1568)[S.3 + D.1569];
                        pos.0 = S.3 + offset.1; }) void
                      }
                    S.3 = S.3 + 1; }) void
                  }
                L.3:; }) void

has a superfluous check || (pos.0 == 0 && (*D.1568)[S.3 + D.1569] == limit.2)
at least for flag_finite_math_only.  If the array cannot contain Inf or NaN
then it either has all elements == FLT_MAX, so pos will stay zero, or at
least one is less than FLT_MAX in which case pos will be adjusted anyway.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (24 preceding siblings ...)
  2009-07-03 12:57 ` rguenth at gcc dot gnu dot org
@ 2009-07-03 13:07 ` burnus at gcc dot gnu dot org
  2009-07-05  6:48 ` irar at il dot ibm dot com
                   ` (14 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: burnus at gcc dot gnu dot org @ 2009-07-03 13:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #26 from burnus at gcc dot gnu dot org  2009-07-03 13:07 -------
> has a superfluous check || (pos.0 == 0 && (*D.1568)[S.3 + D.1569] == limit.2)
> at least for flag_finite_math_only.  If the array cannot contain Inf or NaN
> then it either has all elements == FLT_MAX, so pos will stay zero, or at
> least one is less than FLT_MAX in which case pos will be adjusted anyway.

I have not checked whether algorithm requires the check; NaN/Inf are possible,
but maybe the check is still not needed. And if it is, one could enclose it in
a if(!flag_finite_math_only) condition.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (25 preceding siblings ...)
  2009-07-03 13:07 ` burnus at gcc dot gnu dot org
@ 2009-07-05  6:48 ` irar at il dot ibm dot com
  2009-07-20 12:04 ` irar at il dot ibm dot com
                   ` (13 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: irar at il dot ibm dot com @ 2009-07-05  6:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #27 from irar at il dot ibm dot com  2009-07-05 06:48 -------
(In reply to comment #23)
> because there are two reductions in that loop which I think the vectorizer
> cannot handle:

Actually, the vectorizer can vectorize two reductions. I think, the problem is
in cond_expr in reduction:

>   pos.0_3 = [cond_expr] D.1599_29 ? pos.0_32 : pos.0_31;
>   limit.2_5 = [cond_expr] D.1599_29 ? limit.2_22 : limit.2_8;

I'll look into it.

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (26 preceding siblings ...)
  2009-07-05  6:48 ` irar at il dot ibm dot com
@ 2009-07-20 12:04 ` irar at il dot ibm dot com
  2009-07-24  7:57 ` jakub at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: irar at il dot ibm dot com @ 2009-07-20 12:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #28 from irar at il dot ibm dot com  2009-07-20 12:03 -------
I've just committed a patch that adds support of cond_expr in reductions in
nested cycles (http://gcc.gnu.org/ml/gcc-patches/2009-07/msg01124.html). 

cond_expr cannot be vectorized in reduction of inner-most loop, because such
reduction changes the order of computation, and that cannot be done for
cond_expr.

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (27 preceding siblings ...)
  2009-07-20 12:04 ` irar at il dot ibm dot com
@ 2009-07-24  7:57 ` jakub at gcc dot gnu dot org
  2009-07-24  8:20 ` burnus at gcc dot gnu dot org
                   ` (11 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: jakub at gcc dot gnu dot org @ 2009-07-24  7:57 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #29 from jakub at gcc dot gnu dot org  2009-07-24 07:57 -------
Subject: Bug 31067

Author: jakub
Date: Fri Jul 24 07:57:13 2009
New Revision: 150041

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=150041
Log:
        PR fortran/40643
        PR fortran/31067
        * trans-intrinsic.c (gfc_conv_intrinsic_minmaxloc,
        gfc_conv_intrinsic_minmaxval): Handle Infinities and NaNs properly,
        optimize.
        * trans-array.c (gfc_trans_scalarized_loop_end): No longer static.
        * trans-array.h (gfc_trans_scalarized_loop_end): New prototype.

        * libgfortran.h (GFC_REAL_4_INFINITY, GFC_REAL_8_INFINITY,
        GFC_REAL_10_INFINITY, GFC_REAL_16_INFINITY, GFC_REAL_4_QUIET_NAN,
        GFC_REAL_8_QUIET_NAN, GFC_REAL_10_QUIET_NAN, GFC_REAL_16_QUIET_NAN):
        Define.
        * m4/iparm.m4 (atype_inf, atype_nan): Define.
        * m4/ifunction.m4: Formatting.
        * m4/iforeach.m4: Likewise.
        (START_FOREACH_FUNCTION): Initialize dest to all 1s, not all 0s.
        (START_FOREACH_BLOCK, FINISH_FOREACH_FUNCTION,
        FINISH_MASKED_FOREACH_FUNCTION): Run foreach block inside a loop
        until count[0] == extent[0].
        * m4/minval.m4: Formatting.  Handle NaNs and infinities.  Optimize.
        * m4/maxval.m4: Likewise.
        * m4/minloc0.m4: Likewise.
        * m4/maxloc0.m4: Likewise.
        * m4/minloc1.m4: Likewise.
        * m4/maxloc1.m4: Likewise.
        * generated/maxloc0_16_i16.c: Regenerated.
        * generated/maxloc0_16_i1.c: Likewise.
        * generated/maxloc0_16_i2.c: Likewise.
        * generated/maxloc0_16_i4.c: Likewise.
        * generated/maxloc0_16_i8.c: Likewise.
        * generated/maxloc0_16_r10.c: Likewise.
        * generated/maxloc0_16_r16.c: Likewise.
        * generated/maxloc0_16_r4.c: Likewise.
        * generated/maxloc0_16_r8.c: Likewise.
        * generated/maxloc0_4_i16.c: Likewise.
        * generated/maxloc0_4_i1.c: Likewise.
        * generated/maxloc0_4_i2.c: Likewise.
        * generated/maxloc0_4_i4.c: Likewise.
        * generated/maxloc0_4_i8.c: Likewise.
        * generated/maxloc0_4_r10.c: Likewise.
        * generated/maxloc0_4_r16.c: Likewise.
        * generated/maxloc0_4_r4.c: Likewise.
        * generated/maxloc0_4_r8.c: Likewise.
        * generated/maxloc0_8_i16.c: Likewise.
        * generated/maxloc0_8_i1.c: Likewise.
        * generated/maxloc0_8_i2.c: Likewise.
        * generated/maxloc0_8_i4.c: Likewise.
        * generated/maxloc0_8_i8.c: Likewise.
        * generated/maxloc0_8_r10.c: Likewise.
        * generated/maxloc0_8_r16.c: Likewise.
        * generated/maxloc0_8_r4.c: Likewise.
        * generated/maxloc0_8_r8.c: Likewise.
        * generated/maxloc1_16_i16.c: Likewise.
        * generated/maxloc1_16_i1.c: Likewise.
        * generated/maxloc1_16_i2.c: Likewise.
        * generated/maxloc1_16_i4.c: Likewise.
        * generated/maxloc1_16_i8.c: Likewise.
        * generated/maxloc1_16_r10.c: Likewise.
        * generated/maxloc1_16_r16.c: Likewise.
        * generated/maxloc1_16_r4.c: Likewise.
        * generated/maxloc1_16_r8.c: Likewise.
        * generated/maxloc1_4_i16.c: Likewise.
        * generated/maxloc1_4_i1.c: Likewise.
        * generated/maxloc1_4_i2.c: Likewise.
        * generated/maxloc1_4_i4.c: Likewise.
        * generated/maxloc1_4_i8.c: Likewise.
        * generated/maxloc1_4_r10.c: Likewise.
        * generated/maxloc1_4_r16.c: Likewise.
        * generated/maxloc1_4_r4.c: Likewise.
        * generated/maxloc1_4_r8.c: Likewise.
        * generated/maxloc1_8_i16.c: Likewise.
        * generated/maxloc1_8_i1.c: Likewise.
        * generated/maxloc1_8_i2.c: Likewise.
        * generated/maxloc1_8_i4.c: Likewise.
        * generated/maxloc1_8_i8.c: Likewise.
        * generated/maxloc1_8_r10.c: Likewise.
        * generated/maxloc1_8_r16.c: Likewise.
        * generated/maxloc1_8_r4.c: Likewise.
        * generated/maxloc1_8_r8.c: Likewise.
        * generated/maxval_i16.c: Likewise.
        * generated/maxval_i1.c: Likewise.
        * generated/maxval_i2.c: Likewise.
        * generated/maxval_i4.c: Likewise.
        * generated/maxval_i8.c: Likewise.
        * generated/maxval_r10.c: Likewise.
        * generated/maxval_r16.c: Likewise.
        * generated/maxval_r4.c: Likewise.
        * generated/maxval_r8.c: Likewise.
        * generated/minloc0_16_i16.c: Likewise.
        * generated/minloc0_16_i1.c: Likewise.
        * generated/minloc0_16_i2.c: Likewise.
        * generated/minloc0_16_i4.c: Likewise.
        * generated/minloc0_16_i8.c: Likewise.
        * generated/minloc0_16_r10.c: Likewise.
        * generated/minloc0_16_r16.c: Likewise.
        * generated/minloc0_16_r4.c: Likewise.
        * generated/minloc0_16_r8.c: Likewise.
        * generated/minloc0_4_i16.c: Likewise.
        * generated/minloc0_4_i1.c: Likewise.
        * generated/minloc0_4_i2.c: Likewise.
        * generated/minloc0_4_i4.c: Likewise.
        * generated/minloc0_4_i8.c: Likewise.
        * generated/minloc0_4_r10.c: Likewise.
        * generated/minloc0_4_r16.c: Likewise.
        * generated/minloc0_4_r4.c: Likewise.
        * generated/minloc0_4_r8.c: Likewise.
        * generated/minloc0_8_i16.c: Likewise.
        * generated/minloc0_8_i1.c: Likewise.
        * generated/minloc0_8_i2.c: Likewise.
        * generated/minloc0_8_i4.c: Likewise.
        * generated/minloc0_8_i8.c: Likewise.
        * generated/minloc0_8_r10.c: Likewise.
        * generated/minloc0_8_r16.c: Likewise.
        * generated/minloc0_8_r4.c: Likewise.
        * generated/minloc0_8_r8.c: Likewise.
        * generated/minloc1_16_i16.c: Likewise.
        * generated/minloc1_16_i1.c: Likewise.
        * generated/minloc1_16_i2.c: Likewise.
        * generated/minloc1_16_i4.c: Likewise.
        * generated/minloc1_16_i8.c: Likewise.
        * generated/minloc1_16_r10.c: Likewise.
        * generated/minloc1_16_r16.c: Likewise.
        * generated/minloc1_16_r4.c: Likewise.
        * generated/minloc1_16_r8.c: Likewise.
        * generated/minloc1_4_i16.c: Likewise.
        * generated/minloc1_4_i1.c: Likewise.
        * generated/minloc1_4_i2.c: Likewise.
        * generated/minloc1_4_i4.c: Likewise.
        * generated/minloc1_4_i8.c: Likewise.
        * generated/minloc1_4_r10.c: Likewise.
        * generated/minloc1_4_r16.c: Likewise.
        * generated/minloc1_4_r4.c: Likewise.
        * generated/minloc1_4_r8.c: Likewise.
        * generated/minloc1_8_i16.c: Likewise.
        * generated/minloc1_8_i1.c: Likewise.
        * generated/minloc1_8_i2.c: Likewise.
        * generated/minloc1_8_i4.c: Likewise.
        * generated/minloc1_8_i8.c: Likewise.
        * generated/minloc1_8_r10.c: Likewise.
        * generated/minloc1_8_r16.c: Likewise.
        * generated/minloc1_8_r4.c: Likewise.
        * generated/minloc1_8_r8.c: Likewise.
        * generated/minval_i16.c: Likewise.
        * generated/minval_i1.c: Likewise.
        * generated/minval_i2.c: Likewise.
        * generated/minval_i4.c: Likewise.
        * generated/minval_i8.c: Likewise.
        * generated/minval_r10.c: Likewise.
        * generated/minval_r16.c: Likewise.
        * generated/minval_r4.c: Likewise.
        * generated/minval_r8.c: Likewise.
        * generated/product_c10.c: Likewise.
        * generated/product_c16.c: Likewise.
        * generated/product_c4.c: Likewise.
        * generated/product_c8.c: Likewise.
        * generated/product_i16.c: Likewise.
        * generated/product_i1.c: Likewise.
        * generated/product_i2.c: Likewise.
        * generated/product_i4.c: Likewise.
        * generated/product_i8.c: Likewise.
        * generated/product_r10.c: Likewise.
        * generated/product_r16.c: Likewise.
        * generated/product_r4.c: Likewise.
        * generated/product_r8.c: Likewise.
        * generated/sum_c10.c: Likewise.
        * generated/sum_c16.c: Likewise.
        * generated/sum_c4.c: Likewise.
        * generated/sum_c8.c: Likewise.
        * generated/sum_i16.c: Likewise.
        * generated/sum_i1.c: Likewise.
        * generated/sum_i2.c: Likewise.
        * generated/sum_i4.c: Likewise.
        * generated/sum_i8.c: Likewise.
        * generated/sum_r10.c: Likewise.
        * generated/sum_r16.c: Likewise.
        * generated/sum_r4.c: Likewise.
        * generated/sum_r8.c: Likewise.

        * gfortran.dg/maxlocval_2.f90: New test.
        * gfortran.dg/maxlocval_3.f90: New test.
        * gfortran.dg/maxlocval_4.f90: New test.
        * gfortran.dg/minlocval_1.f90: New test.
        * gfortran.dg/minlocval_2.f90: New test.
        * gfortran.dg/minlocval_3.f90: New test.
        * gfortran.dg/minlocval_4.f90: New test.

Added:
    trunk/gcc/testsuite/gfortran.dg/maxlocval_2.f90
    trunk/gcc/testsuite/gfortran.dg/maxlocval_3.f90
    trunk/gcc/testsuite/gfortran.dg/maxlocval_4.f90
    trunk/gcc/testsuite/gfortran.dg/minlocval_1.f90
    trunk/gcc/testsuite/gfortran.dg/minlocval_2.f90
    trunk/gcc/testsuite/gfortran.dg/minlocval_3.f90
    trunk/gcc/testsuite/gfortran.dg/minlocval_4.f90
Modified:
    trunk/gcc/fortran/ChangeLog
    trunk/gcc/fortran/trans-array.c
    trunk/gcc/fortran/trans-array.h
    trunk/gcc/fortran/trans-intrinsic.c
    trunk/gcc/testsuite/ChangeLog
    trunk/libgfortran/ChangeLog
    trunk/libgfortran/generated/maxloc0_16_i1.c
    trunk/libgfortran/generated/maxloc0_16_i16.c
    trunk/libgfortran/generated/maxloc0_16_i2.c
    trunk/libgfortran/generated/maxloc0_16_i4.c
    trunk/libgfortran/generated/maxloc0_16_i8.c
    trunk/libgfortran/generated/maxloc0_16_r10.c
    trunk/libgfortran/generated/maxloc0_16_r16.c
    trunk/libgfortran/generated/maxloc0_16_r4.c
    trunk/libgfortran/generated/maxloc0_16_r8.c
    trunk/libgfortran/generated/maxloc0_4_i1.c
    trunk/libgfortran/generated/maxloc0_4_i16.c
    trunk/libgfortran/generated/maxloc0_4_i2.c
    trunk/libgfortran/generated/maxloc0_4_i4.c
    trunk/libgfortran/generated/maxloc0_4_i8.c
    trunk/libgfortran/generated/maxloc0_4_r10.c
    trunk/libgfortran/generated/maxloc0_4_r16.c
    trunk/libgfortran/generated/maxloc0_4_r4.c
    trunk/libgfortran/generated/maxloc0_4_r8.c
    trunk/libgfortran/generated/maxloc0_8_i1.c
    trunk/libgfortran/generated/maxloc0_8_i16.c
    trunk/libgfortran/generated/maxloc0_8_i2.c
    trunk/libgfortran/generated/maxloc0_8_i4.c
    trunk/libgfortran/generated/maxloc0_8_i8.c
    trunk/libgfortran/generated/maxloc0_8_r10.c
    trunk/libgfortran/generated/maxloc0_8_r16.c
    trunk/libgfortran/generated/maxloc0_8_r4.c
    trunk/libgfortran/generated/maxloc0_8_r8.c
    trunk/libgfortran/generated/maxloc1_16_i1.c
    trunk/libgfortran/generated/maxloc1_16_i16.c
    trunk/libgfortran/generated/maxloc1_16_i2.c
    trunk/libgfortran/generated/maxloc1_16_i4.c
    trunk/libgfortran/generated/maxloc1_16_i8.c
    trunk/libgfortran/generated/maxloc1_16_r10.c
    trunk/libgfortran/generated/maxloc1_16_r16.c
    trunk/libgfortran/generated/maxloc1_16_r4.c
    trunk/libgfortran/generated/maxloc1_16_r8.c
    trunk/libgfortran/generated/maxloc1_4_i1.c
    trunk/libgfortran/generated/maxloc1_4_i16.c
    trunk/libgfortran/generated/maxloc1_4_i2.c
    trunk/libgfortran/generated/maxloc1_4_i4.c
    trunk/libgfortran/generated/maxloc1_4_i8.c
    trunk/libgfortran/generated/maxloc1_4_r10.c
    trunk/libgfortran/generated/maxloc1_4_r16.c
    trunk/libgfortran/generated/maxloc1_4_r4.c
    trunk/libgfortran/generated/maxloc1_4_r8.c
    trunk/libgfortran/generated/maxloc1_8_i1.c
    trunk/libgfortran/generated/maxloc1_8_i16.c
    trunk/libgfortran/generated/maxloc1_8_i2.c
    trunk/libgfortran/generated/maxloc1_8_i4.c
    trunk/libgfortran/generated/maxloc1_8_i8.c
    trunk/libgfortran/generated/maxloc1_8_r10.c
    trunk/libgfortran/generated/maxloc1_8_r16.c
    trunk/libgfortran/generated/maxloc1_8_r4.c
    trunk/libgfortran/generated/maxloc1_8_r8.c
    trunk/libgfortran/generated/maxval_i1.c
    trunk/libgfortran/generated/maxval_i16.c
    trunk/libgfortran/generated/maxval_i2.c
    trunk/libgfortran/generated/maxval_i4.c
    trunk/libgfortran/generated/maxval_i8.c
    trunk/libgfortran/generated/maxval_r10.c
    trunk/libgfortran/generated/maxval_r16.c
    trunk/libgfortran/generated/maxval_r4.c
    trunk/libgfortran/generated/maxval_r8.c
    trunk/libgfortran/generated/minloc0_16_i1.c
    trunk/libgfortran/generated/minloc0_16_i16.c
    trunk/libgfortran/generated/minloc0_16_i2.c
    trunk/libgfortran/generated/minloc0_16_i4.c
    trunk/libgfortran/generated/minloc0_16_i8.c
    trunk/libgfortran/generated/minloc0_16_r10.c
    trunk/libgfortran/generated/minloc0_16_r16.c
    trunk/libgfortran/generated/minloc0_16_r4.c
    trunk/libgfortran/generated/minloc0_16_r8.c
    trunk/libgfortran/generated/minloc0_4_i1.c
    trunk/libgfortran/generated/minloc0_4_i16.c
    trunk/libgfortran/generated/minloc0_4_i2.c
    trunk/libgfortran/generated/minloc0_4_i4.c
    trunk/libgfortran/generated/minloc0_4_i8.c
    trunk/libgfortran/generated/minloc0_4_r10.c
    trunk/libgfortran/generated/minloc0_4_r16.c
    trunk/libgfortran/generated/minloc0_4_r4.c
    trunk/libgfortran/generated/minloc0_4_r8.c
    trunk/libgfortran/generated/minloc0_8_i1.c
    trunk/libgfortran/generated/minloc0_8_i16.c
    trunk/libgfortran/generated/minloc0_8_i2.c
    trunk/libgfortran/generated/minloc0_8_i4.c
    trunk/libgfortran/generated/minloc0_8_i8.c
    trunk/libgfortran/generated/minloc0_8_r10.c
    trunk/libgfortran/generated/minloc0_8_r16.c
    trunk/libgfortran/generated/minloc0_8_r4.c
    trunk/libgfortran/generated/minloc0_8_r8.c
    trunk/libgfortran/generated/minloc1_16_i1.c
    trunk/libgfortran/generated/minloc1_16_i16.c
    trunk/libgfortran/generated/minloc1_16_i2.c
    trunk/libgfortran/generated/minloc1_16_i4.c
    trunk/libgfortran/generated/minloc1_16_i8.c
    trunk/libgfortran/generated/minloc1_16_r10.c
    trunk/libgfortran/generated/minloc1_16_r16.c
    trunk/libgfortran/generated/minloc1_16_r4.c
    trunk/libgfortran/generated/minloc1_16_r8.c
    trunk/libgfortran/generated/minloc1_4_i1.c
    trunk/libgfortran/generated/minloc1_4_i16.c
    trunk/libgfortran/generated/minloc1_4_i2.c
    trunk/libgfortran/generated/minloc1_4_i4.c
    trunk/libgfortran/generated/minloc1_4_i8.c
    trunk/libgfortran/generated/minloc1_4_r10.c
    trunk/libgfortran/generated/minloc1_4_r16.c
    trunk/libgfortran/generated/minloc1_4_r4.c
    trunk/libgfortran/generated/minloc1_4_r8.c
    trunk/libgfortran/generated/minloc1_8_i1.c
    trunk/libgfortran/generated/minloc1_8_i16.c
    trunk/libgfortran/generated/minloc1_8_i2.c
    trunk/libgfortran/generated/minloc1_8_i4.c
    trunk/libgfortran/generated/minloc1_8_i8.c
    trunk/libgfortran/generated/minloc1_8_r10.c
    trunk/libgfortran/generated/minloc1_8_r16.c
    trunk/libgfortran/generated/minloc1_8_r4.c
    trunk/libgfortran/generated/minloc1_8_r8.c
    trunk/libgfortran/generated/minval_i1.c
    trunk/libgfortran/generated/minval_i16.c
    trunk/libgfortran/generated/minval_i2.c
    trunk/libgfortran/generated/minval_i4.c
    trunk/libgfortran/generated/minval_i8.c
    trunk/libgfortran/generated/minval_r10.c
    trunk/libgfortran/generated/minval_r16.c
    trunk/libgfortran/generated/minval_r4.c
    trunk/libgfortran/generated/minval_r8.c
    trunk/libgfortran/generated/product_c10.c
    trunk/libgfortran/generated/product_c16.c
    trunk/libgfortran/generated/product_c4.c
    trunk/libgfortran/generated/product_c8.c
    trunk/libgfortran/generated/product_i1.c
    trunk/libgfortran/generated/product_i16.c
    trunk/libgfortran/generated/product_i2.c
    trunk/libgfortran/generated/product_i4.c
    trunk/libgfortran/generated/product_i8.c
    trunk/libgfortran/generated/product_r10.c
    trunk/libgfortran/generated/product_r16.c
    trunk/libgfortran/generated/product_r4.c
    trunk/libgfortran/generated/product_r8.c
    trunk/libgfortran/generated/sum_c10.c
    trunk/libgfortran/generated/sum_c16.c
    trunk/libgfortran/generated/sum_c4.c
    trunk/libgfortran/generated/sum_c8.c
    trunk/libgfortran/generated/sum_i1.c
    trunk/libgfortran/generated/sum_i16.c
    trunk/libgfortran/generated/sum_i2.c
    trunk/libgfortran/generated/sum_i4.c
    trunk/libgfortran/generated/sum_i8.c
    trunk/libgfortran/generated/sum_r10.c
    trunk/libgfortran/generated/sum_r16.c
    trunk/libgfortran/generated/sum_r4.c
    trunk/libgfortran/generated/sum_r8.c
    trunk/libgfortran/libgfortran.h
    trunk/libgfortran/m4/iforeach.m4
    trunk/libgfortran/m4/ifunction.m4
    trunk/libgfortran/m4/iparm.m4
    trunk/libgfortran/m4/maxloc0.m4
    trunk/libgfortran/m4/maxloc1.m4
    trunk/libgfortran/m4/maxval.m4
    trunk/libgfortran/m4/minloc0.m4
    trunk/libgfortran/m4/minloc1.m4
    trunk/libgfortran/m4/minval.m4


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (28 preceding siblings ...)
  2009-07-24  7:57 ` jakub at gcc dot gnu dot org
@ 2009-07-24  8:20 ` burnus at gcc dot gnu dot org
  2009-07-24  8:30 ` jakub at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: burnus at gcc dot gnu dot org @ 2009-07-24  8:20 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #30 from burnus at gcc dot gnu dot org  2009-07-24 08:19 -------
Regarding the just committed inline version: It would be interesting to know
whether it is vectorizable (with/without -ffinite-math-only [i.e.
-ffast-math]).

Additionally, for size-1 result arrays, the function should be inlined except
for -O0 and -Os. That affects especially {min,max}loc as for {min,max}val this
looks like a very special case.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (29 preceding siblings ...)
  2009-07-24  8:20 ` burnus at gcc dot gnu dot org
@ 2009-07-24  8:30 ` jakub at gcc dot gnu dot org
  2009-07-26  7:48 ` irar at il dot ibm dot com
                   ` (9 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: jakub at gcc dot gnu dot org @ 2009-07-24  8:30 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #31 from jakub at gcc dot gnu dot org  2009-07-24 08:30 -------
Vectorization questions I'll defer to Ira.

For !optimize I even had a change to forcibly use the function call instead of
inline version.  But it didn't really work, as there are only array versions of
the library functions.  In case of minloc/maxloc we could just set up a fake
array descriptor for one element array and call the library functions (the
*loc0 ones).  But minval/maxval library functions only handle DIM=N cases, not
finding a minimum/maximum of the whole array.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (30 preceding siblings ...)
  2009-07-24  8:30 ` jakub at gcc dot gnu dot org
@ 2009-07-26  7:48 ` irar at il dot ibm dot com
  2009-07-26  9:50 ` burnus at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: irar at il dot ibm dot com @ 2009-07-26  7:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #32 from irar at il dot ibm dot com  2009-07-26 07:48 -------
(In reply to comment #30)
> Regarding the just committed inline version: It would be interesting to know
> whether it is vectorizable (with/without -ffinite-math-only [i.e.
> -ffast-math]).

It depends on where it is inlined. It has to be vectorized in outer loop (see
my previous comment), so it needs another loop around it.

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (31 preceding siblings ...)
  2009-07-26  7:48 ` irar at il dot ibm dot com
@ 2009-07-26  9:50 ` burnus at gcc dot gnu dot org
  2009-07-27  8:36 ` irar at il dot ibm dot com
                   ` (7 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: burnus at gcc dot gnu dot org @ 2009-07-26  9:50 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #33 from burnus at gcc dot gnu dot org  2009-07-26 09:50 -------
(In reply to comment #32)
> > Regarding the just committed inline version: It would be interesting to know
> > whether it is vectorizable (with/without -ffinite-math-only [i.e.
> > -ffast-math]).
> 
> It depends on where it is inlined. It has to be vectorized in outer loop (see
> my previous comment), so it needs another loop around it.

Using the example from comment 23 with
a) gfortran -O3 -ffast-math -march=native -ftree-vectorize
-ftree-vectorizer-verbose=5
b) ifort -O3 -xHost -diag-enable all

ifort shows: test.f90(12): (col. 9) remark: LOOP WAS VECTORIZED.
and needs 1.476s.

gfortran shows: test.f90:12: note: not vectorized: unsupported use in stmt.
and needs 2.272s. (By comparison. 4.4 needs 3.688s.)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (32 preceding siblings ...)
  2009-07-26  9:50 ` burnus at gcc dot gnu dot org
@ 2009-07-27  8:36 ` irar at il dot ibm dot com
  2009-07-27  9:18 ` burnus at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: irar at il dot ibm dot com @ 2009-07-27  8:36 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #34 from irar at il dot ibm dot com  2009-07-27 08:36 -------
(In reply to comment #33)
> Using the example from comment 23 with
...
> gfortran shows: test.f90:12: note: not vectorized: unsupported use in stmt.
> and needs 2.272s. (By comparison. 4.4 needs 3.688s.)

This is for the inner loop vectorization. For the outer loop we get:
tmp.f90:11: note: not vectorized: control flow in loop.
because of the if's. Maybe loop unswitching can help us. 
Vectorizable outer-loops look like this:

                        (pre-header)
                           |
                          header <---+
                           |         |
                          inner-loop |
                           |         |
                          tail ------+
                           |
                        (exit-bb)


Does ifort vectorize the exact same implemantion of minloc?

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (33 preceding siblings ...)
  2009-07-27  8:36 ` irar at il dot ibm dot com
@ 2009-07-27  9:18 ` burnus at gcc dot gnu dot org
  2009-07-27 11:03 ` jakub at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: burnus at gcc dot gnu dot org @ 2009-07-27  9:18 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #35 from burnus at gcc dot gnu dot org  2009-07-27 09:18 -------
(In reply to comment #34)
> Does ifort vectorize the exact same implemantion of minloc?

I tried to convert the minloc implementation into Fortran loops - and the
result is at
http://users.physik.fu-berlin.de/~tburnus/tmp/vect-PR31067/maxloc2.f90

$ ifort -O3 -xHost -diag-enable all maxloc2.f90
maxloc2.f90(25): (col. 5) remark: LOOP WAS VECTORIZED.
[timing: 0m1.384s]

$ gfortran -O3 -ffast-math -march=native -ftree-vectorize
-ftree-vectorizer-verbose=5 maxloc2.f90
maxloc2.f90:1: note: vectorized 0 loops in function.
[timing: 0m2.212s]

In case it helps: I put the ifort assembler output at:
http://users.physik.fu-berlin.de/~tburnus/tmp/vect-PR31067/maxloc2.s


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (34 preceding siblings ...)
  2009-07-27  9:18 ` burnus at gcc dot gnu dot org
@ 2009-07-27 11:03 ` jakub at gcc dot gnu dot org
  2009-07-27 11:10 ` jakub at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: jakub at gcc dot gnu dot org @ 2009-07-27 11:03 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #36 from jakub at gcc dot gnu dot org  2009-07-27 11:02 -------
Here is the loop in C and vectorized by hand as well:
#include <emmintrin.h>

float arr[1024];

unsigned int
foo (unsigned int end)
{
  unsigned int pos = 1;
  unsigned int i;
  float limit = __FLT_MAX__;
  for (i = 0; i < end; i++)
    if (arr[i] < limit)
      {
limit = arr[i];
pos = i + 1;
      }
  return pos;
}

unsigned int
bar (unsigned int end)
{
  __m128 pos = (__m128) _mm_set1_epi32 (1);
  __m128 limit = _mm_set1_ps (__FLT_MAX__);
  __m128i curi = _mm_set_epi32 (4, 3, 2, 1);
  __m128i inc = _mm_set1_epi32 (4);
  unsigned int i = 0;
  if (end > 4)
    {
      for (; i < end - 4; i += 4)
{
  __m128 val = _mm_loadu_ps (arr + i);
  __m128 mask = _mm_cmplt_ps (val, limit);
  limit = _mm_min_ps (limit, val);
  pos = _mm_andnot_ps (mask, pos);
  pos = _mm_or_ps (pos, _mm_and_ps (mask, (__m128) curi));
  curi = _mm_add_epi32 (curi, inc);
}
      /* Reduction.  */
      __m128 tmp1 = _mm_movehl_ps (limit, limit);
      __m128 tmp2 = _mm_movehl_ps (pos, pos);
      __m128 mask = _mm_cmplt_ps (tmp1, limit);
      limit = _mm_min_ps (tmp1, limit);
      tmp2 = _mm_and_ps (mask, tmp2);
      pos = _mm_or_ps (tmp2, _mm_andnot_ps (mask, pos));
      tmp1 = _mm_shuffle_ps (limit, limit, _MM_SHUFFLE (1, 1, 1, 1));
      tmp2 = _mm_shuffle_ps (pos, pos, _MM_SHUFFLE (1, 1, 1, 1));
      mask = _mm_cmplt_ps (tmp1, limit);
      limit = _mm_min_ps (tmp1, limit);
      tmp2 = _mm_and_ps (mask, tmp2);
      pos = _mm_or_ps (tmp2, _mm_andnot_ps (mask, pos));
    }
  float limit_ = _mm_cvtss_f32 (limit);
  unsigned int pos_ = (unsigned int) _mm_cvtsi128_si32 ((__m128i) pos);
  for (; i < end; i++)
    if (arr[i] < limit_)
      {
limit_ = arr[i];
pos_ = i + 1;
      }
  return pos_;
}

int
main (void)
{
  unsigned int k;
  arr[0] = -1;
  arr[2] = -3;
  arr[8] = -5;
  arr[9] = -6;
  if (foo (32) != bar (32))
    __builtin_abort ();
  for (k = 10; k < 32; k++)
    {
      arr[k] = -k;
      if (foo (32) != bar (32))
        __builtin_abort ();
    }
  return 0;
}

Don't know how hard would be to vectorize this in the vectorizer, but clearly
icc manages to handle that.
The loop is:
<bb 4>:
  # pos_22 = PHI <pos_1(7), 1(3)>
  # i_23 = PHI <i_15(7), 0(3)>
  # limit_24 = PHI <limit_4(7), 3.4028234663852885981170418348451692544e+38(3)>
  limit_11 = arr[i_23];
  D.2700_12 = limit_11 < limit_24;
  pos_1 = [cond_expr] D.2700_12 ? i_23 : pos_22;
  limit_4 = [cond_expr] D.2700_12 ? limit_11 : limit_24;
  i_15 = i_23 + 1;
  D.2703_9 = (long unsigned int) i_15;
  if (D.2703_9 < end_10(D))
    goto <bb 7>;
  else
    goto <bb 8>;

<bb 7>:
  goto <bb 4>;
before vectorization.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (35 preceding siblings ...)
  2009-07-27 11:03 ` jakub at gcc dot gnu dot org
@ 2009-07-27 11:10 ` jakub at gcc dot gnu dot org
  2009-07-27 12:44 ` irar at il dot ibm dot com
                   ` (3 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: jakub at gcc dot gnu dot org @ 2009-07-27 11:10 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #37 from jakub at gcc dot gnu dot org  2009-07-27 11:10 -------
Oh, and on 64-bit arches and float or 32-bit arches and double there is another
complication - the comparison has different mode size from the cond_expr for
pos.  For 32-bit pos and 64-bit double it could perhaps just do the computation
in 64-bit integers (vector of 2 (resp. 4 for avx)), for the other case it would
need to shuffle the max and compute the pos in 2 vectors, or e.g. the Fortran
FE could hand in the common case, emit a likely look for the case where array
size is smaller than 4GB, using 32-bit pos zero extended to 64-bits at the end.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (36 preceding siblings ...)
  2009-07-27 11:10 ` jakub at gcc dot gnu dot org
@ 2009-07-27 12:44 ` irar at il dot ibm dot com
  2009-07-27 13:15 ` burnus at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  40 siblings, 0 replies; 47+ messages in thread
From: irar at il dot ibm dot com @ 2009-07-27 12:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #38 from irar at il dot ibm dot com  2009-07-27 12:44 -------
I am not sure that that kind of computation can be generated automatically,
since in general the order of caclulation of cond_expr cannot be changed. 

However, the loop can be split:

  for (i = 0; i < end; i++)
    if (arr[i] < limit)
      limit = arr[i];

  for (i = 0; i < end; i++)
    if (arr[i] == limit)
      {
        pos = i + 1;
        break;
      }

making the first loop vectorizable (inner-most loop vectorization).

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (37 preceding siblings ...)
  2009-07-27 12:44 ` irar at il dot ibm dot com
@ 2009-07-27 13:15 ` burnus at gcc dot gnu dot org
  2009-07-27 14:51 ` jakub at gcc dot gnu dot org
  2009-07-28  8:12 ` irar at il dot ibm dot com
  40 siblings, 0 replies; 47+ messages in thread
From: burnus at gcc dot gnu dot org @ 2009-07-27 13:15 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #39 from burnus at gcc dot gnu dot org  2009-07-27 13:15 -------
(In reply to comment #38)
> However, the loop can be split: [..]
> making the first loop vectorizable (inner-most loop vectorization).

OK. I tried it with a Fortran program:
http://users.physik.fu-berlin.de/~tburnus/tmp/vect-PR31067/maxloc.f90
http://users.physik.fu-berlin.de/~tburnus/tmp/vect-PR31067/maxloc2.f90
http://users.physik.fu-berlin.de/~tburnus/tmp/vect-PR31067/maxloc3.f90

maxloc.f90 is the program from comment 34  (maxloc.s = intel assembler)
maxloc2.f90 models what gfortran does for maxloc (maxloc.s = intel assembler)
maxloc3.f90 models what has a split loop

The splitting plus vectorization makes the calculation 5% faster - 0m2.152s
(maxloc3) vs 0m2.260s (maxloc2). Still, that's 35% more than ifort needs.

For some reason, maxloc2 with -fno-tree-vectorize takes only
0m1.840s.(Identical to intel's result for maxloc2/maxloc3. While for the
original maxloc.f90, there is no performance effect, and for maxloc3
vectorization makes it faster.)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (38 preceding siblings ...)
  2009-07-27 13:15 ` burnus at gcc dot gnu dot org
@ 2009-07-27 14:51 ` jakub at gcc dot gnu dot org
  2009-07-28  8:12 ` irar at il dot ibm dot com
  40 siblings, 0 replies; 47+ messages in thread
From: jakub at gcc dot gnu dot org @ 2009-07-27 14:51 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #40 from jakub at gcc dot gnu dot org  2009-07-27 14:51 -------
If the cond_expr compute a minimum or maximum and the other cond_exprs compute
something based on the IV at the extremum, then I don't see why it couldn't be
vectorized by computing extremes of odd/even and corresponding values based on
the IV at those points, then merging them in the final step, and similarly for
bigger vectorization steps.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
  2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
                   ` (39 preceding siblings ...)
  2009-07-27 14:51 ` jakub at gcc dot gnu dot org
@ 2009-07-28  8:12 ` irar at il dot ibm dot com
  40 siblings, 0 replies; 47+ messages in thread
From: irar at il dot ibm dot com @ 2009-07-28  8:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #41 from irar at il dot ibm dot com  2009-07-28 08:12 -------
That requires pattern recognition. MIN/MAX_EXPR are recognized by the first
phiopt pass, so MIN/MAXLOC should be either also recognized there or in the
vectorizer. (The phiopt pass transforms if clause to MIN/MAX_EXPR. The
vectorizer gets COND_EXPR after if-conversion pass).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
       [not found] <bug-31067-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2011-07-28 20:58 ` jakub at gcc dot gnu.org
@ 2011-07-29  7:13 ` burnus at gcc dot gnu.org
  4 siblings, 0 replies; 47+ messages in thread
From: burnus at gcc dot gnu.org @ 2011-07-29  7:13 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067

--- Comment #46 from Tobias Burnus <burnus at gcc dot gnu.org> 2011-07-29 07:12:47 UTC ---
(In reply to comment #45)
[Commit to inline MINLOC/MAXLOC for a rank-1 array, which returns a
single-element rank-1 array.]


On my ~5 year old Athlon64 x2, I get with "-Ofast -march=native" (and with or
without -fwhole-program) a performance improvement of 3% (10.72s -> 10.41s).


The performance should further improve, if one fuses the loops (cf. comment 42,
but also comment 34 ff.) - and if one could move the memory allocation/freeing
of the automatic-array DTEMP out of the loop (after inlining). (Recall that
with -fstack-arrays/-Ofast, automatic arrays are allocated on the stack.)


As already mentioned indirectly in comment 0 (via PR31066): If one uses
reciprocal
approximation instructions and a Newton-Rhapson step (namely: -mrecip), the
performance improves a lot: 6.895s. By comparison, with ifort (11.1) -xHost -O3
the run time is 7.319s.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
       [not found] <bug-31067-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2011-07-28 16:03 ` jakub at gcc dot gnu.org
@ 2011-07-28 20:58 ` jakub at gcc dot gnu.org
  2011-07-29  7:13 ` burnus at gcc dot gnu.org
  4 siblings, 0 replies; 47+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-07-28 20:58 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067

--- Comment #45 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-07-28 20:56:57 UTC ---
Author: jakub
Date: Thu Jul 28 20:56:50 2011
New Revision: 176897

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=176897
Log:
    PR fortran/31067
    * frontend-passes.c (optimize_minmaxloc): New function.
    (optimize_expr): Call it.

    * gfortran.dg/maxloc_2.f90: New test.
    * gfortran.dg/maxloc_3.f90: New test.
    * gfortran.dg/minloc_1.f90: New test.
    * gfortran.dg/minloc_2.f90: New test.
    * gfortran.dg/minloc_3.f90: New test.
    * gfortran.dg/minmaxloc_7.f90: New test.

Added:
    trunk/gcc/testsuite/gfortran.dg/maxloc_2.f90
    trunk/gcc/testsuite/gfortran.dg/maxloc_3.f90
    trunk/gcc/testsuite/gfortran.dg/minloc_1.f90
    trunk/gcc/testsuite/gfortran.dg/minloc_2.f90
    trunk/gcc/testsuite/gfortran.dg/minloc_3.f90
    trunk/gcc/testsuite/gfortran.dg/minmaxloc_7.f90
Modified:
    trunk/gcc/fortran/ChangeLog
    trunk/gcc/fortran/frontend-passes.c
    trunk/gcc/testsuite/ChangeLog


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
       [not found] <bug-31067-4@http.gcc.gnu.org/bugzilla/>
  2011-07-25 15:44 ` rguenth at gcc dot gnu.org
  2011-07-28 13:52 ` jakub at gcc dot gnu.org
@ 2011-07-28 16:03 ` jakub at gcc dot gnu.org
  2011-07-28 20:58 ` jakub at gcc dot gnu.org
  2011-07-29  7:13 ` burnus at gcc dot gnu.org
  4 siblings, 0 replies; 47+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-07-28 16:03 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #24856|0                           |1
        is obsolete|                            |

--- Comment #44 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-07-28 16:00:54 UTC ---
Created attachment 24858
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24858
gcc47-pr31067.patch

Based on IRC discussions, this patch instead replaces
MINLOC (rank1)
with
/( MINLOC (rank1, DIM=1) )/
so that it should work even with allocatable LHS that should be reallocated on
assignment etc.  As an bonus, even e.g.
if (any (minloc (rank1).ne.6))
etc. can be optimized.  So far tested just with dg.exp=m*.f90


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
       [not found] <bug-31067-4@http.gcc.gnu.org/bugzilla/>
  2011-07-25 15:44 ` rguenth at gcc dot gnu.org
@ 2011-07-28 13:52 ` jakub at gcc dot gnu.org
  2011-07-28 16:03 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 47+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-07-28 13:52 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067

--- Comment #43 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-07-28 13:51:32 UTC ---
Created attachment 24856
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24856
gcc47-pr31067.patch

Patch to optimize a = minloc (b) for rank 1 b into a = minloc (b, dim = 1),
according to the standard the latter function is supposed to return the first
element of the array returned by former function (which should return a rank 1,
 1 element array).  So, by instead initializing the (one element) array with
the scalar we can get it actually inlined.  Doing this during genericization
looked much harder to me.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
       [not found] <bug-31067-4@http.gcc.gnu.org/bugzilla/>
@ 2011-07-25 15:44 ` rguenth at gcc dot gnu.org
  2011-07-28 13:52 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-07-25 15:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067

--- Comment #42 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-07-25 15:39:54 UTC ---
With gas_dyn changed to use MINLOC (DTEMP, 1) we now inline the intrinsic
(but not with MINLOC (DTEMP), even though we know it'll be a single-element
array result ...).

We completely lack a way to fuse the loops though.  Inlining the intrinsic
gives a moderate 5% speedup.


^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2011-07-29  7:13 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-07 10:20 [Bug fortran/31067] New: MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow) fxcoudert at gcc dot gnu dot org
2007-03-07 11:28 ` [Bug fortran/31067] " tkoenig at gcc dot gnu dot org
2007-03-07 12:18 ` fxcoudert at gcc dot gnu dot org
2007-03-07 21:00 ` tkoenig at gcc dot gnu dot org
2007-03-07 21:09 ` fxcoudert at gmail dot com
2007-03-07 21:10 ` tkoenig at gcc dot gnu dot org
2007-03-07 21:29 ` tkoenig at gcc dot gnu dot org
2007-03-08  5:50 ` fxcoudert at gcc dot gnu dot org
2007-03-10 12:34 ` tkoenig at gcc dot gnu dot org
2007-03-11 19:43 ` tkoenig at gcc dot gnu dot org
2007-03-12 19:04 ` pault at gcc dot gnu dot org
2007-03-13 20:12 ` tkoenig at gcc dot gnu dot org
2007-03-26 10:38 ` pault at gcc dot gnu dot org
2007-03-26 11:44 ` pault at gcc dot gnu dot org
2007-04-02 16:44 ` tkoenig at gcc dot gnu dot org
2007-04-02 20:01 ` tkoenig at gcc dot gnu dot org
2007-05-18 20:16 ` jb at gcc dot gnu dot org
2007-05-18 20:20 ` jb at gcc dot gnu dot org
2007-06-15 20:35 ` tkoenig at gcc dot gnu dot org
2007-06-27 14:49 ` jb at gcc dot gnu dot org
2008-01-02 21:03 ` pault at gcc dot gnu dot org
2008-01-02 21:07 ` dominiq at lps dot ens dot fr
2009-07-03 10:00 ` rguenth at gcc dot gnu dot org
2009-07-03 12:19 ` rguenth at gcc dot gnu dot org
2009-07-03 12:41 ` burnus at gcc dot gnu dot org
2009-07-03 12:57 ` rguenth at gcc dot gnu dot org
2009-07-03 13:07 ` burnus at gcc dot gnu dot org
2009-07-05  6:48 ` irar at il dot ibm dot com
2009-07-20 12:04 ` irar at il dot ibm dot com
2009-07-24  7:57 ` jakub at gcc dot gnu dot org
2009-07-24  8:20 ` burnus at gcc dot gnu dot org
2009-07-24  8:30 ` jakub at gcc dot gnu dot org
2009-07-26  7:48 ` irar at il dot ibm dot com
2009-07-26  9:50 ` burnus at gcc dot gnu dot org
2009-07-27  8:36 ` irar at il dot ibm dot com
2009-07-27  9:18 ` burnus at gcc dot gnu dot org
2009-07-27 11:03 ` jakub at gcc dot gnu dot org
2009-07-27 11:10 ` jakub at gcc dot gnu dot org
2009-07-27 12:44 ` irar at il dot ibm dot com
2009-07-27 13:15 ` burnus at gcc dot gnu dot org
2009-07-27 14:51 ` jakub at gcc dot gnu dot org
2009-07-28  8:12 ` irar at il dot ibm dot com
     [not found] <bug-31067-4@http.gcc.gnu.org/bugzilla/>
2011-07-25 15:44 ` rguenth at gcc dot gnu.org
2011-07-28 13:52 ` jakub at gcc dot gnu.org
2011-07-28 16:03 ` jakub at gcc dot gnu.org
2011-07-28 20:58 ` jakub at gcc dot gnu.org
2011-07-29  7:13 ` burnus at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).