[Bug tree-optimization/50031] New: Sphinx3 has a 10% regression going from GCC 4.5 to GCC 4.6 on powerpc

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/50031] New: Sphinx3 has a 10% regression going from GCC 4.5 to GCC 4.6 on powerpc
@ 2011-08-09 17:34 meissner at gcc dot gnu.org
  2011-08-09 17:38 ` [Bug tree-optimization/50031] " meissner at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: meissner at gcc dot gnu.org @ 2011-08-09 17:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50031

           Summary: Sphinx3 has a 10% regression going from GCC 4.5 to GCC
                    4.6 on powerpc
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: meissner@gcc.gnu.org
              Host: powerpc64-linux power-linux
            Target: powerpc64-linux
             Build: powerpc64-linux

Created attachment 24964
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24964
Cut down example of the function with the regression.

The sphinx3 benchmark from Spec 2006, exhibits a 10% regression when comparing
a GCC 4.5 build targeting power7 with vectorization to GCC 4.6.  The GCC 4.7
trunk shows the same slowdown.

In doing various profiling runs, I have traced the slowdown to the
vector_gautbl_eval_logs3 function in vector.c.

The main part of the function is the inner loop:

{
  int32 i, r;
  float64 f;
  int32 end, veclen;
  float32 *m1, *m2, *v1, *v2;
  float64 dval1, dval2, diff1, diff2;

  f = log_to_logs3_factor();

  end = offset + count;
  veclen = gautbl->veclen;

  for (r = offset; r < end-1; r += 2) {
    m1 = gautbl->mean[r];
    m2 = gautbl->mean[r+1];
    v1 = gautbl->var[r];
    v2 = gautbl->var[r+1];
    dval1 = gautbl->lrd[r];
    dval2 = gautbl->lrd[r+1];

    /* start of the critical loop */
    for (i = 0; i < veclen; i++) {
      diff1 = x[i] - m1[i];
      dval1 -= diff1 * diff1 * v1[i];
      diff2 = x[i] - m2[i];
      dval2 -= diff2 * diff2 * v2[i];
    }
    /* end of the critical loop */

    if (dval1 < gautbl->distfloor)
      dval1 = gautbl->distfloor;
    if (dval2 < gautbl->distfloor)
      dval2 = gautbl->distfloor;

    score[r] = (int32)(f * dval1);
    score[r+1] = (int32)(f * dval2);
  }

  if (r < end) {
    m1 = gautbl->mean[r];
    v1 = gautbl->var[r];
    dval1 = gautbl->lrd[r];

    for (i = 0; i < veclen; i++) {
      diff1 = x[i] - m1[i];
      dval1 -= diff1 * diff1 * v1[i];
    }

    if (dval1 < gautbl->distfloor)
      dval1 = gautbl->distfloor;

    score[r] = (int32)(f * dval1);
  }
}

Now, notice that the loop is done by reading single precision floating point,
doing a subtraction, and then converting the result to double precision.  The
code tries to 'help' the compiler by unrolling the loop twice.

The code produced by GCC 4.5 with -O3 -ffast-math -mcpu=power7 is fairly
straight forward scalar evaluation of the inner loop:

.L4:
        lfsx 0,28,9
        lfsx 10,10,9
        lfsx 13,11,9
        lfsx 9,8,9
        fsubs 13,0,13
        fsubs 0,0,10
        lfsx 10,6,9
        addi 9,9,4
        cmpd 7,9,0
        xsmuldp 13,13,13
        xsmuldp 0,0,0
        xsnmsubadp 11,13,9
        xsnmsubadp 12,0,10
        bne 7,.L4

Now, with GCC 4.6, the compiler figures it can vectorize the loop:

.L5:
        add 25,24,27
        add 26,23,27
        rldicr 20,27,0,59
        rldicr 25,25,0,59
        rldicr 26,26,0,59
        lxvw4x 44,0,25
        lxvw4x 45,0,20
        add 25,22,27
        lxvw4x 42,0,26
        add 26,21,27
        rldicr 25,25,0,59
        rldicr 26,26,0,59
        addi 27,27,16
        vperm 11,11,13,2
        vperm 9,9,12,19
        vperm 7,7,10,17
        xvsubsp 9,43,41
        lxvw4x 41,0,26
        xvsubsp 10,43,39
        lxvw4x 43,0,25
        vperm 0,0,9,16
        vperm 1,1,11,18
        xxmrglw 36,32,32
        xxmrglw 37,9,9
        xxmrglw 38,10,10
        xxmrghw 9,9,9
        xvcvspdp 37,37
        xxmrglw 35,33,33
        xvcvspdp 38,38
        xxmrghw 10,10,10
        xvcvspdp 39,9
        xxmrghw 32,32,32
        xvcvspdp 35,35
        xxmrghw 33,33,33
        xvcvspdp 40,10
        xvcvspdp 36,36
        xvmuldp 37,37,37
        xvmuldp 38,38,38
        xvmuldp 9,39,39
        xvcvspdp 33,33
        xvcvspdp 0,32
        xvmuldp 10,40,40
        xvmuldp 6,37,35
        xvmuldp 7,38,36
        xxlor 32,41,41
        xxlor 39,42,42
        xxlor 41,44,44
        xvmaddadp 6,9,33
        xvmaddadp 7,10,0
        xxlor 33,43,43
        xxlor 43,45,45
        xvsubdp 4,4,6
        xvsubdp 5,5,7
        bdnz .L5

This loop demonstrates several problems, some are powerpc backend only and some
tree optimizers (that would need hooks from the backend):

1) When tree-vec-data-refs doesn't know the alignment of memory in a loop that
is vectorized, and the machine has a vec_realign_load_<type> pattern, the loop
that is generated always uses the unalgined load, even though it might be slow.
 On the power7, the realign code uses a vector load and the lvsr instruction to
create a permute mask, and then in the inner loop, after each load, use the
permute mask to do the unaligned loads.  Thus, in the loop, before doing the
conversions, we will be doing 4 vector loads, and 4 permutes.  The vector
conversion from 32-bit to 64-bit, involve two more permutes to split the V4SF
values into the appropriate registers before doing the float->double convert. 
Thus in the loop we will have 4 permutes for the 4 loads that are done, and 8
permutes for the conversions.  The power7 only has one permute functional unit,
and multiple permutes can slow things down.  The code has one segment with 3
back to back permutes, and another with 6 back to back permutes.

If vectorizer could clone the loop and on one side test to see if the pointers
are aligned, and if the pointers are aligned, do the aligned loads, and on the
other side, do unalgined loads it would help.  I experimented with an option to
disable the vec_realign_load_<type> pattern, and it helped this particular
benchmark, but hurt other benchmarks, because the code would do the vectorized
loop only if the pointers are aligned, and fell back to scalar loop if they
were unaligned.  I would think falling back to use vec_realign_<xxx> would be a
win.

2) In looking at the documentation, I discovered that vec_realign_<xxx> is not
documented in md.texi.

3) The powerpc backend doesn't realize it could use the Altivec memory
instruction to load memory (since the Altivec load implicitly ignores the
bottom bits of the address).

4) The code in tree-vec-stmts.c, tree-vec-slp.c, and tree-vec-loop.c that calls
the vectorization cost target hook, never pass in the actual type to the
argument vectype, or set the misalign argument to non-zero.  I would imagine
that vector systems might have different costs, depending on the type.  Maybe
the two arguments should be eliminated if we aren't going to pass useful
information.  In addition, there doesn't seem to be a cost of doing
vec_realign.  There is cost for unaligned loads (via movmisalign), but there
doesn't seem to be a cost for realignment.

I have patches that fix the immediate problem by disabling the float/int to
double vector conversion under switch control, which I will submit shortly.  In
general it restores the performance for sphinx3 on both GCC 4.6 and 4.7, and
improves the performance of a few other benchmarks, though in 4.6 it does have
one regression (tonto is a few percent slower on 32-bit).  However, I suspect
it really needs to be attacked at a higher level.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/50031] Sphinx3 has a 10% regression going from GCC 4.5 to GCC 4.6 on powerpc
  2011-08-09 17:34 [Bug tree-optimization/50031] New: Sphinx3 has a 10% regression going from GCC 4.5 to GCC 4.6 on powerpc meissner at gcc dot gnu.org
@ 2011-08-09 17:38 ` meissner at gcc dot gnu.org
  2011-08-10  6:36 ` irar at il dot ibm.com
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: meissner at gcc dot gnu.org @ 2011-08-09 17:38 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50031

--- Comment #1 from Michael Meissner <meissner at gcc dot gnu.org> 2011-08-09 17:38:23 UTC ---
Created attachment 24965
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24965
Patch to avoid the problem by optionally disabling vector 32-bit to 64-bit
conversions.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/50031] Sphinx3 has a 10% regression going from GCC 4.5 to GCC 4.6 on powerpc
  2011-08-09 17:34 [Bug tree-optimization/50031] New: Sphinx3 has a 10% regression going from GCC 4.5 to GCC 4.6 on powerpc meissner at gcc dot gnu.org
  2011-08-09 17:38 ` [Bug tree-optimization/50031] " meissner at gcc dot gnu.org
@ 2011-08-10  6:36 ` irar at il dot ibm.com
  2011-08-10  9:14 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: irar at il dot ibm.com @ 2011-08-10  6:36 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50031

Ira Rosen <irar at il dot ibm.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |irar at il dot ibm.com

--- Comment #2 from Ira Rosen <irar at il dot ibm.com> 2011-08-10 06:36:26 UTC ---
(In reply to comment #0)

> 
> 1) When tree-vec-data-refs doesn't know the alignment of memory in a loop that
> is vectorized, and the machine has a vec_realign_load_<type> pattern, the loop
> that is generated always uses the unalgined load, even though it might be slow.
>  On the power7, the realign code uses a vector load and the lvsr instruction to
> create a permute mask, and then in the inner loop, after each load, use the
> permute mask to do the unaligned loads.  Thus, in the loop, before doing the
> conversions, we will be doing 4 vector loads, and 4 permutes.  The vector
> conversion from 32-bit to 64-bit, involve two more permutes to split the V4SF
> values into the appropriate registers before doing the float->double convert. 
> Thus in the loop we will have 4 permutes for the 4 loads that are done, and 8
> permutes for the conversions.  The power7 only has one permute functional unit,
> and multiple permutes can slow things down.  The code has one segment with 3
> back to back permutes, and another with 6 back to back permutes.
> 
> If vectorizer could clone the loop and on one side test to see if the pointers
> are aligned, and if the pointers are aligned, do the aligned loads, and on the
> other side, do unalgined loads it would help.  I experimented with an option to
> disable the vec_realign_load_<type> pattern, and it helped this particular
> benchmark, but hurt other benchmarks, because the code would do the vectorized
> loop only if the pointers are aligned, and fell back to scalar loop if they
> were unaligned.  I would think falling back to use vec_realign_<xxx> would be a
> win.

Yes, this kind of versioning is a good idea. I have it implemented on
st/cli-be-vect branch, but it would be probably hard to extract this patch from
there. I'll take a look.

> 
> 2) In looking at the documentation, I discovered that vec_realign_<xxx> is not
> documented in md.texi.
> 
> 3) The powerpc backend doesn't realize it could use the Altivec memory
> instruction to load memory (since the Altivec load implicitly ignores the
> bottom bits of the address).
> 
> 4) The code in tree-vec-stmts.c, tree-vec-slp.c, and tree-vec-loop.c that calls
> the vectorization cost target hook, never pass in the actual type to the
> argument vectype, or set the misalign argument to non-zero.  I would imagine
> that vector systems might have different costs, depending on the type.  Maybe
> the two arguments should be eliminated if we aren't going to pass useful
> information.  In addition, there doesn't seem to be a cost of doing
> vec_realign.  There is cost for unaligned loads (via movmisalign), but there
> doesn't seem to be a cost for realignment.

We pass the type and the misalignment value in vect_get_store_cost and
vect_get_load_cost, the only places that they are relevant. It might be true
that the actual costs depend on the type, but the cost model is only an
evaluation and it is based on tuning, so I guess until now nobody thought that
it is useful. 

The type and the misalignment value are important for VSX in movmisalign case,
so the cost for a data access takes them into account in
rs6000_builtin_vectorization_cost.

The cost of realignment is calculated in vect_get_load_cost under 'case
dr_explicit_realign' and 'case dr_explicit_realign_optimized'. I noticed now
that it uses just vector_stmt type instead of vec_perm, so it should be fixed
like that:

Index: tree-vect-stmts.c
===================================================================
--- tree-vect-stmts.c   (revision 177586)
+++ tree-vect-stmts.c   (working copy)
@@ -1011,7 +1011,7 @@ vect_get_load_cost (struct data_referenc
     case dr_explicit_realign:
       {
         *inside_cost += ncopies * (2 * vect_get_stmt_cost (vector_load)
-           + vect_get_stmt_cost (vector_stmt));
+           + vect_get_stmt_cost (vec_perm));

         /* FIXME: If the misalignment remains fixed across the iterations of
            the containing loop, the following cost should be added to the
@@ -1042,7 +1042,7 @@ vect_get_load_cost (struct data_referenc
           }

         *inside_cost += ncopies * (vect_get_stmt_cost (vector_load)
-          + vect_get_stmt_cost (vector_stmt));
+          + vect_get_stmt_cost (vec_perm));
         break;
       }

but since these costs are equal (at least in rs6000) it will not change
anything unless the costs are changed.

Ira


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/50031] Sphinx3 has a 10% regression going from GCC 4.5 to GCC 4.6 on powerpc
  2011-08-09 17:34 [Bug tree-optimization/50031] New: Sphinx3 has a 10% regression going from GCC 4.5 to GCC 4.6 on powerpc meissner at gcc dot gnu.org
  2011-08-09 17:38 ` [Bug tree-optimization/50031] " meissner at gcc dot gnu.org
  2011-08-10  6:36 ` irar at il dot ibm.com
@ 2011-08-10  9:14 ` rguenth at gcc dot gnu.org
  2012-02-10 16:41 ` wschmidt at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-08-10  9:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50031

--- Comment #3 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-08-10 09:13:03 UTC ---
I think for more elaborate costs we should allow the backend to track costs
per loop by something like

  void *state = targetm.start_vect_loop_cost ();

pass that state pointer to each cost hook we call (making sure to avoid
"duplicate" calls)

and then finish off with

  final_cost = targetm.end_vect_loop_cost (state);

so the target could have for example a way to estimate resource allocation
(like in this case, 1000s permutes but only 10 real insns) and raise
costs accordingly.

A default implementation would simply have a unsigned int where it accumulates
costs (and we would maybe stop doing that in the vectorizer itself).  Thus,
really defer costs to the target (maybe even let it compute the cost for
the scalar variants by means of passing in the scalar gimple stmt).


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/50031] Sphinx3 has a 10% regression going from GCC 4.5 to GCC 4.6 on powerpc
  2011-08-09 17:34 [Bug tree-optimization/50031] New: Sphinx3 has a 10% regression going from GCC 4.5 to GCC 4.6 on powerpc meissner at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2011-08-10  9:14 ` rguenth at gcc dot gnu.org
@ 2012-02-10 16:41 ` wschmidt at gcc dot gnu.org
  2012-02-13 18:26 ` wschmidt at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: wschmidt at gcc dot gnu.org @ 2012-02-10 16:41 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50031

--- Comment #4 from William J. Schmidt <wschmidt at gcc dot gnu.org> 2012-02-10 16:38:44 UTC ---
Author: wschmidt
Date: Fri Feb 10 16:38:37 2012
New Revision: 184102

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=184102
Log:
2012-02-10  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
        Ira Rosen  <irar@il.ibm.com>

    PR tree-optimization/50031
    * targhooks.c (default_builtin_vectorization_cost): Handle
    vec_promote_demote.
    * target.h (enum vect_cost_for_stmt): Add vec_promote_demote.
    * tree-vect-loop.c (vect_get_single_scalar_iteraion_cost): Handle
    all types of reduction and pattern statements.
    (vect_estimate_min_profitable_iters): Likewise.
    * tree-vect-stmts.c (vect_model_promotion_demotion_cost): New function.
    (vect_get_load_cost): Use vec_perm for permutations; add dump logic
    for explicit realigns.
    (vectorizable_conversion): Call vect_model_promotion_demotion_cost.
    * config/spu/spu.c (spu_builtin_vectorization_cost): Handle
    vec_promote_demote.
    * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
    * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Update
    vec_perm for VSX and handle vec_promote_demote.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/rs6000/rs6000.c
    trunk/gcc/config/spu/spu.c
    trunk/gcc/target.h
    trunk/gcc/targhooks.c
    trunk/gcc/tree-vect-loop.c
    trunk/gcc/tree-vect-stmts.c


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/50031] Sphinx3 has a 10% regression going from GCC 4.5 to GCC 4.6 on powerpc
  2011-08-09 17:34 [Bug tree-optimization/50031] New: Sphinx3 has a 10% regression going from GCC 4.5 to GCC 4.6 on powerpc meissner at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2012-02-10 16:41 ` wschmidt at gcc dot gnu.org
@ 2012-02-13 18:26 ` wschmidt at gcc dot gnu.org
  2012-02-14 19:41 ` wschmidt at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: wschmidt at gcc dot gnu.org @ 2012-02-13 18:26 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50031

--- Comment #5 from William J. Schmidt <wschmidt at gcc dot gnu.org> 2012-02-13 18:25:36 UTC ---
Just a quick note about the patch that was committed last week.  For 4.6 and
4.7, this approach adjusts the cost model for vector permutes with TARGET_VSX,
making them appear more expensive across the board.  This works well in the
usual case where permutes are somewhat dense with respect to other vector
operations.  In rarer cases, this has drawbacks.  When, for example, a loop
contains only a single permute, this is overly conservative and loses otherwise
useful vectorization opportunities.  An example of this (thanks to Mike) is
cSimpleChannel::deliver(CMessage*, double) in 471.omnetpp.  In 4.8, we will
want to model things more carefully to handle both the dense and sparse permute
cases.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/50031] Sphinx3 has a 10% regression going from GCC 4.5 to GCC 4.6 on powerpc
  2011-08-09 17:34 [Bug tree-optimization/50031] New: Sphinx3 has a 10% regression going from GCC 4.5 to GCC 4.6 on powerpc meissner at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2012-02-13 18:26 ` wschmidt at gcc dot gnu.org
@ 2012-02-14 19:41 ` wschmidt at gcc dot gnu.org
  2012-03-02 14:52 ` wschmidt at gcc dot gnu.org
  2012-03-02 15:04 ` wschmidt at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: wschmidt at gcc dot gnu.org @ 2012-02-14 19:41 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50031

--- Comment #6 from William J. Schmidt <wschmidt at gcc dot gnu.org> 2012-02-14 19:40:23 UTC ---
Author: wschmidt
Date: Tue Feb 14 19:40:13 2012
New Revision: 184225

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=184225
Log:
2012-02-14  Bill Schmidt <wschmidt@linux.vnet.ibm.com>
        Ira Rosen <irar@il.ibm.com>

    PR tree-optimization/50031
    PR tree-optimization/50969
    * targhooks.c (default_builtin_vectorization_cost): Handle
    vec_promote_demote.
    * target.h (enum vect_cost_for_stmt): Add vec_promote_demote.
    * tree-vect-loop.c (vect_get_single_scalar_iteraion_cost): Handle
    all types of reduction and pattern statements.
    (vect_estimate_min_profitable_iters): Likewise.
    * tree-vect-stmts.c (vect_model_promotion_demotion_cost): New function.
    (vect_model_store_cost): Use vec_perm rather than vector_stmt for
    statement cost.
    (vect_model_load_cost): Likewise.
    (vect_get_load_cost): Likewise; add dump logic for explicit realigns.
    (vectorizable_type_demotion): Call vect_model_promotion_demotion_cost.
    (vectorizable_type_promotion): Likewise.
    * config/spu/spu.c (spu_builtin_vectorization_cost): Handle
    vec_promote_demote.
    * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
    * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Update
    vec_perm for VSX and handle vec_promote_demote.


Modified:
    branches/ibm/gcc-4_6-branch/gcc/ChangeLog.ibm
    branches/ibm/gcc-4_6-branch/gcc/config/i386/i386.c
    branches/ibm/gcc-4_6-branch/gcc/config/rs6000/rs6000.c
    branches/ibm/gcc-4_6-branch/gcc/config/spu/spu.c
    branches/ibm/gcc-4_6-branch/gcc/target.h
    branches/ibm/gcc-4_6-branch/gcc/targhooks.c
    branches/ibm/gcc-4_6-branch/gcc/tree-vect-loop.c
    branches/ibm/gcc-4_6-branch/gcc/tree-vect-stmts.c


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/50031] Sphinx3 has a 10% regression going from GCC 4.5 to GCC 4.6 on powerpc
  2011-08-09 17:34 [Bug tree-optimization/50031] New: Sphinx3 has a 10% regression going from GCC 4.5 to GCC 4.6 on powerpc meissner at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2012-02-14 19:41 ` wschmidt at gcc dot gnu.org
@ 2012-03-02 14:52 ` wschmidt at gcc dot gnu.org
  2012-03-02 15:04 ` wschmidt at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: wschmidt at gcc dot gnu.org @ 2012-03-02 14:52 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50031

--- Comment #7 from William J. Schmidt <wschmidt at gcc dot gnu.org> 2012-03-02 14:52:09 UTC ---
Author: wschmidt
Date: Fri Mar  2 14:51:58 2012
New Revision: 184787

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=184787
Log:
2012-03-02  Bill Schmidt <wschmidt@linux.vnet.ibm.com>
        Ira Rosen <irar@il.ibm.com>

    PR tree-optimization/50031
    PR tree-optimization/50969
    * targhooks.c (default_builtin_vectorization_cost): Handle
    vec_promote_demote.
    * target.h (enum vect_cost_for_stmt): Add vec_promote_demote.
    * tree-vect-loop.c (vect_get_single_scalar_iteraion_cost): Handle
    all types of reduction and pattern statements.
    (vect_estimate_min_profitable_iters): Likewise.
    * tree-vect-stmts.c (vect_model_promotion_demotion_cost): New function.
    (vect_model_store_cost): Use vec_perm rather than vector_stmt for
    statement cost.
    (vect_model_load_cost): Likewise.
    (vect_get_load_cost): Likewise; add dump logic for explicit realigns.
    (vectorizable_type_demotion): Call vect_model_promotion_demotion_cost.
    (vectorizable_type_promotion): Likewise.
    * config/spu/spu.c (spu_builtin_vectorization_cost): Handle
    vec_promote_demote.
    * config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
    * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Update
    vec_perm for VSX and handle vec_promote_demote.


Modified:
    branches/gcc-4_6-branch/gcc/ChangeLog
    branches/gcc-4_6-branch/gcc/config/i386/i386.c
    branches/gcc-4_6-branch/gcc/config/rs6000/rs6000.c
    branches/gcc-4_6-branch/gcc/config/spu/spu.c
    branches/gcc-4_6-branch/gcc/target.h
    branches/gcc-4_6-branch/gcc/targhooks.c
    branches/gcc-4_6-branch/gcc/tree-vect-loop.c
    branches/gcc-4_6-branch/gcc/tree-vect-stmts.c


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/50031] Sphinx3 has a 10% regression going from GCC 4.5 to GCC 4.6 on powerpc
  2011-08-09 17:34 [Bug tree-optimization/50031] New: Sphinx3 has a 10% regression going from GCC 4.5 to GCC 4.6 on powerpc meissner at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2012-03-02 14:52 ` wschmidt at gcc dot gnu.org
@ 2012-03-02 15:04 ` wschmidt at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: wschmidt at gcc dot gnu.org @ 2012-03-02 15:04 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50031

William J. Schmidt <wschmidt at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |FIXED

--- Comment #8 from William J. Schmidt <wschmidt at gcc dot gnu.org> 2012-03-02 15:02:23 UTC ---
I'm going to mark this fixed at this point, but there are still a couple of
future work items arising from the original issue that should be considered:

1) Loop versioning for alignment.

2) Better cost modeling (on powerpc64, at least) for pipeline limitations on
vector permute and certain other instructions.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-03-02 15:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-09 17:34 [Bug tree-optimization/50031] New: Sphinx3 has a 10% regression going from GCC 4.5 to GCC 4.6 on powerpc meissner at gcc dot gnu.org
2011-08-09 17:38 ` [Bug tree-optimization/50031] " meissner at gcc dot gnu.org
2011-08-10  6:36 ` irar at il dot ibm.com
2011-08-10  9:14 ` rguenth at gcc dot gnu.org
2012-02-10 16:41 ` wschmidt at gcc dot gnu.org
2012-02-13 18:26 ` wschmidt at gcc dot gnu.org
2012-02-14 19:41 ` wschmidt at gcc dot gnu.org
2012-03-02 14:52 ` wschmidt at gcc dot gnu.org
2012-03-02 15:04 ` wschmidt at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).