public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/38401]  New: TreeSSA-PRE load after store misoptimization
@ 2008-12-04 14:46 sergeid at il dot ibm dot com
  2008-12-04 15:16 ` [Bug tree-optimization/38401] " rguenth at gcc dot gnu dot org
                   ` (24 more replies)
  0 siblings, 25 replies; 26+ messages in thread
From: sergeid at il dot ibm dot com @ 2008-12-04 14:46 UTC (permalink / raw)
  To: gcc-bugs

There is an obvious redundant LOAD in the in the following code ( (*) line):

void f (int n, int *cond, int *res)
{
    int i;
    *res = 0;
    for (i = 0; i < n; i++)
        if (*cond)
            *res ^= 234; /* (*) */
}

GCSE LAS (load after store) catches it in RTL stage but it should be catched by
PRE in TreeSSA stage.


-- 
           Summary: TreeSSA-PRE load after store misoptimization
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: minor
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: sergeid at il dot ibm dot com
 GCC build triplet: powerpc
  GCC host triplet: powerpc
GCC target triplet: powerpc


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store misoptimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
@ 2008-12-04 15:16 ` rguenth at gcc dot gnu dot org
  2008-12-04 16:59 ` steven at gcc dot gnu dot org
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-12-04 15:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2008-12-04 15:14 -------
It works with -O3 (with partial-partial PRE enabled).  At least phi-translation
figures out that *res is zero on the incoming edge.

Un-leashing partial-PRE like with

Index: tree-ssa-pre.c
===================================================================
--- tree-ssa-pre.c      (revision 142431)
+++ tree-ssa-pre.c      (working copy)
@@ -3356,7 +3358,7 @@ do_partial_partial_insertion (basic_bloc
        {
          pre_expr *avail;
          unsigned int val;
-         bool by_all = true;
+         bool by_some = false;
          bool cant_insert = false;
          edge pred;
          basic_block bprime;
@@ -3404,11 +3406,13 @@ do_partial_partial_insertion (basic_bloc
                                                 vprime, NULL);
              if (edoubleprime == NULL)
                {
-                 by_all = false;
-                 break;
+                 avail[bprime->index] = eprime;
                }
              else
-               avail[bprime->index] = edoubleprime;
+               {
+                 avail[bprime->index] = edoubleprime;
+                 by_some = true;
+               }

            }

@@ -3416,7 +3420,7 @@ do_partial_partial_insertion (basic_bloc
             already existing along every predecessor, and
             it's defined by some predecessor, it is
             partially redundant.  */
-         if (!cant_insert && by_all && dbg_cnt (treepre_insert))
+         if (!cant_insert && by_some && dbg_cnt (treepre_insert))
            {
              pre_stats.pa_insert++;
              if (insert_into_preds_of_block (block, get_expression_id (expr),


fixes this.  But this may cause a lot of partial-PRE to happen.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dberlin at gcc dot gnu dot
                   |                            |org, rguenth at gcc dot gnu
                   |                            |dot org
           Severity|minor                       |enhancement
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
           Keywords|                            |missed-optimization, TREE
   Last reconfirmed|0000-00-00 00:00:00         |2008-12-04 15:14:42
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store misoptimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
  2008-12-04 15:16 ` [Bug tree-optimization/38401] " rguenth at gcc dot gnu dot org
@ 2008-12-04 16:59 ` steven at gcc dot gnu dot org
  2008-12-04 17:10 ` steven at gcc dot gnu dot org
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-04 16:59 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from steven at gcc dot gnu dot org  2008-12-04 16:58 -------
If RTL pre can catch this, then so should tree-PRE without enabling
partial-partial PRE.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store misoptimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
  2008-12-04 15:16 ` [Bug tree-optimization/38401] " rguenth at gcc dot gnu dot org
  2008-12-04 16:59 ` steven at gcc dot gnu dot org
@ 2008-12-04 17:10 ` steven at gcc dot gnu dot org
  2008-12-04 17:16 ` dberlin at dberlin dot org
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-04 17:10 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from steven at gcc dot gnu dot org  2008-12-04 17:08 -------
I do not see RTL PRE catch this on ia64, with or without -fgcse-las.

Can you show, please, the RTL dumps before and after GCSE?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store misoptimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (2 preceding siblings ...)
  2008-12-04 17:10 ` steven at gcc dot gnu dot org
@ 2008-12-04 17:16 ` dberlin at dberlin dot org
  2008-12-04 17:29 ` steven at gcc dot gnu dot org
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: dberlin at dberlin dot org @ 2008-12-04 17:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from dberlin at gcc dot gnu dot org  2008-12-04 17:14 -------
Subject: Re:  TreeSSA-PRE load after store misoptimization

That would be incorrect.
Partial partial (Partial antic, Partial Avail). PRE is necessary to
catch all the cases LCM does (and RTL PRE is LCM based).
LCM includes partial partial by default in it's dataflow equations.
In fact, it's mostly a waste of time, which is why it's only on at O3+
(LCM spends 30% of it's dataflow equations computing this, IIRC)


richi's code looks correct, i'm not sure why the by_all was in there
originally, since that would be partial antic, full avail, not partial
antic, partial avail.
My recollection is that getting the theoretical lifetime optimality
when doing partial antic, partial avail requires evaluating code
placement of phi nodes.


On Thu, Dec 4, 2008 at 11:58 AM, steven at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #2 from steven at gcc dot gnu dot org  2008-12-04 16:58 -------
> If RTL pre can catch this, then so should tree-PRE without enabling
> partial-partial PRE.
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store misoptimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (3 preceding siblings ...)
  2008-12-04 17:16 ` dberlin at dberlin dot org
@ 2008-12-04 17:29 ` steven at gcc dot gnu dot org
  2008-12-04 17:37 ` dberlin at dberlin dot org
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-04 17:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from steven at gcc dot gnu dot org  2008-12-04 17:27 -------
by_all was there because you made it so on purpose. From tree-ssa-pre.c:

"   For the partial anticipation case, we only perform insertion if it
   is partially anticipated in some block, and fully available in all
   of the predecessors.
"


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store misoptimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (4 preceding siblings ...)
  2008-12-04 17:29 ` steven at gcc dot gnu dot org
@ 2008-12-04 17:37 ` dberlin at dberlin dot org
  2008-12-04 17:55 ` sergeid at il dot ibm dot com
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: dberlin at dberlin dot org @ 2008-12-04 17:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from dberlin at gcc dot gnu dot org  2008-12-04 17:35 -------
Subject: Re:  TreeSSA-PRE load after store misoptimization

Yes, i'm aware, but again, that is because my recollection is doing
partial antic partial avail with lifetime optimality requires code
placement that we don't do.

On Thu, Dec 4, 2008 at 12:27 PM, steven at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #5 from steven at gcc dot gnu dot org  2008-12-04 17:27 -------
> by_all was there because you made it so on purpose. From tree-ssa-pre.c:
>
> "   For the partial anticipation case, we only perform insertion if it
>   is partially anticipated in some block, and fully available in all
>   of the predecessors.
> "
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401
>
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store misoptimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (5 preceding siblings ...)
  2008-12-04 17:37 ` dberlin at dberlin dot org
@ 2008-12-04 17:55 ` sergeid at il dot ibm dot com
  2008-12-04 18:17 ` steven at gcc dot gnu dot org
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: sergeid at il dot ibm dot com @ 2008-12-04 17:55 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from sergeid at il dot ibm dot com  2008-12-04 17:54 -------
Subject: Re:  TreeSSA-PRE load after store
 misoptimization

You're right, it worked for me only on powerpc. This is RTL snippet
_before_ elimination (load + xor + store):

...
(insn 20 19 21 5 ../loop.c:10 (set (reg:SI 128)
        (mem:SI (reg/v/f:SI 123 [ res ]) [2 S4 A32])) 324
{*movsi_internal1} (nil))

(insn 21 20 22 5 ../loop.c:10 (set (reg:SI 129)
        (xor:SI (reg:SI 128)
            (const_int 234 [0xea]))) 139 {*boolsi3_internal1} (nil))

(insn 22 21 23 5 ../loop.c:10 (set (mem:SI (reg/v/f:SI 123 [ res ]) [2 S4
A32])
        (reg:SI 129)) 324 {*movsi_internal1} (nil)):
...


And this is _after_ (xor + store only):
...
(insn 21 19 22 5 ../loop.c:10 (set (reg:SI 131)
        (xor:SI (reg:SI 131)
            (const_int 234 [0xea]))) 139 {*boolsi3_internal1}
(expr_list:REG_DEAD (reg:SI 131)
        (nil)))

(insn 22 21 23 5 ../loop.c:10 (set (mem:SI (reg/v/f:SI 123 [ res ]) [2 S4
A32])
        (reg:SI 131)) 324 {*movsi_internal1} (expr_list:REG_DEAD (reg:SI
129)])
        (nil)))
...

On x86 it produces complex set instructions:
...
(insn 18 17 19 5 ../loop.c:10 (parallel [
            (set (mem:SI (reg/v/f:DI 62 [ res ]) [2 S4 A32])
                (xor:SI (mem:SI (reg/v/f:DI 62 [ res ]) [2 S4 A32])
                    (const_int 234 [0xea])))
            (clobber (reg:CC 17 flags))
        ]) 417 {*xorsi_1} (expr_list:REG_UNUSED (reg:CC 17 flags)])
        (nil)))
...
and that's why (probably) GCSE can't optimize it.


PS. BTW, I _do_ compile it with "-O3" and tree PRE doesn't catch it.


"steven at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> wrote on
04/12/2008 19:08:57:

> I do not see RTL PRE catch this on ia64, with or without -fgcse-las.
>
> Can you show, please, the RTL dumps before and after GCSE?
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401
>
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store misoptimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (6 preceding siblings ...)
  2008-12-04 17:55 ` sergeid at il dot ibm dot com
@ 2008-12-04 18:17 ` steven at gcc dot gnu dot org
  2008-12-08 10:04 ` sergeid at il dot ibm dot com
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-04 18:17 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from steven at gcc dot gnu dot org  2008-12-04 18:16 -------
Created an attachment (id=16828)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16828&action=view)
.gcse1 dump of r142405 on ia64-linux

I still don't see why this is caught on powerpc by RTL PRE, but not on ia64
(note *ia64*, not x86).  I compile with -O3 -fgcse-las.  The compiler is
yesterday's trunk on ia64-unknown-linux-gnu.  The .gcse1 dump is attached.  Why
is it optimized for you on powerpc but not for me on ia64?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store misoptimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (7 preceding siblings ...)
  2008-12-04 18:17 ` steven at gcc dot gnu dot org
@ 2008-12-08 10:04 ` sergeid at il dot ibm dot com
  2008-12-08 10:09 ` sergeid at il dot ibm dot com
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: sergeid at il dot ibm dot com @ 2008-12-08 10:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from sergeid at il dot ibm dot com  2008-12-08 10:03 -------
Subject: Re:  TreeSSA-PRE load after store
 misoptimization

Can you post your gcc configuration options?
I've created and attached a little patch which adds some more information
to dump file. Can you apply it and send me the new .gcse1 dump? Then I'll
compare it with mine and may be we'll find the reason.

"steven at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> wrote on
04/12/2008 20:16:05:

> I still don't see why this is caught on powerpc by RTL PRE, but not on
ia64
> (note *ia64*, not x86).  I compile with -O3 -fgcse-las.  The compiler is
> yesterday's trunk on ia64-unknown-linux-gnu.  The .gcse1 dump is
> attached.  Why
> is it optimized for you on powerpc but not for me on ia64?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store misoptimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (8 preceding siblings ...)
  2008-12-08 10:04 ` sergeid at il dot ibm dot com
@ 2008-12-08 10:09 ` sergeid at il dot ibm dot com
  2008-12-08 11:55 ` sergeid at il dot ibm dot com
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: sergeid at il dot ibm dot com @ 2008-12-08 10:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from sergeid at il dot ibm dot com  2008-12-08 10:08 -------
Subject: Re:  TreeSSA-PRE load after store
 misoptimization

Sorry, forgot to attach the patch.(See attached file:
gcse-las-counter.patch)


------- Comment #11 from sergeid at il dot ibm dot com  2008-12-08 10:08 -------
Created an attachment (id=16850)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16850&action=view)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store misoptimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (9 preceding siblings ...)
  2008-12-08 10:09 ` sergeid at il dot ibm dot com
@ 2008-12-08 11:55 ` sergeid at il dot ibm dot com
  2008-12-08 12:42 ` rguenther at suse dot de
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: sergeid at il dot ibm dot com @ 2008-12-08 11:55 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from sergeid at il dot ibm dot com  2008-12-08 11:53 -------
I have to mention that tree PRE still don't catch this LOAD with -O3. 
Though the patch Richard posted does the job.

(In reply to comment #1)
> It works with -O3 (with partial-partial PRE enabled).  At least phi-translation
> figures out that *res is zero on the incoming edge.
> 
> Un-leashing partial-PRE like with


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store misoptimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (10 preceding siblings ...)
  2008-12-08 11:55 ` sergeid at il dot ibm dot com
@ 2008-12-08 12:42 ` rguenther at suse dot de
  2008-12-15  7:18 ` sergeid at il dot ibm dot com
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: rguenther at suse dot de @ 2008-12-08 12:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from rguenther at suse dot de  2008-12-08 12:40 -------
Subject: Re:  TreeSSA-PRE load after store
 misoptimization

On Mon, 8 Dec 2008, sergeid at il dot ibm dot com wrote:

> ------- Comment #12 from sergeid at il dot ibm dot com  2008-12-08 11:53 -------
> I have to mention that tree PRE still don't catch this LOAD with -O3. 
> Though the patch Richard posted does the job.

Sorry if the comment wasn't clear - -O3 doesn't catch it, but this is
only because of the implementation decision fixed with that patch.

> (In reply to comment #1)
> > It works with -O3 (with partial-partial PRE enabled).  At least phi-translation
> > figures out that *res is zero on the incoming edge.
> > 
> > Un-leashing partial-PRE like with


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store misoptimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (11 preceding siblings ...)
  2008-12-08 12:42 ` rguenther at suse dot de
@ 2008-12-15  7:18 ` sergeid at il dot ibm dot com
  2008-12-15 17:39 ` steven at gcc dot gnu dot org
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: sergeid at il dot ibm dot com @ 2008-12-15  7:18 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from sergeid at il dot ibm dot com  2008-12-15 07:17 -------
Ok, since this case is the only one where RTL PRE (gcse-las) improves
performance and it can be dealt with at the TreeSSA level, it should be ok to
remove gcse-las from mainline and keep this PR open? 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store misoptimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (12 preceding siblings ...)
  2008-12-15  7:18 ` sergeid at il dot ibm dot com
@ 2008-12-15 17:39 ` steven at gcc dot gnu dot org
  2008-12-21  7:46 ` [Bug tree-optimization/38401] TreeSSA-PRE load after store missed optimization sergeid at il dot ibm dot com
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: steven at gcc dot gnu dot org @ 2008-12-15 17:39 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from steven at gcc dot gnu dot org  2008-12-15 17:38 -------
Re. comment #14: Yes, I suppose so.  Why do you want to remove gcse-las from
mainline.  Not that I'm against it -- ideally RTL gcse.c would not work on
memory at all anymore -- but I wouldn't remove gcse-las until we catch in the
GIMPLE optimizers as much as possible of the things we still need gcse-las for.

It seems to me, btw, that it might be easier to teach GIMPLE loop invariant
code motion about this transformation.  Adding this in GIMPLE PRE might be a
little too expensive...?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store missed optimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (13 preceding siblings ...)
  2008-12-15 17:39 ` steven at gcc dot gnu dot org
@ 2008-12-21  7:46 ` sergeid at il dot ibm dot com
  2008-12-29 23:09 ` rguenth at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: sergeid at il dot ibm dot com @ 2008-12-21  7:46 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from sergeid at il dot ibm dot com  2008-12-21 07:44 -------
(In reply to comment #15)
> Re. comment #14: Yes, I suppose so.  Why do you want to remove gcse-las from
> mainline.  Not that I'm against it -- ideally RTL gcse.c would not work on
> memory at all anymore -- but I wouldn't remove gcse-las until we catch in the
> GIMPLE optimizers as much as possible of the things we still need gcse-las for.

For the time being this is the only case I've found out which is missed by
tree-PRE and caught by GCSE-LAS. As you pointed out, GCSE-LAS doesn't seem to
help much.

> It seems to me, btw, that it might be easier to teach GIMPLE loop invariant
> code motion about this transformation.  Adding this in GIMPLE PRE might be a
> little too expensive...?

That may be; I was just noting that such redundancies should be caught
somewhere at the GIMPLE stage.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store missed optimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (14 preceding siblings ...)
  2008-12-21  7:46 ` [Bug tree-optimization/38401] TreeSSA-PRE load after store missed optimization sergeid at il dot ibm dot com
@ 2008-12-29 23:09 ` rguenth at gcc dot gnu dot org
  2009-01-12 18:09 ` amylaar at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-12-29 23:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from rguenth at gcc dot gnu dot org  2008-12-29 22:17 -------
I think enabling partial PRE to do it is appropriate (with at most inserting
on one edge).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store missed optimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (15 preceding siblings ...)
  2008-12-29 23:09 ` rguenth at gcc dot gnu dot org
@ 2009-01-12 18:09 ` amylaar at gcc dot gnu dot org
  2009-01-13  8:19 ` steven at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: amylaar at gcc dot gnu dot org @ 2009-01-12 18:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #18 from amylaar at gcc dot gnu dot org  2009-01-12 18:09 -------
(In reply to comment #17)
> I think enabling partial PRE to do it is appropriate (with at most inserting
> on one edge).

I think the abstraction with tree-ssa and cfglayout mode has gone too far.
We no longer have visibility of the costs of branches, or of opportunities
for conditional execution.
I suspect that the 35% speed regressions we see on EEMBC fbital at -O3 are
also to blame on overzealous tree-pre partial-partial redundancy eliminations.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store missed optimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (16 preceding siblings ...)
  2009-01-12 18:09 ` amylaar at gcc dot gnu dot org
@ 2009-01-13  8:19 ` steven at gcc dot gnu dot org
  2009-01-13 14:01 ` amylaar at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: steven at gcc dot gnu dot org @ 2009-01-13  8:19 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #19 from steven at gcc dot gnu dot org  2009-01-13 08:19 -------
Joern, nobody is forcing you to follow the crowd if you think the crowd is
going in the wrong direction.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store missed optimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (17 preceding siblings ...)
  2009-01-13  8:19 ` steven at gcc dot gnu dot org
@ 2009-01-13 14:01 ` amylaar at gcc dot gnu dot org
  2009-01-13 14:12 ` amylaar at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: amylaar at gcc dot gnu dot org @ 2009-01-13 14:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #20 from amylaar at gcc dot gnu dot org  2009-01-13 14:00 -------
(In reply to comment #19)
> Joern, nobody is forcing you to follow the crowd if you think the crowd is
> going in the wrong direction.

I have evidence that the direction is wrong.  I added a new option to disable
partial-partial pre while keeping the rest of -O3 and -ftree-pre enabled.
This got EEMBC bitmnp back to the level of 4.2.1 (unmodified 4.4.0 needs 2.55
times the amout of cycles).  fbital00 also improved, although it regained only
a little of the performance that it lost since 4.2.1 - cycle count is now down
6% against unmodified gcc 4.4.0 .  Overall the disabling of partial-partial
is also beneficial for EEMBC; there are a few other benchmarks that improved
5 or 6 percent, and the worst regressions are one and two percent.

These are the changes in the geometric means of cycle counts by disabling
partial-partial redundancy elimination per EEMBC benchmark suite:

automotive: 5.73% improvement
consumer:   0.04% improvement
networking: 0.37% improvement
office:     1.39% worse
telecom:    0.00% worse


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store missed optimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (18 preceding siblings ...)
  2009-01-13 14:01 ` amylaar at gcc dot gnu dot org
@ 2009-01-13 14:12 ` amylaar at gcc dot gnu dot org
  2009-01-13 14:29 ` rguenther at suse dot de
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: amylaar at gcc dot gnu dot org @ 2009-01-13 14:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #21 from amylaar at gcc dot gnu dot org  2009-01-13 14:11 -------
(In reply to comment #20)
> office:     1.39% worse

Actually, this is the EEMBC version with bezier01, where the entire benchmark
gets optized away and thus tiny changes in the cost of the set-up code make
noticeable differences.  Comparing the geometric means with bezier01 left out
gives:

automotive: 5.73% improvement
consumer:   0.04% improvement
networking: 0.37% improvement
office:     0.90% worse
telecom:    0.00% worse


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store missed optimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (19 preceding siblings ...)
  2009-01-13 14:12 ` amylaar at gcc dot gnu dot org
@ 2009-01-13 14:29 ` rguenther at suse dot de
  2009-01-13 14:59 ` amylaar at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: rguenther at suse dot de @ 2009-01-13 14:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #22 from rguenther at suse dot de  2009-01-13 14:29 -------
Subject: Re:  TreeSSA-PRE load after store missed
 optimization

On Tue, 13 Jan 2009, amylaar at gcc dot gnu dot org wrote:

> ------- Comment #21 from amylaar at gcc dot gnu dot org  2009-01-13 14:11 -------
> (In reply to comment #20)
> > office:     1.39% worse
> 
> Actually, this is the EEMBC version with bezier01, where the entire benchmark
> gets optized away and thus tiny changes in the cost of the set-up code make
> noticeable differences.  Comparing the geometric means with bezier01 left out
> gives:
> 
> automotive: 5.73% improvement
> consumer:   0.04% improvement
> networking: 0.37% improvement
> office:     0.90% worse
> telecom:    0.00% worse

If you post a patch to add the option to enable/disable partial-PRE I will
happily review and approve it for 4.4.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store missed optimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (20 preceding siblings ...)
  2009-01-13 14:29 ` rguenther at suse dot de
@ 2009-01-13 14:59 ` amylaar at gcc dot gnu dot org
  2009-01-18 21:34 ` steven at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 26+ messages in thread
From: amylaar at gcc dot gnu dot org @ 2009-01-13 14:59 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #23 from amylaar at gcc dot gnu dot org  2009-01-13 14:58 -------
(In reply to comment #22)
> If you post a patch to add the option to enable/disable partial-PRE I will
> happily review and approve it for 4.4.

I'd be happy to post the patch, but we (ARC) are still waiting for the
FSF acknowledgement that our copyright assignment has been filed.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store missed optimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (21 preceding siblings ...)
  2009-01-13 14:59 ` amylaar at gcc dot gnu dot org
@ 2009-01-18 21:34 ` steven at gcc dot gnu dot org
  2009-02-02 20:02 ` amylaar at gcc dot gnu dot org
  2009-07-30 23:30 ` amylaar at gcc dot gnu dot org
  24 siblings, 0 replies; 26+ messages in thread
From: steven at gcc dot gnu dot org @ 2009-01-18 21:34 UTC (permalink / raw)
  To: gcc-bugs



-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |steven at gcc dot gnu dot
                   |dot org                     |org
             Status|NEW                         |ASSIGNED
   Last reconfirmed|2008-12-04 15:14:42         |2009-01-18 21:34:08
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store missed optimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (22 preceding siblings ...)
  2009-01-18 21:34 ` steven at gcc dot gnu dot org
@ 2009-02-02 20:02 ` amylaar at gcc dot gnu dot org
  2009-07-30 23:30 ` amylaar at gcc dot gnu dot org
  24 siblings, 0 replies; 26+ messages in thread
From: amylaar at gcc dot gnu dot org @ 2009-02-02 20:02 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #24 from amylaar at gcc dot gnu dot org  2009-02-02 20:02 -------
(In reply to comment #22)
> If you post a patch to add the option to enable/disable partial-PRE I will
> happily review and approve it for 4.4.

I experimented using Seteven Bosscher's patch as a starting point and
augmenting the test in do_regular_insertion with a speed based heuristic
to throttle the calls to insert_into_preds_of_block.  That was worse than
turning off partial-PRE altogether.  Then I added the heuristic also in
do_partial_insertion, which worked better.  Then I tried to remove the speed
heuristoc from do_regular_insertion, and taht change only very tiny, although
overall beneficial, effects.

To get meaningful results we had to modify the linking a bit to reduce
instruction cache effects: the most needed libgcc function were pulled out
early and placed next to the core benchmark objects.

applying heuristic only to partial-partial vs. not applying it at all is...
automotive: 6.55389% faster
consumer:   0.00048% worse
networking: 0.03793% faster
office:     0.07269% worse
telecom:    0.00000% faster

applying heuristic only to partial-partial vs. applying it in general is...
automotive: 0.00674% faster
consumer:   0.00076% worse
networking: 0.01746% faster
office:     0.00440% worse
telecom:    0.00002% worse

Unfortunately, there is still no word from the FSF on what they did with our
Copyright Assignment.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Bug tree-optimization/38401] TreeSSA-PRE load after store missed optimization
  2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
                   ` (23 preceding siblings ...)
  2009-02-02 20:02 ` amylaar at gcc dot gnu dot org
@ 2009-07-30 23:30 ` amylaar at gcc dot gnu dot org
  24 siblings, 0 replies; 26+ messages in thread
From: amylaar at gcc dot gnu dot org @ 2009-07-30 23:30 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #25 from amylaar at gcc dot gnu dot org  2009-07-30 23:30 -------
(In reply to comment #24)
> Unfortunately, there is still no word from the FSF on what they did with our
> Copyright Assignment.

As already mentioned in PR 38785, I've posted the patch here:
http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00250.html

It is also integrated in the milepost-branch.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2009-07-30 23:30 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-04 14:46 [Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization sergeid at il dot ibm dot com
2008-12-04 15:16 ` [Bug tree-optimization/38401] " rguenth at gcc dot gnu dot org
2008-12-04 16:59 ` steven at gcc dot gnu dot org
2008-12-04 17:10 ` steven at gcc dot gnu dot org
2008-12-04 17:16 ` dberlin at dberlin dot org
2008-12-04 17:29 ` steven at gcc dot gnu dot org
2008-12-04 17:37 ` dberlin at dberlin dot org
2008-12-04 17:55 ` sergeid at il dot ibm dot com
2008-12-04 18:17 ` steven at gcc dot gnu dot org
2008-12-08 10:04 ` sergeid at il dot ibm dot com
2008-12-08 10:09 ` sergeid at il dot ibm dot com
2008-12-08 11:55 ` sergeid at il dot ibm dot com
2008-12-08 12:42 ` rguenther at suse dot de
2008-12-15  7:18 ` sergeid at il dot ibm dot com
2008-12-15 17:39 ` steven at gcc dot gnu dot org
2008-12-21  7:46 ` [Bug tree-optimization/38401] TreeSSA-PRE load after store missed optimization sergeid at il dot ibm dot com
2008-12-29 23:09 ` rguenth at gcc dot gnu dot org
2009-01-12 18:09 ` amylaar at gcc dot gnu dot org
2009-01-13  8:19 ` steven at gcc dot gnu dot org
2009-01-13 14:01 ` amylaar at gcc dot gnu dot org
2009-01-13 14:12 ` amylaar at gcc dot gnu dot org
2009-01-13 14:29 ` rguenther at suse dot de
2009-01-13 14:59 ` amylaar at gcc dot gnu dot org
2009-01-18 21:34 ` steven at gcc dot gnu dot org
2009-02-02 20:02 ` amylaar at gcc dot gnu dot org
2009-07-30 23:30 ` amylaar at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).