[Bug tree-optimization/61034] New: Optimizing takes too many passes

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/61034] New: Optimizing takes too many passes
@ 2014-05-02 10:52 glisse at gcc dot gnu.org
  2014-05-05 10:25 ` [Bug tree-optimization/61034] " rguenth at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2014-05-02 10:52 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61034

            Bug ID: 61034
           Summary: Optimizing takes too many passes
           Product: gcc
           Version: 4.10.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: glisse at gcc dot gnu.org

Hello,

when I compile the code below (-O3), I have plenty of calls to malloc left. If
I make the final expression simpler, they disappear. If I add some more
fre/copy_prop/dse/dce passes, they eventually get optimized, but the number of
passes required seems to increase with the size of the expression, which
obviously doesn't scale. Cycling through the passes seems like an obvious way
to work around this. Otherwise, I don't really know which pass to blame this
on. Maybe FRE could do better, it leaves some:

  _79 = _73; 
  _79->count = _81;
[things that don't alias, small if-then-else]
  _86 = _73;
  _87 = _86->count;

which PRE will handle later after other passes have cleaned it up a bit and
removed everything in the middle []. But it isn't at all obvious this is the
right place to improve things.

The code is a reference counted double. I was trying to optimize something more
complicated, but the simpler things have to come first...


#define assume(x) if(!(x))__builtin_unreachable()
inline void* operator new(__SIZE_TYPE__ n){ return __builtin_malloc(n); }
inline void operator delete(void *p) { __builtin_free(p); }
struct O {
  double num;
  int count;
};
struct I {
  O *o;
  I(double d = 0) : o (new O) { o->num = d; o->count = 1; }
  I(I const&i) { assume(i.o->count >= 1); o = i.o; ++o->count; }
  I& operator=(I const&i) { I(i).swap(*this); return *this; }
  ~I() { if (--o->count == 0) delete o; }
  void swap(I& i) { O *tmp = o; o = i.o; i.o = tmp; }
  I& operator*= (I const&i) {
    if (o->count > 1) *this = I(o->num);
    o->num *= i.o->num;
    return *this;
  }
  I& operator-= (I const&i) {
    if (o->count > 1) *this = I(o->num);
    o->num -= i.o->num;
    return *this;
  }
};
inline I operator* (I a, I const&b) { return a *= b; }
inline I operator- (I a, I const&b) { return a -= b; }
inline bool operator< (I const&a, I const&b) { return a.o->num < b.o->num; }

bool f(I a, I b, I c, I d) {
  return (a * d - b * c) * (a * b - c * d) < 42;
}


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/61034] Optimizing takes too many passes
  2014-05-02 10:52 [Bug tree-optimization/61034] New: Optimizing takes too many passes glisse at gcc dot gnu.org
@ 2014-05-05 10:25 ` rguenth at gcc dot gnu.org
  2014-05-05 10:33 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-05-05 10:25 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61034

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Actually it's the conditional free that "clobbers" the value as _83 may
point to the same thing as _86 (so we don't CSE (possible) use-after-frees).

  if (_85 == 0)
    goto <bb 6>;
  else     
    goto <bb 7>;

  <bb 6>:  
  # .MEM_287 = VDEF <.MEM_286>
  # USE = nonlocal
  # CLB = nonlocal
  __builtin_free (_83);

  <bb 7>:  
  # .MEM_40 = PHI <.MEM_286(5), .MEM_287(6)>

and we don't use VN to improve that disambiguation result (we know that
_83 == _2 and that doesn't point to the malloc result in question).


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/61034] Optimizing takes too many passes
  2014-05-02 10:52 [Bug tree-optimization/61034] New: Optimizing takes too many passes glisse at gcc dot gnu.org
  2014-05-05 10:25 ` [Bug tree-optimization/61034] " rguenth at gcc dot gnu.org
@ 2014-05-05 10:33 ` rguenth at gcc dot gnu.org
  2014-05-05 10:48 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-05-05 10:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61034

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
points-to cannot be improved here as it is flow-insensitive for memory
accesses and the pointers are copied:

  D.2327.o = _2;
  D.2327.o = _79;

thus they blend together.

So we have to improve alias walking by also making get_continuation_for_phi
(and callees) use the 'translate' callback and improve the 'translate'
callback in VN to handle calls (which it doesn't at the moment, and it's
also somewhat difficult to do that ... if done generically, we can of
course special-case builtins we can ignore for CSE/PRE purposes here).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/61034] Optimizing takes too many passes
  2014-05-02 10:52 [Bug tree-optimization/61034] New: Optimizing takes too many passes glisse at gcc dot gnu.org
  2014-05-05 10:25 ` [Bug tree-optimization/61034] " rguenth at gcc dot gnu.org
  2014-05-05 10:33 ` rguenth at gcc dot gnu.org
@ 2014-05-05 10:48 ` rguenth at gcc dot gnu.org
  2014-05-05 11:56 ` glisse at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-05-05 10:48 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61034

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 32736
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32736&action=edit
hack

patch that does the job (but has wrong-code issues, so I need to add a new
hook to walk_non_aliased_vuses instead).

  _88 = 1;

you might want to check whether that improves things and point me to stuff
that is left (FRE related).


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/61034] Optimizing takes too many passes
  2014-05-02 10:52 [Bug tree-optimization/61034] New: Optimizing takes too many passes glisse at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2014-05-05 10:48 ` rguenth at gcc dot gnu.org
@ 2014-05-05 11:56 ` glisse at gcc dot gnu.org
  2014-05-05 12:19 ` rguenther at suse dot de
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2014-05-05 11:56 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61034

--- Comment #5 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #4)
> you might want to check whether that improves things and point me to stuff
> that is left (FRE related).

The fre2 dump looks much nicer (the .optimized dump is a bit surprising, I
would need more time to look at it). It is not sufficient, but it clearly
helps.

In the fre2 dump, I still see (skipping many lines, I may have skipped some
essential ones):

  D.2325.o = _4;
  D.2325.o = _101;
  _105 = _4;
  __builtin_free (_105);
  _96 = D.2325.o;

(instead of _96 = _101)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/61034] Optimizing takes too many passes
  2014-05-02 10:52 [Bug tree-optimization/61034] New: Optimizing takes too many passes glisse at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2014-05-05 11:56 ` glisse at gcc dot gnu.org
@ 2014-05-05 12:19 ` rguenther at suse dot de
  2014-05-05 13:41 ` glisse at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenther at suse dot de @ 2014-05-05 12:19 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61034

--- Comment #6 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 5 May 2014, glisse at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61034
> 
> --- Comment #5 from Marc Glisse <glisse at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #4)
> > you might want to check whether that improves things and point me to stuff
> > that is left (FRE related).
> 
> The fre2 dump looks much nicer (the .optimized dump is a bit surprising, I
> would need more time to look at it). It is not sufficient, but it clearly
> helps.
> 
> In the fre2 dump, I still see (skipping many lines, I may have skipped some
> essential ones):
> 
>   D.2325.o = _4;
>   D.2325.o = _101;
>   _105 = _4;
>   __builtin_free (_105);
>   _96 = D.2325.o;
> 
> (instead of _96 = _101)

that's a conditional assignment AFAICS


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/61034] Optimizing takes too many passes
  2014-05-02 10:52 [Bug tree-optimization/61034] New: Optimizing takes too many passes glisse at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2014-05-05 12:19 ` rguenther at suse dot de
@ 2014-05-05 13:41 ` glisse at gcc dot gnu.org
  2014-05-07 11:08 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2014-05-05 13:41 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61034

--- Comment #7 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #6)
> that's a conditional assignment AFAICS

Ah, you are right of course. It shouldn't be conditional, but it will take a
VRP pass to notice that. If I schedule another FRE right after VRP1, things
optimize nicely, and after some cleanup by DOM+DSE, DCE2 can remove all
malloc+free. However, if I don't add this extra FRE pass, we somehow don't
manage. Note that in the PRE dump, with just your patch (no extra pass), I see:

  pretmp_92 = 1;
  _235 = pretmp_92;
  if (_235 == 0)

and these conditions seem to be what prevents us from finishing the job.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/61034] Optimizing takes too many passes
  2014-05-02 10:52 [Bug tree-optimization/61034] New: Optimizing takes too many passes glisse at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2014-05-05 13:41 ` glisse at gcc dot gnu.org
@ 2014-05-07 11:08 ` rguenth at gcc dot gnu.org
  2014-05-07 11:50 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-05-07 11:08 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61034

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Marc Glisse from comment #7)
> (In reply to rguenther@suse.de from comment #6)
> > that's a conditional assignment AFAICS
> 
> Ah, you are right of course. It shouldn't be conditional, but it will take a
> VRP pass to notice that. If I schedule another FRE right after VRP1, things
> optimize nicely, and after some cleanup by DOM+DSE, DCE2 can remove all
> malloc+free. However, if I don't add this extra FRE pass, we somehow don't
> manage. Note that in the PRE dump, with just your patch (no extra pass), I
> see:
> 
>   pretmp_92 = 1;
>   _235 = pretmp_92;
>   if (_235 == 0)
> 
> and these conditions seem to be what prevents us from finishing the job.

Yeah.  Looks somewhat tricky, but I'll play with it.  Meanwhile testing
a proper patch for the first issue.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/61034] Optimizing takes too many passes
  2014-05-02 10:52 [Bug tree-optimization/61034] New: Optimizing takes too many passes glisse at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2014-05-07 11:08 ` rguenth at gcc dot gnu.org
@ 2014-05-07 11:50 ` rguenth at gcc dot gnu.org
  2014-05-07 14:19 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-05-07 11:50 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61034

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #8)
> (In reply to Marc Glisse from comment #7)
> > (In reply to rguenther@suse.de from comment #6)
> > > that's a conditional assignment AFAICS
> > 
> > Ah, you are right of course. It shouldn't be conditional, but it will take a
> > VRP pass to notice that. If I schedule another FRE right after VRP1, things
> > optimize nicely, and after some cleanup by DOM+DSE, DCE2 can remove all
> > malloc+free. However, if I don't add this extra FRE pass, we somehow don't
> > manage. Note that in the PRE dump, with just your patch (no extra pass), I
> > see:
> > 
> >   pretmp_92 = 1;
> >   _235 = pretmp_92;
> >   if (_235 == 0)
> > 
> > and these conditions seem to be what prevents us from finishing the job.
> 
> Yeah.  Looks somewhat tricky, but I'll play with it.  Meanwhile testing
> a proper patch for the first issue.

Ok, so we have after PRE

<bb 39>:
_240 = 2;
_241 = 1;
MEM[(struct O *)_73].count = _241;
if (1 == 0)
  goto <bb 40>;
else
  goto <bb 53>;

<bb 53>:
goto <bb 41>;

<bb 40>:
__builtin_free (_73);
pretmp_48 = MEM[(struct O *)_73].count;

<bb 41>:
# prephitmp_35 = PHI <1(53), pretmp_48(40)>
D.2328 ={v} {CLOBBER};
D.2328 ={v} {CLOBBER};
_242 = prephitmp_35;
if (_242 == 1)
  goto <bb 42>;
else
  goto <bb 54>;

and only CFG-cleanup will simplify the PHI node by removing the unreachable
path and thus we end up with

_242 = 1;
if (_242 == 1)

of course the above shows another case where free() bogusly is thought
to clobber the load from count.  But in general as SCCVN/PRE is not
predicate aware (or optimistically treating edges as not executed) the
issue cannot always be avoided.  Applying copy-propagation during
elimination would make CFG-cleanup do the job though.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/61034] Optimizing takes too many passes
  2014-05-02 10:52 [Bug tree-optimization/61034] New: Optimizing takes too many passes glisse at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2014-05-07 11:50 ` rguenth at gcc dot gnu.org
@ 2014-05-07 14:19 ` rguenth at gcc dot gnu.org
  2014-05-07 15:14 ` glisse at gcc dot gnu.org
  2014-05-08 10:00 ` rguenther at suse dot de
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-05-07 14:19 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61034

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Wed May  7 14:19:14 2014
New Revision: 210160

URL: http://gcc.gnu.org/viewcvs?rev=210160&root=gcc&view=rev
Log:
2014-05-07  Richard Biener  <rguenther@suse.de>

    PR tree-optimization/61034
    * tree-ssa-alias.c (call_may_clobber_ref_p_1): Export.
    (maybe_skip_until): Use translate to take into account
    lattices when trying to do disambiguations.
    (get_continuation_for_phi_1): Likewise.
    (get_continuation_for_phi): Adjust for added translate
    arguments.
    (walk_non_aliased_vuses): Likewise.
    * tree-ssa-alias.h (get_continuation_for_phi): Adjust
    prototype.
    (walk_non_aliased_vuses): Likewise.
    (call_may_clobber_ref_p_1): Declare.
    * tree-ssa-sccvn.c (vn_reference_lookup_3): Also
    disambiguate against calls.  Stop early if we are
    only supposed to disambiguate.
    * tree-ssa-pre.c (translate_vuse_through_block): Adjust.

    * g++.dg/tree-ssa/pr61034.C: New testcase.

Added:
    trunk/gcc/testsuite/g++.dg/tree-ssa/pr61034.C
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-alias.c
    trunk/gcc/tree-ssa-alias.h
    trunk/gcc/tree-ssa-pre.c
    trunk/gcc/tree-ssa-sccvn.c


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/61034] Optimizing takes too many passes
  2014-05-02 10:52 [Bug tree-optimization/61034] New: Optimizing takes too many passes glisse at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2014-05-07 14:19 ` rguenth at gcc dot gnu.org
@ 2014-05-07 15:14 ` glisse at gcc dot gnu.org
  2014-05-08 10:00 ` rguenther at suse dot de
  10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2014-05-07 15:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61034

--- Comment #11 from Marc Glisse <glisse at gcc dot gnu.org> ---
The committed patch doesn't seem to optimize as well as the hack. fre2 (- is
the hack and + is the new trunk, unless I confused my directories):

-  _8 = _98;
+  _8 = MEM[(const struct I &)b_9(D)].o;

-  _224 = _201;
+  _224 = _18->count;

etc

I assume that's because call_may_clobber_ref_p_1 sometimes says that free
clobbers its argument while the hack was assuming it never does. "free" is
strange, somehow here we would want call_may_clobber_ref_p_1 to return false
(we can't be reading after a free, so we must have taken another path, this
free didn't run and didn't clobber anything) even when stmt_kills_ref_p_1 would
return true, but that would confuse other parts of gcc.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/61034] Optimizing takes too many passes
  2014-05-02 10:52 [Bug tree-optimization/61034] New: Optimizing takes too many passes glisse at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2014-05-07 15:14 ` glisse at gcc dot gnu.org
@ 2014-05-08 10:00 ` rguenther at suse dot de
  10 siblings, 0 replies; 12+ messages in thread
From: rguenther at suse dot de @ 2014-05-08 10:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61034

--- Comment #12 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 7 May 2014, glisse at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61034
> 
> --- Comment #11 from Marc Glisse <glisse at gcc dot gnu.org> ---
> The committed patch doesn't seem to optimize as well as the hack. fre2 (- is
> the hack and + is the new trunk, unless I confused my directories):
> 
> -  _8 = _98;
> +  _8 = MEM[(const struct I &)b_9(D)].o;
> 
> -  _224 = _201;
> +  _224 = _18->count;
> 
> etc
> 
> I assume that's because call_may_clobber_ref_p_1 sometimes says that free
> clobbers its argument while the hack was assuming it never does. "free" is
> strange, somehow here we would want call_may_clobber_ref_p_1 to return false
> (we can't be reading after a free, so we must have taken another path, this
> free didn't run and didn't clobber anything) even when stmt_kills_ref_p_1 would
> return true, but that would confuse other parts of gcc.

Yeah, I know.  call_may_clobber_ref_p_1 has to guard against sinking
a load across it as well.

I'm working on improving value-numbering in other ways at the moment,
we can come back and special-case free and va_end builtins later
in SCCVN.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-05-08 10:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-02 10:52 [Bug tree-optimization/61034] New: Optimizing takes too many passes glisse at gcc dot gnu.org
2014-05-05 10:25 ` [Bug tree-optimization/61034] " rguenth at gcc dot gnu.org
2014-05-05 10:33 ` rguenth at gcc dot gnu.org
2014-05-05 10:48 ` rguenth at gcc dot gnu.org
2014-05-05 11:56 ` glisse at gcc dot gnu.org
2014-05-05 12:19 ` rguenther at suse dot de
2014-05-05 13:41 ` glisse at gcc dot gnu.org
2014-05-07 11:08 ` rguenth at gcc dot gnu.org
2014-05-07 11:50 ` rguenth at gcc dot gnu.org
2014-05-07 14:19 ` rguenth at gcc dot gnu.org
2014-05-07 15:14 ` glisse at gcc dot gnu.org
2014-05-08 10:00 ` rguenther at suse dot de

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).