From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-481442-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 59659 invoked by alias); 24 Mar 2015 14:30:18 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 58672 invoked by uid 48); 24 Mar 2015 14:30:07 -0000
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression
Date: Tue, 24 Mar 2015 14:56:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: ipa
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 5.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-65076-4-jVZurpe8xU@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-65076-4@http.gcc.gnu.org/bugzilla/>
References: <bug-65076-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-03/txt/msg02586.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076
--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #10)
> I can re-confirm the 16% compile time regression.  I went through some
> compare.
> 
> $ wc -l *.ssa
> 299231 tramp3d-v4.ii.015t.ssa
> $ wc -l ../5/*.ssa
> 331115 ../5/tramp3d-v4.ii.018t.ssa
> 
> so as a lame compare, we already have 10% more statements to start with.
> Now einline
> 
> $ wc -l *.einline
> 692812 tramp3d-v4.ii.018t.einline
> $ wc -l ../5/*.einline
> 724090 ../5/tramp3d-v4.ii.026t.einline
> 
> so after einline we seem to have 4% statements more, we do about the same
> number of inlining:
> 
> $ grep Inlining tramp3d-v4.ii.*einline | wc -l
> 28003
> $ grep Inlining ../5/tramp3d-v4.ii.*einline | wc -l
> 28685
> 
> but at release_ssa we still have about 4% more.
> 
> $ wc -l *release_ssa*
> 348378 tramp3d-v4.ii.036t.release_ssa
> $ wc -l ../5/*release_ssa*
> 365689 ../5/tramp3d-v4.ii.043t.release_ssa
> 
> There is no difference in number of functions in ssa and release_ssa dumps. 
> What makes the functions bigger in GCC 5?
> 
> $ grep "^  .* = " *.release_ssa | wc -l
> 65028
> $ grep "^  .* = " ../5/*.release_ssa | wc -l
> 72636
> 
> The number of statements is about the same.
> 
> During the actual inlining GCC 4.9 reports:
>  Unit growth for small function inlining: 88536->114049 (28%)
> and
>  Unit growth for small function inlining: 87943->97699 (11%)
> 
> Statement count seems to remain 7% in .optimized dumps.  So perhaps the
> slowdown is not really that much caused by IPA passes as we somehow manage
> to produce more code out of C++ FE.
> 
> I looked for interesting differences in SSA dump.  Here are few:
> 
> -;; Function int __gthread_active_p() (_ZL18__gthread_active_pv,
> funcdef_no=312, decl_uid=8436, symbol_order=127)
> +;; Function int __gthread_active_p() (_ZL18__gthread_active_pv,
> funcdef_no=312, decl_uid=8537, cgraph_uid=127, symbol_order=127)
>  
>  int __gthread_active_p() ()
>  {
> -  bool _1;
> -  int _2;
> +  static void * const __gthread_active_ptr = (void *)
> __gthrw_pthread_cancel;
> +  void * __gthread_active_ptr.111_2;
> +  bool _3;
> +  int _4;
>  
>    <bb 2>:
> -  _1 = __gthrw_pthread_cancel != 0B;
> -  _2 = (int) _1;
> -  return _2;
> +  __gthread_active_ptr.111_2 = __gthread_active_ptr;
> +  _3 = __gthread_active_ptr.111_2 != 0B;
> +  _4 = (int) _3;
> +  return _4;
>  
>  }
> 
> ... this looks like header change, perhaps ...

Yep.  __gthrw_pthread_cancel is a function pointer (thsu constant)
while __gthread_active_ptr is a global variable.

>  ObserverEvent::~ObserverEvent() (struct ObserverEvent * const this)
>  {
> -  int _6;
> +  int (*__vtbl_ptr_type) () * _2;
> +  int _7;
>  
>    <bb 2>:
> -  this_3(D)->_vptr.ObserverEvent = &MEM[(void *)&_ZTV13ObserverEvent + 16B];
> -  *this_3(D) ={v} {CLOBBER};
> -  _6 = 0;
> -  if (_6 != 0)
> +  _2 = &_ZTV13ObserverEvent + 16;
> +  this_4(D)->_vptr.ObserverEvent = _2;
> +  MEM[(struct  &)this_4(D)] ={v} {CLOBBER};
> +  _7 = 0;
> +  if (_7 != 0)
> 
> ... extra temporary initializing vtbl pointer. This is repeated many times
> ...

This is because of

2015-03-20  Richard Biener  <rguenther@suse.de>

        PR middle-end/64715
...
        * gimplify.c (gimplify_expr): Remove premature folding of
        &X + CST to &MEM[&X, CST].

thus relatively recent.  It will be fixed up by ccp1.

> -;; Function static Unique::Value_t Unique::get() (_ZN6Unique3getEv,
> funcdef_no=3030, decl_uid=51649, symbol_order=884)
> +;; Function static Unique::Value_t Unique::get() (_ZN6Unique3getEv,
> funcdef_no=3030, decl_uid=51730, cgraph_uid=883, symbol_order=884)
>  
>  static Unique::Value_t Unique::get() ()
>  {
>    Value_t retval;
> -  long int next_s.83_2;
> -  long int next_s.84_3;
> -  long int next_s.85_4;
> -  Value_t _7;
> +  long int next_s.83_3;
> +  long int next_s.84_4;
> +  long int next_s.85_5;
> +  Value_t _9;
>  
>    <bb 2>:
> -  Pooma::DummyMutex::_ZN5Pooma10DummyMutex4lockEv.isra.26 ();
> -  next_s.83_2 = next_s;
> -  next_s.84_3 = next_s.83_2;
> -  next_s.85_4 = next_s.84_3 + 1;
> -  next_s = next_s.85_4;
> -  retval_6 = next_s.84_3;
> -  Pooma::DummyMutex::_ZN5Pooma10DummyMutex6unlockEv.isra.27 ();
> -  _7 = retval_6;
> -  return _7;
> +  Pooma::DummyMutex::lock (&mutex_s);
> +  next_s.83_3 = next_s;
> +  next_s.84_4 = next_s.83_3;
> +  next_s.85_5 = next_s.84_4 + 1;
> +  next_s = next_s.85_5;
> +  retval_7 = next_s.84_4;
> +  Pooma::DummyMutex::unlock (&mutex_s);
> +  _9 = retval_7;
> +  return _9;
>  
>  }
> 
> ... here we give up on ISRA....

I believe because of

2015-02-13  Ilya Enkovich  <ilya.enkovich@intel.com>

        PR tree-optimization/65002
        * tree-cfg.c (pass_data_fixup_cfg): Don't update
        SSA on start.
        * tree-sra.c (some_callers_have_no_vuse_p): New.
        (ipa_early_sra): Reject functions whose callers
        assume function is read only.

or related changes.

> and we have about twice as much EH:
> 
> $ grep "resx " tramp3d-v4.ii.*\.ssa | wc -l
> 4816
> $ grep "resx " ../5/tramp3d-v4.ii.*\.ssa | wc -l
> 8671
> 
> which however is optimized out at a time of release_ssa.

That's maybe because we emit more CLOBBERs initially (do we?)

> Another thing that we may consider to cleanup in next stage1 is to get rid
> of dead stores:
> 
> -  MEM[(struct new_allocator *)&D.561702] ={v} {CLOBBER};
> -  D.561702 ={v} {CLOBBER};
> -  D.561702 ={v} {CLOBBER};
> -  MEM[(struct new_allocator *)_2] ={v} {CLOBBER};
> -  MEM[(struct allocator *)_2] ={v} {CLOBBER};
> -  MEM[(struct _Alloc_hider *)_2] ={v} {CLOBBER};
> -  MEM[(struct basic_string *)_2] ={v} {CLOBBER};
> -  *_2 ={v} {CLOBBER};
> -  *this_1(D) ={v} {CLOBBER};
> +  MEM[(struct  &)&D.570046] ={v} {CLOBBER};
> +  MEM[(struct  &)&D.570046] ={v} {CLOBBER};
> +  D.570046 ={v} {CLOBBER};
> +  MEM[(struct  &)_2] ={v} {CLOBBER};
> +  MEM[(struct  &)_2] ={v} {CLOBBER};
> +  MEM[(struct  &)_2] ={v} {CLOBBER};
> +  MEM[(struct  &)_2] ={v} {CLOBBER};
> +  MEM[(struct  &)_2] ={v} {CLOBBER};
> +  MEM[(struct  &)this_1(D)] ={v} {CLOBBER};
> 
> Clobbers are dangerously common. There are 18K clobbers in release_ssa dump
> out of 65K assignments, that makes them to be 29% of all the code. The
> number of clobbers seems to go down only in tramp3d-v4.ii.166t.ehcleanup
> dump and we still get a lot of redundancies:

Yeah, well ... :/  I've already taught DCE to get rid of the really useless
ones...

>   <bb 32>:                                                                  
> 
>   D.581063 ={v} {CLOBBER};                                                  
> 
>   D.581063 ={v} {CLOBBER};                                                  
> 
>   D.164155 ={v} {CLOBBER};                                                  
> 
>   D.164155 ={v} {CLOBBER};                                                  
> 
>   operator delete [] (begbuf_18);                                           
> 
> 
> Why those are not considered a dead stores and DCEed out earlier?

dead clobbers you mean?  Well, they are only "dead" if there are no
uses/defs of its LHS dominating them.