From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 59659 invoked by alias); 24 Mar 2015 14:30:18 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 58672 invoked by uid 48); 24 Mar 2015 14:30:07 -0000 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug ipa/65076] [5 Regression] 16% tramp3d-v4.cpp compile time regression Date: Tue, 24 Mar 2015 14:56:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: ipa X-Bugzilla-Version: 5.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 5.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-03/txt/msg02586.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65076 --- Comment #13 from Richard Biener --- (In reply to Jan Hubicka from comment #10) > I can re-confirm the 16% compile time regression. I went through some > compare. > > $ wc -l *.ssa > 299231 tramp3d-v4.ii.015t.ssa > $ wc -l ../5/*.ssa > 331115 ../5/tramp3d-v4.ii.018t.ssa > > so as a lame compare, we already have 10% more statements to start with. > Now einline > > $ wc -l *.einline > 692812 tramp3d-v4.ii.018t.einline > $ wc -l ../5/*.einline > 724090 ../5/tramp3d-v4.ii.026t.einline > > so after einline we seem to have 4% statements more, we do about the same > number of inlining: > > $ grep Inlining tramp3d-v4.ii.*einline | wc -l > 28003 > $ grep Inlining ../5/tramp3d-v4.ii.*einline | wc -l > 28685 > > but at release_ssa we still have about 4% more. > > $ wc -l *release_ssa* > 348378 tramp3d-v4.ii.036t.release_ssa > $ wc -l ../5/*release_ssa* > 365689 ../5/tramp3d-v4.ii.043t.release_ssa > > There is no difference in number of functions in ssa and release_ssa dumps. > What makes the functions bigger in GCC 5? > > $ grep "^ .* = " *.release_ssa | wc -l > 65028 > $ grep "^ .* = " ../5/*.release_ssa | wc -l > 72636 > > The number of statements is about the same. > > During the actual inlining GCC 4.9 reports: > Unit growth for small function inlining: 88536->114049 (28%) > and > Unit growth for small function inlining: 87943->97699 (11%) > > Statement count seems to remain 7% in .optimized dumps. So perhaps the > slowdown is not really that much caused by IPA passes as we somehow manage > to produce more code out of C++ FE. > > I looked for interesting differences in SSA dump. Here are few: > > -;; Function int __gthread_active_p() (_ZL18__gthread_active_pv, > funcdef_no=312, decl_uid=8436, symbol_order=127) > +;; Function int __gthread_active_p() (_ZL18__gthread_active_pv, > funcdef_no=312, decl_uid=8537, cgraph_uid=127, symbol_order=127) > > int __gthread_active_p() () > { > - bool _1; > - int _2; > + static void * const __gthread_active_ptr = (void *) > __gthrw_pthread_cancel; > + void * __gthread_active_ptr.111_2; > + bool _3; > + int _4; > > : > - _1 = __gthrw_pthread_cancel != 0B; > - _2 = (int) _1; > - return _2; > + __gthread_active_ptr.111_2 = __gthread_active_ptr; > + _3 = __gthread_active_ptr.111_2 != 0B; > + _4 = (int) _3; > + return _4; > > } > > ... this looks like header change, perhaps ... Yep. __gthrw_pthread_cancel is a function pointer (thsu constant) while __gthread_active_ptr is a global variable. > ObserverEvent::~ObserverEvent() (struct ObserverEvent * const this) > { > - int _6; > + int (*__vtbl_ptr_type) () * _2; > + int _7; > > : > - this_3(D)->_vptr.ObserverEvent = &MEM[(void *)&_ZTV13ObserverEvent + 16B]; > - *this_3(D) ={v} {CLOBBER}; > - _6 = 0; > - if (_6 != 0) > + _2 = &_ZTV13ObserverEvent + 16; > + this_4(D)->_vptr.ObserverEvent = _2; > + MEM[(struct &)this_4(D)] ={v} {CLOBBER}; > + _7 = 0; > + if (_7 != 0) > > ... extra temporary initializing vtbl pointer. This is repeated many times > ... This is because of 2015-03-20 Richard Biener PR middle-end/64715 ... * gimplify.c (gimplify_expr): Remove premature folding of &X + CST to &MEM[&X, CST]. thus relatively recent. It will be fixed up by ccp1. > -;; Function static Unique::Value_t Unique::get() (_ZN6Unique3getEv, > funcdef_no=3030, decl_uid=51649, symbol_order=884) > +;; Function static Unique::Value_t Unique::get() (_ZN6Unique3getEv, > funcdef_no=3030, decl_uid=51730, cgraph_uid=883, symbol_order=884) > > static Unique::Value_t Unique::get() () > { > Value_t retval; > - long int next_s.83_2; > - long int next_s.84_3; > - long int next_s.85_4; > - Value_t _7; > + long int next_s.83_3; > + long int next_s.84_4; > + long int next_s.85_5; > + Value_t _9; > > : > - Pooma::DummyMutex::_ZN5Pooma10DummyMutex4lockEv.isra.26 (); > - next_s.83_2 = next_s; > - next_s.84_3 = next_s.83_2; > - next_s.85_4 = next_s.84_3 + 1; > - next_s = next_s.85_4; > - retval_6 = next_s.84_3; > - Pooma::DummyMutex::_ZN5Pooma10DummyMutex6unlockEv.isra.27 (); > - _7 = retval_6; > - return _7; > + Pooma::DummyMutex::lock (&mutex_s); > + next_s.83_3 = next_s; > + next_s.84_4 = next_s.83_3; > + next_s.85_5 = next_s.84_4 + 1; > + next_s = next_s.85_5; > + retval_7 = next_s.84_4; > + Pooma::DummyMutex::unlock (&mutex_s); > + _9 = retval_7; > + return _9; > > } > > ... here we give up on ISRA.... I believe because of 2015-02-13 Ilya Enkovich PR tree-optimization/65002 * tree-cfg.c (pass_data_fixup_cfg): Don't update SSA on start. * tree-sra.c (some_callers_have_no_vuse_p): New. (ipa_early_sra): Reject functions whose callers assume function is read only. or related changes. > and we have about twice as much EH: > > $ grep "resx " tramp3d-v4.ii.*\.ssa | wc -l > 4816 > $ grep "resx " ../5/tramp3d-v4.ii.*\.ssa | wc -l > 8671 > > which however is optimized out at a time of release_ssa. That's maybe because we emit more CLOBBERs initially (do we?) > Another thing that we may consider to cleanup in next stage1 is to get rid > of dead stores: > > - MEM[(struct new_allocator *)&D.561702] ={v} {CLOBBER}; > - D.561702 ={v} {CLOBBER}; > - D.561702 ={v} {CLOBBER}; > - MEM[(struct new_allocator *)_2] ={v} {CLOBBER}; > - MEM[(struct allocator *)_2] ={v} {CLOBBER}; > - MEM[(struct _Alloc_hider *)_2] ={v} {CLOBBER}; > - MEM[(struct basic_string *)_2] ={v} {CLOBBER}; > - *_2 ={v} {CLOBBER}; > - *this_1(D) ={v} {CLOBBER}; > + MEM[(struct &)&D.570046] ={v} {CLOBBER}; > + MEM[(struct &)&D.570046] ={v} {CLOBBER}; > + D.570046 ={v} {CLOBBER}; > + MEM[(struct &)_2] ={v} {CLOBBER}; > + MEM[(struct &)_2] ={v} {CLOBBER}; > + MEM[(struct &)_2] ={v} {CLOBBER}; > + MEM[(struct &)_2] ={v} {CLOBBER}; > + MEM[(struct &)_2] ={v} {CLOBBER}; > + MEM[(struct &)this_1(D)] ={v} {CLOBBER}; > > Clobbers are dangerously common. There are 18K clobbers in release_ssa dump > out of 65K assignments, that makes them to be 29% of all the code. The > number of clobbers seems to go down only in tramp3d-v4.ii.166t.ehcleanup > dump and we still get a lot of redundancies: Yeah, well ... :/ I've already taught DCE to get rid of the really useless ones... > : > > D.581063 ={v} {CLOBBER}; > > D.581063 ={v} {CLOBBER}; > > D.164155 ={v} {CLOBBER}; > > D.164155 ={v} {CLOBBER}; > > operator delete [] (begbuf_18); > > > Why those are not considered a dead stores and DCEed out earlier? dead clobbers you mean? Well, they are only "dead" if there are no uses/defs of its LHS dominating them.