From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30958 invoked by alias); 5 Apr 2002 23:26:01 -0000 Mailing-List: contact gcc-prs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-prs-owner@gcc.gnu.org Received: (qmail 30934 invoked by uid 71); 5 Apr 2002 23:26:01 -0000 Date: Fri, 05 Apr 2002 15:26:00 -0000 Message-ID: <20020405232601.30933.qmail@sources.redhat.com> To: jason@gcc.gnu.org Cc: gcc-prs@gcc.gnu.org, From: Jason Merrill Subject: Re: c++/5504: Optimization breaks wei-ku-1 from blitz Reply-To: Jason Merrill X-SW-Source: 2002-04/txt/msg00393.txt.bz2 List-Id: The following reply was made to PR c++/5504; it has been noted by GNATS. From: Jason Merrill To: gcc-gnats@gcc.gnu.org Cc: Subject: Re: c++/5504: Optimization breaks wei-ku-1 from blitz Date: Sat, 06 Apr 2002 00:22:53 +0100 This seems to be a problem with combinatorial explosion of EH regions. The explosion itself is a necessary consequence of the use of expression templates, which produces a lot of little recursive functions inlined into one another, ideally optimizing out to very simple code. In this case, since IndexPlaceholder has a destructor, all of the expression template classes parameterized on it need to call it. As the expression (A = 11111 + ...) grows, the classes involved get more and more complicated. We used to handle cleanups in destructors by manually calling the base and member destructors from finish_function. My patch changed this to use normal cleanups for destroying bases and members, in order to get proper semantics if the destructor itself or one of the subobject destructors throw; previously we would have skipped the other subobject destructors. This change is necessary for correctness. But as a result, we now have a lot more EH regions than we used to. Counting calls to expand_eh_region_end_cleanup, I see: With one term, 67; two, 379; three, 1657; four, 6779. It's not entirely clear to me that this explosion is due to the nature of expression templates, but I think it is. And the vast majority of these aren't needed; the function ends up with 43 entries in the action table. For the four term expression, the maximal expression template class looks like _bz_ArrayExpr , blitz::_bz_ArrayExpr , blitz::IndexPlaceholder<0>, blitz::Multiply > >, blitz::Add > >, blitz::_bz_ArrayExpr , blitz::IndexPlaceholder<1>, blitz::Multiply > >, blitz::Add > >, blitz::_bz_ArrayExpr , blitz::IndexPlaceholder<2>, blitz::Multiply > >, blitz::Add > > which itself involves 15 destructor calls, for the IndexPlaceholder members and all the types that (perhaps indirectly) contain one. Each of these is expanded twice, once for the normal flow case, and once for the EH case. If fully inlined, destroying one temporary of this type adds 30 exception regions to the list. And such temporaries are created every time the expression object is passed by value, as it uniformly is. Unfortunately, the backend doesn't seem to be prepared to handle this number of EH regions; many of the data structures can only be searched linearly. For the four-term case, gprof shows that we're spending the vast majority of our time traversing them: % cumulative self self total time seconds seconds calls ms/call ms/call name 37.35 211.09 211.09 110832 1.90 2.32 maybe_remove_eh_handler 26.59 361.36 150.27 344409 0.44 0.44 in_expr_list_p 10.00 417.89 56.53 305399 0.19 0.19 next_nonnote_insn 6.59 455.11 37.22 28896 1.29 1.29 remove_exception_handler_label ... where the time in in_expr_list_p comes from the call in can_delete_label_p to see if a label is in exception_handler_labels. with -O0 -finline, it looks a bit different: 28.70 106.27 106.27 100111 1.06 1.30 maybe_remove_eh_handler 20.71 182.94 76.67 306819 0.25 0.25 in_expr_list_p 12.38 228.77 45.83 92573 0.50 0.50 pop_temp_slots 7.12 255.12 26.35 56221 0.47 0.47 free_temp_slots 5.03 273.73 18.61 28872 0.64 0.64 remove_exception_handler_label Here again we're running into linear traversals of lists that got larger than we ever expected, but this time it's the list of temp slots. I'm not sure what to do about this. We need to support this sort of code; this is a very common C++ programming idiom. Obvious ways to improve performance would be: Adjust the EH data structures so that we can do more efficient searches in them. Try to avoid creating EH regions that will just be deleted again. If nothing in the region can throw, we can discard it at expand_eh_region_end time. For 3.1, one option would be to go back to doing destructor cleanups only on the normal flow path; this would break EH semantics again for this case, but that would not be a regression and I don't think I've ever seen a bug report about it in past releases. Thoughts? Jason