From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 64833 invoked by alias); 19 Oct 2018 13:57:30 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 64402 invoked by uid 89); 19 Oct 2018 13:57:29 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS autolearn=ham version=3.3.2 spammy=price, !head X-HELO: mx1.suse.de Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 19 Oct 2018 13:57:28 +0000 Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id EDAEBAD70; Fri, 19 Oct 2018 13:57:25 +0000 (UTC) Date: Fri, 19 Oct 2018 14:32:00 -0000 From: Richard Biener To: Richard Sandiford cc: gcc-patches@gcc.gnu.org, stevenb.gcc@gmail.com Subject: Re: [PATCH] Add splay-tree "view" for bitmap In-Reply-To: Message-ID: References: <877eif8ih9.fsf@arm.com> <87woqf55k3.fsf@arm.com> <74926395-0C13-4048-B9F0-62F44408D7B1@suse.de> <87sh125ro1.fsf@arm.com> User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 Content-Type: multipart/mixed; BOUNDARY="-1609908220-92830863-1539957445=:4374" X-SW-Source: 2018-10/txt/msg01198.txt.bz2 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1609908220-92830863-1539957445=:4374 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Content-length: 5203 On Fri, 19 Oct 2018, Richard Biener wrote: > On Fri, 19 Oct 2018, Richard Sandiford wrote: > > > Richard Biener writes: > > > On October 18, 2018 11:05:32 PM GMT+02:00, Richard Sandiford > > > wrote: > > >>Richard Biener writes: > > >>> On Thu, 18 Oct 2018, Richard Sandiford wrote: > > >>>> What's the performance like for more reasonable testcases? E.g. is > > >>there > > >>>> a measurable change either way in --enable-checking=release for some > > >>gcc > > >>>> .iis at -g or -O2 -g? > > >>> > > >>> I did a quick check on my set of cc1files (still .i, .ii ones tend to > > >>> be unusable quite quickly...). Most of them compile too quickly > > >>> to make any difference appear other than noise. Multi-second > > >>differences > > >>> like for PR63155 should be the exception but our O(n) bitmap > > >>> implementation really makes some parts of GCC quadratic where it > > >>> doesn't appear so. > > >>> > > >>> Is there a reason you expect it to be ever slower? > > >> > > >>During recent stage3s I've tried to look at profiles of cc1plus > > >>to see whether there was something easy we could do to improve > > >>compile times. And bitmap operations always showed up near the > > >>top of the profile. There were always some pathological queries > > >>in which the linear search really hurt, but whenever I tried "simple" > > >>ways to avoid the obvious cases, they made those queries quicker > > >>but slowed things down overall. It seemed that adding any extra logic > > >>to the queries hurted because even a small slowdown in common lookups > > >>overwhelmed a big saving in less common lookups. > > > > > > Yeah. I also noticed some 'obvious' shortcomings in the heuristics... > > > I guess in the end well predicted branches in the out of line code are > > > important... > > > > > >> > > >>But there again I was looking to speed up more "normal" cases, not ones > > >>like the PR. > > >> > > >>FWIW I've tried it on a local x86_64 box and it slows down current > > >>optabs.ii at -O2 -g by ~0.35% (reproducable). I see similar slowdowns > > >>for the other files I tried. But that's hardly a huge amount, and > > >>probably a price worth paying for the speed-up in the PR. > > > > > > Just to make sure what to reproduce - this is with checking disabled? > > > And with or without the hunks actually making use of the splay tree > > > path? > > > > Yeah, with an --enable-checking=release stage3: > > > > ./cc1plus optabs.ii -o /dev/null -O2 -g > > > > using the optabs.ii from the unpatched --enable-checking=release build. > > > > It was the whole patch vs. without the patch. > > OK, so there are almost no hits from the SSA propagator or out-of-SSA > but only from "unchanged" paths: > > - 2.90% 2.90% 23 cc1plus cc1plus [.] > bitmap_set_b▒ > - bitmap_set_bit > ▒ > + 0.79% df_lr_bb_local_compute > ▒ > + 0.38% insert_updated_phi_nodes_for > ▒ > + 0.27% sched_analyze_reg > ▒ > + 0.23% walk_aliased_vdefs_1 > ▒ > + 0.13% get_continuation_for_phi > ▒ > + 0.13% add_control_edge > ▒ > + 0.13% df_md_bb_local_compute_process_def > ▒ > + 0.13% mark_for_renaming > ▒ > + 0.13% (anonymous namespace)::pass_rtl_cprop::execute > ▒ > + 0.13% compute_dominance_frontiers > ▒ > + 0.12% df_simulate_initialize_backwards > > it's also interesting that most branch misses (for bitmap_set_bit) > are from > > bool res = (ptr->bits[word_num] & bit_val) == 0; > if (res) > ptr->bits[word_num] |= bit_val; > return res; > > I'm not sure how "bad" a mispredicted branch is here. I guess > if it is predicted to be taken (skip the write) then it's bad > but if it is predicted the other way around it should be a matter > of not retiring the store... But I am not a CPU guy. I guess > unconditionally updating the memory wouldn't be so bad after all > and it might also help combine in using architecture specific > optimizations like using some CC flags of the OR operation > to get at the comparison result. Can you benchmark a difference > for this? So I can also see mispredicts on if (!head->tree_form) element = bitmap_list_find_element (head, indx, usage); else element = bitmap_tree_find_element (head, indx); which I could significantly reduce via re-ordering (thus w/o any branch history this currently gets predicted to the tree variant). That suggests to instead of the above runtime checks do bitmap_tree_bit_p / bitmap_tree_set_bit routines (or do tricks with C++ and overloading via a tbitmap type). Would you be more happy with that approach? Thanks, Richard. ---1609908220-92830863-1539957445=:4374--