From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 33D273858003 for ; Wed, 1 Dec 2021 16:18:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 33D273858003 Received: from mail-oi1-f197.google.com (mail-oi1-f197.google.com [209.85.167.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-538-COUWy4ndOT2QbhxTPw6hKw-1; Wed, 01 Dec 2021 11:18:32 -0500 X-MC-Unique: COUWy4ndOT2QbhxTPw6hKw-1 Received: by mail-oi1-f197.google.com with SMTP id r15-20020acaa80f000000b002bcc50ca40dso16588343oie.5 for ; Wed, 01 Dec 2021 08:18:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=7MTrETb95qjn0SCV6ljWx2Uq5rvxE3abUa5zAXrYxJs=; b=eMQu9ecFlKCuel4WZ6dEJ35GuOg8TS+zZ1vcQDCrwfZvPFtyp8NTqNBpvu0ZpGw3Mr jjv5uGLuV5jwkOvJ76rTCYaFI/4hBeNoA76VS/HWFcYo1cOZAYw1lYs98U4a4080dXxt Ou5p/kkqz0ELC9pnvq18DK8r2qPxPbcLLnna5TmEfwnnraByHjxGLlKPLx4dByWlTiWl DMGHWlLF0XimO+02y5OWCp+D4s/D4m2UA6mwcSh6GI4FukiyY+J9UIyLdE22I9KRXwlr siNuHo/u1EGuU7Q6U6jhDRwk/to3qgaSbpbVDe96STwB7joJ3cn9XLXsjJ3j0ekE2wsK p9gw== X-Gm-Message-State: AOAM533pyTsmJJ+Q9lsX70mKtN0IMhArBtIktKQiqeOsk8WlS+3tvCVJ kaf8j/YvAFbQPeDIuqE9CWxdCrodqqAmSS+ua8wHsPYmlVOXNo5Q+cJUQrkrw4h+UtbqsQAb8pe gDsTJmkODw3BpLaYe3pPRcS2UC8D6TzJVeQ== X-Received: by 2002:aca:2207:: with SMTP id b7mr7155157oic.24.1638375511002; Wed, 01 Dec 2021 08:18:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJxl8ds73Fo3YNd/8oABsdfPj2zlFhsWNmH14oZh7ClhV7RFpcnq1/0Xgy6tUGWDuo1UGyxZlNq6ghLywVoWZu8= X-Received: by 2002:aca:2207:: with SMTP id b7mr7155130oic.24.1638375510698; Wed, 01 Dec 2021 08:18:30 -0800 (PST) MIME-Version: 1.0 References: <20211127083206.2450093-1-aldyh@redhat.com> In-Reply-To: <20211127083206.2450093-1-aldyh@redhat.com> From: Aldy Hernandez Date: Wed, 1 Dec 2021 17:18:20 +0100 Message-ID: Subject: Re: [PATCH] path solver: Minimize exported ranges to subsequent blocks. To: GCC patches X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Dec 2021 16:18:35 -0000 I'm going to hold off pushing this until I hear from the global maintainers and/or release managers. This patch fixes a pathological performance case, but it may also be handled by Andrew's fixes in this area. It's up to y'all to decide if it's something we want in this release. An alternative would be to keep track of any pathological issues going forward, and see if this fixes the issue. I'm going to be taking leave any day now :). Aldy On Sat, Nov 27, 2021 at 9:32 AM Aldy Hernandez wrote: > > Currently we have a set of interesting names in the path that we solve > for. However, solving *all* of them on exit from each block is > unnecessary, even if an edge range is available. We can reduce the > queried edges by only solving SSA names from the current block > that will be needed in subsequent blocks. > > With this patch we avoid the explosion in PR103254, without the need for > Andrew's patch (which is still independently needed). > > In my testbed we loose 6 threads in the non-resolving threading passes, > but the overall thread count is the same, as we pick them up later. We > could probably try harder to get them, but it's probably not worth the > effort. > > Benchmarking this patch shows no difference in overall compilation speed > and a tiny slowdown (< 0.80%) for some of the threading passes. > Benchmarked for both -O2, -O3, and -O2 --param ranger-logical-depth=10. > > I'm unsure what the review requirements are in stage3, so... > > OK for trunk? > > p.s. I saw some cleanups that could be done (intersecting more bitmaps, > putting the rest of the bitmaps in the new obstack, etc etc), but I > avoided doing them to minimize the churn in stage3. > > Regstrapped on x86-64 & ppc64le Linux. > > PR tree-optimization/103254 > > gcc/ChangeLog: > > * gimple-range-path.cc (path_range_query::path_range_query): > Initialize obstack. > (path_range_query::~path_range_query): Release obstack. > (path_range_query::needs_on_entry_for_block): New. > (path_range_query::compute_needs_on_entry_set): New. > (path_range_query::compute_ranges_in_block): Only calculate > outgoing edges for m_needs_on_entry set. > (path_range_query::compute_ranges): Call > compute_needs_on_entry_set. > * gimple-range-path.h: Add needs_on_entry_for_block, > compute_needs_on_entry_set, m_bitmaps, and m_needs_on_entry. > > gcc/testsuite/ChangeLog: > > * gcc.dg/pr103254.c: Add -fdump-tree-vrp2-details. > * gcc.dg/pr103254-2.c: New test. > --- > gcc/gimple-range-path.cc | 67 +++++++++++++++++++++++++++++-- > gcc/gimple-range-path.h | 4 ++ > gcc/testsuite/gcc.dg/pr103254-2.c | 29 +++++++++++++ > gcc/testsuite/gcc.dg/pr103254.c | 6 ++- > 4 files changed, 102 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/pr103254-2.c > > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc > index b9c71226c1c..32226df05b5 100644 > --- a/gcc/gimple-range-path.cc > +++ b/gcc/gimple-range-path.cc > @@ -48,6 +48,7 @@ path_range_query::path_range_query (bool resolve, gimple_ranger *ranger) > m_ranger = ranger; > > m_oracle = new path_oracle (m_ranger->oracle ()); > + bitmap_obstack_initialize (&m_bitmaps); > } > > path_range_query::~path_range_query () > @@ -57,6 +58,7 @@ path_range_query::~path_range_query () > delete m_ranger; > BITMAP_FREE (m_has_cache_entry); > delete m_cache; > + bitmap_obstack_release (&m_bitmaps); > } > > // Return TRUE if NAME is in the import bitmap. > @@ -401,6 +403,62 @@ path_range_query::compute_ranges_in_phis (basic_block bb) > } > } > > +// Return the set of SSA names that would be useful to have resolved > +// on entry to this block. These are the names that will ultimately > +// help resolve the final conditional on the path. > + > +void > +path_range_query::needs_on_entry_for_block (bitmap on_entry, unsigned bbn) > +{ > + gori_compute &g = m_ranger->gori (); > + basic_block next = m_path[bbn]; > + bitmap_copy (on_entry, g.imports (next)); > + > + // Any incoming PHI args are needed on entry, but are not in the > + // g.imports() above, so add them manually. > + for (auto iter = gsi_start_phis (next); !gsi_end_p (iter); gsi_next (&iter)) > + { > + gphi *phi = iter.phi (); > + tree result = gimple_phi_result (phi); > + > + if (bitmap_bit_p (m_imports, SSA_NAME_VERSION (result))) > + for (size_t i = 0; i < gimple_phi_num_args (phi); ++i) > + { > + tree arg = gimple_phi_arg (phi, i)->def; > + > + if (TREE_CODE (arg) == SSA_NAME > + && m_path.contains (gimple_phi_arg_edge (phi, i)->src)) > + bitmap_set_bit (on_entry, SSA_NAME_VERSION (arg)); > + } > + } > + bitmap_and_into (on_entry, m_imports); > +} > + > +// Compute the set of needed on-entry names for all path blocks. > + > +void > +path_range_query::compute_needs_on_entry_set () > +{ > + unsigned len = m_path.length (); > + > + if (len > m_needs_on_entry.length ()) > + m_needs_on_entry.safe_grow_cleared (len); > + > + // Iterate backwards through the path, excluding the entry block > + // since we'll be asking the ranger for incoming values into the path. > + for (unsigned i = 0; i + 1 < len; ++i) > + { > + if (!m_needs_on_entry[i]) > + m_needs_on_entry[i] = BITMAP_ALLOC (&m_bitmaps); > + > + needs_on_entry_for_block (m_needs_on_entry[i], i); > + > + // Include the on-entry names for the subsequent blocks. > + if (i > 0) > + bitmap_ior_into (m_needs_on_entry[i], m_needs_on_entry[i - 1]); > + } > +} > + > // Compute ranges defined in the current block, or exported to the > // next block. > > @@ -441,11 +499,12 @@ path_range_query::compute_ranges_in_block (basic_block bb) > // Solve imports that are exported to the next block. > basic_block next = next_bb (); > edge e = find_edge (bb, next); > - EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi) > + const_bitmap candidates = m_needs_on_entry[m_pos - 1]; > + gori_compute &g = m_ranger->gori (); > + bitmap exports = g.exports (bb); > + EXECUTE_IF_SET_IN_BITMAP (candidates, 0, i, bi) > { > tree name = ssa_name (i); > - gori_compute &g = m_ranger->gori (); > - bitmap exports = g.exports (bb); > > if (bitmap_bit_p (exports, i)) > { > @@ -602,6 +661,8 @@ path_range_query::compute_ranges (const vec &path, > else > compute_imports (m_imports, exit_bb ()); > > + compute_needs_on_entry_set (); > + > if (m_resolve) > get_path_oracle ()->reset_path (); > > diff --git a/gcc/gimple-range-path.h b/gcc/gimple-range-path.h > index 57a9ae9bdcd..f5601377209 100644 > --- a/gcc/gimple-range-path.h > +++ b/gcc/gimple-range-path.h > @@ -61,6 +61,8 @@ private: > void compute_ranges_in_phis (basic_block bb); > void adjust_for_non_null_uses (basic_block bb); > void ssa_range_in_phi (irange &r, gphi *phi); > + void needs_on_entry_for_block (bitmap interesting, unsigned bbn); > + void compute_needs_on_entry_set (); > void compute_outgoing_relations (basic_block bb, basic_block next); > void compute_phi_relations (basic_block bb, basic_block prev); > void maybe_register_phi_relation (gphi *, tree arg); > @@ -93,6 +95,8 @@ private: > auto_bitmap m_imports; > gimple_ranger *m_ranger; > non_null_ref m_non_null; > + bitmap_obstack m_bitmaps; > + auto_vec m_needs_on_entry; > > // Current path position. > unsigned m_pos; > diff --git a/gcc/testsuite/gcc.dg/pr103254-2.c b/gcc/testsuite/gcc.dg/pr103254-2.c > new file mode 100644 > index 00000000000..1956d90a563 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/pr103254-2.c > @@ -0,0 +1,29 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 --param ranger-logical-depth=99" } */ > +/* { dg-timeout 10 } */ > + > +/* This is the same test as pr103254.c, but making sure the threader > + succeeds regardless of the depth limit, since it should avoid > + asking for unnecessary edges. */ > + > +short int i; > + > +void > +foo (void) > +{ > + for (i = 1; i < 2; i += 4) > + { > + int j; > + > + for (j = 0; j < 5; j += 4) > + { > + int k; > + > + for (k = 0; k < 68; k += 4) > + { > + i &= j; > + j &= i; > + } > + } > + } > +} > diff --git a/gcc/testsuite/gcc.dg/pr103254.c b/gcc/testsuite/gcc.dg/pr103254.c > index 62d2415f216..9d6828a6771 100644 > --- a/gcc/testsuite/gcc.dg/pr103254.c > +++ b/gcc/testsuite/gcc.dg/pr103254.c > @@ -1,7 +1,11 @@ > /* { dg-do compile } */ > -/* { dg-options "-O3" } */ > +/* { dg-options "-O3 -fdump-tree-vrp2-details" } */ > /* { dg-timeout 10 } */ > > +/* The vrp2 dump above is needed to test that ranger can dump all the > + known ranges in the IL and not blow up, which would happen for > + --param ranger-logical-depth=99. */ > + > short int i; > > void > -- > 2.31.1 >