From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id E86BB3858401 for ; Wed, 10 Aug 2022 16:08:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E86BB3858401 Received: from mail-oi1-f199.google.com (mail-oi1-f199.google.com [209.85.167.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-226-LnqwbGaAPLSXpPSsODA09w-1; Wed, 10 Aug 2022 12:08:28 -0400 X-MC-Unique: LnqwbGaAPLSXpPSsODA09w-1 Received: by mail-oi1-f199.google.com with SMTP id k8-20020a0568080e8800b0033b08e0490bso6258945oil.19 for ; Wed, 10 Aug 2022 09:08:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=2qmsvdfAWn2f8B6Gd0Si8unO6NPP1RBsID9g4PJo6LE=; b=MWTgJSHLz3LELTAatsGI4fUnOuH7+AaGGGWBxlqLZofdH1jMGtTWbEKRNX0ej/XL0B XwlD1zAmha719qy2qlG3gFu63X1PkNTv4PIUpwwgSj9O2/1ylupSqan7k/r6fmXgp6Qo 60IYoYZx/Y6bp901S/GHTdPb7VHYL3WCZtHqumGo8O2CsSUqh9opi7GHpQ+y8F+poGKJ TTRqNPciHx1WtVG7ssxpnOfv2zR+nh3XUAJCKawY7L8TVZgSL/BUEACYxBYWR7ljdvdh w3jlXSkCyktDAkdG8NF+vtA0M1h8jFpE2v4i+XVdRc4saLO6x7iVOfkA3FT+LId2oiSx RHcw== X-Gm-Message-State: ACgBeo3S0sorwZ2lzoWm+w3a8RIMquN79+S+EhTI5AIRWv0MhbQNb9mm Bi5Boqi4unzreRXa+KkRWFdnU7uu9E0ngCPa4rMA445hSS20qruZQ1/RiVms2aBf6afobvQ489p b/rDrtWKzG9gRRxKxAD1fOXYgB8sZXJ/iAQ== X-Received: by 2002:a05:6870:b617:b0:10d:f7ce:50df with SMTP id cm23-20020a056870b61700b0010df7ce50dfmr1794777oab.36.1660147705827; Wed, 10 Aug 2022 09:08:25 -0700 (PDT) X-Google-Smtp-Source: AA6agR4fssn+9j9qaSAvQKRbNUL3nKI0R9hFItmtntwrd+c7sAWlh395pDQhBUAWmkExpVhdlVCVdQq8BmE9TOZec38= X-Received: by 2002:a05:6870:b617:b0:10d:f7ce:50df with SMTP id cm23-20020a056870b61700b0010df7ce50dfmr1794760oab.36.1660147705516; Wed, 10 Aug 2022 09:08:25 -0700 (PDT) MIME-Version: 1.0 References: <37255.122080909012202281@us-mta-359.us.mimecast.lan> <55ed3acb-ac0c-d7b7-0e9b-58591cf992a2@redhat.com> In-Reply-To: From: Aldy Hernandez Date: Wed, 10 Aug 2022 18:08:12 +0200 Message-ID: Subject: Re: [PATCH] tree-optimization/106514 - revisit m_import compute in backward threading To: Richard Biener Cc: Andrew MacLeod , gcc-patches X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Aug 2022 16:08:34 -0000 On Wed, Aug 10, 2022 at 12:46 PM Richard Biener wrote: > > On Wed, 10 Aug 2022, Richard Biener wrote: > > > On Tue, 9 Aug 2022, Andrew MacLeod wrote: > > > > > > > > On 8/9/22 09:01, Richard Biener wrote: > > > > This revisits how we compute imports later used for the ranger path > > > > query during backwards threading. The compute_imports function > > > > of the path solver ends up pulling the SSA def chain of regular > > > > stmts without limit and since it starts with just the gori imports > > > > of the path exit it misses some interesting names to translate > > > > during path discovery. In fact with a still empty path this > > > > compute_imports function looks like not the correct tool. > > > > > > I don't really know how this works in practice. Aldys off this week, so he > > > can comment when he returns. > > > > > > The original premise was along the line of recognizing that only changes to a > > > GORI import name to a block can affect the branch at the end of the block. > > > ie, if the path doesn't change any import to block A, then the branch at the > > > end of block A will not change either. Likewise, if it does change an > > > import, then we look at whether the branch can be threaded. Beyond that > > > basic premise, I dont know what all it does. > > > > Yep, I also think that's the idea. > > [...] > > > Anyway, the important result of the change is that the imports > > set is vastly smaller since it is now constrained to the > > actual path. Plus it now contains the local defs in blocks > > of the path that do not dominate the exit block which means > > it might get more threading opportunities. Which reminds me > > to re-do the cc1files experiment for this change. > > Results are a bit unconclusive. We have > ethread 14044 -> 14058, first threadfull 36986 -> 37174, > first thread 8060 -> 8049, dom 21723 -> 21822, second thread > (after loop opts) 6242 -> 5582, second dom 8522 -> 8765, > second threadfull 3072 -> 2998 which makes an overall drop > in the number of threads from 98649 to 98448. The threading dance between all passes is a bit fickle as you can see. Particularly, DOM is a red herring. If it starts firing up, as it looks like from your numbers, it's usually because we regressed in the backward threaders and DOM is picking up the slack. Though a slightly increase in DOM threading can be because we opened up more opportunities for DOM because of better threading in the backward threaders. Sometimes it helps to run without DOM (or disable DOM threading) to compare apples with apples...being careful of course, that you're not dropping a bunch of threads in thread2 for instance, and picking them up in threadfull2 or whatever. Ughhh...I really hate that we have 20 million threading passes. > > I've isolated one testcase we threaded before but no longer after > this change and that is > > void foo (int nest, int print_nest) > { > _Bool t0 = nest != 0; > _Bool t1 = nest == print_nest; > _Bool t2 = t0 & t1; > if (t2) > __builtin_puts ("x"); > nest++; > if (nest > 2) > __builtin_abort (); > if (print_nest == nest) > __builtin_puts ("y"); > } > > where we are able to thread from if (t2) to if (print_nest == nest) > resolving that to false when t2 is true using the nest == print_nest > relation. Now, the reason is because of the imports added by > > // Exported booleans along the path, may help conditionals. > if (m_resolve) > for (i = 0; i < m_path.length (); ++i) > { > basic_block bb = m_path[i]; > tree name; > FOR_EACH_GORI_EXPORT_NAME (gori, bb, name) > if (TREE_CODE (TREE_TYPE (name)) == BOOLEAN_TYPE) > bitmap_set_bit (imports, SSA_NAME_VERSION (name)); > } > > _BUT_ - it turns out that the m_path local to the solver is still > the one from the last back_threader::find_taken_edge_switch > calling m_solver->compute_ranges (path, m_imports), so for a possibly > completely unrelated path. Truncating the path also on the solver > side before calling compute_imports makes the testcase fail to thread. > I'll note the PHI handling also looks at the stale m_path. Not good. Thanks for noticing. > > I see the solver itself adds relations from edges on the path so > the cruical item here seems to be to add imports for the path > entry conditional, but those would likely be GORI imports for that > block? Unfortunately that fails to add t[012], the GORI exports > seem to cover all that's needed but then exports might be too much > here? At least that's what the code in compute_imports effectively > does (also for the entry block). But I do wonder why compute_ranges > does not add relations computed by the entry conditional ... > That's probably because of > > void > path_range_query::compute_outgoing_relations (basic_block bb, basic_block > next) > { > gimple *stmt = last_stmt (bb); > > if (stmt > && gimple_code (stmt) == GIMPLE_COND > && (import_p (gimple_cond_lhs (stmt)) > || import_p (gimple_cond_rhs (stmt)))) > > where for a condition like above the names in the imports are not > in the condition itself. The gori.outgoing_edge_range_p compute > of the import ranges is appearantly not affected by this > restriction and picks up nest as ~[0, 0] on the respective path > even though, while 'nest' is in imports, the temporaries are not. > That would support the argument to drop the import_p checks in > path_range_query::compute_outgoing_relations. Perhaps Andrew can better answer your questions on GORI imports / exports here and elsewhere. > > I'm not sure it's worth fixing incrementally though, it's fallout > of r12-5157-gbfa04d0ec958eb that in some way did the reverse of > my patch. Interestingly hybrid_jt_simplifier::compute_ranges_from_state > still computes imports itself before calling compute_ranges. Hmmm, compute_ranges_from_state(), which is the interface from the forward threader into the path solver probably shouldn't be rolling its own. It looks like a poor man's calculation of imports. It's just the GORI imports to the final conditional and the SSA operands to the final conditional (in case GORI didn't have it in the list of imports??). This may have been there originally because the calculation of imports was in the backward threader, and the forward threader was doing its own thing before calling the path solver. ISTM, both the forward and backward threader should use path_range_query::compute_imports. In practice, it probably didn't matter because the forward threader couldn't feed the path solver long or complex enough paths for it to make a difference in the thread count. Aldy > > For comparing thread numbers fixing the current state incrementally > would be nice I guess, I will test something but not further pursue it > if not successful. > > Richard.