From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 8E74E3861018 for ; Tue, 3 Nov 2020 13:49:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 8E74E3861018 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 0A3DnMhR082485; Tue, 3 Nov 2020 08:49:40 -0500 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 34k86crhc4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 03 Nov 2020 08:49:40 -0500 Received: from m0098393.ppops.net (m0098393.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 0A3DnRHp083125; Tue, 3 Nov 2020 08:49:40 -0500 Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com with ESMTP id 34k86crhan-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 03 Nov 2020 08:49:39 -0500 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 0A3DlQbj024988; Tue, 3 Nov 2020 13:49:37 GMT Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by ppma03ams.nl.ibm.com with ESMTP id 34hm6hahhf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 03 Nov 2020 13:49:37 +0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 0A3DnZiU57475386 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 3 Nov 2020 13:49:35 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 30FA24C046; Tue, 3 Nov 2020 13:49:35 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CABD94C050; Tue, 3 Nov 2020 13:49:34 +0000 (GMT) Received: from sig-9-145-1-210.uk.ibm.com (unknown [9.145.1.210]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 3 Nov 2020 13:49:34 +0000 (GMT) Message-ID: <8517a4a8ddf315fbc8bdd01dc9a690797e8fbee4.camel@linux.ibm.com> Subject: Re: Incremental updating of SSA_NAMEs that are passed to b_c_p From: Ilya Leoshkevich To: Richard Biener Cc: GCC Development , Jeff Law , Marc Glisse , Jakub Jelinek Date: Tue, 03 Nov 2020 14:49:34 +0100 In-Reply-To: References: <4ce44dda24d28279156422c10d09d64a575f1f8c.camel@linux.ibm.com> <132ef4118088160843af2556cb64e499f1aae3e3.camel@linux.ibm.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.34.4 (3.34.4-1.fc31) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.312, 18.0.737 definitions=2020-11-03_08:2020-11-03, 2020-11-03 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 clxscore=1015 suspectscore=0 impostorscore=0 phishscore=0 priorityscore=1501 malwarescore=0 adultscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2011030089 X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Nov 2020 13:49:43 -0000 On Fri, 2020-10-30 at 09:22 +0100, Richard Biener wrote: > On Thu, Oct 29, 2020 at 6:20 PM Ilya Leoshkevich > wrote: > > On Wed, 2020-10-28 at 12:18 +0100, Richard Biener wrote: > > > On Tue, Oct 27, 2020 at 7:36 PM Ilya Leoshkevich via Gcc > > > wrote: > > > > Hi, > > > > > > > > I'd like to revive the old discussion regarding the interaction > > > > of > > > > jump threading and b_c_p causing the latter to incorrectly > > > > return 1 > > > > in > > > > certain cases: > > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549288.html > > > > > > > > The conclusion was that this happening during threading is just > > > > a > > > > symptom of a deeper problem: SSA_NAMEs that are passed to b_c_p > > > > should > > > > not be registered for incremental updating. > > > > > > > > I performed a little experiment and added an assertion to > > > > create_new_def_for: > > > > > > > > --- a/gcc/tree-into-ssa.c > > > > +++ b/gcc/tree-into-ssa.c > > > > @@ -2996,6 +3014,8 @@ create_new_def_for (tree old_name, gimple > > > > *stmt, > > > > def_operand_p def) > > > > { > > > > tree new_name; > > > > > > > > + gcc_checking_assert (!used_by_bcp_p (old_name)); > > > > + > > > > timevar_push (TV_TREE_SSA_INCREMENTAL); > > > > > > > > if (!update_ssa_initialized_fn) > > > > > > > > This has of course fired when performing basic block > > > > duplication > > > > during > > > > threading, which can be fixed by avoiding duplication of basic > > > > blocks > > > > wi > > > > th b_c_p calls: > > > > > > > > --- a/gcc/tree-cfg.c > > > > +++ b/gcc/tree-cfg.c > > > > @@ -6224,7 +6224,8 @@ gimple_can_duplicate_bb_p > > > > (const_basic_block > > > > bb) > > > > || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT) > > > > || gimple_call_internal_p (g, > > > > IFN_GOMP_SIMT_VOTE_ANY) > > > > || gimple_call_internal_p (g, > > > > IFN_GOMP_SIMT_XCHG_BFLY) > > > > - || gimple_call_internal_p (g, > > > > IFN_GOMP_SIMT_XCHG_IDX))) > > > > + || gimple_call_internal_p (g, > > > > IFN_GOMP_SIMT_XCHG_IDX) > > > > + || gimple_call_builtin_p (g, > > > > BUILT_IN_CONSTANT_P))) > > > > return false; > > > > } > > > > > > > > The second occurrence is a bit more interesting: > > > > > > > > gimple * > > > > vrp_insert::build_assert_expr_for (tree cond, tree v) > > > > { > > > > ... > > > > a = build2 (ASSERT_EXPR, TREE_TYPE (v), v, cond); > > > > assertion = gimple_build_assign (NULL_TREE, a); > > > > ... > > > > tree new_def = create_new_def_for (v, assertion, NULL); > > > > > > > > The fix is also simple though: > > > > > > > > --- a/gcc/tree-vrp.c > > > > +++ b/gcc/tree-vrp.c > > > > @@ -3101,6 +3101,9 @@ vrp_insert::process_assert_insertions_for > > > > (tree > > > > name, assert_locus *loc) > > > > if (loc->expr == loc->val) > > > > return false; > > > > > > > > + if (used_by_bcp_p (name)) > > > > + return false; > > > > + > > > > cond = build2 (loc->comp_code, boolean_type_node, loc->expr, > > > > loc- > > > > > val); > > > > assert_stmt = build_assert_expr_for (cond, name); > > > > if (loc->e) > > > > > > > > My original testcase did not trigger anything else. I'm > > > > planning > > > > to > > > > check how this affects the testsuite, but before going further > > > > I > > > > would > > > > like to ask: is this the right direction now? To me it looks a > > > > little-bit more heavy-handed than the original approach, but > > > > maybe > > > > it's > > > > worth it. > > > > > > Disabling threading looks reasonable but I'd rather not disallow > > > BB > > > duplication > > > or renaming. For VRP I guess we want to instead change > > > register_edge_assert_for* to not register assertions for the > > > result > > > of > > > __builtin_constant_p rather than just not allowing VRP to process > > > it > > > (there are other consumers still). > > > > If I understood Jeff correctly, we should disable incremental > > updates > > for absolutely all b_c_p arguments, so affecting as many consumers > > as > > possible must actually be a good thing when this approach is > > considered? > > > > That said, regtest has revealed one more place where this is > > happening: > > rewrite_into_loop_closed_ssa_1 -> ... -> add_exit_phi -> > > create_new_def_for. The reduced code is: > > > > int a; > > > > void > > b (void) > > { > > while (a) > > __builtin_constant_p (a) ?: 0; > > if (a == 8) > > while (1) > > ; > > } > > > > I guess we are not allowed to cancel this transformation, becase we > > really need LCSSA for later passes? This now contradicts the > > "prohibit > > all updates" idea.. > > Yes. I believe this looks at the issue from a wrong angle. SSA > rewrite > is just renaming and that's OK. > > > If you think that disabling threading is reasonable, could you > > please > > have another look at the original patches? > > > > v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html > > v2: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549211.html > > > > Can anything like this go in after all? > > For v2 I'm not sure why we would allow b_c_p late? That said, both > approaches are reasonable - disallow threading or force bcp to > evaluate to zero (guess that relies on the non-threaded variant to > be known to be _not_ constant, otherwise we create another > inconsistecy?). So for sure v1 looks "safer" at the possible expense > of less (early) optimization. Of course backward threading is not > the only source of jump threading (but it's the only truly early > one). According to the doc, returning 0 conservatively should not create inconsistencies: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html A return of 0 does not indicate that the value is not a constant, but merely that GCC cannot prove it is a constant with the specified value of the -O option. v2 was suggested by Jakub as a possible optimization here: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549078.html However, I don't see any differences between v1 and v2 neither in SPEC nor in vmlinux builds on x86_64, so I think the additional complexity is not worth it. There are a few differences between master and v1/v2 in x86_64 vmlinux builds though. For example: https://elixir.bootlin.com/linux/v5.10-rc2/source/kernel/tracepoint.c#L56 Here master performs threading for the overflow case: https://elixir.bootlin.com/linux/v5.10-rc2/source/include/linux/overflow.h#L297 where the allocated size ends up being SIZE_MAX. Threading goes as far as this b_c_p in kmalloc: https://elixir.bootlin.com/linux/v5.10-rc2/source/include/linux/slab.h#L538 and ends up emitting jo to kmalloc_large call after mul that calculates the size. I haven't analyzed all the cases, but the asm diff (including context) is just 2k lines, which isn't a lot, given that vmlinux objdump is 6m lines. Best regards, Ilya