From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <iii@linux.ibm.com>
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com
 [148.163.156.1])
 by sourceware.org (Postfix) with ESMTPS id 8E74E3861018
 for <gcc@gcc.gnu.org>; Tue,  3 Nov 2020 13:49:41 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 8E74E3861018
Received: from pps.filterd (m0098393.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id
 0A3DnMhR082485; Tue, 3 Nov 2020 08:49:40 -0500
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0a-001b2d01.pphosted.com with ESMTP id 34k86crhc4-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Tue, 03 Nov 2020 08:49:40 -0500
Received: from m0098393.ppops.net (m0098393.ppops.net [127.0.0.1])
 by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 0A3DnRHp083125;
 Tue, 3 Nov 2020 08:49:40 -0500
Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com
 [169.51.49.98])
 by mx0a-001b2d01.pphosted.com with ESMTP id 34k86crhan-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Tue, 03 Nov 2020 08:49:39 -0500
Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1])
 by ppma03ams.nl.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 0A3DlQbj024988;
 Tue, 3 Nov 2020 13:49:37 GMT
Received: from b06cxnps3074.portsmouth.uk.ibm.com
 (d06relay09.portsmouth.uk.ibm.com [9.149.109.194])
 by ppma03ams.nl.ibm.com with ESMTP id 34hm6hahhf-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Tue, 03 Nov 2020 13:49:37 +0000
Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com
 [9.149.105.58])
 by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 0A3DnZiU57475386
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Tue, 3 Nov 2020 13:49:35 GMT
Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 30FA24C046;
 Tue,  3 Nov 2020 13:49:35 +0000 (GMT)
Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id CABD94C050;
 Tue,  3 Nov 2020 13:49:34 +0000 (GMT)
Received: from sig-9-145-1-210.uk.ibm.com (unknown [9.145.1.210])
 by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP;
 Tue,  3 Nov 2020 13:49:34 +0000 (GMT)
Message-ID: <8517a4a8ddf315fbc8bdd01dc9a690797e8fbee4.camel@linux.ibm.com>
Subject: Re: Incremental updating of SSA_NAMEs that are passed to b_c_p
From: Ilya Leoshkevich <iii@linux.ibm.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: GCC Development <gcc@gcc.gnu.org>, Jeff Law <law@redhat.com>, Marc
 Glisse <marc.glisse@inria.fr>, Jakub Jelinek <jakub@redhat.com>
Date: Tue, 03 Nov 2020 14:49:34 +0100
In-Reply-To: <CAFiYyc0Zow4KRZG8R0Cj7UyZqh6nnCY3QxpnwASxxXkF46KDCQ@mail.gmail.com>
References: <4ce44dda24d28279156422c10d09d64a575f1f8c.camel@linux.ibm.com>
 <CAFiYyc1=DGGYo1JJxTBpx3CkoVd4s53UqQN6gkwMM3DJWFTiwQ@mail.gmail.com>
 <132ef4118088160843af2556cb64e499f1aae3e3.camel@linux.ibm.com>
 <CAFiYyc0Zow4KRZG8R0Cj7UyZqh6nnCY3QxpnwASxxXkF46KDCQ@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
User-Agent: Evolution 3.34.4 (3.34.4-1.fc31) 
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-TM-AS-GCONF: 00
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.312, 18.0.737
 definitions=2020-11-03_08:2020-11-03,
 2020-11-03 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 bulkscore=0 clxscore=1015
 suspectscore=0 impostorscore=0 phishscore=0 priorityscore=1501
 malwarescore=0 adultscore=0 mlxscore=0 lowpriorityscore=0 spamscore=0
 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2009150000 definitions=main-2011030089
X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Nov 2020 13:49:43 -0000

On Fri, 2020-10-30 at 09:22 +0100, Richard Biener wrote:
> On Thu, Oct 29, 2020 at 6:20 PM Ilya Leoshkevich <iii@linux.ibm.com>
> wrote:
> > On Wed, 2020-10-28 at 12:18 +0100, Richard Biener wrote:
> > > On Tue, Oct 27, 2020 at 7:36 PM Ilya Leoshkevich via Gcc
> > > <gcc@gcc.gnu.org> wrote:
> > > > Hi,
> > > > 
> > > > I'd like to revive the old discussion regarding the interaction
> > > > of
> > > > jump threading and b_c_p causing the latter to incorrectly
> > > > return 1
> > > > in
> > > > certain cases:
> > > > 
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549288.html
> > > > 
> > > > The conclusion was that this happening during threading is just
> > > > a
> > > > symptom of a deeper problem: SSA_NAMEs that are passed to b_c_p
> > > > should
> > > > not be registered for incremental updating.
> > > > 
> > > > I performed a little experiment and added an assertion to
> > > > create_new_def_for:
> > > > 
> > > > --- a/gcc/tree-into-ssa.c
> > > > +++ b/gcc/tree-into-ssa.c
> > > > @@ -2996,6 +3014,8 @@ create_new_def_for (tree old_name, gimple
> > > > *stmt,
> > > > def_operand_p def)
> > > >  {
> > > >    tree new_name;
> > > > 
> > > > +  gcc_checking_assert (!used_by_bcp_p (old_name));
> > > > +
> > > >    timevar_push (TV_TREE_SSA_INCREMENTAL);
> > > > 
> > > >    if (!update_ssa_initialized_fn)
> > > > 
> > > > This has of course fired when performing basic block
> > > > duplication
> > > > during
> > > > threading, which can be fixed by avoiding duplication of basic
> > > > blocks
> > > > wi
> > > > th b_c_p calls:
> > > > 
> > > > --- a/gcc/tree-cfg.c
> > > > +++ b/gcc/tree-cfg.c
> > > > @@ -6224,7 +6224,8 @@ gimple_can_duplicate_bb_p
> > > > (const_basic_block
> > > > bb)
> > > >               || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
> > > >               || gimple_call_internal_p (g,
> > > > IFN_GOMP_SIMT_VOTE_ANY)
> > > >               || gimple_call_internal_p (g,
> > > > IFN_GOMP_SIMT_XCHG_BFLY)
> > > > -             || gimple_call_internal_p (g,
> > > > IFN_GOMP_SIMT_XCHG_IDX)))
> > > > +             || gimple_call_internal_p (g,
> > > > IFN_GOMP_SIMT_XCHG_IDX)
> > > > +             || gimple_call_builtin_p (g,
> > > > BUILT_IN_CONSTANT_P)))
> > > >         return false;
> > > >      }
> > > > 
> > > > The second occurrence is a bit more interesting:
> > > > 
> > > > gimple *
> > > > vrp_insert::build_assert_expr_for (tree cond, tree v)
> > > > {
> > > >   ...
> > > >   a = build2 (ASSERT_EXPR, TREE_TYPE (v), v, cond);
> > > >   assertion = gimple_build_assign (NULL_TREE, a);
> > > >   ...
> > > >   tree new_def = create_new_def_for (v, assertion, NULL);
> > > > 
> > > > The fix is also simple though:
> > > > 
> > > > --- a/gcc/tree-vrp.c
> > > > +++ b/gcc/tree-vrp.c
> > > > @@ -3101,6 +3101,9 @@ vrp_insert::process_assert_insertions_for
> > > > (tree
> > > > name, assert_locus *loc)
> > > >    if (loc->expr == loc->val)
> > > >      return false;
> > > > 
> > > > +  if (used_by_bcp_p (name))
> > > > +    return false;
> > > > +
> > > >    cond = build2 (loc->comp_code, boolean_type_node, loc->expr,
> > > > loc-
> > > > > val);
> > > >    assert_stmt = build_assert_expr_for (cond, name);
> > > >    if (loc->e)
> > > > 
> > > > My original testcase did not trigger anything else.  I'm
> > > > planning
> > > > to
> > > > check how this affects the testsuite, but before going further
> > > > I
> > > > would
> > > > like to ask: is this the right direction now?  To me it looks a
> > > > little-bit more heavy-handed than the original approach, but
> > > > maybe
> > > > it's
> > > > worth it.
> > > 
> > > Disabling threading looks reasonable but I'd rather not disallow
> > > BB
> > > duplication
> > > or renaming.  For VRP I guess we want to instead change
> > > register_edge_assert_for* to not register assertions for the
> > > result
> > > of
> > > __builtin_constant_p rather than just not allowing VRP to process
> > > it
> > > (there are other consumers still).
> > 
> > If I understood Jeff correctly, we should disable incremental
> > updates
> > for absolutely all b_c_p arguments, so affecting as many consumers
> > as
> > possible must actually be a good thing when this approach is
> > considered?
> > 
> > That said, regtest has revealed one more place where this is
> > happening:
> > rewrite_into_loop_closed_ssa_1 -> ... -> add_exit_phi ->
> > create_new_def_for.  The reduced code is:
> > 
> > int a;
> > 
> > void
> > b (void)
> > {
> >   while (a)
> >     __builtin_constant_p (a) ?: 0;
> >   if (a == 8)
> >     while (1)
> >       ;
> > }
> > 
> > I guess we are not allowed to cancel this transformation, becase we
> > really need LCSSA for later passes?  This now contradicts the
> > "prohibit
> > all updates" idea..
> 
> Yes.  I believe this looks at the issue from a wrong angle.  SSA
> rewrite
> is just renaming and that's OK.
> 
> > If you think that disabling threading is reasonable, could you
> > please
> > have another look at the original patches?
> > 
> > v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html
> > v2: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549211.html
> > 
> > Can anything like this go in after all?
> 
> For v2 I'm not sure why we would allow b_c_p late?  That said, both
> approaches are reasonable - disallow threading or force bcp to
> evaluate to zero (guess that relies on the non-threaded variant to
> be known to be _not_ constant, otherwise we create another
> inconsistecy?).  So for sure v1 looks "safer" at the possible expense
> of less (early) optimization.  Of course backward threading is not
> the only source of jump threading (but it's the only truly early
> one).

According to the doc, returning 0 conservatively should not create
inconsistencies:

https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

A return of 0 does not indicate that the value is not a constant, but
merely that GCC cannot prove it is a constant with the specified value
of the -O option.


v2 was suggested by Jakub as a possible optimization here:

https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549078.html

However, I don't see any differences between v1 and v2 neither in SPEC
nor in vmlinux builds on x86_64, so I think the additional complexity
is not worth it.


There are a few differences between master and v1/v2 in x86_64 vmlinux
builds though. For example:

https://elixir.bootlin.com/linux/v5.10-rc2/source/kernel/tracepoint.c#L56

Here master performs threading for the overflow case:

https://elixir.bootlin.com/linux/v5.10-rc2/source/include/linux/overflow.h#L297

where the allocated size ends up being SIZE_MAX. Threading goes as far
as this b_c_p in kmalloc:

https://elixir.bootlin.com/linux/v5.10-rc2/source/include/linux/slab.h#L538

and ends up emitting jo to kmalloc_large call after mul that
calculates the size.

I haven't analyzed all the cases, but the asm diff (including context)
is just 2k lines, which isn't a lot, given that vmlinux objdump is 6m
lines.

Best regards,
Ilya