From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenther@suse.de>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
 by sourceware.org (Postfix) with ESMTPS id 5842D3858C27
 for <gcc-patches@gcc.gnu.org>; Mon, 25 Oct 2021 09:48:28 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5842D3858C27
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
 by smtp-out1.suse.de (Postfix) with ESMTP id 263FE2191E;
 Mon, 25 Oct 2021 09:48:27 +0000 (UTC)
Received: from murzim.suse.de (murzim.suse.de [10.160.4.192])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by relay2.suse.de (Postfix) with ESMTPS id F2BABA3B84;
 Mon, 25 Oct 2021 09:48:26 +0000 (UTC)
Date: Mon, 25 Oct 2021 11:48:26 +0200 (CEST)
From: Richard Biener <rguenther@suse.de>
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
cc: Richard Biener <richard.guenther@gmail.com>, 
 gcc Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [match.pd] PR83750 - CSE erf/erfc pair
In-Reply-To: <CAAgBjM=G=2+ugdwK34jBvDVnZt-5WyqTT0ZDvz574BkK-eVQyQ@mail.gmail.com>
Message-ID: <rr3orrnq-6286-nqop-724s-1rpn8spo4375@fhfr.qr>
References: <CAAgBjMkadc11j998xBSOpNB8ES29KWN8=Td-dVf54U26rqabVg@mail.gmail.com>
 <7nno61so-8qor-n16r-444p-527o2qp169r5@fhfr.qr>
 <CAAgBjMmzw9-YNdkQMUt=TsYE_czNPHJUK_201BexVieTDkAbyw@mail.gmail.com>
 <92r13pn-97s9-r18r-6p8-96o3rqqn9or1@fhfr.qr>
 <CAAgBjMmZaPWWPcCXhum+fCv2hhsyGJbn+NJGpPcYGSZEGrCSdw@mail.gmail.com>
 <41ssp55p-1sp2-r9qs-n4po-poq797or90@fhfr.qr>
 <CAAgBjMmTVrEa29o8LrdU8jz_TZ2NF8OScRDCq3d9A5Hims3aGQ@mail.gmail.com>
 <CAFiYyc0i=z=reQ5mQhupY6JfN2Do5=LHCe-iqoczdto62q_Cig@mail.gmail.com>
 <CAAgBjMkdrPvysUru8-xZvz_TMiHNDxrfS25AYAW6OJfQwUbyOg@mail.gmail.com>
 <7069701q-n4q5-sr4-5824-53r8133q19s4@fhfr.qr>
 <CAAgBjMmpQ1bNcnp67BK4p45ai_o+gOdpE7Li9cw+Kzio7ap0RQ@mail.gmail.com>
 <q9n014pr-2784-s076-4n30-26ss8s9q39o5@fhfr.qr>
 <CAAgBjM=e5adKRfRv12xhTR8pWPAi7RMkt-KgtFD=7oiEAV_BFg@mail.gmail.com>
 <723qs717-9873-35n6-rss3-oq661s38ss1@fhfr.qr>
 <CAAgBjM=G=2+ugdwK34jBvDVnZt-5WyqTT0ZDvz574BkK-eVQyQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Oct 2021 09:48:31 -0000

On Fri, 22 Oct 2021, Prathamesh Kulkarni wrote:

> On Fri, 22 Oct 2021 at 14:56, Richard Biener <rguenther@suse.de> wrote:
> >
> > On Fri, 22 Oct 2021, Prathamesh Kulkarni wrote:
> >
> > > On Wed, 20 Oct 2021 at 18:21, Richard Biener <rguenther@suse.de> wrote:
> > > >
> > > > On Wed, 20 Oct 2021, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Tue, 19 Oct 2021 at 16:55, Richard Biener <rguenther@suse.de> wrote:
> > > > > >
> > > > > > On Tue, 19 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > >
> > > > > > > On Tue, 19 Oct 2021 at 13:02, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Oct 19, 2021 at 9:03 AM Prathamesh Kulkarni via Gcc-patches
> > > > > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 18 Oct 2021 at 17:23, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > >
> > > > > > > > > > > On Mon, 18 Oct 2021 at 17:10, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, 18 Oct 2021 at 16:18, Richard Biener <rguenther@suse.de> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > As suggested in PR, I have attached WIP patch that adds two patterns
> > > > > > > > > > > > > > > to match.pd:
> > > > > > > > > > > > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > > > > > > > > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This works to remove call to erfc for the following test:
> > > > > > > > > > > > > > > double f(double x)
> > > > > > > > > > > > > > > {
> > > > > > > > > > > > > > >   double g(double, double);
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >   double t1 = __builtin_erf (x);
> > > > > > > > > > > > > > >   double t2 = __builtin_erfc (x);
> > > > > > > > > > > > > > >   return g(t1, t2);
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > with .optimized dump shows:
> > > > > > > > > > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > > > > > > > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > However, for the following test:
> > > > > > > > > > > > > > > double f(double x)
> > > > > > > > > > > > > > > {
> > > > > > > > > > > > > > >   double g(double, double);
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >   double t1 = __builtin_erfc (x);
> > > > > > > > > > > > > > >   return t1;
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > > > > > > > > > > > > > > erf(x) to erfc(x) again
> > > > > > > > > > > > > > > post canonicalization.
> > > > > > > > > > > > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > > > > > > > > > > > > > > but then it tries to
> > > > > > > > > > > > > > > resimplify erfc(x), which fails post canonicalization. So we end up
> > > > > > > > > > > > > > > with erfc(x) transformed to
> > > > > > > > > > > > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > > > > > > > > > > > > > Could you suggest how to proceed ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I applied your patch manually and it does the intended
> > > > > > > > > > > > > > simplifications so I wonder what I am missing?
> > > > > > > > > > > > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
> > > > > > > > > > > > > no erf(x) in the source ?
> > > > > > > > > > > >
> > > > > > > > > > > > I do think it's reasonable to expect erfc to be available when erf
> > > > > > > > > > > > is and vice versa but note both are C99 specified functions (either
> > > > > > > > > > > > requires -lm).
> > > > > > > > > > > OK, thanks. Would it be OK to commit the patch after bootstrap+test ?
> > > > > > > > > >
> > > > > > > > > > Yes, but I'm confused because you say the patch doesn't work for you?
> > > > > > > > > The patch works for me to CSE erf/erfc pair.
> > > > > > > > > However when there's only erfc in the source, it canonicalizes erfc(x)
> > > > > > > > > to 1 - erf(x) but later fails to uncanonicalize 1 - erf(x) back to
> > > > > > > > > erfc(x)
> > > > > > > > > with -O3 -funsafe-math-optimizations.
> > > > > > > > >
> > > > > > > > > For,
> > > > > > > > > t1 = __builtin_erfc(x),
> > > > > > > > >
> > > > > > > > > .optimized dump shows:
> > > > > > > > >   _2 = __builtin_erf (x_1(D));
> > > > > > > > >   t1_3 = 1.0e+0 - _2;
> > > > > > > > >
> > > > > > > > > and for,
> > > > > > > > > double t1 = x + __builtin_erfc(x);
> > > > > > > > >
> > > > > > > > > .optimized dump shows:
> > > > > > > > >   _3 = __builtin_erf (x_2(D));
> > > > > > > > >   _7 = x_2(D) + 1.0e+0;
> > > > > > > > >   t1_4 = _7 - _3;
> > > > > > > > >
> > > > > > > > > I assume in both cases, we want erfc in the code-gen instead ?
> > > > > > > > > I think the reason uncaonicalization fails is because the pattern 1 -
> > > > > > > > > erf(x) to erfc(x)
> > > > > > > > > gets applied, but then it fails in resimplifying erfc(x), and we end
> > > > > > > > > up with 1 - erf(x) in code-gen.
> > > > > > > > >
> > > > > > > > > From gimple-match.c, it hits the simplification:
> > > > > > > > >
> > > > > > > > >                                 gimple_seq *lseq = seq;
> > > > > > > > >                                 if (__builtin_expect (!dbg_cnt
> > > > > > > > > (match), 0)) goto next_after_fail1172;
> > > > > > > > >                                 if (__builtin_expect (dump_file &&
> > > > > > > > > (dump_flags & TDF_FOLDING), 0)) fprintf (dump_file, "Applying pattern
> > > > > > > > > %s:%d, %s:%d\n", "match.pd", 6162, __FILE__, __LINE__);
> > > > > > > > >                                 {
> > > > > > > > >                                   res_op->set_op (CFN_BUILT_IN_ERFC, type, 1);
> > > > > > > > >                                   res_op->ops[0] = captures[0];
> > > > > > > > >                                   res_op->resimplify (lseq, valueize);
> > > > > > > > >                                   return true;
> > > > > > > > >                                 }
> > > > > > > > >
> > > > > > > > > But res_op->resimplify returns false, and doesn't end up adding to lseq.
> > > > > > > >
> > > > > > > > There's nothing to add to lseq since there's also nothing to resimplify.
> > > > > > > > The only thing that could happen is that the replacement is not done
> > > > > > > > because replace_stmt_with_simplification via maybe_push_res_to_seq
> > > > > > > > doesn't pass the builtin_decl_implicit test:
> > > > > > > >
> > > > > > > >           /* Find the function we want to call.  */
> > > > > > > >           tree decl = builtin_decl_implicit (as_builtin_fn (fn));
> > > > > > > >           if (!decl)
> > > > > > > >             return NULL;
> > > > > > > >
> > > > > > > > btw, it did work for me since the call was present before and gimplification
> > > > > > > > should then mark the function eligible for implicit generation.
> > > > > > > >
> > > > > > > > > As you suggest, should we instead handle this in fre to transform
> > > > > > > > > erfc(x) to 1 - erf(x), only when
> > > > > > > > > there's a matching erf(x) in the source ?
> > > > > > > >
> > > > > > > > Note that's strictly less powerful and we'd have to handle erf(x) -> 1 +erfc (x)
> > > > > > > > to handle CSE in
> > > > > > > >
> > > > > > > >   tem = erfc (x);
> > > > > > > >   tem2 = erf (x);
> > > > > > > >
> > > > > > > > So no, I think the canonicalization is fine unless there's a compelling reason
> > > > > > > > for having both erfc and erf.
> > > > > > > >
> > > > > > > > Can you debug why the reverse transform doesn't work for you?
> > > > > > > It seems the issue was that erfc wasn't getting marked with const
> > > > > > > attribute, and failed the following test in
> > > > > > > maybe_push_res_to_seq:
> > > > > > >           /* We can't and should not emit calls to non-const functions.  */
> > > > > > >           if (!(flags_from_decl_or_type (decl) & ECF_CONST))
> > > > > > >             return NULL;
> > > > > > >
> > > > > > > Passing -fno-math-errno seems to work for the reverse transform:
> > > > > > >
> > > > > > > double f(double x)
> > > > > > > {
> > > > > > >   double g(double, double);
> > > > > > >
> > > > > > >   double t1 = __builtin_erfc (x);
> > > > > > >   return t1;
> > > > > > > }
> > > > > > >
> > > > > > > Compiling with -O3 -funsafe-math-optimizations -fno-math-errno:
> > > > > > >
> > > > > > > vrp2 dump shows:
> > > > > > > Folding statement: _2 = __builtin_erf (x_1(D));
> > > > > > > Not folded
> > > > > > > Folding statement: t1_3 = 1.0e+0 - _2;
> > > > > > > Applying pattern match.pd:6162, gimple-match.c:68450
> > > > > > > gimple_simplified to t1_3 = __builtin_erfc (x_1(D));
> > > > > > > Folded into: t1_3 = __builtin_erfc (x_1(D));
> > > > > > >
> > > > > > > and .optimized dump shows:
> > > > > > > double f (double x)
> > > > > > > {
> > > > > > >   double t1;
> > > > > > >
> > > > > > >   <bb 2> [local count: 1073741824]:
> > > > > > >   t1_3 = __builtin_erfc (x_1(D)); [tail call]
> > > > > > >   return t1_3;
> > > > > > > }
> > > > > > >
> > > > > > > Unfortunately, for the test-case involving erf/erfc pair, the reverse
> > > > > > > transform seems to undo the CSE:
> > > > > > >
> > > > > > > double f(double x)
> > > > > > > {
> > > > > > >   double g(double, double);
> > > > > > >
> > > > > > >   double t1 = __builtin_erf (x);
> > > > > > >   double t2 = __builtin_erfc (x);
> > > > > > >   return g(t1, t2);
> > > > > > > }
> > > > > > >
> > > > > > > gimplification turns erfc to 1 - erf:
> > > > > > > Applying pattern match.pd:6158, gimple-match.c:44479
> > > > > > > gimple_simplified to D.1988 = __builtin_erf (x);
> > > > > > > t2 = 1.0e+0 - D.1988;
> > > > > > >
> > > > > > >   t1 = __builtin_erf (x);
> > > > > > >   D.1988 = __builtin_erf (x);
> > > > > > >   t2 = 1.0e+0 - D.1988;
> > > > > > >   D.1987 = g (t1, t2);
> > > > > > >
> > > > > > > fre1 does the CSE:
> > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > >   t2_4 = 1.0e+0 - t1_2;
> > > > > > >   _7 = g (t1_2, t2_4);
> > > > > > >
> > > > > > > and forwprop4 again converts 1 - erf(x) to erfc(x), "undoing" the CSE:
> > > > > > > Applying pattern match.pd:6162, gimple-match.c:68450
> > > > > > > gimple_simplified to t2_3 = __builtin_erfc (x_1(D));
> > > > > > >
> > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > >   t2_3 = __builtin_erfc (x_1(D));
> > > > > > >   _6 = g (t1_2, t2_3);
> > > > > > >
> > > > > > > and .optimized dump shows:
> > > > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > > > >   t2_3 = __builtin_erfc (x_1(D));
> > > > > > >   _6 = g (t1_2, t2_3); [tail call]
> > > > > >
> > > > > > You probably want an explicit && single_use () check on the
> > > > > > 1 - erf() -> erfc transform.
> > > > > single_use worked, thanks!
> > > > > As you pointed out, we reassociate x + erfc(x) to (x + 1) - erf(x) and
> > > > > don't uncanonicalize back.
> > > > > I added another pattern to reassociate (x + 1) - erf(x) to x + (1 -
> > > > > erf(x)), after which,
> > > > > it gets uncanonicalized back to x + erfc(x) in .optimized dump.
> > > >
> > > > I think that's not the way to "fix" this case, instead if we'd care
> > > > we should change reassoc to either associate a -erf() operand and
> > > > a 1 next to each other or recognize it there.  But I think that
> > > > can be done as followup.
> > > >
> > > > > Does the attached patch look OK after bootstrap+test ?
> > > >
> > > > It's OK with removing the extra (x+1) - erf pattern and the
> > > > associated testcase.
> > > Thanks. It regressed builtin-nonneg-1.c because for following case:
> > >
> > > void test(double d)
> > > {
> > >   if (signbit (__builtin_erfc (d)))
> > >     link_failure_erfc ();
> > > }
> > >
> > > before patch signbit (erfc (d)) was transformed to 0, since erfc
> > > returns a non-negative result.
> > > However after patch, erfc (d), transformed to 1 - erf(d), and in
> > > optimized dump we get:
> > >
> > >   <bb 2> [local count: 1073741824]:
> > >   _3 = __builtin_erf (d_2(D));
> > >   if (_3 > 1.0e+0)
> > >     goto <bb 3>; [41.48%]
> > >   else
> > >     goto <bb 4>; [58.52%]
> > >
> > >   <bb 3> [local count: 445388112]:
> > >   link_failure_erfc (); [tail call]
> > >
> > >   <bb 4> [local count: 1073741824]:
> > >   return;
> > >
> > > The attached patch adds transform erf(x) > 1 --> 0 which removes the
> > > redundant test and call to link_failure_erfc.
> > > I assume the transform is OK since erf's range is [-1, 1] ?
> >
> > + /* Simplify erf(x) > 1 to 0.  */
> > + (simplify
> > +  (gt (ERF @0) real_onep)
> > +  { boolean_false_node; }))
> >
> > most definitely this is too special - it doesn't work for erf(x) > 1.1
> > for example.  Ideally we'd have value-ranges on floats and that would
> > deal with such cases.  Note that erf can return NaN (for NaN input),
> > in which case gt would raise an exception but unge would not.
> > So the issue is that folding the gt to false would eventually remove
> > an exception.  Conditionalizing this on HONOR_NANS (@0) would work
> > around this.  And of course use sth like
> >
> >  (gt (ERF @0) REAL_CST@1)
> >  (if (!HONOR_NANS (@0)
> >       && real_compare (GE_EXPR, @1, dconst1))
> >   { boolean_false_node; })
> >
> > that said, I'm not convinced this is the very best idea - maybe
> > we should revisit the way do CSE erf and erfc (was there a
> > real-world usecase for this btw?)
> Hmm indeed you're right, it's rather specialised to catch erf > 0
> condition in patch.
> I initially thought of adding this to
> fold_using_range::range_of_builtin_call, but since
> vrp doesn't propagate floats yet, I added it as a transform.
> No I didn't have any particular real world use case for the transform.
> How do you suggest to proceed ?

Put it on hold, I don't see a nice place to do this that wouldn't
be a very special case for the non-real-world testcase.  The VRP
folks say they'll work on float stuff for GCC 13 so maybe we can
revisit then.

Richard.