From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id C37AA385B51E for ; Wed, 10 May 2023 15:27:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C37AA385B51E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E2AA01063; Wed, 10 May 2023 08:28:07 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CE5623F67D; Wed, 10 May 2023 08:27:22 -0700 (PDT) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener ,Oluwatamilore Adebayo , "gcc-patches\@gcc.gnu.org" , richard.sandiford@arm.com Cc: Oluwatamilore Adebayo , "gcc-patches\@gcc.gnu.org" Subject: Re: [PATCH] vect: Missed opportunity to use [SU]ABD References: Date: Wed, 10 May 2023 16:27:21 +0100 In-Reply-To: (Richard Biener's message of "Wed, 10 May 2023 11:51:08 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-29.8 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Richard Biener writes: > On Wed, May 10, 2023 at 11:49=E2=80=AFAM Richard Biener > wrote: >> >> On Wed, May 10, 2023 at 11:01=E2=80=AFAM Richard Sandiford >> wrote: >> > >> > Oluwatamilore Adebayo writes: >> > > From 0b5f469171c340ef61a48a31877d495bb77bd35f Mon Sep 17 00:00:00 20= 01 >> > > From: oluade01 >> > > Date: Fri, 14 Apr 2023 10:24:43 +0100 >> > > Subject: [PATCH 1/4] Missed opportunity to use [SU]ABD >> > > >> > > This adds a recognition pattern for the non-widening >> > > absolute difference (ABD). >> > > >> > > gcc/ChangeLog: >> > > >> > > * doc/md.texi (sabd, uabd): Document them. >> > > * internal-fn.def (ABD): Use new optab. >> > > * optabs.def (sabd_optab, uabd_optab): New optabs, >> > > * tree-vect-patterns.cc (vect_recog_absolute_difference): >> > > Recognize the following idiom abs (a - b). >> > > (vect_recog_sad_pattern): Refactor to use >> > > vect_recog_absolute_difference. >> > > (vect_recog_abd_pattern): Use patterns found by >> > > vect_recog_absolute_difference to build a new ABD >> > > internal call. >> > > --- >> > > gcc/doc/md.texi | 10 ++ >> > > gcc/internal-fn.def | 3 + >> > > gcc/optabs.def | 2 + >> > > gcc/tree-vect-patterns.cc | 250 +++++++++++++++++++++++++++++++++--= --- >> > > 4 files changed, 234 insertions(+), 31 deletions(-) >> > > >> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi >> > > index 07bf8bdebffb2e523f25a41f2b57e43c0276b745..0ad546c63a8deebb4b6d= b894f437d1e21f0245a8 100644 >> > > --- a/gcc/doc/md.texi >> > > +++ b/gcc/doc/md.texi >> > > @@ -5778,6 +5778,16 @@ Other shift and rotate instructions, analogou= s to the >> > > Vector shift and rotate instructions that take vectors as operand 2 >> > > instead of a scalar type. >> > > >> > > +@cindex @code{uabd@var{m}} instruction pattern >> > > +@cindex @code{sabd@var{m}} instruction pattern >> > > +@item @samp{uabd@var{m}}, @samp{sabd@var{m}} >> > > +Signed and unsigned absolute difference instructions. These >> > > +instructions find the difference between operands 1 and 2 >> > > +then return the absolute value. A C code equivalent would be: >> > > +@smallexample >> > > +op0 =3D abs (op0 - op1) >> > >> > op0 =3D abs (op1 - op2) >> > >> > But that isn't the correct calculation for unsigned (where abs doesn't >> > really work). It also doesn't handle some cases correctly for signed. >> > >> > I think it's more: >> > >> > op0 =3D op1 > op2 ? (unsigned type) op1 - op2 : (unsigned type) op2 = - op1 >> > >> > or (conceptually) max minus min. >> > >> > E.g. for 16-bit values, the absolute difference between signed 0x7fff >> > and signed -0x8000 is 0xffff (reinterpreted as -1 if you cast back >> > to signed). But, ignoring undefined behaviour: >> > >> > 0x7fff - 0x8000 =3D -1 >> > abs(-1) =3D 1 >> > >> > which gives the wrong answer. >> > >> > We might still be able to fold C abs(a - b) to abd for signed a and b >> > by relying on undefined behaviour (TYPE_OVERFLOW_UNDEFINED). But we >> > can't do it for -fwrapv. >> > >> > Richi knows better than me what would be appropriate here. >> >> The question is what does the hardware do? For the widening [us]sad it's >> obvious since the difference is computed in a wider signed mode and the >> absolute value always fits. >> >> So what does it actually do, esp. when the difference yields 0x8000? > > A "sensible" definition would be that it works like the widening [us]sad > and applies truncation to the result (modulo-reducing when the result > isn't always unsigned). Yeah. Like Tami says, this is what the instruction does. I think all three definitions are equivalent: the extend/operate/truncate one, the ?: one above, and the "max - min" one. Probably just personal preference as to which seems more natural. Reading the patch again, it does check TYPE_OVERFLOW_WRAPS, so -fwrapv might be handled correctly after all. Sorry for missing it first time. On the patch: > +/* Look for the following pattern > + X =3D x[i] > + Y =3D y[i] > + DIFF =3D X - Y > + DAD =3D ABS_EXPR > + */ > +static bool > +vect_recog_absolute_difference (vec_info *vinfo, gassign *abs_stmt, > + tree *half_type, bool reject_unsigned, > + vect_unpromoted_value unprom[2], > + tree diff_oprnds[2]) It would be good to document what the parameters mean (except VINFO, which is obvious). > + /* Peel off conversions from the ABS input. This can involve sign > + changes (e.g. from an unsigned subtraction to a signed ABS input) > + or signed promotion, but it can't include unsigned promotion. > + (Note that ABS of an unsigned promotion should have been folded > + away before now anyway.) */ > + vect_unpromoted_value unprom_diff; > + abs_oprnd =3D vect_look_through_possible_promotion (vinfo, abs_oprnd, > + &unprom_diff); > + if (!abs_oprnd) > + return false; > + if (TYPE_PRECISION (unprom_diff.type) !=3D TYPE_PRECISION (abs_type) > + && TYPE_UNSIGNED (unprom_diff.type)) > + if (!reject_unsigned) > + return false; I think this should instead be: if (TYPE_PRECISION (unprom_diff.type) !=3D TYPE_PRECISION (abs_type) && TYPE_UNSIGNED (unprom_diff.type) && TYPE_UNSIGNED (abs_type)) return false; The point is that if any promotion (from "A" to "B" say) happens on the input to the ABS, it has to be signed rather than unsigned promotion, so that the extra bits in B get filled with the sign of A. Without that, the ABS will always be a no-op (which is why it should have been removed by now). > + bool assigned_oprnds =3D false; > + gassign *diff =3D dyn_cast (STMT_VINFO_STMT (diff_stmt_vin= fo)); > + if (diff_oprnds && diff && gimple_assign_rhs_code (diff) =3D=3D MINUS_= EXPR) > + { > + assigned_oprnds =3D true; > + diff_oprnds[0] =3D gimple_assign_rhs1 (diff); > + diff_oprnds[1] =3D gimple_assign_rhs2 (diff); > + } > + > + /* FORNOW. Can continue analyzing the def-use chain when this stmt in= a phi > + inside the loop (in case we are analyzing an outer-loop). */ > + if (vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, > + WIDEN_MINUS_EXPR, > + false, 2, unprom, half_type)) > + { > + if (diff_oprnds && !assigned_oprnds) > + { > + diff_oprnds[0] =3D unprom[0].op; > + diff_oprnds[1] =3D unprom[1].op; > + } > + } > + else if (!assigned_oprnds) > + { > + return false; > + } > + else > + { > + *half_type =3D NULL_TREE; > + } > + > + return true; > +} I think the code would be easier to follow if it used vect_widened_op_tree first, and only considered the unextended case on failure. Minor formatting nit: GCC style is to indent braces by two spaces further than an "if": if (...) { ... } Thanks, Richard