From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12c.google.com (mail-lf1-x12c.google.com [IPv6:2a00:1450:4864:20::12c]) by sourceware.org (Postfix) with ESMTPS id 585A63858C66 for ; Wed, 10 May 2023 09:53:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 585A63858C66 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lf1-x12c.google.com with SMTP id 2adb3069b0e04-4f25d79f6bfso1735577e87.2 for ; Wed, 10 May 2023 02:53:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683712381; x=1686304381; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=RX6Zpo4uoE9/lbwALko2D0aSoAWUQ0BaJXw8A+jujnI=; b=KcBNCaqC9AQTUKjL8Z2weN7Tyhy7tbXLCeaAJA48Dwh5dphlIqYgdoBZefGYLNc26T 1GqJ51GWOZA0D4wX8bBtl53sThzTNiTbLqIutiUIgnQTmwdipx4BdNDRRj+m6JMRxqRK BBUWxKpcbUosxR3iW5gF2nTPv8rqu2kVKMVaWnDv2UdCBP+Xhi9jyxb02n9ozmnG8JFo ZNpnzEAyLwpJyOvtybOtZlul/uuyT99NjgC3qAYQ14XFO3UrfqfHxij3aBTcvCyH5Un+ ZVmhJRSVBqjSDDOOW6cPM4qXzt4yBQDCd9ixZiSMw3JEH0T5eF5spETDLkGktcmAim9R zTQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683712381; x=1686304381; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RX6Zpo4uoE9/lbwALko2D0aSoAWUQ0BaJXw8A+jujnI=; b=L/t6EYdUAuRysOBEKJKGSB4mG3gSLuQ7+SrZG79TMj4xEM0AAwocxIXNSw2XDnUhdx s8oi5dr5oe5t1SgodqoX0MSMRAER1W0dHcqkl6Dr9qpL1aZHmAKzYtZVVfVsIjFJ9287 7klbiHVKUUtpL/ixdsu4cP/cXiguEAadKQzyGmz1BzXQNsAMDvhsriRFWySnvLZhtlyW SKQWmkJ6PrO7wp3AogYyO0RYM94wMxhA3IVVHA3lp6BlGRFKKUxNjwinqPWNmZBYAGxp osqwGrxDNtPc3XYLoNbHfFwgXRdI3V2fGNQOqMbqO1heOsrIQvXvwy0JlXBknI/y24H8 vW2A== X-Gm-Message-State: AC+VfDzR9vIlJw94griJ1Wbwr0MKvvT+yqssjXSvt8eH1MCIcnPLtRGS gb3uiG8JN6yBfwU3HmREmhj7ONScw6bYsiZueOl8Ymi6 X-Google-Smtp-Source: ACHHUZ4pjRBS8HNaAmm3X2ytIJs+0xF0ZC1JGuFGrK2r5ltkZKwcRmq6ividNyHo4jZpDSrFgucCtFiCZW8vUMDLwcI= X-Received: by 2002:ac2:488c:0:b0:4f0:2ce:34ea with SMTP id x12-20020ac2488c000000b004f002ce34eamr1490655lfc.44.1683712380642; Wed, 10 May 2023 02:53:00 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Wed, 10 May 2023 11:51:08 +0200 Message-ID: Subject: Re: [PATCH] vect: Missed opportunity to use [SU]ABD To: Oluwatamilore Adebayo , "gcc-patches@gcc.gnu.org" , "richard.guenther@gmail.com" , richard.sandiford@arm.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_LOTSOFHASH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, May 10, 2023 at 11:49=E2=80=AFAM Richard Biener wrote: > > On Wed, May 10, 2023 at 11:01=E2=80=AFAM Richard Sandiford > wrote: > > > > Oluwatamilore Adebayo writes: > > > From 0b5f469171c340ef61a48a31877d495bb77bd35f Mon Sep 17 00:00:00 200= 1 > > > From: oluade01 > > > Date: Fri, 14 Apr 2023 10:24:43 +0100 > > > Subject: [PATCH 1/4] Missed opportunity to use [SU]ABD > > > > > > This adds a recognition pattern for the non-widening > > > absolute difference (ABD). > > > > > > gcc/ChangeLog: > > > > > > * doc/md.texi (sabd, uabd): Document them. > > > * internal-fn.def (ABD): Use new optab. > > > * optabs.def (sabd_optab, uabd_optab): New optabs, > > > * tree-vect-patterns.cc (vect_recog_absolute_difference): > > > Recognize the following idiom abs (a - b). > > > (vect_recog_sad_pattern): Refactor to use > > > vect_recog_absolute_difference. > > > (vect_recog_abd_pattern): Use patterns found by > > > vect_recog_absolute_difference to build a new ABD > > > internal call. > > > --- > > > gcc/doc/md.texi | 10 ++ > > > gcc/internal-fn.def | 3 + > > > gcc/optabs.def | 2 + > > > gcc/tree-vect-patterns.cc | 250 +++++++++++++++++++++++++++++++++---= -- > > > 4 files changed, 234 insertions(+), 31 deletions(-) > > > > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > > > index 07bf8bdebffb2e523f25a41f2b57e43c0276b745..0ad546c63a8deebb4b6db= 894f437d1e21f0245a8 100644 > > > --- a/gcc/doc/md.texi > > > +++ b/gcc/doc/md.texi > > > @@ -5778,6 +5778,16 @@ Other shift and rotate instructions, analogous= to the > > > Vector shift and rotate instructions that take vectors as operand 2 > > > instead of a scalar type. > > > > > > +@cindex @code{uabd@var{m}} instruction pattern > > > +@cindex @code{sabd@var{m}} instruction pattern > > > +@item @samp{uabd@var{m}}, @samp{sabd@var{m}} > > > +Signed and unsigned absolute difference instructions. These > > > +instructions find the difference between operands 1 and 2 > > > +then return the absolute value. A C code equivalent would be: > > > +@smallexample > > > +op0 =3D abs (op0 - op1) > > > > op0 =3D abs (op1 - op2) > > > > But that isn't the correct calculation for unsigned (where abs doesn't > > really work). It also doesn't handle some cases correctly for signed. > > > > I think it's more: > > > > op0 =3D op1 > op2 ? (unsigned type) op1 - op2 : (unsigned type) op2 -= op1 > > > > or (conceptually) max minus min. > > > > E.g. for 16-bit values, the absolute difference between signed 0x7fff > > and signed -0x8000 is 0xffff (reinterpreted as -1 if you cast back > > to signed). But, ignoring undefined behaviour: > > > > 0x7fff - 0x8000 =3D -1 > > abs(-1) =3D 1 > > > > which gives the wrong answer. > > > > We might still be able to fold C abs(a - b) to abd for signed a and b > > by relying on undefined behaviour (TYPE_OVERFLOW_UNDEFINED). But we > > can't do it for -fwrapv. > > > > Richi knows better than me what would be appropriate here. > > The question is what does the hardware do? For the widening [us]sad it's > obvious since the difference is computed in a wider signed mode and the > absolute value always fits. > > So what does it actually do, esp. when the difference yields 0x8000? A "sensible" definition would be that it works like the widening [us]sad and applies truncation to the result (modulo-reducing when the result isn't always unsigned). Richard. > Richard. > > > > > Thanks, > > Richard > > > > > +@end smallexample > > > + > > > @cindex @code{avg@var{m}3_floor} instruction pattern > > > @cindex @code{uavg@var{m}3_floor} instruction pattern > > > @item @samp{avg@var{m}3_floor} > > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > > > index 7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..0f1724ecf37a31c231572= edf90b5577e2d82f468 100644 > > > --- a/gcc/internal-fn.def > > > +++ b/gcc/internal-fn.def > > > @@ -167,6 +167,9 @@ DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, terna= ry) > > > DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary) > > > DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary) > > > > > > +DEF_INTERNAL_SIGNED_OPTAB_FN (ABD, ECF_CONST | ECF_NOTHROW, first, > > > + sabd, uabd, binary) > > > + > > > DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | ECF_NOTHROW, fi= rst, > > > savg_floor, uavg_floor, binary) > > > DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, fir= st, > > > diff --git a/gcc/optabs.def b/gcc/optabs.def > > > index 695f5911b300c9ca5737de9be809fa01aabe5e01..29bc92281a2175f898634= cbe6af63c18021e5268 100644 > > > --- a/gcc/optabs.def > > > +++ b/gcc/optabs.def > > > @@ -359,6 +359,8 @@ OPTAB_D (mask_fold_left_plus_optab, "mask_fold_le= ft_plus_$a") > > > OPTAB_D (extract_last_optab, "extract_last_$a") > > > OPTAB_D (fold_extract_last_optab, "fold_extract_last_$a") > > > > > > +OPTAB_D (uabd_optab, "uabd$a3") > > > +OPTAB_D (sabd_optab, "sabd$a3") > > > OPTAB_D (savg_floor_optab, "avg$a3_floor") > > > OPTAB_D (uavg_floor_optab, "uavg$a3_floor") > > > OPTAB_D (savg_ceil_optab, "avg$a3_ceil") > > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > > > index a49b09539776c0056e77f99b10365d0a8747fbc5..91e1f9d4b610275dd833e= c56dc77f76367ee7886 100644 > > > --- a/gcc/tree-vect-patterns.cc > > > +++ b/gcc/tree-vect-patterns.cc > > > @@ -770,6 +770,89 @@ vect_split_statement (vec_info *vinfo, stmt_vec_= info stmt2_info, tree new_rhs, > > > } > > > } > > > > > > +/* Look for the following pattern > > > + X =3D x[i] > > > + Y =3D y[i] > > > + DIFF =3D X - Y > > > + DAD =3D ABS_EXPR > > > + */ > > > +static bool > > > +vect_recog_absolute_difference (vec_info *vinfo, gassign *abs_stmt, > > > + tree *half_type, bool reject_unsigned= , > > > + vect_unpromoted_value unprom[2], > > > + tree diff_oprnds[2]) > > > +{ > > > + if (!abs_stmt) > > > + return false; > > > + > > > + /* FORNOW. Can continue analyzing the def-use chain when this stm= t in a phi > > > + inside the loop (in case we are analyzing an outer-loop). */ > > > + enum tree_code code =3D gimple_assign_rhs_code (abs_stmt); > > > + if (code !=3D ABS_EXPR && code !=3D ABSU_EXPR) > > > + return false; > > > + > > > + tree abs_oprnd =3D gimple_assign_rhs1 (abs_stmt); > > > + tree abs_type =3D TREE_TYPE (abs_oprnd); > > > + if (!abs_oprnd) > > > + return false; > > > + if (reject_unsigned && TYPE_UNSIGNED (abs_type)) > > > + return false; > > > + if (!ANY_INTEGRAL_TYPE_P (abs_type) || TYPE_OVERFLOW_WRAPS (abs_ty= pe)) > > > + return false; > > > + > > > + /* Peel off conversions from the ABS input. This can involve sign > > > + changes (e.g. from an unsigned subtraction to a signed ABS inp= ut) > > > + or signed promotion, but it can't include unsigned promotion. > > > + (Note that ABS of an unsigned promotion should have been folded > > > + away before now anyway.) */ > > > + vect_unpromoted_value unprom_diff; > > > + abs_oprnd =3D vect_look_through_possible_promotion (vinfo, abs_opr= nd, > > > + &unprom_diff); > > > + if (!abs_oprnd) > > > + return false; > > > + if (TYPE_PRECISION (unprom_diff.type) !=3D TYPE_PRECISION (abs_typ= e) > > > + && TYPE_UNSIGNED (unprom_diff.type)) > > > + if (!reject_unsigned) > > > + return false; > > > + > > > + /* We then detect if the operand of abs_expr is defined by a minus= _expr. */ > > > + stmt_vec_info diff_stmt_vinfo =3D vect_get_internal_def (vinfo, ab= s_oprnd); > > > + if (!diff_stmt_vinfo) > > > + return false; > > > + > > > + bool assigned_oprnds =3D false; > > > + gassign *diff =3D dyn_cast (STMT_VINFO_STMT (diff_stmt= _vinfo)); > > > + if (diff_oprnds && diff && gimple_assign_rhs_code (diff) =3D=3D MI= NUS_EXPR) > > > + { > > > + assigned_oprnds =3D true; > > > + diff_oprnds[0] =3D gimple_assign_rhs1 (diff); > > > + diff_oprnds[1] =3D gimple_assign_rhs2 (diff); > > > + } > > > + > > > + /* FORNOW. Can continue analyzing the def-use chain when this stm= t in a phi > > > + inside the loop (in case we are analyzing an outer-loop). */ > > > + if (vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, > > > + WIDEN_MINUS_EXPR, > > > + false, 2, unprom, half_type)) > > > + { > > > + if (diff_oprnds && !assigned_oprnds) > > > + { > > > + diff_oprnds[0] =3D unprom[0].op; > > > + diff_oprnds[1] =3D unprom[1].op; > > > + } > > > + } > > > + else if (!assigned_oprnds) > > > + { > > > + return false; > > > + } > > > + else > > > + { > > > + *half_type =3D NULL_TREE; > > > + } > > > + > > > + return true; > > > +} > > > + > > > /* Convert UNPROM to TYPE and return the result, adding new statemen= ts > > > to STMT_INFO's pattern definition statements if no better way is > > > available. VECTYPE is the vector form of TYPE. > > > @@ -1308,40 +1391,13 @@ vect_recog_sad_pattern (vec_info *vinfo, > > > /* FORNOW. Can continue analyzing the def-use chain when this stm= t in a phi > > > inside the loop (in case we are analyzing an outer-loop). */ > > > gassign *abs_stmt =3D dyn_cast (abs_stmt_vinfo->stmt); > > > - if (!abs_stmt > > > - || (gimple_assign_rhs_code (abs_stmt) !=3D ABS_EXPR > > > - && gimple_assign_rhs_code (abs_stmt) !=3D ABSU_EXPR)) > > > - return NULL; > > > - > > > - tree abs_oprnd =3D gimple_assign_rhs1 (abs_stmt); > > > - tree abs_type =3D TREE_TYPE (abs_oprnd); > > > - if (TYPE_UNSIGNED (abs_type)) > > > - return NULL; > > > - > > > - /* Peel off conversions from the ABS input. This can involve sign > > > - changes (e.g. from an unsigned subtraction to a signed ABS inpu= t) > > > - or signed promotion, but it can't include unsigned promotion. > > > - (Note that ABS of an unsigned promotion should have been folded > > > - away before now anyway.) */ > > > - vect_unpromoted_value unprom_diff; > > > - abs_oprnd =3D vect_look_through_possible_promotion (vinfo, abs_opr= nd, > > > - &unprom_diff); > > > - if (!abs_oprnd) > > > - return NULL; > > > - if (TYPE_PRECISION (unprom_diff.type) !=3D TYPE_PRECISION (abs_typ= e) > > > - && TYPE_UNSIGNED (unprom_diff.type)) > > > - return NULL; > > > > > > - /* We then detect if the operand of abs_expr is defined by a minus= _expr. */ > > > - stmt_vec_info diff_stmt_vinfo =3D vect_get_internal_def (vinfo, ab= s_oprnd); > > > - if (!diff_stmt_vinfo) > > > + vect_unpromoted_value unprom[2]; > > > + if (!vect_recog_absolute_difference (vinfo, abs_stmt, &half_type, > > > + true, unprom, NULL)) > > > return NULL; > > > > > > - /* FORNOW. Can continue analyzing the def-use chain when this stm= t in a phi > > > - inside the loop (in case we are analyzing an outer-loop). */ > > > - vect_unpromoted_value unprom[2]; > > > - if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, WID= EN_MINUS_EXPR, > > > - false, 2, unprom, &half_type)) > > > + if (!half_type) > > > return NULL; > > > > > > vect_pattern_detected ("vect_recog_sad_pattern", last_stmt); > > > @@ -1363,6 +1419,137 @@ vect_recog_sad_pattern (vec_info *vinfo, > > > return pattern_stmt; > > > } > > > > > > +/* Function vect_recog_abd_pattern > > > + > > > + Try to find the following ABsolute Difference (ABD) pattern: > > > + > > > + VTYPE x, y, out; > > > + type diff; > > > + loop i in range: > > > + S1 diff =3D x[i] - y[i] > > > + S2 out[i] =3D ABS_EXPR ; > > > + > > > + where 'type' is a integer and 'VTYPE' is a vector of integers > > > + the same size as 'type' > > > + > > > + Input: > > > + > > > + * STMT_VINFO: The stmt from which the pattern search begins > > > + > > > + Output: > > > + > > > + * TYPE_out: The type of the output of this pattern > > > + > > > + * Return value: A new stmt that will be used to replace the seque= nce of > > > + stmts that constitute the pattern; either SABD or UABD: > > > + SABD_EXPR > > > + UABD_EXPR > > > + > > > + UABD expressions are used when the input types are > > > + narrower than the output types or the output type is narrower > > > + than 32 bits > > > + */ > > > + > > > +static gimple * > > > +vect_recog_abd_pattern (vec_info *vinfo, > > > + stmt_vec_info stmt_vinfo, tree *type_out) > > > +{ > > > + /* Look for the following patterns > > > + X =3D x[i] > > > + Y =3D y[i] > > > + DIFF =3D X - Y > > > + DAD =3D ABS_EXPR > > > + out[i] =3D DAD > > > + > > > + In which > > > + - X, Y, DIFF, DAD all have the same type > > > + - x, y, out are all vectors of the same type > > > + */ > > > + gassign *last_stmt =3D dyn_cast (STMT_VINFO_STMT (stmt= _vinfo)); > > > + if (!last_stmt) > > > + return NULL; > > > + > > > + tree out_type =3D TREE_TYPE (gimple_assign_lhs (last_stmt)); > > > + > > > + gassign *abs_stmt =3D last_stmt; > > > + if (gimple_assign_cast_p (last_stmt)) > > > + { > > > + tree last_rhs =3D gimple_assign_rhs1 (last_stmt); > > > + if (!SSA_VAR_P (last_rhs)) > > > + return NULL; > > > + > > > + abs_stmt =3D dyn_cast (SSA_NAME_DEF_STMT (last_rhs))= ; > > > + if (!abs_stmt) > > > + return NULL; > > > + } > > > + > > > + vect_unpromoted_value unprom[2]; > > > + tree diff_oprnds[2]; > > > + tree half_type; > > > + if (!vect_recog_absolute_difference (vinfo, abs_stmt, &half_type, > > > + false, unprom, diff_oprnds)) > > > + return NULL; > > > + > > > +#define SAME_TYPE(A, B) (TYPE_PRECISION (A) =3D=3D TYPE_PRECISION (B= )) > > > + > > > + tree abd_oprnds[2]; > > > + if (half_type) > > > + { > > > + if (!SAME_TYPE (unprom[0].type, unprom[1].type)) > > > + return NULL; > > > + > > > + tree diff_type =3D TREE_TYPE (diff_oprnds[0]); > > > + if (TYPE_PRECISION (out_type) !=3D TYPE_PRECISION (diff_type)) > > > + { > > > + vect_convert_inputs (vinfo, stmt_vinfo, 2, abd_oprnds, half_ty= pe, unprom, > > > + get_vectype_for_scalar_type (vinfo, half_t= ype)); > > > + } > > > + else > > > + { > > > + abd_oprnds[0] =3D diff_oprnds[0]; > > > + abd_oprnds[1] =3D diff_oprnds[1]; > > > + } > > > + } > > > + else > > > + { > > > + if (unprom[0].op && unprom[1].op > > > + && (!SAME_TYPE (unprom[0].type, unprom[1].type) > > > + || !SAME_TYPE (unprom[0].type, out_type))) > > > + return NULL; > > > + > > > + unprom[0].op =3D diff_oprnds[0]; > > > + unprom[1].op =3D diff_oprnds[1]; > > > + tree signed_out =3D signed_type_for (out_type); > > > + tree signed_out_vectype =3D get_vectype_for_scalar_type (vinfo, = signed_out); > > > + vect_convert_inputs (vinfo, stmt_vinfo, 2, abd_oprnds, > > > + signed_out, unprom, signed_out_vectype); > > > + > > > + if (!SAME_TYPE (TREE_TYPE (diff_oprnds[0]), TREE_TYPE (abd_oprnd= s[0]))) > > > + return NULL; > > > + } > > > + > > > + if (!SAME_TYPE (TREE_TYPE (abd_oprnds[0]), TREE_TYPE (abd_oprnds[1= ])) > > > + || !SAME_TYPE (TREE_TYPE (abd_oprnds[0]), out_type)) > > > + return NULL; > > > + > > > + vect_pattern_detected ("vect_recog_abd_pattern", last_stmt); > > > + > > > + tree vectype =3D get_vectype_for_scalar_type (vinfo, out_type); > > > + if (!vectype > > > + || !direct_internal_fn_supported_p (IFN_ABD, vectype, > > > + OPTIMIZE_FOR_SPEED)) > > > + return NULL; > > > + > > > + *type_out =3D STMT_VINFO_VECTYPE (stmt_vinfo); > > > + > > > + tree var =3D vect_recog_temp_ssa_var (out_type, NULL); > > > + gcall *abd_stmt =3D gimple_build_call_internal (IFN_ABD, 2, > > > + abd_oprnds[0], abd_op= rnds[1]); > > > + gimple_call_set_lhs (abd_stmt, var); > > > + gimple_set_location (abd_stmt, gimple_location (last_stmt)); > > > + return abd_stmt; > > > +} > > > + > > > /* Recognize an operation that performs ORIG_CODE on widened inputs, > > > so that it can be treated as though it had the form: > > > > > > @@ -6439,6 +6626,7 @@ struct vect_recog_func > > > static vect_recog_func vect_vect_recog_func_ptrs[] =3D { > > > { vect_recog_bitfield_ref_pattern, "bitfield_ref" }, > > > { vect_recog_bit_insert_pattern, "bit_insert" }, > > > + { vect_recog_abd_pattern, "abd" }, > > > { vect_recog_over_widening_pattern, "over_widening" }, > > > /* Must come after over_widening, which narrows the shift as much = as > > > possible beforehand. */ > > > -- > > > 2.25.1