public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina <Tamar.Christina@arm.com>
Cc: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>,
	 nd <nd@arm.com>,  "rguenther\@suse.de" <rguenther@suse.de>,
	 "jlaw\@ventanamicro.com" <jlaw@ventanamicro.com>
Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]
Date: Mon, 27 Feb 2023 12:11:34 +0000	[thread overview]
Message-ID: <mptfsarux3d.fsf@arm.com> (raw)
In-Reply-To: <VI1PR08MB53255CE1AC5F885D2660E3CFFFAF9@VI1PR08MB5325.eurprd08.prod.outlook.com> (Tamar Christina's message of "Mon, 27 Feb 2023 11:09:35 +0000")

Tamar Christina <Tamar.Christina@arm.com> writes:
> Hi,
>
>> > I avoided open coding it with add and shift because it creates a 4
>> > instructions (and shifts which are typically slow) dependency chain
>> > instead of a load and multiply.  This change, unless the target is
>> > known to optimize it further is unlikely to be beneficial.  And by the
>> > time we get to costing the only alternative is to undo the existing pattern and
>> so you lose the general shift optimization.
>> >
>> > So it seemed unwise to open code as shifts, given the codegen out of
>> > the vectorizer would be degenerate for most targets or one needs the
>> > more complicated route of costing during pattern matching already.
>> 
>> Hmm, OK.  That seems like a cost-model thing though, rather than something
>> that should be exposed through optabs.  And I imagine the open-coded
>> version would still be better than nothing on targets without highpart multiply.
>> 
>> So how about replacing the hook with one that simply asks whether division
>> through highpart multiplication is preferred over the add/shift sequence?
>> (Unfortunately it's not going to be possible to work that out from existing
>> information.)
>
> So this doesn't work for SVE.  For SVE the multiplication widening pass introduces
> FMAs at gimple level.  So in the cases where the operation is fed from a widening
> multiplication we end up generating FMA.  If that was it I could have matched FMA.
>
> But it also pushes the multiplication in the second operand because it no longer has
> a mul to share the results with.
>
> In any case, the gimple code is transformed into
>
> vect__3.8_122 = .MASK_LOAD (_29, 8B, loop_mask_121);
> vect_patt_57.9_123 = (vector([8,8]) unsigned short) vect__3.8_122;
> vect_patt_64.11_127 = .FMA (vect_patt_57.9_123, vect_cst__124, { 257, ... });
> vect_patt_65.12_128 = vect_patt_64.11_127 >> 8;
> vect_patt_66.13_129 = .FMA (vect_patt_57.9_123, vect_cst__124, vect_patt_65.12_128);
> vect_patt_62.14_130 = vect_patt_66.13_129 >> 8;
> vect_patt_68.15_131 = (vector([8,8]) unsigned charD.21) vect_patt_62.14_130;
>
> This transformation is much worse than the original code, it extended the dependency
> chain with another expensive instruction. I can try to correct this in RTL by matching
> FMA and shift and splitting into MUL + ADDHNB and hope CSE takes care of the extra mul.
>
> But this seems like a hack, and it's basically undoing the earlier transformation.  It seems to
> me that the open coding is a bad idea.

Could you post the patch that gives this result?  I'll have a poke around.

Thanks,
Richard

> Do you still want it Richard?
>
> Thanks,
> Tamar
>> 
>> Thanks,
>> Richard
>> 
>> >
>> >>
>> >> Some comments in addition to Richard's:
>> >>
>> >> Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> >> > Hi All,
>> >> >
>> >> > As discussed in the ticket, this replaces the approach for
>> >> > optimizing the div by bitmask operation from a hook into optabs
>> >> > implemented through add_highpart.
>> >> >
>> >> > In order to be able to use this we need to check whether the
>> >> > current precision has enough bits to do the operation without any
>> >> > of the additions
>> >> overflowing.
>> >> >
>> >> > We use range information to determine this and only do the
>> >> > operation if we're sure am overflow won't occur.
>> >> >
>> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and <on-going>
>> >> issues.
>> >> >
>> >> > Ok for master?
>> >> >
>> >> > Thanks,
>> >> > Tamar
>> >> >
>> >> > gcc/ChangeLog:
>> >> >
>> >> > 	PR target/108583
>> >> > 	* doc/tm.texi (TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST):
>> >> Remove.
>> >> > 	* doc/tm.texi.in: Likewise.
>> >> > 	* explow.cc (round_push, align_dynamic_address): Revert previous
>> >> patch.
>> >> > 	* expmed.cc (expand_divmod): Likewise.
>> >> > 	* expmed.h (expand_divmod): Likewise.
>> >> > 	* expr.cc (force_operand, expand_expr_divmod): Likewise.
>> >> > 	* optabs.cc (expand_doubleword_mod,
>> >> expand_doubleword_divmod): Likewise.
>> >> > 	* internal-fn.def (ADDH): New.
>> >> > 	* optabs.def (sadd_highpart_optab, uadd_highpart_optab): New.
>> >> > 	* doc/md.texi: Document them.
>> >> > 	* doc/rtl.texi: Likewise.
>> >> > 	* target.def (can_special_div_by_const): Remove.
>> >> > 	* target.h: Remove tree-core.h include
>> >> > 	* targhooks.cc (default_can_special_div_by_const): Remove.
>> >> > 	* targhooks.h (default_can_special_div_by_const): Remove.
>> >> > 	* tree-vect-generic.cc (expand_vector_operation): Remove hook.
>> >> > 	* tree-vect-patterns.cc (vect_recog_divmod_pattern): Remove hook
>> >> and
>> >> > 	implement new obtab recognition based on range.
>> >> > 	* tree-vect-stmts.cc (vectorizable_operation): Remove hook.
>> >> >
>> >> > gcc/testsuite/ChangeLog:
>> >> >
>> >> > 	PR target/108583
>> >> > 	* gcc.dg/vect/vect-div-bitmask-4.c: New test.
>> >> > 	* gcc.dg/vect/vect-div-bitmask-5.c: New test.
>> >> >
>> >> > --- inline copy of patch --
>> >> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
>> >> >
>> >>
>> 7235d34c4b30949febfa10d5a626ac9358281cfa..02004c4b0f4d88dffe980f
>> 74080
>> >> 3
>> >> > 8595e21af35d 100644
>> >> > --- a/gcc/doc/md.texi
>> >> > +++ b/gcc/doc/md.texi
>> >> > @@ -5668,6 +5668,18 @@ represented in RTL using a
>> >> @code{smul_highpart} RTX expression.
>> >> >  Similar, but the multiplication is unsigned.  This may be
>> >> > represented in RTL using an @code{umul_highpart} RTX expression.
>> >> >
>> >> > +@cindex @code{sadd@var{m}3_highpart} instruction pattern @item
>> >> > +@samp{smul@var{m}3_highpart}
>> >>
>> >> sadd
>> >>
>> >> > +Perform a signed addition of operands 1 and 2, which have mode
>> >> > +@var{m}, and store the most significant half of the product in operand
>> 0.
>> >> > +The least significant half of the product is discarded.  This may
>> >> > +be represented in RTL using a @code{sadd_highpart} RTX expression.
>> >> > +
>> >> > +@cindex @code{uadd@var{m}3_highpart} instruction pattern @item
>> >> > +@samp{uadd@var{m}3_highpart} Similar, but the addition is unsigned.
>> >> > +This may be represented in RTL using an @code{uadd_highpart} RTX
>> >> > +expression.
>> >> > +
>> >> >  @cindex @code{madd@var{m}@var{n}4} instruction pattern  @item
>> >> > @samp{madd@var{m}@var{n}4}  Multiply operands 1 and 2, sign-
>> extend
>> >> > them to mode @var{n}, add diff --git a/gcc/doc/rtl.texi
>> >> > b/gcc/doc/rtl.texi index
>> >> >
>> >>
>> d1380e1eb3ba6b2853686f41f2bf937bfcbed1fe..63a7ef6e566eeea4f14c00
>> 343
>> >> d17
>> >> > 1940ec4222f3 100644
>> >> > --- a/gcc/doc/rtl.texi
>> >> > +++ b/gcc/doc/rtl.texi
>> >> > @@ -2535,6 +2535,17 @@ out in machine mode @var{m}.
>> >> > @code{smul_highpart} returns the high part  of a signed
>> >> > multiplication, @code{umul_highpart} returns the high part  of an
>> >> > unsigned
>> >> multiplication.
>> >> >
>> >> > +@findex sadd_highpart
>> >> > +@findex uadd_highpart
>> >> > +@cindex high-part addition
>> >> > +@cindex addition high part
>> >> > +@item (sadd_highpart:@var{m} @var{x} @var{y}) @itemx
>> >> > +(uadd_highpart:@var{m} @var{x} @var{y}) Represents the high-part
>> >> > +addition of @var{x} and @var{y} carried out in machine mode @var{m}.
>> >> > +@code{sadd_highpart} returns the high part of a signed addition,
>> >> > +@code{uadd_highpart} returns the high part of an unsigned addition.
>> >>
>> >> The patch doesn't add these RTL codes though.
>> >>
>> >> > +
>> >> >  @findex fma
>> >> >  @cindex fused multiply-add
>> >> >  @item (fma:@var{m} @var{x} @var{y} @var{z}) diff --git
>> >> > a/gcc/doc/tm.texi b/gcc/doc/tm.texi index
>> >> >
>> >>
>> c6c891972d1e58cd163b259ba96a599d62326865..3ab2031a336b8758d57
>> 914840
>> >> 17e
>> >> > 6b0d62ab077e 100644
>> >> > --- a/gcc/doc/tm.texi
>> >> > +++ b/gcc/doc/tm.texi
>> >> > @@ -6137,22 +6137,6 @@ instruction pattern.  There is no need for
>> >> > the hook to handle these two  implementation approaches itself.
>> >> >  @end deftypefn
>> >> >
>> >> > -@deftypefn {Target Hook} bool
>> >> > TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST (enum
>> >> @var{tree_code}, tree
>> >> > @var{vectype}, wide_int @var{constant}, rtx *@var{output}, rtx
>> >> > @var{in0}, rtx @var{in1}) -This hook is used to test whether the
>> >> > target has a special method of -division of vectors of type
>> >> > @var{vectype}
>> >> using the value @var{constant}, -and producing a vector of type
>> >> @var{vectype}.  The division -will then not be decomposed by the
>> >> vectorizer and kept as a div.
>> >> > -
>> >> > -When the hook is being used to test whether the target supports a
>> >> > special -divide, @var{in0}, @var{in1}, and @var{output} are all null.
>> >> > When the hook -is being used to emit a division, @var{in0} and
>> >> > @var{in1} are the source -vectors of type @var{vecttype} and
>> >> > @var{output} is the destination vector of -type @var{vectype}.
>> >> > -
>> >> > -Return true if the operation is possible, emitting instructions
>> >> > for it -if rtxes are provided and updating @var{output}.
>> >> > -@end deftypefn
>> >> > -
>> >> >  @deftypefn {Target Hook} tree
>> >> > TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION (unsigned
>> >> @var{code},
>> >> > tree @var{vec_type_out}, tree @var{vec_type_in})  This hook should
>> >> > return the decl of a function that implements the  vectorized
>> >> > variant of the function with the @code{combined_fn} code diff --git
>> >> > a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index
>> >> >
>> >>
>> 613b2534149415f442163d599503efaf423b673b..8790f4e44b98b51ad5d
>> 1efec0
>> >> a3a
>> >> > bccd1c293c7b 100644
>> >> > --- a/gcc/doc/tm.texi.in
>> >> > +++ b/gcc/doc/tm.texi.in
>> >> > @@ -4173,8 +4173,6 @@ address;  but often a machine-dependent
>> >> strategy can generate better code.
>> >> >
>> >> >  @hook TARGET_VECTORIZE_VEC_PERM_CONST
>> >> >
>> >> > -@hook TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST
>> >> > -
>> >> >  @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
>> >> >
>> >> >  @hook TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION
>> >> > diff --git a/gcc/explow.cc b/gcc/explow.cc index
>> >> >
>> >>
>> 83439b32abe1b9aa4b7983eb629804f97486acbd..be9195b33323ee5597fc
>> 212f0
>> >> bef
>> >> > a016eea4573c 100644
>> >> > --- a/gcc/explow.cc
>> >> > +++ b/gcc/explow.cc
>> >> > @@ -1037,7 +1037,7 @@ round_push (rtx size)
>> >> >       TRUNC_DIV_EXPR.  */
>> >> >    size = expand_binop (Pmode, add_optab, size, alignm1_rtx,
>> >> >  		       NULL_RTX, 1, OPTAB_LIB_WIDEN);
>> >> > -  size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL,
>> >> > size, align_rtx,
>> >> > +  size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, size, align_rtx,
>> >> >  			NULL_RTX, 1);
>> >> >    size = expand_mult (Pmode, size, align_rtx, NULL_RTX, 1);
>> >> >
>> >> > @@ -1203,7 +1203,7 @@ align_dynamic_address (rtx target, unsigned
>> >> required_align)
>> >> >  			 gen_int_mode (required_align / BITS_PER_UNIT - 1,
>> >> >  				       Pmode),
>> >> >  			 NULL_RTX, 1, OPTAB_LIB_WIDEN);
>> >> > -  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL,
>> >> > target,
>> >> > +  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
>> >> >  			  gen_int_mode (required_align / BITS_PER_UNIT,
>> >> >  					Pmode),
>> >> >  			  NULL_RTX, 1);
>> >> > diff --git a/gcc/expmed.h b/gcc/expmed.h index
>> >> >
>> >>
>> 0419e2dac85850889ce0bee59515e31a80c582de..4dfe635c22ee49f2dba4c
>> 5364
>> >> 094
>> >> > 1628068f3901 100644
>> >> > --- a/gcc/expmed.h
>> >> > +++ b/gcc/expmed.h
>> >> > @@ -710,9 +710,8 @@ extern rtx expand_shift (enum tree_code,
>> >> > machine_mode, rtx, poly_int64, rtx,  extern rtx maybe_expand_shift
>> >> (enum tree_code, machine_mode, rtx, int, rtx,
>> >> >  			       int);
>> >> >  #ifdef GCC_OPTABS_H
>> >> > -extern rtx expand_divmod (int, enum tree_code, machine_mode, tree,
>> >> tree,
>> >> > -			  rtx, rtx, rtx, int,
>> >> > -			  enum optab_methods = OPTAB_LIB_WIDEN);
>> >> > +extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx,
>> >> rtx,
>> >> > +			  rtx, int, enum optab_methods =
>> >> OPTAB_LIB_WIDEN);
>> >> >  #endif
>> >> >  #endif
>> >> >
>> >> > diff --git a/gcc/expmed.cc b/gcc/expmed.cc index
>> >> >
>> >>
>> 917360199ca56157cf3c3693b65e93cd9d8ed244..1553ea8e31eb6433025a
>> b18a3
>> >> a59
>> >> > c169d3b7692f 100644
>> >> > --- a/gcc/expmed.cc
>> >> > +++ b/gcc/expmed.cc
>> >> > @@ -4222,8 +4222,8 @@ expand_sdiv_pow2 (scalar_int_mode mode,
>> rtx
>> >> op0,
>> >> > HOST_WIDE_INT d)
>> >> >
>> >> >  rtx
>> >> >  expand_divmod (int rem_flag, enum tree_code code, machine_mode
>> >> mode,
>> >> > -	       tree treeop0, tree treeop1, rtx op0, rtx op1, rtx target,
>> >> > -	       int unsignedp, enum optab_methods methods)
>> >> > +	       rtx op0, rtx op1, rtx target, int unsignedp,
>> >> > +	       enum optab_methods methods)
>> >> >  {
>> >> >    machine_mode compute_mode;
>> >> >    rtx tquotient;
>> >> > @@ -4375,17 +4375,6 @@ expand_divmod (int rem_flag, enum
>> tree_code
>> >> > code, machine_mode mode,
>> >> >
>> >> >    last_div_const = ! rem_flag && op1_is_constant ? INTVAL (op1) :
>> >> > 0;
>> >> >
>> >> > -  /* Check if the target has specific expansions for the division.
>> >> > */
>> >> > -  tree cst;
>> >> > -  if (treeop0
>> >> > -      && treeop1
>> >> > -      && (cst = uniform_integer_cst_p (treeop1))
>> >> > -      && targetm.vectorize.can_special_div_by_const (code, TREE_TYPE
>> >> (treeop0),
>> >> > -						     wi::to_wide (cst),
>> >> > -						     &target, op0, op1))
>> >> > -    return target;
>> >> > -
>> >> > -
>> >> >    /* Now convert to the best mode to use.  */
>> >> >    if (compute_mode != mode)
>> >> >      {
>> >> > @@ -4629,8 +4618,8 @@ expand_divmod (int rem_flag, enum
>> tree_code
>> >> code, machine_mode mode,
>> >> >  			    || (optab_handler (sdivmod_optab, int_mode)
>> >> >  				!= CODE_FOR_nothing)))
>> >> >  		      quotient = expand_divmod (0, TRUNC_DIV_EXPR,
>> >> > -						int_mode, treeop0, treeop1,
>> >> > -						op0, gen_int_mode (abs_d,
>> >> > +						int_mode, op0,
>> >> > +						gen_int_mode (abs_d,
>> >> >  							      int_mode),
>> >> >  						NULL_RTX, 0);
>> >> >  		    else
>> >> > @@ -4819,8 +4808,8 @@ expand_divmod (int rem_flag, enum
>> tree_code
>> >> code, machine_mode mode,
>> >> >  				      size - 1, NULL_RTX, 0);
>> >> >  		t3 = force_operand (gen_rtx_MINUS (int_mode, t1, nsign),
>> >> >  				    NULL_RTX);
>> >> > -		t4 = expand_divmod (0, TRUNC_DIV_EXPR, int_mode,
>> >> treeop0,
>> >> > -				    treeop1, t3, op1, NULL_RTX, 0);
>> >> > +		t4 = expand_divmod (0, TRUNC_DIV_EXPR, int_mode, t3,
>> >> op1,
>> >> > +				    NULL_RTX, 0);
>> >> >  		if (t4)
>> >> >  		  {
>> >> >  		    rtx t5;
>> >> > diff --git a/gcc/expr.cc b/gcc/expr.cc index
>> >> >
>> >>
>> 15be1c8db999103bb9e5fa33daa44ae06de5ace8..78d35297e75521633907
>> 8d5b
>> >> 2280
>> >> > c6e277f26d72 100644
>> >> > --- a/gcc/expr.cc
>> >> > +++ b/gcc/expr.cc
>> >> > @@ -8207,17 +8207,16 @@ force_operand (rtx value, rtx target)
>> >> >  	    return expand_divmod (0,
>> >> >  				  FLOAT_MODE_P (GET_MODE (value))
>> >> >  				  ? RDIV_EXPR : TRUNC_DIV_EXPR,
>> >> > -				  GET_MODE (value), NULL, NULL, op1, op2,
>> >> > -				  target, 0);
>> >> > +				  GET_MODE (value), op1, op2, target, 0);
>> >> >  	case MOD:
>> >> > -	  return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value),
>> >> NULL, NULL,
>> >> > -				op1, op2, target, 0);
>> >> > +	  return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value),
>> >> op1, op2,
>> >> > +				target, 0);
>> >> >  	case UDIV:
>> >> > -	  return expand_divmod (0, TRUNC_DIV_EXPR, GET_MODE (value),
>> >> NULL, NULL,
>> >> > -				op1, op2, target, 1);
>> >> > +	  return expand_divmod (0, TRUNC_DIV_EXPR, GET_MODE (value),
>> >> op1, op2,
>> >> > +				target, 1);
>> >> >  	case UMOD:
>> >> > -	  return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value),
>> >> NULL, NULL,
>> >> > -				op1, op2, target, 1);
>> >> > +	  return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value),
>> >> op1, op2,
>> >> > +				target, 1);
>> >> >  	case ASHIFTRT:
>> >> >  	  return expand_simple_binop (GET_MODE (value), code, op1, op2,
>> >> >  				      target, 0, OPTAB_LIB_WIDEN); @@ -
>> >> 9170,13 +9169,11 @@
>> >> > expand_expr_divmod (tree_code code, machine_mode mode, tree
>> >> treeop0,
>> >> >        bool speed_p = optimize_insn_for_speed_p ();
>> >> >        do_pending_stack_adjust ();
>> >> >        start_sequence ();
>> >> > -      rtx uns_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1,
>> >> > -				   op0, op1, target, 1);
>> >> > +      rtx uns_ret = expand_divmod (mod_p, code, mode, op0, op1,
>> >> > + target, 1);
>> >> >        rtx_insn *uns_insns = get_insns ();
>> >> >        end_sequence ();
>> >> >        start_sequence ();
>> >> > -      rtx sgn_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1,
>> >> > -				   op0, op1, target, 0);
>> >> > +      rtx sgn_ret = expand_divmod (mod_p, code, mode, op0, op1,
>> >> > + target, 0);
>> >> >        rtx_insn *sgn_insns = get_insns ();
>> >> >        end_sequence ();
>> >> >        unsigned uns_cost = seq_cost (uns_insns, speed_p); @@
>> >> > -9198,8
>> >> > +9195,7 @@ expand_expr_divmod (tree_code code, machine_mode
>> >> mode, tree treeop0,
>> >> >        emit_insn (sgn_insns);
>> >> >        return sgn_ret;
>> >> >      }
>> >> > -  return expand_divmod (mod_p, code, mode, treeop0, treeop1,
>> >> > -			op0, op1, target, unsignedp);
>> >> > +  return expand_divmod (mod_p, code, mode, op0, op1, target,
>> >> > + unsignedp);
>> >> >  }
>> >> >
>> >> >  rtx
>> >> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index
>> >> >
>> >>
>> 22b4a2d92967076c658965afcaeaf39b449a8caf..2796d3669a0806538052
>> 584f5a
>> >> 3b
>> >> > 8a734baa800f 100644
>> >> > --- a/gcc/internal-fn.def
>> >> > +++ b/gcc/internal-fn.def
>> >> > @@ -174,6 +174,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL,
>> >> ECF_CONST
>> >> > | ECF_NOTHROW, first,
>> >> >
>> >> >  DEF_INTERNAL_SIGNED_OPTAB_FN (MULH, ECF_CONST |
>> >> ECF_NOTHROW, first,
>> >> >  			      smul_highpart, umul_highpart, binary)
>> >> > +DEF_INTERNAL_SIGNED_OPTAB_FN (ADDH, ECF_CONST |
>> >> ECF_NOTHROW, first,
>> >> > +			      sadd_highpart, uadd_highpart, binary)
>> >> >  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST |
>> >> ECF_NOTHROW, first,
>> >> >  			      smulhs, umulhs, binary)
>> >> >  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST |
>> >> ECF_NOTHROW, first,
>> >> > diff --git a/gcc/optabs.cc b/gcc/optabs.cc index
>> >> >
>> >>
>> cf22bfec3f5513f56d22c866231edbf322ff6945..474ccbd7915b4f144cebe03
>> 69a6
>> >> e
>> >> > 77082c1e617b 100644
>> >> > --- a/gcc/optabs.cc
>> >> > +++ b/gcc/optabs.cc
>> >> > @@ -1106,9 +1106,8 @@ expand_doubleword_mod (machine_mode
>> >> mode, rtx op0, rtx op1, bool unsignedp)
>> >> >  		return NULL_RTX;
>> >> >  	    }
>> >> >  	}
>> >> > -      rtx remainder = expand_divmod (1, TRUNC_MOD_EXPR, word_mode,
>> >> NULL, NULL,
>> >> > -				     sum, gen_int_mode (INTVAL (op1),
>> >> > -							word_mode),
>> >> > +      rtx remainder = expand_divmod (1, TRUNC_MOD_EXPR,
>> word_mode,
>> >> sum,
>> >> > +				     gen_int_mode (INTVAL (op1),
>> >> word_mode),
>> >> >  				     NULL_RTX, 1, OPTAB_DIRECT);
>> >> >        if (remainder == NULL_RTX)
>> >> >  	return NULL_RTX;
>> >> > @@ -1211,8 +1210,8 @@ expand_doubleword_divmod (machine_mode
>> >> mode, rtx
>> >> > op0, rtx op1, rtx *rem,
>> >> >
>> >> >    if (op11 != const1_rtx)
>> >> >      {
>> >> > -      rtx rem2 = expand_divmod (1, TRUNC_MOD_EXPR, mode, NULL,
>> NULL,
>> >> quot1,
>> >> > -				op11, NULL_RTX, unsignedp,
>> >> OPTAB_DIRECT);
>> >> > +      rtx rem2 = expand_divmod (1, TRUNC_MOD_EXPR, mode, quot1,
>> >> op11,
>> >> > +				NULL_RTX, unsignedp, OPTAB_DIRECT);
>> >> >        if (rem2 == NULL_RTX)
>> >> >  	return NULL_RTX;
>> >> >
>> >> > @@ -1226,8 +1225,8 @@ expand_doubleword_divmod (machine_mode
>> >> mode, rtx op0, rtx op1, rtx *rem,
>> >> >        if (rem2 == NULL_RTX)
>> >> >  	return NULL_RTX;
>> >> >
>> >> > -      rtx quot2 = expand_divmod (0, TRUNC_DIV_EXPR, mode, NULL,
>> NULL,
>> >> quot1,
>> >> > -				 op11, NULL_RTX, unsignedp,
>> >> OPTAB_DIRECT);
>> >> > +      rtx quot2 = expand_divmod (0, TRUNC_DIV_EXPR, mode, quot1,
>> op11,
>> >> > +				 NULL_RTX, unsignedp, OPTAB_DIRECT);
>> >> >        if (quot2 == NULL_RTX)
>> >> >  	return NULL_RTX;
>> >> >
>> >> > diff --git a/gcc/optabs.def b/gcc/optabs.def index
>> >> >
>> >>
>> 695f5911b300c9ca5737de9be809fa01aabe5e01..77a152ec2d1949deca2c2
>> d7a5
>> >> ccb
>> >> > f6147947351a 100644
>> >> > --- a/gcc/optabs.def
>> >> > +++ b/gcc/optabs.def
>> >> > @@ -265,6 +265,8 @@ OPTAB_D (spaceship_optab, "spaceship$a3")
>> >> >
>> >> >  OPTAB_D (smul_highpart_optab, "smul$a3_highpart")  OPTAB_D
>> >> > (umul_highpart_optab, "umul$a3_highpart")
>> >> > +OPTAB_D (sadd_highpart_optab, "sadd$a3_highpart") OPTAB_D
>> >> > +(uadd_highpart_optab, "uadd$a3_highpart")
>> >> >
>> >> >  OPTAB_D (cmpmem_optab, "cmpmem$a")  OPTAB_D (cmpstr_optab,
>> >> > "cmpstr$a") diff --git a/gcc/target.def b/gcc/target.def index
>> >> >
>> >>
>> db8af0cbe81624513f114fc9bbd8be61d855f409..e0a5c7adbd962f5d08ed0
>> 8d1d
>> >> 81a
>> >> > fa2c2baa64a5 100644
>> >> > --- a/gcc/target.def
>> >> > +++ b/gcc/target.def
>> >> > @@ -1905,25 +1905,6 @@ implementation approaches itself.",
>> >> >  	const vec_perm_indices &sel),
>> >> >   NULL)
>> >> >
>> >> > -DEFHOOK
>> >> > -(can_special_div_by_const,
>> >> > - "This hook is used to test whether the target has a special
>> >> > method of\n\ -division of vectors of type @var{vectype} using the
>> >> > value @var{constant},\n\ -and producing a vector of type
>> >> > @var{vectype}.  The division\n\ -will then not be decomposed by the
>> >> > vectorizer and kept as a div.\n\ -\n\ -When the hook is being used
>> >> > to test whether the target supports a special\n\ -divide,
>> >> > @var{in0}, @var{in1}, and @var{output} are all null.  When the
>> >> > hook\n\ -is being used to emit a division, @var{in0} and @var{in1}
>> >> > are the source\n\ -vectors of type @var{vecttype} and @var{output}
>> >> > is the destination vector of\n\ -type @var{vectype}.\n\ -\n\
>> >> > -Return true if the operation is possible, emitting instructions
>> >> > for it\n\ -if rtxes are provided and updating @var{output}.",
>> >> > - bool, (enum tree_code, tree vectype, wide_int constant, rtx *output,
>> >> > -	rtx in0, rtx in1),
>> >> > - default_can_special_div_by_const)
>> >> > -
>> >> >  /* Return true if the target supports misaligned store/load of a
>> >> >     specific factor denoted in the third parameter.  The last parameter
>> >> >     is true if the access is defined in a packed struct.  */ diff
>> >> > --git a/gcc/target.h b/gcc/target.h index
>> >> >
>> >>
>> 03fd03a52075b4836159035ec14078c0aebdd7e9..93691882757232c514fc
>> a82b9
>> >> 9f9
>> >> > 13158c2d47b1 100644
>> >> > --- a/gcc/target.h
>> >> > +++ b/gcc/target.h
>> >> > @@ -51,7 +51,6 @@
>> >> >  #include "insn-codes.h"
>> >> >  #include "tm.h"
>> >> >  #include "hard-reg-set.h"
>> >> > -#include "tree-core.h"
>> >> >
>> >> >  #if CHECKING_P
>> >> >
>> >> > diff --git a/gcc/targhooks.h b/gcc/targhooks.h index
>> >> >
>> >>
>> a1df260f5483dc84f18d8f12c5202484a32d5bb7..a6a4809ca91baa5d7fad2
>> 24454
>> >> 93
>> >> > 17a31390f0c2 100644
>> >> > --- a/gcc/targhooks.h
>> >> > +++ b/gcc/targhooks.h
>> >> > @@ -209,8 +209,6 @@ extern void default_addr_space_diagnose_usage
>> >> > (addr_space_t, location_t);  extern rtx default_addr_space_convert
>> >> > (rtx, tree, tree);  extern unsigned int
>> >> > default_case_values_threshold (void);  extern bool
>> >> > default_have_conditional_execution (void); -extern bool
>> >> > default_can_special_div_by_const (enum tree_code, tree,
>> >> wide_int,
>> >> > -					      rtx *, rtx, rtx);
>> >> >
>> >> >  extern bool default_libc_has_function (enum function_class, tree);
>> >> > extern bool default_libc_has_fast_function (int fcode); diff --git
>> >> > a/gcc/targhooks.cc b/gcc/targhooks.cc index
>> >> >
>> >>
>> fe0116521feaf32187e7bc113bf93b1805852c79..211525720a620d6f533e2
>> da91e
>> >> 03
>> >> > 877337a931e7 100644
>> >> > --- a/gcc/targhooks.cc
>> >> > +++ b/gcc/targhooks.cc
>> >> > @@ -1840,14 +1840,6 @@ default_have_conditional_execution (void)
>> >> >    return HAVE_conditional_execution;  }
>> >> >
>> >> > -/* Default that no division by constant operations are special.
>> >> > */ -bool -default_can_special_div_by_const (enum tree_code, tree,
>> >> > wide_int, rtx *, rtx,
>> >> > -				  rtx)
>> >> > -{
>> >> > -  return false;
>> >> > -}
>> >> > -
>> >> >  /* By default we assume that c99 functions are present at the runtime,
>> >> >     but sincos is not.  */
>> >> >  bool
>> >> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c
>> >> > b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c
>> >> > new file mode 100644
>> >> > index
>> >> >
>> >>
>> 0000000000000000000000000000000000000000..c81f8946922250234b
>> f759e0a0
>> >> a0
>> >> > 4ea8c1f73e3c
>> >> > --- /dev/null
>> >> > +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c
>> >> > @@ -0,0 +1,25 @@
>> >> > +/* { dg-require-effective-target vect_int } */
>> >> > +
>> >> > +#include <stdint.h>
>> >> > +#include "tree-vect.h"
>> >> > +
>> >> > +typedef unsigned __attribute__((__vector_size__ (16))) V;
>> >> > +
>> >> > +static __attribute__((__noinline__)) __attribute__((__noclone__))
>> >> > +V foo (V v, unsigned short i) {
>> >> > +  v /= i;
>> >> > +  return v;
>> >> > +}
>> >> > +
>> >> > +int
>> >> > +main (void)
>> >> > +{
>> >> > +  V v = foo ((V) { 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff
>> >> > +}, 0xffff);
>> >> > +  for (unsigned i = 0; i < sizeof (v) / sizeof (v[0]); i++)
>> >> > +    if (v[i] != 0x00010001)
>> >> > +      __builtin_abort ();
>> >> > +  return 0;
>> >> > +}
>> >> > +
>> >> > +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern:
>> >> > +detected" "vect" { target aarch64*-*-* } } } */
>> >> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c
>> >> > b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c
>> >> > new file mode 100644
>> >> > index
>> >> >
>> >>
>> 0000000000000000000000000000000000000000..b4eb1a4dacba481e63
>> 06b4991
>> >> 4d2
>> >> > a29b933de625
>> >> > --- /dev/null
>> >> > +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-5.c
>> >> > @@ -0,0 +1,58 @@
>> >> > +/* { dg-require-effective-target vect_int } */
>> >> > +
>> >> > +#include <stdint.h>
>> >> > +#include <stdio.h>
>> >> > +#include "tree-vect.h"
>> >> > +
>> >> > +#define N 50
>> >> > +#define TYPE uint8_t
>> >> > +
>> >> > +#ifndef DEBUG
>> >> > +#define DEBUG 0
>> >> > +#endif
>> >> > +
>> >> > +#define BASE ((TYPE) -1 < 0 ? -126 : 4)
>> >> > +
>> >> > +
>> >> > +__attribute__((noipa, noinline, optimize("O1"))) void fun1(TYPE*
>> >> > +restrict pixel, TYPE level, int n) {
>> >> > +  for (int i = 0; i < n; i+=1)
>> >> > +    pixel[i] = (pixel[i] + level) / 0xff; }
>> >> > +
>> >> > +__attribute__((noipa, noinline, optimize("O3"))) void fun2(TYPE*
>> >> > +restrict pixel, TYPE level, int n) {
>> >> > +  for (int i = 0; i < n; i+=1)
>> >> > +    pixel[i] = (pixel[i] + level) / 0xff; }
>> >> > +
>> >> > +int main ()
>> >> > +{
>> >> > +  TYPE a[N];
>> >> > +  TYPE b[N];
>> >> > +
>> >> > +  for (int i = 0; i < N; ++i)
>> >> > +    {
>> >> > +      a[i] = BASE + i * 13;
>> >> > +      b[i] = BASE + i * 13;
>> >> > +      if (DEBUG)
>> >> > +        printf ("%d: 0x%x\n", i, a[i]);
>> >> > +    }
>> >> > +
>> >> > +  fun1 (a, N / 2, N);
>> >> > +  fun2 (b, N / 2, N);
>> >> > +
>> >> > +  for (int i = 0; i < N; ++i)
>> >> > +    {
>> >> > +      if (DEBUG)
>> >> > +        printf ("%d = 0x%x == 0x%x\n", i, a[i], b[i]);
>> >> > +
>> >> > +      if (a[i] != b[i])
>> >> > +        __builtin_abort ();
>> >> > +    }
>> >> > +  return 0;
>> >> > +}
>> >> > +
>> >> > +/* { dg-final { scan-tree-dump "divmod pattern recognized" "vect"
>> >> > +{ target aarch64*-*-* } } } */
>> >> > diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
>> >> > index
>> >> >
>> >>
>> 166a248f4b9512d4c6fc8d760b458b7a467f7790..519a824ec727d4d4f28c
>> 14077d
>> >> c3
>> >> > e970bed75ef6 100644
>> >> > --- a/gcc/tree-vect-generic.cc
>> >> > +++ b/gcc/tree-vect-generic.cc
>> >> > @@ -1237,17 +1237,6 @@ expand_vector_operation
>> >> (gimple_stmt_iterator *gsi, tree type, tree compute_type
>> >> >  	  tree rhs2 = gimple_assign_rhs2 (assign);
>> >> >  	  tree ret;
>> >> >
>> >> > -	  /* Check if the target was going to handle it through the special
>> >> > -	     division callback hook.  */
>> >> > -	  tree cst = uniform_integer_cst_p (rhs2);
>> >> > -	  if (cst &&
>> >> > -	      targetm.vectorize.can_special_div_by_const (code, type,
>> >> > -							  wi::to_wide (cst),
>> >> > -							  NULL,
>> >> > -							  NULL_RTX,
>> >> NULL_RTX))
>> >> > -	    return NULL_TREE;
>> >> > -
>> >> > -
>> >> >  	  if (!optimize
>> >> >  	      || !VECTOR_INTEGER_TYPE_P (type)
>> >> >  	      || TREE_CODE (rhs2) != VECTOR_CST diff --git
>> >> > a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index
>> >> >
>> >>
>> 6934aebc69f231af24668f0a1c3d140e97f55487..e39d7e6b362ef44eb2fc46
>> 7f33
>> >> 69
>> >> > de2afea139d6 100644
>> >> > --- a/gcc/tree-vect-patterns.cc
>> >> > +++ b/gcc/tree-vect-patterns.cc
>> >> > @@ -3914,12 +3914,82 @@ vect_recog_divmod_pattern (vec_info
>> *vinfo,
>> >> >        return pattern_stmt;
>> >> >      }
>> >> >    else if ((cst = uniform_integer_cst_p (oprnd1))
>> >> > -	   && targetm.vectorize.can_special_div_by_const (rhs_code,
>> >> vectype,
>> >> > -							  wi::to_wide (cst),
>> >> > -							  NULL, NULL_RTX,
>> >> > -							  NULL_RTX))
>> >> > +	   && TYPE_UNSIGNED (itype)
>> >> > +	   && rhs_code == TRUNC_DIV_EXPR
>> >> > +	   && vectype
>> >> > +	   && direct_internal_fn_supported_p (IFN_ADDH, vectype,
>> >> > +					      OPTIMIZE_FOR_SPEED))
>> >> >      {
>> >> > -      return NULL;
>> >> > +      /* div optimizations using narrowings
>> >> > +       we can do the division e.g. shorts by 255 faster by calculating it as
>> >> > +       (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in
>> >> > +       double the precision of x.
>> >> > +
>> >> > +       If we imagine a short as being composed of two blocks of bytes then
>> >> > +       adding 257 or 0b0000_0001_0000_0001 to the number is
>> equivalent to
>> >> > +       adding 1 to each sub component:
>> >> > +
>> >> > +	    short value of 16-bits
>> >> > +       ┌──────────────┬────────────────┐
>> >> > +       │              │                │
>> >> > +       └──────────────┴────────────────┘
>> >> > +	 8-bit part1 ▲  8-bit part2   ▲
>> >> > +		     │                │
>> >> > +		     │                │
>> >> > +		    +1               +1
>> >> > +
>> >> > +       after the first addition, we have to shift right by 8, and narrow the
>> >> > +       results back to a byte.  Remember that the addition must be done in
>> >> > +       double the precision of the input.  However if we know that
>> >> > + the
>> >> addition
>> >> > +       `x + 257` does not overflow then we can do the operation in
>> >> > + the
>> >> current
>> >> > +       precision.  In which case we don't need the pack and unpacks.  */
>> >> > +      auto wcst = wi::to_wide (cst);
>> >> > +      int pow = wi::exact_log2 (wcst + 1);
>> >> > +      if (pow == (int) (element_precision (vectype) / 2))
>> >> > +	{
>> >> > +	  wide_int min,max;
>> >> > +	  /* If we're in a pattern we need to find the orginal definition.  */
>> >> > +	  tree op0 = oprnd0;
>> >> > +	  gimple *stmt = SSA_NAME_DEF_STMT (oprnd0);
>> >> > +	  stmt_vec_info stmt_info = vinfo->lookup_stmt (stmt);
>> >> > +	  if (is_pattern_stmt_p (stmt_info))
>> >> > +	    {
>> >> > +	      auto orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
>> >> > +	      if (is_gimple_assign (STMT_VINFO_STMT (orig_stmt)))
>> >> > +		op0 = gimple_assign_lhs (STMT_VINFO_STMT (orig_stmt));
>> >> > +	    }
>> >>
>> >> If this is generally safe (I'm skipping thinking about it in the
>> >> interests of a quick review :-)), then I think it should be done in
>> >> vect_get_range_info instead.  Using gimple_get_lhs would be more
>> >> general than handling just assignments.
>> >>
>> >> > +
>> >> > +	  /* Check that no overflow will occur.  If we don't have range
>> >> > +	     information we can't perform the optimization.  */
>> >> > +	  if (vect_get_range_info (op0, &min, &max))
>> >> > +	    {
>> >> > +	      wide_int one = wi::to_wide (build_one_cst (itype));
>> >> > +	      wide_int adder = wi::add (one, wi::lshift (one, pow));
>> >> > +	      wi::overflow_type ovf;
>> >> > +	      /* We need adder and max in the same precision.  */
>> >> > +	      wide_int zadder
>> >> > +		= wide_int_storage::from (adder, wi::get_precision (max),
>> >> > +					  UNSIGNED);
>> >> > +	      wi::add (max, zadder, UNSIGNED, &ovf);
>> >>
>> >> Could you explain this a bit more?  When do we have mismatched
>> >> precisions?
>> >
>> > C promotion rules will promote e.g.
>> >
>> > void fun2(uint8_t* restrict pixel, uint8_t level, int n) {
>> >   for (int i = 0; i < n; i+=1)
>> >     pixel[i] = (pixel[i] + level) / 0xff; }
>> >
>> > And have the addition be done as a 32 bit integer.  The vectorizer
>> > will demote this down to a short, but range information is not stored
>> > for patterns.  So In the above the range will correctly be 0x1fe but
>> > the precision will be that of the original expression, so 32.  This
>> > will be a mismatch with itype which is derived from the size the vectorizer
>> will perform the operation in.
>> >
>> > Thanks,
>> > Tamar
>> >
>> >>
>> >> Thanks,
>> >> Richard
>> >>
>> >> > +	      if (ovf == wi::OVF_NONE)
>> >> > +		{
>> >> > +		  *type_out = vectype;
>> >> > +		  tree tadder = wide_int_to_tree (itype, adder);
>> >> > +		  gcall *patt1
>> >> > +		    = gimple_build_call_internal (IFN_ADDH, 2, oprnd0,
>> >> tadder);
>> >> > +		  tree lhs = vect_recog_temp_ssa_var (itype, NULL);
>> >> > +		  gimple_call_set_lhs (patt1, lhs);
>> >> > +		  append_pattern_def_seq (vinfo, stmt_vinfo, patt1,
>> >> vectype);
>> >> > +
>> >> > +		  pattern_stmt
>> >> > +		    = gimple_build_call_internal (IFN_ADDH, 2, oprnd0, lhs);
>> >> > +		  lhs = vect_recog_temp_ssa_var (itype, NULL);
>> >> > +		  gimple_call_set_lhs (pattern_stmt, lhs);
>> >> > +
>> >> > +		  return pattern_stmt;
>> >> > +		}
>> >> > +	    }
>> >> > +	}
>> >> >      }
>> >> >
>> >> >    if (prec > HOST_BITS_PER_WIDE_INT diff --git
>> >> > a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index
>> >> >
>> >>
>> eb4ca1f184e374d177eb43d5eb93acf6e6a8fde9..3a0fb5ad898ad42c3867f0
>> b95
>> >> 64f
>> >> > c4e066e50081 100644
>> >> > --- a/gcc/tree-vect-stmts.cc
>> >> > +++ b/gcc/tree-vect-stmts.cc
>> >> > @@ -6263,15 +6263,6 @@ vectorizable_operation (vec_info *vinfo,
>> >> >  	}
>> >> >        target_support_p = (optab_handler (optab, vec_mode)
>> >> >  			  != CODE_FOR_nothing);
>> >> > -      tree cst;
>> >> > -      if (!target_support_p
>> >> > -	  && op1
>> >> > -	  && (cst = uniform_integer_cst_p (op1)))
>> >> > -	target_support_p
>> >> > -	  = targetm.vectorize.can_special_div_by_const (code, vectype,
>> >> > -							wi::to_wide (cst),
>> >> > -							NULL, NULL_RTX,
>> >> > -							NULL_RTX);
>> >> >      }
>> >> >
>> >> >    bool using_emulated_vectors_p = vect_emulated_vector_p
>> >> > (vectype);

  reply	other threads:[~2023-02-27 12:11 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-09 17:16 Tamar Christina
2023-02-09 17:22 ` [PATCH 2/2]AArch64 Update div-bitmask to implement new optab instead of target hook [PR108583] Tamar Christina
2023-02-10 10:35   ` Tamar Christina
2023-02-10 14:10   ` Richard Sandiford
2023-02-10 10:34 ` [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583] Tamar Christina
2023-02-10 13:13 ` Richard Biener
2023-02-10 13:36 ` Richard Sandiford
2023-02-10 13:52   ` Richard Biener
2023-02-10 14:13   ` Tamar Christina
2023-02-10 14:30     ` Richard Sandiford
2023-02-10 14:54       ` Tamar Christina
2023-02-27 11:09       ` Tamar Christina
2023-02-27 12:11         ` Richard Sandiford [this message]
2023-02-27 12:14           ` Tamar Christina
2023-02-27 21:33             ` Richard Sandiford
2023-02-27 22:10               ` Tamar Christina
2023-02-28 11:08                 ` Richard Sandiford
2023-02-28 11:12                   ` Tamar Christina
2023-02-28 12:03                     ` Richard Sandiford
2023-03-01 11:30                       ` Richard Biener
2023-02-10 15:56     ` Richard Sandiford
2023-02-10 16:09       ` Tamar Christina
2023-02-10 16:25         ` Richard Sandiford
2023-02-10 16:33           ` Tamar Christina
2023-02-10 16:57             ` Richard Sandiford
2023-02-10 17:01               ` Richard Sandiford
2023-02-10 17:14               ` Tamar Christina
2023-02-10 18:12                 ` Richard Sandiford
2023-02-10 18:34                   ` Richard Biener
2023-02-10 20:58                     ` Andrew MacLeod
2023-02-13  9:54                       ` Tamar Christina
2023-02-15 12:51                         ` Tamar Christina
2023-02-15 16:05                           ` Andrew MacLeod
2023-02-15 17:13                             ` Tamar Christina
2023-02-15 17:50                               ` Andrew MacLeod
2023-02-15 18:42                                 ` Andrew MacLeod
2023-02-22 12:51                                   ` Tamar Christina
2023-02-22 16:41                                   ` Andrew MacLeod
2023-02-22 18:03                                     ` Tamar Christina
2023-02-22 18:33                                       ` Andrew MacLeod
2023-02-23  8:36                                         ` Tamar Christina
2023-02-23 16:39                                           ` Andrew MacLeod
2023-02-23 16:56                                             ` Tamar Christina
2023-03-01 16:57                                             ` Andrew Carlotti
2023-03-01 18:16                                               ` Tamar Christina
2023-02-22 13:06                                 ` Tamar Christina
2023-02-22 15:19                                   ` Andrew MacLeod

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptfsarux3d.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=Tamar.Christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jlaw@ventanamicro.com \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).