From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=zAqF=6G=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 22FF63858D28
	for <gcc-patches@gcc.gnu.org>; Fri, 10 Feb 2023 14:10:19 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 22FF63858D28
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 499444B3;
	Fri, 10 Feb 2023 06:11:01 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C1C9F3F71E;
	Fri, 10 Feb 2023 06:10:17 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina <tamar.christina@arm.com>
Mail-Followup-To: Tamar Christina <tamar.christina@arm.com>,gcc-patches@gcc.gnu.org,  nd@arm.com,  Richard.Earnshaw@arm.com,  Marcus.Shawcroft@arm.com,  Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com
Cc: gcc-patches@gcc.gnu.org,  nd@arm.com,  Richard.Earnshaw@arm.com,  Marcus.Shawcroft@arm.com,  Kyrylo.Tkachov@arm.com
Subject: Re: [PATCH 2/2]AArch64 Update div-bitmask to implement new optab instead of target hook [PR108583]
References: <Y+UrwQSz3hWz+Puo@arm.com>
Date: Fri, 10 Feb 2023 14:10:16 +0000
In-Reply-To: <Y+UrwQSz3hWz+Puo@arm.com> (Tamar Christina's message of "Thu, 9
	Feb 2023 17:22:09 +0000")
Message-ID: <mpt5yc94nnb.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-33.5 required=5.0 tests=BAYES_00,BODY_8BITS,GIT_PATCH_0,KAM_ASCII_DIVIDERS,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_LOTSOFHASH,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

I was asking in the 1/2 review whether we need the optab, but that
decision doesn't affect the other patterns, so:

Tamar Christina <tamar.christina@arm.com> writes:
> Hi All,
>
> This replaces the custom division hook with just an implementation through
> add_highpart.  For NEON we implement the add highpart (Addition + extract=
ion of
> the upper highpart of the register in the same precision) as ADD + LSR.
>
> This representation allows us to easily optimize the sequence using exist=
ing
> sequences. This gets us a pretty decent sequence using SRA:
>
>         umull   v1.8h, v0.8b, v3.8b
>         umull2  v0.8h, v0.16b, v3.16b
>         add     v5.8h, v1.8h, v2.8h
>         add     v4.8h, v0.8h, v2.8h
>         usra    v1.8h, v5.8h, 8
>         usra    v0.8h, v4.8h, 8
>         uzp2    v1.16b, v1.16b, v0.16b
>
> To get the most optimal sequence however we match (a + ((b + c) >> n)) wh=
ere n
> is half the precision of the mode of the operation into addhn + uaddw whi=
ch is
> a general good optimization on its own and gets us back to:
>
> .L4:
>         ldr     q0, [x3]
>         umull   v1.8h, v0.8b, v5.8b
>         umull2  v0.8h, v0.16b, v5.16b
>         addhn   v3.8b, v1.8h, v4.8h
>         addhn   v2.8b, v0.8h, v4.8h
>         uaddw   v1.8h, v1.8h, v3.8b
>         uaddw   v0.8h, v0.8h, v2.8b
>         uzp2    v1.16b, v1.16b, v0.16b
>         str     q1, [x3], 16
>         cmp     x3, x4
>         bne     .L4
>
> For SVE2 we optimize the initial sequence to the same ADD + LSR which get=
s us:
>
> .L3:
>         ld1b    z0.h, p0/z, [x0, x3]
>         mul     z0.h, p1/m, z0.h, z2.h
>         add     z1.h, z0.h, z3.h
>         usra    z0.h, z1.h, #8
>         lsr     z0.h, z0.h, #8
>         st1b    z0.h, p0, [x0, x3]
>         inch    x3
>         whilelo p0.h, w3, w2
>         b.any   .L3
> .L1:
>         ret
>
> and to get the most optimal sequence I match (a + b) >> n (same constrain=
t on n)
> to addhnb which gets us to:
>
> .L3:
>         ld1b    z0.h, p0/z, [x0, x3]
>         mul     z0.h, p1/m, z0.h, z2.h
>         addhnb  z1.b, z0.h, z3.h
>         addhnb  z0.b, z0.h, z1.h
>         st1b    z0.h, p0, [x0, x3]
>         inch    x3
>         whilelo p0.h, w3, w2
>         b.any   .L3
>
> There are multiple RTL representations possible for these optimizations, =
I did
> not represent them using a zero_extend because we seem very inconsistent =
in this
> in the backend.  Since they are unspecs we won't match them from vector o=
ps
> anyway. I figured maintainers would prefer this, but my maintainer ouija =
board
> is still out for repairs :)

I agree this is the best approach as things stand.  Personally, I'd like
to have some way for the target to define simplification rules based on
unspecs, so that unspecs act more like target-specific rtl codes.  But I
know others disagree, and it wouldn't really apply to this case anyway.

> There are no new test as new correctness tests were added to the mid-end =
and
> the existing codegen tests for this already exist.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and <on-going> issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	PR target/108583
> 	* config/aarch64/aarch64-simd.md (@aarch64_bitmask_udiv<mode>3): Remove.
> 	(<su>add<mode>3_highpart, *bitmask_shift_plus<mode>): New.
> 	* config/aarch64/aarch64-sve2.md (<su>add<mode>3_highpart,
> 	*bitmask_shift_plus<mode>): New.
> 	(@aarch64_bitmask_udiv<mode>3): Remove.
> 	* config/aarch64/aarch64.cc
> 	(aarch64_vectorize_can_special_div_by_constant): Removed.
> 	* config/aarch64/iterators.md (UNSPEC_SADD_HIGHPART,
> 	UNSPEC_UADD_HIGHPART, ADD_HIGHPART): New.
>
> --- inline copy of patch --=20
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarc=
h64-simd.md
> index 7f212bf37cd2c120dceb7efa733c9fa76226f029..26871a56d1fdb134f0ad9d828=
ce68a8df0272c53 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -4867,62 +4867,48 @@ (define_expand "aarch64_<sur><addsub>hn2<mode>"
>    }
>  )
>=20=20
> -;; div optimizations using narrowings
> -;; we can do the division e.g. shorts by 255 faster by calculating it as
> -;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in
> -;; double the precision of x.
> -;;
> -;; If we imagine a short as being composed of two blocks of bytes then
> -;; adding 257 or 0b0000_0001_0000_0001 to the number is equivalent to
> -;; adding 1 to each sub component:
> -;;
> -;;      short value of 16-bits
> -;; =E2=94=8C=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=AC=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=90
> -;; =E2=94=82              =E2=94=82                =E2=94=82
> -;; =E2=94=94=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=B4=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=98
> -;;   8-bit part1 =E2=96=B2  8-bit part2   =E2=96=B2
> -;;               =E2=94=82                =E2=94=82
> -;;               =E2=94=82                =E2=94=82
> -;;              +1               +1
> -;;
> -;; after the first addition, we have to shift right by 8, and narrow the
> -;; results back to a byte.  Remember that the addition must be done in
> -;; double the precision of the input.  Since 8 is half the size of a sho=
rt
> -;; we can use a narrowing halfing instruction in AArch64, addhn which al=
so
> -;; does the addition in a wider precision and narrows back to a byte.  T=
he
> -;; shift itself is implicit in the operation as it writes back only the =
top
> -;; half of the result. i.e. bits 2*esize-1:esize.
> -;;
> -;; Since we have narrowed the result of the first part back to a byte, f=
or
> -;; the second addition we can use a widening addition, uaddw.
> -;;
> -;; For the final shift, since it's unsigned arithmetic we emit an ushr b=
y 8.
> -;;
> -;; The shift is later optimized by combine to a uzp2 with movi #0.
> -(define_expand "@aarch64_bitmask_udiv<mode>3"
> -  [(match_operand:VQN 0 "register_operand")
> -   (match_operand:VQN 1 "register_operand")
> -   (match_operand:VQN 2 "immediate_operand")]
> +;; Implement add_highpart as ADD + RSHIFT, we have various optimization =
for
> +;; narrowing represented as shifts and so this representation will allow=
 us to
> +;; further optimize this should the result require narrowing. The altern=
ative
> +;; representation of ADDHN + UXTL is less efficient and harder to futher
> +;; optimize.
> +(define_expand "<su>add<mode>3_highpart"
> +  [(set (match_operand:VQN 0 "register_operand")
> +	(unspec:VQN [(match_operand:VQN 1 "register_operand")
> +		     (match_operand:VQN 2 "register_operand")]
> +		    ADD_HIGHPART))]
> +  "TARGET_SIMD"
> +{
> +  rtx result =3D gen_reg_rtx (<MODE>mode);
> +  int shift_amount =3D GET_MODE_UNIT_BITSIZE (<MODE>mode) / 2;
> +  rtx shift_vector =3D aarch64_simd_gen_const_vector_dup (<MODE>mode,
> +							shift_amount);
> +  emit_insn (gen_add<mode>3 (result, operands[1], operands[2]));
> +  emit_insn (gen_aarch64_simd_lshr<mode> (operands[0], result, shift_vec=
tor));
> +  DONE;
> +})
> +
> +;; Optimize ((a + b) >> n) + c where n is half the bitsize of the vector
> +(define_insn_and_split "*bitmask_shift_plus<mode>"
> +  [(set (match_operand:VQN 0 "register_operand" "=3Dw")
> +	(plus:VQN
> +	  (lshiftrt:VQN
> +	    (plus:VQN (match_operand:VQN 1 "register_operand" "w")
> +		      (match_operand:VQN 2 "register_operand" "w"))
> +	    (match_operand:VQN 3 "aarch64_simd_shift_imm_vec_exact_top" "Dr"))
> +	  (match_operand:VQN 4 "register_operand" "w")))]
>    "TARGET_SIMD"
> +  "#"
> +  "&& !reload_completed"

This is an ICE trap, since "#" forces a split while "!reload_completed"
prevents one after reload.

I think the theoretically correct way would be to use operand 0 as a
temporary when reload_completed, which in turn means making it an
earlyclobber.

However, IIUC, this pattern would only be formed from combining
three distinct patterns.  Is that right?  If so, we should be able
to handle it as a plain define_split, with no define_insn.
That should make things simpler, so would be worth trying before
the changes I mentioned above.

> +  [(const_int 0)]
>  {
> -  unsigned HOST_WIDE_INT size
> -    =3D (1ULL << GET_MODE_UNIT_BITSIZE (<VNARROWQ>mode)) - 1;
> -  rtx elt =3D unwrap_const_vec_duplicate (operands[2]);
> -  if (!CONST_INT_P (elt) || UINTVAL (elt) !=3D size)
> -    FAIL;
> -
> -  rtx addend =3D gen_reg_rtx (<MODE>mode);
> -  rtx val =3D aarch64_simd_gen_const_vector_dup (<VNARROWQ2>mode, 1);
> -  emit_move_insn (addend, lowpart_subreg (<MODE>mode, val, <VNARROWQ2>mo=
de));
> -  rtx tmp1 =3D gen_reg_rtx (<VNARROWQ>mode);
> -  rtx tmp2 =3D gen_reg_rtx (<MODE>mode);
> -  emit_insn (gen_aarch64_addhn<mode> (tmp1, operands[1], addend));
> -  unsigned bitsize =3D GET_MODE_UNIT_BITSIZE (<VNARROWQ>mode);
> -  rtx shift_vector =3D aarch64_simd_gen_const_vector_dup (<MODE>mode, bi=
tsize);
> -  emit_insn (gen_aarch64_uaddw<Vnarrowq> (tmp2, operands[1], tmp1));
> -  emit_insn (gen_aarch64_simd_lshr<mode> (operands[0], tmp2, shift_vecto=
r));
> +  rtx tmp =3D gen_reg_rtx (<VNARROWQ>mode);
> +  emit_insn (gen_aarch64_addhn<mode> (tmp, operands[1], operands[2]));
> +  emit_insn (gen_aarch64_uaddw<Vnarrowq> (operands[0], operands[4], tmp)=
);
>    DONE;
> -})
> +}
> +  [(set_attr "type" "neon_add_halve<q>")]

I think we should leave this out, since it's a multi-instruction pattern.

> +)
>=20=20
>  ;; pmul.
>=20=20
> diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarc=
h64-sve2.md
> index 40c0728a7e6f00c395c360ce7625bc2e4a018809..ad01c1ddf9257cec951ed0c16=
558a3c4d856813b 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
> +++ b/gcc/config/aarch64/aarch64-sve2.md
> @@ -2317,39 +2317,51 @@ (define_insn "@aarch64_sve_<optab><mode>"
>  ;; ---- [INT] Misc optab implementations
>  ;; ---------------------------------------------------------------------=
----
>  ;; Includes:
> -;; - aarch64_bitmask_udiv
> +;; - add_highpart
>  ;; ---------------------------------------------------------------------=
----
>=20=20
> -;; div optimizations using narrowings
> -;; we can do the division e.g. shorts by 255 faster by calculating it as
> -;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in
> -;; double the precision of x.
> -;;
> -;; See aarch64-simd.md for bigger explanation.
> -(define_expand "@aarch64_bitmask_udiv<mode>3"
> -  [(match_operand:SVE_FULL_HSDI 0 "register_operand")
> -   (match_operand:SVE_FULL_HSDI 1 "register_operand")
> -   (match_operand:SVE_FULL_HSDI 2 "immediate_operand")]
> +;; Implement add_highpart as ADD + RSHIFT, we have various optimization =
for
> +;; narrowing represented as shifts and so this representation will allow=
 us to
> +;; further optimize this should the result require narrowing. The altern=
ative
> +;; representation of ADDHN + UXTL is less efficient and harder to futher
> +;; optimize.
> +(define_expand "<su>add<mode>3_highpart"
> +  [(set (match_operand:SVE_FULL_HSDI 0 "register_operand")
> +	(unspec:SVE_FULL_HSDI
> +	  [(match_operand:SVE_FULL_HSDI 1 "register_operand")
> +	   (match_operand:SVE_FULL_HSDI 2 "register_operand")]
> +	  ADD_HIGHPART))]
>    "TARGET_SVE2"
>  {
> -  unsigned HOST_WIDE_INT size
> -    =3D (1ULL << GET_MODE_UNIT_BITSIZE (<VNARROW>mode)) - 1;
> -  rtx elt =3D unwrap_const_vec_duplicate (operands[2]);
> -  if (!CONST_INT_P (elt) || UINTVAL (elt) !=3D size)
> -    FAIL;
> +  rtx result =3D gen_reg_rtx (<MODE>mode);
> +  int shift_amount =3D GET_MODE_UNIT_BITSIZE (<MODE>mode) / 2;
> +  rtx shift_vector =3D aarch64_simd_gen_const_vector_dup (<MODE>mode,
> +							shift_amount);
> +  emit_insn (gen_add<mode>3 (result, operands[1], operands[2]));
> +  emit_insn (gen_vlshr<mode>3 (operands[0], result, shift_vector));
> +  DONE;
> +})
>=20=20
> -  rtx addend =3D gen_reg_rtx (<MODE>mode);
> +;; Optimize ((a + b) >> n) where n is half the bitsize of the vector
> +(define_insn_and_split "*bitmask_shift_plus<mode>"
> +  [(set (match_operand:SVE_FULL_HSDI 0 "register_operand" "=3Dw")
> +	(unspec:SVE_FULL_HSDI [
> +	    (match_operand:<VPRED> 1 "register_operand" "Upl")

Looks like this can be:

  (match_operand:<VPRED> 1)

since the predicate isn't used.

> +	    (lshiftrt:SVE_FULL_HSDI
> +	      (plus:SVE_FULL_HSDI
> +		(match_operand:SVE_FULL_HSDI 2 "register_operand" "w")
> +		(match_operand:SVE_FULL_HSDI 3 "register_operand" "w"))
> +	      (match_operand:SVE_FULL_HSDI 4 "aarch64_simd_shift_imm_vec_exact_=
top" "Dr"))
> +        ] UNSPEC_PRED_X))]

Very minor nit, but the formatting used in the file follows the style
in the earlier pattern above, with [ immediately before ( and ]
immediately after ).  Not that that's inherently better or anything,
it's just a consistency thing.

> +  "TARGET_SVE2"
> +  "#"
> +  "&& !reload_completed"
> +  [(const_int 0)]
> +{
>    rtx tmp1 =3D gen_reg_rtx (<VNARROW>mode);
> -  rtx tmp2 =3D gen_reg_rtx (<VNARROW>mode);
> -  rtx val =3D aarch64_simd_gen_const_vector_dup (<VNARROW>mode, 1);
> -  emit_move_insn (addend, lowpart_subreg (<MODE>mode, val, <VNARROW>mode=
));
> -  emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, <MODE>mode, tmp1, operands[=
1],
> -			      addend));
> -  emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, <MODE>mode, tmp2, operands[=
1],
> -			      lowpart_subreg (<MODE>mode, tmp1,
> -					      <VNARROW>mode)));
> +  emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, <MODE>mode, tmp1, operands[=
2], operands[3]));
>    emit_move_insn (operands[0],
> -		  lowpart_subreg (<MODE>mode, tmp2, <VNARROW>mode));
> +		  lowpart_subreg (<MODE>mode, tmp1, <VNARROW>mode));
>    DONE;
>  })

Since this is a single instruction, I'm not sure it's worth splitting it.
Perhaps there would be CSE opportunities to having a single form,
but it seems unlikely.  And doing the unsplit form is nice and safe.

But yeah, generating the patterns this way seems like a good approach.
It might even help optimise open-coded versions of the same trick.

Thanks,
Richard


>=20=20
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index e6f47cbbb0d04a6f33b9a741ebb614cabd0204b9..8a04feb29e6bfb423a09dde2c=
d64853e69d0e1ba 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -24363,46 +24363,6 @@ aarch64_vectorize_vec_perm_const (machine_mode v=
mode, machine_mode op_mode,
>=20=20
>    return ret;
>  }
> -
> -/* Implement TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST.  */
> -
> -bool
> -aarch64_vectorize_can_special_div_by_constant (enum tree_code code,
> -					       tree vectype, wide_int cst,
> -					       rtx *output, rtx in0, rtx in1)
> -{
> -  if (code !=3D TRUNC_DIV_EXPR
> -      || !TYPE_UNSIGNED (vectype))
> -    return false;
> -
> -  machine_mode mode =3D TYPE_MODE (vectype);
> -  unsigned int flags =3D aarch64_classify_vector_mode (mode);
> -  if ((flags & VEC_ANY_SVE) && !TARGET_SVE2)
> -    return false;
> -
> -  int pow =3D wi::exact_log2 (cst + 1);
> -  auto insn_code =3D maybe_code_for_aarch64_bitmask_udiv3 (TYPE_MODE (ve=
ctype));
> -  /* SVE actually has a div operator, we may have gotten here through
> -     that route.  */
> -  if (pow !=3D (int) (element_precision (vectype) / 2)
> -      || insn_code =3D=3D CODE_FOR_nothing)
> -    return false;
> -
> -  /* We can use the optimized pattern.  */
> -  if (in0 =3D=3D NULL_RTX && in1 =3D=3D NULL_RTX)
> -    return true;
> -
> -  gcc_assert (output);
> -
> -  expand_operand ops[3];
> -  create_output_operand (&ops[0], *output, mode);
> -  create_input_operand (&ops[1], in0, mode);
> -  create_fixed_operand (&ops[2], in1);
> -  expand_insn (insn_code, 3, ops);
> -  *output =3D ops[0].value;
> -  return true;
> -}
> -
>  /* Generate a byte permute mask for a register of mode MODE,
>     which has NUNITS units.  */
>=20=20
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterato=
rs.md
> index 6cbc97cc82c06a68259bdf4dec8a0eab230081e5..ae627ae56cbd1e8b882e596db=
a974e74ef396e0e 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -750,6 +750,8 @@ (define_c_enum "unspec"
>      UNSPEC_REVH		; Used in aarch64-sve.md.
>      UNSPEC_REVW		; Used in aarch64-sve.md.
>      UNSPEC_REVBHW	; Used in aarch64-sve.md.
> +    UNSPEC_SADD_HIGHPART ; Used in aarch64-sve.md.
> +    UNSPEC_UADD_HIGHPART ; Used in aarch64-sve.md.
>      UNSPEC_SMUL_HIGHPART ; Used in aarch64-sve.md.
>      UNSPEC_UMUL_HIGHPART ; Used in aarch64-sve.md.
>      UNSPEC_FMLA		; Used in aarch64-sve.md.
> @@ -2704,6 +2706,7 @@ (define_int_iterator UNPACK [UNSPEC_UNPACKSHI UNSPE=
C_UNPACKUHI
>=20=20
>  (define_int_iterator UNPACK_UNSIGNED [UNSPEC_UNPACKULO UNSPEC_UNPACKUHI])
>=20=20
> +(define_int_iterator ADD_HIGHPART [UNSPEC_SADD_HIGHPART UNSPEC_UADD_HIGH=
PART])
>  (define_int_iterator MUL_HIGHPART [UNSPEC_SMUL_HIGHPART UNSPEC_UMUL_HIGH=
PART])
>=20=20
>  (define_int_iterator CLAST [UNSPEC_CLASTA UNSPEC_CLASTB])
> @@ -3342,6 +3345,8 @@ (define_int_attr su [(UNSPEC_SADDV "s")
>  		     (UNSPEC_UNPACKUHI "u")
>  		     (UNSPEC_UNPACKSLO "s")
>  		     (UNSPEC_UNPACKULO "u")
> +		     (UNSPEC_SADD_HIGHPART "s")
> +		     (UNSPEC_UADD_HIGHPART "u")
>  		     (UNSPEC_SMUL_HIGHPART "s")
>  		     (UNSPEC_UMUL_HIGHPART "u")
>  		     (UNSPEC_COND_FCVTZS "s")