From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=F6qA=BA=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 1994F3858C54
	for <gcc-patches@gcc.gnu.org>; Thu, 11 May 2023 17:36:27 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1994F3858C54
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 478F5113E;
	Thu, 11 May 2023 10:37:11 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 452F73F67D;
	Thu, 11 May 2023 10:36:26 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: "Roger Sayle" <roger@nextmovesoftware.com>
Mail-Followup-To: "Roger Sayle" <roger@nextmovesoftware.com>,"'GCC Patches'" <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Cc: "'GCC Patches'" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] Add RTX codes for BITREVERSE and COPYSIGN.
References: <00aa01d98039$4c7668e0$e5633aa0$@nextmovesoftware.com>
Date: Thu, 11 May 2023 18:36:25 +0100
In-Reply-To: <00aa01d98039$4c7668e0$e5633aa0$@nextmovesoftware.com> (Roger
	Sayle's message of "Sat, 6 May 2023 17:39:15 +0100")
Message-ID: <mpt1qjmg486.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-29.5 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

"Roger Sayle" <roger@nextmovesoftware.com> writes:
> An analysis of backend UNSPECs reveals that two of the most common UNSPECs
> across target backends are for copysign and bit reversal.  This patch
> adds RTX codes for these expressions to allow their representation to
> be standardized, and them to optimized by the middle-end RTL optimizers.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-32} with
> no new failures.  Ok for mainline?
>
>
> 2023-05-06  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * doc/rtl.texi (bitreverse, copysign): Document new RTX codes.
>         * rtl.def (BITREVERSE, COPYSIGN): Define new RTX codes.
>         * simplify-rtx.cc (simplify_unary_operation_1): Optimize
>         NOT (BITREVERSE x) as BITREVERSE (NOT x).
>         Optimize POPCOUNT (BITREVERSE x) as POPCOUNT x.
>         Optimize PARITY (BITREVERSE x) as PARITY x.
>         Optimize BITREVERSE (BITREVERSE x) as x.
>         (simplify_const_unary_operation) <case BITREVERSE>: Evaluate
>         BITREVERSE of a constant integer at compile-time.
>         (simplify_binary_operation_1) <case COPYSIGN>:  Optimize
>         COPY_SIGN (x, x) as x.  Optimize COPYSIGN (x, C) as ABS x
>         or NEG (ABS x) for constant C.  Optimize COPYSIGN (ABS x, y)
>         and COPYSIGN (NEG x, y) as COPYSIGN (x, y).  Optimize
>         COPYSIGN (x, ABS y) as ABS x.
>         Optimize COPYSIGN (COPYSIGN (x, y), z) as COPYSIGN (x, z).
>         Optimize COPYSIGN (x, COPYSIGN (y, z)) as COPYSIGN (x, z).
>         (simplify_const_binary_operation): Evaluate COPYSIGN of constant
>         arguments at compile-time.
>         * wide-int.cc (wide_int_storage::bitreverse): Provide a
>         wide_int implementation, based upon bswap implementation.
>         * wide-int.g (wide_int_storage::bitreverse): Prototype here.
>
>
> Thanks in advance,
> Roger
> --
>
> diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
> index 1de2494..76aeafb 100644
> --- a/gcc/doc/rtl.texi
> +++ b/gcc/doc/rtl.texi
> @@ -2742,6 +2742,17 @@ integer of mode @var{m}.  The mode of @var{x} must be @var{m} or
>  Represents the value @var{x} with the order of bytes reversed, carried out
>  in mode @var{m}, which must be a fixed-point machine mode.
>  The mode of @var{x} must be @var{m} or @code{VOIDmode}.
> +
> +@findex bitreverse
> +@item (bitreverse:@var{m} @var{x})
> +Represents the value @var{x} with the order of bits reversed, carried out
> +in mode @var{m}, which must be a fixed-point machine mode.
> +The mode of @var{x} must be @var{m} or @code{VOIDmode}.
> +
> +@findex copysign
> +@item (copysign:@var{m} @var{x} @var{y})
> +Represents the value @var{x} with the sign of @var{y}.
> +Both @var{x} and @var{y} must have floating point machine mode @var{m}.
>  @end table
>  
>  @node Comparisons
> diff --git a/gcc/rtl.def b/gcc/rtl.def
> index 6ddbce3..88e2b19 100644
> --- a/gcc/rtl.def
> +++ b/gcc/rtl.def
> @@ -664,6 +664,9 @@ DEF_RTL_EXPR(POPCOUNT, "popcount", "e", RTX_UNARY)
>  /* Population parity (number of 1 bits modulo 2).  */
>  DEF_RTL_EXPR(PARITY, "parity", "e", RTX_UNARY)
>  
> +/* Reverse bits.  */
> +DEF_RTL_EXPR(BITREVERSE, "bitreverse", "e", RTX_UNARY)
> +
>  /* Reference to a signed bit-field of specified size and position.
>     Operand 0 is the memory unit (usually SImode or QImode) which
>     contains the field's first bit.  Operand 1 is the width, in bits.
> @@ -753,6 +756,9 @@ DEF_RTL_EXPR(US_TRUNCATE, "us_truncate", "e", RTX_UNARY)
>  /* Floating point multiply/add combined instruction.  */
>  DEF_RTL_EXPR(FMA, "fma", "eee", RTX_TERNARY)
>  
> +/* Floating point copysign.  Operand 0 with the sign of operand 1.  */
> +DEF_RTL_EXPR(COPYSIGN, "copysign", "ee", RTX_BIN_ARITH)
> +
>  /* Information about the variable and its location.  */
>  DEF_RTL_EXPR(VAR_LOCATION, "var_location", "te", RTX_EXTRA)
>  
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index d4aeebc..26fa2b9 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -1040,10 +1040,10 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
>  	}
>  
>        /* (not (bswap x)) -> (bswap (not x)).  */
> -      if (GET_CODE (op) == BSWAP)
> +      if (GET_CODE (op) == BSWAP || GET_CODE (op) == BITREVERSE)
>  	{
>  	  rtx x = simplify_gen_unary (NOT, mode, XEXP (op, 0), mode);
> -	  return simplify_gen_unary (BSWAP, mode, x, mode);
> +	  return simplify_gen_unary (GET_CODE (op), mode, x, mode);
>  	}
>        break;
>  
> @@ -1419,6 +1419,7 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
>        switch (GET_CODE (op))
>  	{
>  	case BSWAP:
> +	case BITREVERSE:
>  	  /* (popcount (bswap <X>)) = (popcount <X>).  */
>  	  return simplify_gen_unary (POPCOUNT, mode, XEXP (op, 0),
>  				     GET_MODE (XEXP (op, 0)));
> @@ -1448,6 +1449,7 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
>  	{
>  	case NOT:
>  	case BSWAP:
> +	case BITREVERSE:
>  	  return simplify_gen_unary (PARITY, mode, XEXP (op, 0),
>  				     GET_MODE (XEXP (op, 0)));
>  
> @@ -1481,6 +1483,12 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode,
>  	return XEXP (op, 0);
>        break;
>  
> +    case BITREVERSE:
> +      /* (bitreverse (bitreverse x)) -> x.  */
> +      if (GET_CODE (op) == BITREVERSE)
> +	return XEXP (op, 0);
> +      break;
> +
>      case FLOAT:
>        /* (float (sign_extend <X>)) = (float <X>).  */
>        if (GET_CODE (op) == SIGN_EXTEND)
> @@ -2114,6 +2122,10 @@ simplify_const_unary_operation (enum rtx_code code, machine_mode mode,
>  	  result = wide_int (op0).bswap ();
>  	  break;
>  
> +	case BITREVERSE:
> +	  result = wide_int (op0).bitreverse ();
> +	  break;
> +
>  	case TRUNCATE:
>  	case ZERO_EXTEND:
>  	  result = wide_int::from (op0, width, UNSIGNED);
> @@ -4355,6 +4367,31 @@ simplify_ashift:
>  	return op0;
>        return 0;
>  
> +    case COPYSIGN:
> +      if (rtx_equal_p (trueop0, trueop1) && ! side_effects_p (op0))
> +	return op0;
> +      if (CONST_DOUBLE_AS_FLOAT_P (trueop1))
> +	{
> +	  REAL_VALUE_TYPE f1;
> +	  real_convert (&f1, mode, CONST_DOUBLE_REAL_VALUE (trueop1));
> +	  rtx tmp = simplify_gen_unary (ABS, mode, op0, mode);
> +	  if (REAL_VALUE_NEGATIVE (f1))
> +	    tmp = simplify_gen_unary (NEG, mode, op0, mode);
> +	  return tmp;
> +	}
> +      if (GET_CODE (op0) == NEG || GET_CODE (op0) == ABS)
> +	return simplify_gen_binary (COPYSIGN, mode, XEXP (op0, 0), op1);
> +      if (GET_CODE (op1) == ABS
> +	  && ! side_effects_p (op1))
> +	return simplify_gen_unary (ABS, mode, op0, mode);
> +      if (GET_CODE (op0) == COPYSIGN
> +	  && ! side_effects_p (XEXP (op0, 1)))
> +	return simplify_gen_binary (COPYSIGN, mode, XEXP (op0, 0), op1);
> +      if (GET_CODE (op1) == COPYSIGN
> +	  && ! side_effects_p (XEXP (op1, 0)))
> +	return simplify_gen_binary (COPYSIGN, mode, op0, XEXP (op1, 1));
> +      return 0;
> +
>      case VEC_SERIES:
>        if (op1 == CONST0_RTX (GET_MODE_INNER (mode)))
>  	return gen_vec_duplicate (mode, op0);
> @@ -4995,6 +5032,14 @@ simplify_const_binary_operation (enum rtx_code code, machine_mode mode,
>  	   real_from_target (&r, tmp0, mode);
>  	   return const_double_from_real_value (r, mode);
>  	}
> +      else if (code == COPYSIGN)
> +	{
> +	  REAL_VALUE_TYPE f0, f1;
> +	  real_convert (&f0, mode, CONST_DOUBLE_REAL_VALUE (op0));
> +	  real_convert (&f1, mode, CONST_DOUBLE_REAL_VALUE (op1));
> +	  real_copysign (&f0, &f1);
> +	  return const_double_from_real_value (f0, mode);
> +	}
>        else
>  	{
>  	  REAL_VALUE_TYPE f0, f1, value, result;
> diff --git a/gcc/wide-int.cc b/gcc/wide-int.cc
> index 8c81757..280348d 100644
> --- a/gcc/wide-int.cc
> +++ b/gcc/wide-int.cc
> @@ -770,6 +770,80 @@ wide_int_storage::bswap () const
>    return result;
>  }
>  
> +/* bitreverse THIS.  */
> +wide_int
> +wide_int_storage::bitreverse () const
> +{
> +  static const unsigned char bitreverse_byte_table[256] = {
> +    0x00, 0x80, 0x40, 0xc0, 0x20, 0xa0, 0x60, 0xe0,
> +    0x10, 0x90, 0x50, 0xd0, 0x30, 0xb0, 0x70, 0xf0,
> +    0x08, 0x88, 0x48, 0xc8, 0x28, 0xa8, 0x68, 0xe8,
> +    0x18, 0x98, 0x58, 0xd8, 0x38, 0xb8, 0x78, 0xf8,
> +    0x04, 0x84, 0x44, 0xc4, 0x24, 0xa4, 0x64, 0xe4,
> +    0x14, 0x94, 0x54, 0xd4, 0x34, 0xb4, 0x74, 0xf4,
> +    0x0c, 0x8c, 0x4c, 0xcc, 0x2c, 0xac, 0x6c, 0xec,
> +    0x1c, 0x9c, 0x5c, 0xdc, 0x3c, 0xbc, 0x7c, 0xfc,
> +    0x02, 0x82, 0x42, 0xc2, 0x22, 0xa2, 0x62, 0xe2,
> +    0x12, 0x92, 0x52, 0xd2, 0x32, 0xb2, 0x72, 0xf2,
> +    0x0a, 0x8a, 0x4a, 0xca, 0x2a, 0xaa, 0x6a, 0xea,
> +    0x1a, 0x9a, 0x5a, 0xda, 0x3a, 0xba, 0x7a, 0xfa,
> +    0x06, 0x86, 0x46, 0xc6, 0x26, 0xa6, 0x66, 0xe6,
> +    0x16, 0x96, 0x56, 0xd6, 0x36, 0xb6, 0x76, 0xf6,
> +    0x0e, 0x8e, 0x4e, 0xce, 0x2e, 0xae, 0x6e, 0xee,
> +    0x1e, 0x9e, 0x5e, 0xde, 0x3e, 0xbe, 0x7e, 0xfe,
> +    0x01, 0x81, 0x41, 0xc1, 0x21, 0xa1, 0x61, 0xe1,
> +    0x11, 0x91, 0x51, 0xd1, 0x31, 0xb1, 0x71, 0xf1,
> +    0x09, 0x89, 0x49, 0xc9, 0x29, 0xa9, 0x69, 0xe9,
> +    0x19, 0x99, 0x59, 0xd9, 0x39, 0xb9, 0x79, 0xf9,
> +    0x05, 0x85, 0x45, 0xc5, 0x25, 0xa5, 0x65, 0xe5,
> +    0x15, 0x95, 0x55, 0xd5, 0x35, 0xb5, 0x75, 0xf5,
> +    0x0d, 0x8d, 0x4d, 0xcd, 0x2d, 0xad, 0x6d, 0xed,
> +    0x1d, 0x9d, 0x5d, 0xdd, 0x3d, 0xbd, 0x7d, 0xfd,
> +    0x03, 0x83, 0x43, 0xc3, 0x23, 0xa3, 0x63, 0xe3,
> +    0x13, 0x93, 0x53, 0xd3, 0x33, 0xb3, 0x73, 0xf3,
> +    0x0b, 0x8b, 0x4b, 0xcb, 0x2b, 0xab, 0x6b, 0xeb,
> +    0x1b, 0x9b, 0x5b, 0xdb, 0x3b, 0xbb, 0x7b, 0xfb,
> +    0x07, 0x87, 0x47, 0xc7, 0x27, 0xa7, 0x67, 0xe7,
> +    0x17, 0x97, 0x57, 0xd7, 0x37, 0xb7, 0x77, 0xf7,
> +    0x0f, 0x8f, 0x4f, 0xcf, 0x2f, 0xaf, 0x6f, 0xef,
> +    0x1f, 0x9f, 0x5f, 0xdf, 0x3f, 0xbf, 0x7f, 0xff
> +  };
> +
> +  wide_int result = wide_int::create (precision);
> +  unsigned int i, s;
> +  unsigned int len = BLOCKS_NEEDED (precision);
> +  unsigned int xlen = get_len ();
> +  const HOST_WIDE_INT *xval = get_val ();
> +  HOST_WIDE_INT *val = result.write_val ();
> +
> +  /* This is not a well defined operation if the precision is not a
> +     multiple of 8.  */
> +  gcc_assert ((precision & 0x7) == 0);

I don't see why though.  If we used a naive loop instead of the LUT,
I think we could handle all precisions naturally.  And I wouldn't
have expected this operation to be common enough that the optimisation
is important.

> +
> +  for (i = 0; i < len; i++)
> +    val[i] = 0;
> +
> +  /* Only swap the bits that are not the padding.  */
> +  for (s = 0; s < precision; s += 8)
> +    {
> +      unsigned int d = precision - s - 8;
> +      unsigned HOST_WIDE_INT byte;
> +
> +      unsigned int block = s / HOST_BITS_PER_WIDE_INT;
> +      unsigned int offset = s & (HOST_BITS_PER_WIDE_INT - 1);
> +
> +      byte = (safe_uhwi (xval, xlen, block) >> offset) & 0xff;
> +
> +      block = d / HOST_BITS_PER_WIDE_INT;
> +      offset = d & (HOST_BITS_PER_WIDE_INT - 1);
> +
> +      val[block] |= bitreverse_byte_table[byte << offset];
> +    }
> +
> +  result.set_len (canonize (val, len, precision));
> +  return result;
> +}
> +
>  /* Fill VAL with a mask where the lower WIDTH bits are ones and the bits
>     above that up to PREC are zeros.  The result is inverted if NEGATE
>     is true.  Return the number of blocks in VAL.  */
> diff --git a/gcc/wide-int.h b/gcc/wide-int.h
> index 6be343c..e78d83c 100644
> --- a/gcc/wide-int.h
> +++ b/gcc/wide-int.h
> @@ -1089,6 +1089,7 @@ public:
>  
>    /* FIXME: target-dependent, so should disappear.  */
>    wide_int bswap () const;
> +  wide_int bitreverse () const;

Would be worth adding a blank line here, since the FIXME doesn't apply
to the new routine.

But I think it would be better to define:

   wide_int wi::bitreverse (const wide_int_ref &);

in a similar way as other operations like wi::insert.  I can't remember
why bswap is different, but generally, the only member functions of
the wide_int_storage classes should be accessors.

Otherwise it looks good to me FWIW, but I'm a bit nervous about having
so much new code without any test coverage.

Thanks,
Richard