From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 1994F3858C54 for ; Thu, 11 May 2023 17:36:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1994F3858C54 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 478F5113E; Thu, 11 May 2023 10:37:11 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 452F73F67D; Thu, 11 May 2023 10:36:26 -0700 (PDT) From: Richard Sandiford To: "Roger Sayle" Mail-Followup-To: "Roger Sayle" ,"'GCC Patches'" , richard.sandiford@arm.com Cc: "'GCC Patches'" Subject: Re: [PATCH] Add RTX codes for BITREVERSE and COPYSIGN. References: <00aa01d98039$4c7668e0$e5633aa0$@nextmovesoftware.com> Date: Thu, 11 May 2023 18:36:25 +0100 In-Reply-To: <00aa01d98039$4c7668e0$e5633aa0$@nextmovesoftware.com> (Roger Sayle's message of "Sat, 6 May 2023 17:39:15 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-29.5 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: "Roger Sayle" writes: > An analysis of backend UNSPECs reveals that two of the most common UNSPECs > across target backends are for copysign and bit reversal. This patch > adds RTX codes for these expressions to allow their representation to > be standardized, and them to optimized by the middle-end RTL optimizers. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-32} with > no new failures. Ok for mainline? > > > 2023-05-06 Roger Sayle > > gcc/ChangeLog > * doc/rtl.texi (bitreverse, copysign): Document new RTX codes. > * rtl.def (BITREVERSE, COPYSIGN): Define new RTX codes. > * simplify-rtx.cc (simplify_unary_operation_1): Optimize > NOT (BITREVERSE x) as BITREVERSE (NOT x). > Optimize POPCOUNT (BITREVERSE x) as POPCOUNT x. > Optimize PARITY (BITREVERSE x) as PARITY x. > Optimize BITREVERSE (BITREVERSE x) as x. > (simplify_const_unary_operation) : Evaluate > BITREVERSE of a constant integer at compile-time. > (simplify_binary_operation_1) : Optimize > COPY_SIGN (x, x) as x. Optimize COPYSIGN (x, C) as ABS x > or NEG (ABS x) for constant C. Optimize COPYSIGN (ABS x, y) > and COPYSIGN (NEG x, y) as COPYSIGN (x, y). Optimize > COPYSIGN (x, ABS y) as ABS x. > Optimize COPYSIGN (COPYSIGN (x, y), z) as COPYSIGN (x, z). > Optimize COPYSIGN (x, COPYSIGN (y, z)) as COPYSIGN (x, z). > (simplify_const_binary_operation): Evaluate COPYSIGN of constant > arguments at compile-time. > * wide-int.cc (wide_int_storage::bitreverse): Provide a > wide_int implementation, based upon bswap implementation. > * wide-int.g (wide_int_storage::bitreverse): Prototype here. > > > Thanks in advance, > Roger > -- > > diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi > index 1de2494..76aeafb 100644 > --- a/gcc/doc/rtl.texi > +++ b/gcc/doc/rtl.texi > @@ -2742,6 +2742,17 @@ integer of mode @var{m}. The mode of @var{x} must be @var{m} or > Represents the value @var{x} with the order of bytes reversed, carried out > in mode @var{m}, which must be a fixed-point machine mode. > The mode of @var{x} must be @var{m} or @code{VOIDmode}. > + > +@findex bitreverse > +@item (bitreverse:@var{m} @var{x}) > +Represents the value @var{x} with the order of bits reversed, carried out > +in mode @var{m}, which must be a fixed-point machine mode. > +The mode of @var{x} must be @var{m} or @code{VOIDmode}. > + > +@findex copysign > +@item (copysign:@var{m} @var{x} @var{y}) > +Represents the value @var{x} with the sign of @var{y}. > +Both @var{x} and @var{y} must have floating point machine mode @var{m}. > @end table > > @node Comparisons > diff --git a/gcc/rtl.def b/gcc/rtl.def > index 6ddbce3..88e2b19 100644 > --- a/gcc/rtl.def > +++ b/gcc/rtl.def > @@ -664,6 +664,9 @@ DEF_RTL_EXPR(POPCOUNT, "popcount", "e", RTX_UNARY) > /* Population parity (number of 1 bits modulo 2). */ > DEF_RTL_EXPR(PARITY, "parity", "e", RTX_UNARY) > > +/* Reverse bits. */ > +DEF_RTL_EXPR(BITREVERSE, "bitreverse", "e", RTX_UNARY) > + > /* Reference to a signed bit-field of specified size and position. > Operand 0 is the memory unit (usually SImode or QImode) which > contains the field's first bit. Operand 1 is the width, in bits. > @@ -753,6 +756,9 @@ DEF_RTL_EXPR(US_TRUNCATE, "us_truncate", "e", RTX_UNARY) > /* Floating point multiply/add combined instruction. */ > DEF_RTL_EXPR(FMA, "fma", "eee", RTX_TERNARY) > > +/* Floating point copysign. Operand 0 with the sign of operand 1. */ > +DEF_RTL_EXPR(COPYSIGN, "copysign", "ee", RTX_BIN_ARITH) > + > /* Information about the variable and its location. */ > DEF_RTL_EXPR(VAR_LOCATION, "var_location", "te", RTX_EXTRA) > > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc > index d4aeebc..26fa2b9 100644 > --- a/gcc/simplify-rtx.cc > +++ b/gcc/simplify-rtx.cc > @@ -1040,10 +1040,10 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, > } > > /* (not (bswap x)) -> (bswap (not x)). */ > - if (GET_CODE (op) == BSWAP) > + if (GET_CODE (op) == BSWAP || GET_CODE (op) == BITREVERSE) > { > rtx x = simplify_gen_unary (NOT, mode, XEXP (op, 0), mode); > - return simplify_gen_unary (BSWAP, mode, x, mode); > + return simplify_gen_unary (GET_CODE (op), mode, x, mode); > } > break; > > @@ -1419,6 +1419,7 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, > switch (GET_CODE (op)) > { > case BSWAP: > + case BITREVERSE: > /* (popcount (bswap )) = (popcount ). */ > return simplify_gen_unary (POPCOUNT, mode, XEXP (op, 0), > GET_MODE (XEXP (op, 0))); > @@ -1448,6 +1449,7 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, > { > case NOT: > case BSWAP: > + case BITREVERSE: > return simplify_gen_unary (PARITY, mode, XEXP (op, 0), > GET_MODE (XEXP (op, 0))); > > @@ -1481,6 +1483,12 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, > return XEXP (op, 0); > break; > > + case BITREVERSE: > + /* (bitreverse (bitreverse x)) -> x. */ > + if (GET_CODE (op) == BITREVERSE) > + return XEXP (op, 0); > + break; > + > case FLOAT: > /* (float (sign_extend )) = (float ). */ > if (GET_CODE (op) == SIGN_EXTEND) > @@ -2114,6 +2122,10 @@ simplify_const_unary_operation (enum rtx_code code, machine_mode mode, > result = wide_int (op0).bswap (); > break; > > + case BITREVERSE: > + result = wide_int (op0).bitreverse (); > + break; > + > case TRUNCATE: > case ZERO_EXTEND: > result = wide_int::from (op0, width, UNSIGNED); > @@ -4355,6 +4367,31 @@ simplify_ashift: > return op0; > return 0; > > + case COPYSIGN: > + if (rtx_equal_p (trueop0, trueop1) && ! side_effects_p (op0)) > + return op0; > + if (CONST_DOUBLE_AS_FLOAT_P (trueop1)) > + { > + REAL_VALUE_TYPE f1; > + real_convert (&f1, mode, CONST_DOUBLE_REAL_VALUE (trueop1)); > + rtx tmp = simplify_gen_unary (ABS, mode, op0, mode); > + if (REAL_VALUE_NEGATIVE (f1)) > + tmp = simplify_gen_unary (NEG, mode, op0, mode); > + return tmp; > + } > + if (GET_CODE (op0) == NEG || GET_CODE (op0) == ABS) > + return simplify_gen_binary (COPYSIGN, mode, XEXP (op0, 0), op1); > + if (GET_CODE (op1) == ABS > + && ! side_effects_p (op1)) > + return simplify_gen_unary (ABS, mode, op0, mode); > + if (GET_CODE (op0) == COPYSIGN > + && ! side_effects_p (XEXP (op0, 1))) > + return simplify_gen_binary (COPYSIGN, mode, XEXP (op0, 0), op1); > + if (GET_CODE (op1) == COPYSIGN > + && ! side_effects_p (XEXP (op1, 0))) > + return simplify_gen_binary (COPYSIGN, mode, op0, XEXP (op1, 1)); > + return 0; > + > case VEC_SERIES: > if (op1 == CONST0_RTX (GET_MODE_INNER (mode))) > return gen_vec_duplicate (mode, op0); > @@ -4995,6 +5032,14 @@ simplify_const_binary_operation (enum rtx_code code, machine_mode mode, > real_from_target (&r, tmp0, mode); > return const_double_from_real_value (r, mode); > } > + else if (code == COPYSIGN) > + { > + REAL_VALUE_TYPE f0, f1; > + real_convert (&f0, mode, CONST_DOUBLE_REAL_VALUE (op0)); > + real_convert (&f1, mode, CONST_DOUBLE_REAL_VALUE (op1)); > + real_copysign (&f0, &f1); > + return const_double_from_real_value (f0, mode); > + } > else > { > REAL_VALUE_TYPE f0, f1, value, result; > diff --git a/gcc/wide-int.cc b/gcc/wide-int.cc > index 8c81757..280348d 100644 > --- a/gcc/wide-int.cc > +++ b/gcc/wide-int.cc > @@ -770,6 +770,80 @@ wide_int_storage::bswap () const > return result; > } > > +/* bitreverse THIS. */ > +wide_int > +wide_int_storage::bitreverse () const > +{ > + static const unsigned char bitreverse_byte_table[256] = { > + 0x00, 0x80, 0x40, 0xc0, 0x20, 0xa0, 0x60, 0xe0, > + 0x10, 0x90, 0x50, 0xd0, 0x30, 0xb0, 0x70, 0xf0, > + 0x08, 0x88, 0x48, 0xc8, 0x28, 0xa8, 0x68, 0xe8, > + 0x18, 0x98, 0x58, 0xd8, 0x38, 0xb8, 0x78, 0xf8, > + 0x04, 0x84, 0x44, 0xc4, 0x24, 0xa4, 0x64, 0xe4, > + 0x14, 0x94, 0x54, 0xd4, 0x34, 0xb4, 0x74, 0xf4, > + 0x0c, 0x8c, 0x4c, 0xcc, 0x2c, 0xac, 0x6c, 0xec, > + 0x1c, 0x9c, 0x5c, 0xdc, 0x3c, 0xbc, 0x7c, 0xfc, > + 0x02, 0x82, 0x42, 0xc2, 0x22, 0xa2, 0x62, 0xe2, > + 0x12, 0x92, 0x52, 0xd2, 0x32, 0xb2, 0x72, 0xf2, > + 0x0a, 0x8a, 0x4a, 0xca, 0x2a, 0xaa, 0x6a, 0xea, > + 0x1a, 0x9a, 0x5a, 0xda, 0x3a, 0xba, 0x7a, 0xfa, > + 0x06, 0x86, 0x46, 0xc6, 0x26, 0xa6, 0x66, 0xe6, > + 0x16, 0x96, 0x56, 0xd6, 0x36, 0xb6, 0x76, 0xf6, > + 0x0e, 0x8e, 0x4e, 0xce, 0x2e, 0xae, 0x6e, 0xee, > + 0x1e, 0x9e, 0x5e, 0xde, 0x3e, 0xbe, 0x7e, 0xfe, > + 0x01, 0x81, 0x41, 0xc1, 0x21, 0xa1, 0x61, 0xe1, > + 0x11, 0x91, 0x51, 0xd1, 0x31, 0xb1, 0x71, 0xf1, > + 0x09, 0x89, 0x49, 0xc9, 0x29, 0xa9, 0x69, 0xe9, > + 0x19, 0x99, 0x59, 0xd9, 0x39, 0xb9, 0x79, 0xf9, > + 0x05, 0x85, 0x45, 0xc5, 0x25, 0xa5, 0x65, 0xe5, > + 0x15, 0x95, 0x55, 0xd5, 0x35, 0xb5, 0x75, 0xf5, > + 0x0d, 0x8d, 0x4d, 0xcd, 0x2d, 0xad, 0x6d, 0xed, > + 0x1d, 0x9d, 0x5d, 0xdd, 0x3d, 0xbd, 0x7d, 0xfd, > + 0x03, 0x83, 0x43, 0xc3, 0x23, 0xa3, 0x63, 0xe3, > + 0x13, 0x93, 0x53, 0xd3, 0x33, 0xb3, 0x73, 0xf3, > + 0x0b, 0x8b, 0x4b, 0xcb, 0x2b, 0xab, 0x6b, 0xeb, > + 0x1b, 0x9b, 0x5b, 0xdb, 0x3b, 0xbb, 0x7b, 0xfb, > + 0x07, 0x87, 0x47, 0xc7, 0x27, 0xa7, 0x67, 0xe7, > + 0x17, 0x97, 0x57, 0xd7, 0x37, 0xb7, 0x77, 0xf7, > + 0x0f, 0x8f, 0x4f, 0xcf, 0x2f, 0xaf, 0x6f, 0xef, > + 0x1f, 0x9f, 0x5f, 0xdf, 0x3f, 0xbf, 0x7f, 0xff > + }; > + > + wide_int result = wide_int::create (precision); > + unsigned int i, s; > + unsigned int len = BLOCKS_NEEDED (precision); > + unsigned int xlen = get_len (); > + const HOST_WIDE_INT *xval = get_val (); > + HOST_WIDE_INT *val = result.write_val (); > + > + /* This is not a well defined operation if the precision is not a > + multiple of 8. */ > + gcc_assert ((precision & 0x7) == 0); I don't see why though. If we used a naive loop instead of the LUT, I think we could handle all precisions naturally. And I wouldn't have expected this operation to be common enough that the optimisation is important. > + > + for (i = 0; i < len; i++) > + val[i] = 0; > + > + /* Only swap the bits that are not the padding. */ > + for (s = 0; s < precision; s += 8) > + { > + unsigned int d = precision - s - 8; > + unsigned HOST_WIDE_INT byte; > + > + unsigned int block = s / HOST_BITS_PER_WIDE_INT; > + unsigned int offset = s & (HOST_BITS_PER_WIDE_INT - 1); > + > + byte = (safe_uhwi (xval, xlen, block) >> offset) & 0xff; > + > + block = d / HOST_BITS_PER_WIDE_INT; > + offset = d & (HOST_BITS_PER_WIDE_INT - 1); > + > + val[block] |= bitreverse_byte_table[byte << offset]; > + } > + > + result.set_len (canonize (val, len, precision)); > + return result; > +} > + > /* Fill VAL with a mask where the lower WIDTH bits are ones and the bits > above that up to PREC are zeros. The result is inverted if NEGATE > is true. Return the number of blocks in VAL. */ > diff --git a/gcc/wide-int.h b/gcc/wide-int.h > index 6be343c..e78d83c 100644 > --- a/gcc/wide-int.h > +++ b/gcc/wide-int.h > @@ -1089,6 +1089,7 @@ public: > > /* FIXME: target-dependent, so should disappear. */ > wide_int bswap () const; > + wide_int bitreverse () const; Would be worth adding a blank line here, since the FIXME doesn't apply to the new routine. But I think it would be better to define: wide_int wi::bitreverse (const wide_int_ref &); in a similar way as other operations like wi::insert. I can't remember why bswap is different, but generally, the only member functions of the wide_int_storage classes should be accessors. Otherwise it looks good to me FWIW, but I'm a bit nervous about having so much new code without any test coverage. Thanks, Richard