From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=F6qA=BA=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 81E893858C2F
	for <gcc-patches@gcc.gnu.org>; Thu, 11 May 2023 18:02:01 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 81E893858C2F
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B4C8B113E;
	Thu, 11 May 2023 11:02:45 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B27603F5A1;
	Thu, 11 May 2023 11:02:00 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: "Roger Sayle" <roger@nextmovesoftware.com>
Mail-Followup-To: "Roger Sayle" <roger@nextmovesoftware.com>,"'GCC Patches'" <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Cc: "'GCC Patches'" <gcc-patches@gcc.gnu.org>
Subject: Re: [libgcc PATCH] Add bit reversal functions __bitrev[qhsd]i2.
References: <00c401d9803f$dafe3c90$90fab5b0$@nextmovesoftware.com>
Date: Thu, 11 May 2023 19:01:59 +0100
In-Reply-To: <00c401d9803f$dafe3c90$90fab5b0$@nextmovesoftware.com> (Roger
	Sayle's message of "Sat, 6 May 2023 18:26:11 +0100")
Message-ID: <mptlehueoh4.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-29.5 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

"Roger Sayle" <roger@nextmovesoftware.com> writes:
> This patch proposes adding run-time library support for bit reversal,
> by adding a __bitrevsi2 function to libgcc.  Thoughts/opinions?
>
> I'm also tempted to add __popcount[qh]i2 and __parity[qh]i2 to libgcc,
> to allow the RTL optimizers to perform narrowing operations, but I'm
> curious to hear whether QImode and HImode support, though more efficient,
> is frowned by the libgcc maintainers/philosophy.

I don't think RTL optimisers should be in the business of generating new
libcalls.  Wouldn't it have to be done in gimple and/or during expand?

> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32} and
> on nvptx-none, with no new regressions.  Ok for mainline?
>
>
> 2023-05-06  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * doc/libgcc.texi (__bitrevqi2): Document bit reversal run-time
>         functions; __bitrevqi2, __bitrevhi2, __bitrevsi2 and __bitrevdi2.
>
> libgcc/ChangeLog
>         * Makfile.in (lib2funcs): Add __bitrev[qhsd]i2.
>         * libgcc-std.ver.in (GCC_14.0.0): Add __bitrev[qhsd]i2.
>         * libgcc2.c (__bitrevqi2): New function.
>         (__bitrevhi2): Likewise.
>         (__bitrevsi2): Likewise.
>         (__bitrevdi2): Likewise.
>         * libgcc2.h (__bitrevqi2): Prototype here.
>         (__bitrevhi2): Likewise.
>         (__bitrevsi2): Likewise.
>         (__bitrevdi2): Likewise.
>
> Thanks in advance,
> Roger
> --
>
> diff --git a/gcc/doc/libgcc.texi b/gcc/doc/libgcc.texi
> index 73aa803..7611347 100644
> --- a/gcc/doc/libgcc.texi
> +++ b/gcc/doc/libgcc.texi
> @@ -218,6 +218,13 @@ These functions return the number of bits set in @var{a}.
>  These functions return the @var{a} byteswapped.
>  @end deftypefn
>  
> +@deftypefn {Runtime Function} int8_t __bitrevqi2 (int8_t @var{a})
> +@deftypefnx {Runtime Function} int16_t __bitrevhi2 (int16_t @var{a})
> +@deftypefnx {Runtime Function} int32_t __bitrevsi2 (int32_t @var{a})
> +@deftypefnx {Runtime Function} int64_t __bitrevdi2 (int64_t @var{a})
> +These functions return the bit reversed @var{a}.
> +@end deftypefn
> +
>  @node Soft float library routines
>  @section Routines for floating point emulation
>  @cindex soft float library
> diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
> index 6c4dc79..67c54df 100644
> --- a/libgcc/Makefile.in
> +++ b/libgcc/Makefile.in
> @@ -446,7 +446,7 @@ lib2funcs = _muldi3 _negdi2 _lshrdi3 _ashldi3 _ashrdi3 _cmpdi2 _ucmpdi2	   \
>  	    _paritysi2 _paritydi2 _powisf2 _powidf2 _powixf2 _powitf2	   \
>  	    _mulhc3 _mulsc3 _muldc3 _mulxc3 _multc3 _divhc3 _divsc3	   \
>  	    _divdc3 _divxc3 _divtc3 _bswapsi2 _bswapdi2 _clrsbsi2	   \
> -	    _clrsbdi2
> +	    _clrsbdi2 _bitrevqi2 _bitrevhi2 _bitrevsi2 _bitrevdi2
>  
>  # The floating-point conversion routines that involve a single-word integer.
>  # XX stands for the integer mode.
> diff --git a/libgcc/libgcc-std.ver.in b/libgcc/libgcc-std.ver.in
> index c4f87a5..2198b0e 100644
> --- a/libgcc/libgcc-std.ver.in
> +++ b/libgcc/libgcc-std.ver.in
> @@ -1944,3 +1944,12 @@ GCC_7.0.0 {
>    __PFX__divmoddi4
>    __PFX__divmodti4
>  }
> +
> +%inherit GCC_14.0.0 GCC_7.0.0
> +GCC_14.0.0 {
> +  # bit reversal functions
> +  __PFX__bitrevqi2
> +  __PFX__bitrevhi2
> +  __PFX__bitrevsi2
> +  __PFX__bitrevdi2
> +}
> diff --git a/libgcc/libgcc2.c b/libgcc/libgcc2.c
> index e0017d1..2bef2a1 100644
> --- a/libgcc/libgcc2.c
> +++ b/libgcc/libgcc2.c
> @@ -488,6 +488,54 @@ __bswapdi2 (DItype u)
>  	  | (((u) & 0x00000000000000ffull) << 56));
>  }
>  #endif
> +
> +#ifdef L_bitrevqi2
> +QItype
> +__bitrevqi2 (QItype x)
> +{
> +  UQItype u = x;
> +  u = (((u) >> 1) & 0x55) | (((u) & 0x55) << 1);
> +  u = (((u) >> 2) & 0x33) | (((u) & 0x33) << 2);
> +  return ((u) >> 4) | ((u) << 4);
> +}
> +#endif
> +#ifdef L_bitrevhi2
> +HItype
> +__bitrevhi2 (HItype x)
> +{
> +  UHItype u = x;
> +  u = (((u) >> 1) & 0x5555) | (((u) & 0x5555) << 1);
> +  u = (((u) >> 2) & 0x3333) | (((u) & 0x3333) << 2);
> +  u = (((u) >> 4) & 0x0f0f) | (((u) & 0x0f0f) << 4);
> +  return ((u) >> 8) | ((u) << 8);
> +}
> +#endif
> +#ifdef L_bitrevsi2
> +SItype
> +__bitrevsi2 (SItype x)
> +{
> +  USItype u = x;
> +  u = (((u) >> 1) & 0x55555555) | (((u) & 0x55555555) << 1);
> +  u = (((u) >> 2) & 0x33333333) | (((u) & 0x33333333) << 2);
> +  u = (((u) >> 4) & 0x0f0f0f0f) | (((u) & 0x0f0f0f0f) << 4);
> +  return __bswapsi2 (u);

Would it be better to use __builtin_bswap32 here, so that targets
with bswap but not bitreverse still optimise the bswap part?
Same for the DI version.

Not sure how portable all this is, but the underlying assumptions
seem to be the same as for bswap.

Looks OK to me otherwise, but it should wait until something needs it
(and can test it).

Thanks,
Richard

> +}
> +#endif
> +#ifdef L_bitrevdi2
> +DItype
> +__bitrevdi2 (DItype x)
> +{
> +  UDItype u = x;
> +  u = (((u) >> 1) & 0x5555555555555555ll)
> +      | (((u) & 0x5555555555555555ll) << 1);
> +  u = (((u) >> 2) & 0x3333333333333333ll)
> +      | (((u) & 0x3333333333333333ll) << 2);
> +  u = (((u) >> 4) & 0x0f0f0f0f0f0f0f0fll)
> +      | (((u) & 0x0f0f0f0f0f0f0f0fll) << 4);
> +  return __bswapdi2 (u);
> +}
> +#endif
> +
>  #ifdef L_ffssi2
>  #undef int
>  int
> diff --git a/libgcc/libgcc2.h b/libgcc/libgcc2.h
> index 3ec9bbd..e1abc0d 100644
> --- a/libgcc/libgcc2.h
> +++ b/libgcc/libgcc2.h
> @@ -338,6 +338,10 @@ typedef int shift_count_type __attribute__((mode (__libgcc_shift_count__)));
>  #define __udiv_w_sdiv		__N(udiv_w_sdiv)
>  #define __clear_cache		__N(clear_cache)
>  #define __enable_execute_stack	__N(enable_execute_stack)
> +#define __bitrevqi2		__N(bitrevqi2)
> +#define __bitrevhi2		__N(bitrevhi2)
> +#define __bitrevsi2		__N(bitrevsi2)
> +#define __bitrevdi2		__N(bitrevdi2)
>  
>  #ifndef __powisf2
>  #define __powisf2		__N(powisf2)
> @@ -426,6 +430,15 @@ extern DWtype __subvDI3 (DWtype, DWtype);
>  extern DWtype __mulvDI3 (DWtype, DWtype);
>  extern DWtype __negvDI2 (DWtype);
>  
> +extern QItype __bitrevqi2 (QItype);
> +extern HItype __bitrevhi2 (HItype);
> +#if MIN_UNITS_PER_WORD > 1
> +extern SItype __bitrevsi2 (SItype);
> +#endif
> +#if __SIZEOF_LONG_LONG__ > 4
> +extern DItype __bitrevdi2 (DItype);
> +#endif
> +
>  #ifdef COMPAT_SIMODE_TRAPPING_ARITHMETIC
>  #define __absvsi2	__N(absvsi2)
>  #define __negvsi2	__N(negvsi2)