public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [committed] RISC-V: Expose bswapsi for TARGET_64BIT
@ 2023-09-05 21:41 Jeff Law
  0 siblings, 0 replies; only message in thread
From: Jeff Law @ 2023-09-05 21:41 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2030 bytes --]

Various bswapsi tests are failing for rv64.  More importantly, we're 
generating crappy code.

Let's take the first test from bswapsi-1.c as an example.

> typedef unsigned int uint32_t;
> 
> #define __const_swab32(x) ((uint32_t)(                                \
>         (((uint32_t)(x) & (uint32_t)0x000000ffUL) << 24) |            \
>         (((uint32_t)(x) & (uint32_t)0x0000ff00UL) <<  8) |            \
>         (((uint32_t)(x) & (uint32_t)0x00ff0000UL) >>  8) |            \
>         (((uint32_t)(x) & (uint32_t)0xff000000UL) >> 24)))
> 
> /* This byte swap implementation is used by the Linux kernel and the
>    GNU C library.  */
> 
> uint32_t
> swap32_a (uint32_t in)
> {
>   return __const_swab32 (in);
> }
> 
> 
> 


We generate this for rv64gc_zba_zbb_zbs:

>         srliw   a1,a0,24
>         slliw   a5,a0,24
>         slliw   a3,a0,8
>         li      a2,16711680
>         li      a4,65536
>         or      a5,a5,a1
>         and     a3,a3,a2
>         addi    a4,a4,-256
>         srliw   a0,a0,8
>         or      a5,a5,a3
>         and     a0,a0,a4
>         or      a0,a5,a0
>         ret
Urgh!

After this patch we generate:

>         rev8    a0,a0
>         srai    a0,a0,32
>         ret

Clearly better.


The stated rationale behind not exposing bswapsi2 for TARGET_64BIT is 
that the RTL expanders already know how to widen a bswap, which is 
definitely true.  But it's the case that failure to expose a bswapsi 
will cause the 32bit bswap optimizations in gimple store merging to not 
trigger.  Thus we get crappy code.

To fix this we expose bswapsi on TARGET_64BIT.  gimple-store-merging 
then detects the 32bit bswap idioms and generates suitable __builtin 
calls.  The expander will "FAIL" expansion for TARGET_64BIT which forces 
the generic expander code to synthesize the operation (we could 
synthesize in here, but that'd result in duplicate code).

Tested on rv64gc_zba_zbb_zbs, fixes all the bswapsi failures in the 
testsuite without any regressions.

Pushed to the trunk,

Jeff



[-- Attachment #2: P --]
[-- Type: text/plain, Size: 3344 bytes --]

commit fbc01748ba46eb26074388a8fb7b44d25a414a72
Author: Jeff Law <jlaw@ventanamicro.com>
Date:   Tue Sep 5 15:39:16 2023 -0600

    RISC-V: Expose bswapsi for TARGET_64BIT
    
    Various bswapsi tests are failing for rv64.  More importantly, we're generating
    crappy code.
    
    Let's take the first test from bswapsi-1.c as an example.
    
    > typedef unsigned int uint32_t;
    >
    > #define __const_swab32(x) ((uint32_t)(                                \
    >         (((uint32_t)(x) & (uint32_t)0x000000ffUL) << 24) |            \
    >         (((uint32_t)(x) & (uint32_t)0x0000ff00UL) <<  8) |            \
    >         (((uint32_t)(x) & (uint32_t)0x00ff0000UL) >>  8) |            \
    >         (((uint32_t)(x) & (uint32_t)0xff000000UL) >> 24)))
    >
    > /* This byte swap implementation is used by the Linux kernel and the
    >    GNU C library.  */
    >
    > uint32_t
    > swap32_a (uint32_t in)
    > {
    >   return __const_swab32 (in);
    > }
    >
    >
    >
    
    We generate this for rv64gc_zba_zbb_zbs:
    
    >         srliw   a1,a0,24
    >         slliw   a5,a0,24
    >         slliw   a3,a0,8
    >         li      a2,16711680
    >         li      a4,65536
    >         or      a5,a5,a1
    >         and     a3,a3,a2
    >         addi    a4,a4,-256
    >         srliw   a0,a0,8
    >         or      a5,a5,a3
    >         and     a0,a0,a4
    >         or      a0,a5,a0
    >         retUrgh!
    
    After this patch we generate:
    
    >         rev8    a0,a0
    >         srai    a0,a0,32
    >         ret
    Clearly better.
    
    The stated rationale behind not exposing bswapsi2 for TARGET_64BIT is that the
    RTL expanders already know how to widen a bswap, which is definitely true.  But
    it's the case that failure to expose a bswapsi will cause the 32bit bswap
    optimizations in gimple store merging to not trigger.  Thus we get crappy code.
    
    To fix this we expose bswapsi on TARGET_64BIT.  gimple-store-merging then
    detects the 32bit bswap idioms and generates suitable __builtin calls.  The
    expander will "FAIL" expansion for TARGET_64BIT which forces the generic
    expander code to synthesize the operation (we could synthesize in here, but
    that'd result in duplicate code).
    
    Tested on rv64gc_zba_zbb_zbs, fixes all the bswapsi failures in the testsuite
    without any regressions.
    
    gcc/
            * config/riscv/bitmanip.md (bswapsi2): Expose for TARGET_64BIT.

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 7b55528ee49..1544ef4e125 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -454,7 +454,16 @@ (define_expand "bswapdi2"
 (define_expand "bswapsi2"
   [(set (match_operand:SI 0 "register_operand")
 	(bswap:SI (match_operand:SI 1 "register_operand")))]
-  "(!TARGET_64BIT && (TARGET_ZBB || TARGET_ZBKB)) || TARGET_XTHEADBB")
+  "TARGET_ZBB || TARGET_ZBKB || TARGET_XTHEADBB"
+{
+  /* Expose bswapsi2 on TARGET_64BIT so that the gimple store
+     merging pass will create suitable bswap insns.  We can actually
+     just FAIL that case when generating RTL and let the generic code
+     handle it.  */
+  if (TARGET_64BIT && !TARGET_XTHEADBB)
+    FAIL;
+})
+
 
 (define_insn "*bswap<mode>2"
   [(set (match_operand:X 0 "register_operand" "=r")

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-09-05 21:41 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-05 21:41 [committed] RISC-V: Expose bswapsi for TARGET_64BIT Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).