public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization
@ 2023-07-18 14:32 jakub at gcc dot gnu.org
  2023-07-18 14:33 ` [Bug rtl-optimization/110717] " jakub at gcc dot gnu.org
                   ` (17 more replies)
  0 siblings, 18 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-07-18 14:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

            Bug ID: 110717
           Summary: Double-word sign-extension missed-optimization
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jakub at gcc dot gnu.org
  Target Milestone: ---

While working on _BitInt, I've noticed that we don't emit very good code at
least on x86_64 -m64/-m32 -O2 for:
#ifdef __SIZEOF_INT128__
unsigned __int128
foo (unsigned __int128 x)
{
  x <<= 59;
  return ((__int128) x) >> 59;
}
#else
unsigned long long
foo (unsigned long long x)
{
  x <<= 27;
  return ((long long) x) >> 27;
}
#endif

The sign-extension from 69 resp. 37 bits could be limited solely to the upper
word,
but we uselessly shift the lower word with it as well:
        movq    %rdi, %rax
        movq    %rsi, %rdx
        shldq   $59, %rdi, %rdx
        salq    $59, %rax
        shrdq   $59, %rdx, %rax
        sarq    $59, %rdx
        ret
for -m64 and
        movl    4(%esp), %eax
        movl    8(%esp), %edx
        shldl   $27, %eax, %edx
        sall    $27, %eax
        shrdl   $27, %edx, %eax
        sarl    $27, %edx
        ret
for -m32.
LLVM emits even more horrible code for -m64, but
        movl    4(%esp), %eax
        movl    8(%esp), %edx
        shll    $27, %edx
        sarl    $27, %edx
        retl
for -m32, which looks to me like what we want.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
  2023-07-18 14:32 [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization jakub at gcc dot gnu.org
@ 2023-07-18 14:33 ` jakub at gcc dot gnu.org
  2023-07-18 14:54 ` jakub at gcc dot gnu.org
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-07-18 14:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |uros at gcc dot gnu.org
             Target|                            |x86_64-linux
           Keywords|                            |missed-optimization

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Haven't tried other targets.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
  2023-07-18 14:32 [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization jakub at gcc dot gnu.org
  2023-07-18 14:33 ` [Bug rtl-optimization/110717] " jakub at gcc dot gnu.org
@ 2023-07-18 14:54 ` jakub at gcc dot gnu.org
  2023-07-19  6:38 ` rguenth at gcc dot gnu.org
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-07-18 14:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Improved testcase which shows similar behavior also with bitfields:

#ifdef __SIZEOF_INT128__
#define type __int128
#define N 59
#else
#define type long long
#define N 27
#endif

struct S { type a : sizeof (type) * __CHAR_BIT__ - N; };

unsigned type
foo (unsigned type x)
{
  x <<= N;
  return ((type) x) >> N;
}

unsigned type
bar (struct S *p)
{
  return p->a;
}

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
  2023-07-18 14:32 [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization jakub at gcc dot gnu.org
  2023-07-18 14:33 ` [Bug rtl-optimization/110717] " jakub at gcc dot gnu.org
  2023-07-18 14:54 ` jakub at gcc dot gnu.org
@ 2023-07-19  6:38 ` rguenth at gcc dot gnu.org
  2023-07-19  8:55 ` ubizjak at gmail dot com
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-19  6:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2023-07-19
     Ever confirmed|0                           |1

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
  2023-07-18 14:32 [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization jakub at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-07-19  6:38 ` rguenth at gcc dot gnu.org
@ 2023-07-19  8:55 ` ubizjak at gmail dot com
  2023-07-19  9:00 ` jakub at gcc dot gnu.org
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: ubizjak at gmail dot com @ 2023-07-19  8:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |ubizjak at gmail dot com

--- Comment #4 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 55578
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55578&action=edit
Proposed patch

Patch in testing.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
  2023-07-18 14:32 [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization jakub at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-07-19  8:55 ` ubizjak at gmail dot com
@ 2023-07-19  9:00 ` jakub at gcc dot gnu.org
  2023-07-19  9:30 ` ubizjak at gmail dot com
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-07-19  9:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Thanks.
Shouldn't
INTVAL (operands[2]) < <MODE_SIZE> * BITS_PER_UNIT
be
UINTVAL (operands[2]) < <MODE_SIZE> * BITS_PER_UNIT
just to make sure it doesn't trigger for negative?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
  2023-07-18 14:32 [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization jakub at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-07-19  9:00 ` jakub at gcc dot gnu.org
@ 2023-07-19  9:30 ` ubizjak at gmail dot com
  2023-07-20 18:56 ` cvs-commit at gcc dot gnu.org
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: ubizjak at gmail dot com @ 2023-07-19  9:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |14.0
                 CC|uros at gcc dot gnu.org            |

--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #5)
> Thanks.
> Shouldn't
> INTVAL (operands[2]) < <MODE_SIZE> * BITS_PER_UNIT
> be
> UINTVAL (operands[2]) < <MODE_SIZE> * BITS_PER_UNIT
> just to make sure it doesn't trigger for negative?

Ah, yes, I'll change it.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
  2023-07-18 14:32 [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization jakub at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-07-19  9:30 ` ubizjak at gmail dot com
@ 2023-07-20 18:56 ` cvs-commit at gcc dot gnu.org
  2023-07-20 19:03 ` ubizjak at gmail dot com
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-20 18:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:b50a851eef4b70aabf28fa875d9b2a302d17b66a

commit r14-2684-gb50a851eef4b70aabf28fa875d9b2a302d17b66a
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Thu Jul 20 20:54:51 2023 +0200

    i386: Double-word sign-extension missed-optimization [PR110717]

    When sign-extending the value in a double-word register pair using shift
and
    ashiftrt sequence with the same count immediate value less than word width,
    there is no need to shift the lower word of the value. The sign-extension
    could be limited to the upper word, but we uselessly shift the lower word
    with it as well:
            movq    %rdi, %rax
            movq    %rsi, %rdx
            shldq   $59, %rdi, %rdx
            salq    $59, %rax
            shrdq   $59, %rdx, %rax
            sarq    $59, %rdx
            ret
    for -m64 and
            movl    4(%esp), %eax
            movl    8(%esp), %edx
            shldl   $27, %eax, %edx
            sall    $27, %eax
            shrdl   $27, %edx, %eax
            sarl    $27, %edx
            ret
    for -m32.

    The patch introduces a new post-reload splitter to provide the combined
    ASHIFTRT/SHIFT instruction pattern.  The instruction is split to a sequence
    of SAL and SAR insns with the same count immediate operand:
            movq    %rsi, %rdx
            movq    %rdi, %rax
            salq    $59, %rdx
            sarq    $59, %rdx
            ret

    Some complication is required to properly handle STV transform, where we
    emit a sequence with DImode PSLLQ and PSRAQ insns for 32-bit AVX512VL
    targets when profitable.

    The patch also fixes a small oversight and enables STV transform of SImode
    ASHIFTRT to PSRAD also for SSE2 targets.

            PR target/110717

    gcc/ChangeLog:

            * config/i386/i386-features.cc
            (general_scalar_chain::compute_convert_gain): Calculate gain
            for extend higpart case.
            (general_scalar_chain::convert_op): Handle
            ASHIFTRT/ASHIFT combined RTX.
            (general_scalar_to_vector_candidate_p): Enable ASHIFTRT for
            SImode for SSE2 targets.  Handle ASHIFTRT/ASHIFT combined RTX.
            * config/i386/i386.md (*extend<dwi>2_doubleword_highpart):
            New define_insn_and_split pattern.
            (*extendv2di2_highpart_stv): Ditto.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/pr110717.c: New test.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
  2023-07-18 14:32 [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization jakub at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2023-07-20 18:56 ` cvs-commit at gcc dot gnu.org
@ 2023-07-20 19:03 ` ubizjak at gmail dot com
  2023-07-20 19:19 ` jakub at gcc dot gnu.org
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: ubizjak at gmail dot com @ 2023-07-20 19:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|ubizjak at gmail dot com           |unassigned at gcc dot gnu.org
             Status|ASSIGNED                    |NEW

--- Comment #8 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to CVS Commits from comment #7)
> The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

The patch implements transform for x86 targets only. Due to eventual STV
transformation, x86 targets handle double-word operations in its own way.

I'll left the target-independent implementation to someone else.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
  2023-07-18 14:32 [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization jakub at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2023-07-20 19:03 ` ubizjak at gmail dot com
@ 2023-07-20 19:19 ` jakub at gcc dot gnu.org
  2023-07-20 19:21 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-07-20 19:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |krebbel at gcc dot gnu.org,
                   |                            |law at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org,
                   |                            |segher at gcc dot gnu.org

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Wonder how many important targets provide double-word shift patterns vs. ones
which expand it through generic code.
aarch64 looks quite small:
foo:
        extr    x1, x1, x0, 5
        asr     x1, x1, 59
        ret
powerpc probably could be improved:
foo:
        srwi 9,4,5
        mr 10,9
        rlwimi 4,9,5,0,31-5
        rlwimi 10,3,27,0,31-27
        srawi 3,10,27
        blr
ditto s390x:
foo:
        lg      %r1,0(%r3)
        lg      %r3,8(%r3)
        srlg    %r5,%r3,5
        lghi    %r0,31
        sllg    %r1,%r1,59
        ogr     %r1,%r5
        ngr     %r3,%r0
        sllg    %r5,%r5,5
        srag    %r1,%r1,59
        ogr     %r3,%r5
        stg     %r3,8(%r2)
        stg     %r1,0(%r2)
        br      %r14
ditto riscv64:
foo:
        srli    a5,a0,5
        slli    a1,a1,59
        or      a1,a5,a1
        slli    a5,a1,5
        andi    a0,a0,31
        or      a0,a5,a0
        srai    a1,a1,59
        ret

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
  2023-07-18 14:32 [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization jakub at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2023-07-20 19:19 ` jakub at gcc dot gnu.org
@ 2023-07-20 19:21 ` jakub at gcc dot gnu.org
  2023-07-21  9:06 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-07-20 19:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Though, grepping tmp-mddump.md files shows only x86 having ashlti3 and ashrti3
expanders.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
  2023-07-18 14:32 [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization jakub at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2023-07-20 19:21 ` jakub at gcc dot gnu.org
@ 2023-07-21  9:06 ` jakub at gcc dot gnu.org
  2023-07-21 10:53 ` segher at gcc dot gnu.org
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-07-21  9:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
To handle this in generic code, I think expand_expr_real_2 woiuld need to
notice this case of << operand of arithmetic >> by same amount and tell that to
expand_variable_shift -> expand_shift_1 -> expand_binop somehow.
Wasn't there a proposal for SEXT_EXPR at some point?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
  2023-07-18 14:32 [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization jakub at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2023-07-21  9:06 ` jakub at gcc dot gnu.org
@ 2023-07-21 10:53 ` segher at gcc dot gnu.org
  2023-07-21 16:28 ` segher at gcc dot gnu.org
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: segher at gcc dot gnu.org @ 2023-07-21 10:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

--- Comment #12 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #9)
> Wonder how many important targets provide double-word shift patterns vs.
> ones which expand it through generic code.

Very long ago rs6000 had special code for this.  That was sub-optimal in
other ways, and the generic code generated almost ideal code (sometimes an
extra data movement insn).

> powerpc probably could be improved:
> foo:
>         srwi 9,4,5
>         mr 10,9
>         rlwimi 4,9,5,0,31-5
>         rlwimi 10,3,27,0,31-27
>         srawi 3,10,27
>         blr

This is hugely worse than what we used to do, it seems?

GCC 8 did

        srdi 9,4,5
        rldimi 9,3,59,0
        rldimi 4,9,5,0
        sradi 3,9,59
        blr

GCC 9 started with the unnecessary move.

But we should get only one insert insn in any case!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
  2023-07-18 14:32 [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization jakub at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2023-07-21 10:53 ` segher at gcc dot gnu.org
@ 2023-07-21 16:28 ` segher at gcc dot gnu.org
  2023-07-21 16:46 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: segher at gcc dot gnu.org @ 2023-07-21 16:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

--- Comment #13 from Segher Boessenkool <segher at gcc dot gnu.org> ---
So.  Before expand we have

  _6 = (__int128) x_3(D);
  x.0_1 = _6 << 59;
  _2 = x.0_1 >> 59;
  _4 = (__int128 unsigned) _2;
  return _4;

That should have been optimised better :-(

The RTL code it expands to sets the same pseudo multiple times.  Bad bad bad.
This hampers many optimisations.  Like:
(insn 6 3 7 2 (set (reg:DI 124)
        (lshiftrt:DI (reg:DI 129 [ x+8 ])
            (const_int 5 [0x5]))) "110717.c":6:11 299 {lshrdi3}
     (nil))
(insn 7 6 8 2 (set (reg:DI 132)
        (ashift:DI (reg:DI 128 [ x ])
            (const_int 59 [0x3b]))) "110717.c":6:11 289 {ashldi3}
     (nil))
(insn 8 7 9 2 (set (reg:DI 132)
        (ior:DI (reg:DI 124)
            (reg:DI 132))) "110717.c":6:11 233 {*booldi3}
     (nil))
(They are subregs right after expand, totally unreadable; this is after
subreg1, slightly more readable, but essentially the same code still).

The web pass eventually gets rid of the double set in this case.

Because the shift-left-then-right survives all the way to combine, it (being
the greedy bastard that it is) will use the combiner patterns rs6000 has for
multi-precision shifts, before it would notice the two (multiprecision!)
shifts together are largely a no-op, so you get stuck at a local optimum.
Pat for the course for combine :-/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
  2023-07-18 14:32 [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization jakub at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2023-07-21 16:28 ` segher at gcc dot gnu.org
@ 2023-07-21 16:46 ` pinskia at gcc dot gnu.org
  2023-08-04 15:26 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-21 16:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

--- Comment #14 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #13)
> So.  Before expand we have
> 
>   _6 = (__int128) x_3(D);
>   x.0_1 = _6 << 59;
>   _2 = x.0_1 >> 59;


Jakub is trying to emulate this using shifts but this is better using bitfields
just to get the truncation:


#ifdef __SIZEOF_INT128__
#define type __int128
#else
#define type long long
#endif

struct f
{
#ifdef __SIZEOF_INT128__
  __int128 t:(128-59);
#else
  long long t:(64-27);
#endif
};

unsigned type
foo (unsigned type x)
{
  struct f t;
  t.t = x;
  return t.t;
}

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
  2023-07-18 14:32 [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization jakub at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2023-07-21 16:46 ` pinskia at gcc dot gnu.org
@ 2023-08-04 15:26 ` cvs-commit at gcc dot gnu.org
  2023-10-30 16:18 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-08-04 15:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

--- Comment #15 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:c572f09a751cbd365e2285b30527de5ab9025972

commit r14-2998-gc572f09a751cbd365e2285b30527de5ab9025972
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Fri Aug 4 16:26:06 2023 +0100

    Specify signed/unsigned/dontcare in calls to extract_bit_field_1.

    This patch is inspired by Jakub's work on PR rtl-optimization/110717.
    The bitfield example described in comment #2, looks like:

    struct S { __int128 a : 69; };
    unsigned type bar (struct S *p) {
      return p->a;
    }

    which on x86_64 with -O2 currently generates:

    bar:    movzbl  8(%rdi), %ecx
            movq    (%rdi), %rax
            andl    $31, %ecx
            movq    %rcx, %rdx
            salq    $59, %rdx
            sarq    $59, %rdx
            ret

    The ANDL $31 is interesting... we first extract an unsigned 69-bit bitfield
    by masking/clearing the top bits of the most significant word, and then
    it gets sign-extended, by left shifting and arithmetic right shifting.
    Obviously, this bit-wise AND is redundant, for signed bit-fields, we don't
    require these bits to be cleared, if we're about to set them appropriately.

    This patch eliminates this redundancy in the middle-end, during RTL
    expansion, but extending the extract_bit_field APIs so that the integer
    UNSIGNEDP argument takes a special value; 0 indicates the field should
    be sign extended, 1 (any non-zero value) indicates the field should be
    zero extended, but -1 indicates a third option, that we don't care how
    or whether the field is extended.  By passing and checking this sentinel
    value at the appropriate places we avoid the useless bit masking (on
    all targets).

    For the test case above, with this patch we now generate:

    bar:    movzbl  8(%rdi), %ecx
            movq    (%rdi), %rax
            movq    %rcx, %rdx
            salq    $59, %rdx
            sarq    $59, %rdx
            ret

    2023-08-04  Roger Sayle  <roger@nextmovesoftware.com>

    gcc/ChangeLog
            * expmed.cc (extract_bit_field_1): Document that an UNSIGNEDP
            value of -1 is equivalent to don't care.
            (extract_integral_bit_field): Indicate that we don't require
            the most significant word to be zero extended, if we're about
            to sign extend it.
            (extract_fixed_bit_field_1): Document that an UNSIGNEDP value
            of -1 is equivalent to don't care.  Don't clear the most
            significant bits with AND mask when UNSIGNEDP is -1.

    gcc/testsuite/ChangeLog
            * gcc.target/i386/pr110717-2.c: New test case.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
  2023-07-18 14:32 [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization jakub at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2023-08-04 15:26 ` cvs-commit at gcc dot gnu.org
@ 2023-10-30 16:18 ` cvs-commit at gcc dot gnu.org
  2023-12-13 13:37 ` cvs-commit at gcc dot gnu.org
  2024-05-07  7:41 ` rguenth at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-10-30 16:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

--- Comment #16 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:31cc9824d1cd5e0f7fa145d0831a923479333cd6

commit r14-5013-g31cc9824d1cd5e0f7fa145d0831a923479333cd6
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Mon Oct 30 16:17:42 2023 +0000

    ARC: Improved ARC rtx_costs/insn_cost for SHIFTs and ROTATEs.

    This patch overhauls the ARC backend's insn_cost target hook, and makes
    some related improvements to rtx_costs, BRANCH_COST, etc.  The primary
    goal is to allow the backend to indicate that shifts and rotates are
    slow (discouraged) when the CPU doesn't have a barrel shifter. I should
    also acknowledge Richard Sandiford for inspiring the use of set_cost
    in this rewrite of arc_insn_cost; this implementation borrows heavily
    for the target hooks for AArch64 and ARM.

    The motivating example is derived from PR rtl-optimization/110717.

    struct S { int a : 5; };
    unsigned int foo (struct S *p) {
      return p->a;
    }

    With a barrel shifter, GCC -O2 generates the reasonable:

    foo:    ldb_s   r0,[r0]
            asl_s   r0,r0,27
            j_s.d   [blink]
            asr_s   r0,r0,27

    What's interesting is that during combine, the middle-end actually
    has two shifts by three bits, and a sign-extension from QI to SI.

    Trying 8, 9 -> 11:
        8: r158:SI=r157:QI#0<<0x3
          REG_DEAD r157:QI
        9: r159:SI=sign_extend(r158:SI#0)
          REG_DEAD r158:SI
       11: r155:SI=r159:SI>>0x3
          REG_DEAD r159:SI

    Whilst it's reasonable to simplify this to two shifts by 27 bits when
    the CPU has a barrel shifter, it's actually a significant pessimization
    when these shifts are implemented by loops.  This combination can be
    prevented if the backend provides accurate-ish estimates for insn_cost.

    Previously, without a barrel shifter, GCC -O2 -mcpu=em generates:

    foo:    ldb_s   r0,[r0]
            mov     lp_count,27
            lp      2f
            add     r0,r0,r0
            nop
    2:      # end single insn loop
            mov     lp_count,27
            lp      2f
            asr     r0,r0
            nop
    2:      # end single insn loop
            j_s     [blink]

    which contains two loops and requires about ~113 cycles to execute.
    With this patch to rtx_cost/insn_cost, GCC -O2 -mcpu=em generates:

    foo:    ldb_s   r0,[r0]
            mov_s   r2,0    ;3
            add3    r0,r2,r0
            sexb_s  r0,r0
            asr_s   r0,r0
            asr_s   r0,r0
            j_s.d   [blink]
            asr_s   r0,r0

    which requires only ~6 cycles, for the shorter shifts by 3 and sign
    extension.

    2023-10-30  Roger Sayle  <roger@nextmovesoftware.com>

    gcc/ChangeLog
            * config/arc/arc.cc (arc_rtx_costs): Improve cost estimates.
            Provide reasonable values for SHIFTS and ROTATES by constant
            bit counts depending upon TARGET_BARREL_SHIFTER.
            (arc_insn_cost): Use insn attributes if the instruction is
            recognized.  Avoid calling get_attr_length for type "multi",
            i.e. define_insn_and_split patterns without explicit type.
            Fall-back to set_rtx_cost for single_set and pattern_cost
            otherwise.
            * config/arc/arc.h (COSTS_N_BYTES): Define helper macro.
            (BRANCH_COST): Improve/correct definition.
            (LOGICAL_OP_NON_SHORT_CIRCUIT): Preserve previous behavior.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
  2023-07-18 14:32 [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization jakub at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2023-10-30 16:18 ` cvs-commit at gcc dot gnu.org
@ 2023-12-13 13:37 ` cvs-commit at gcc dot gnu.org
  2024-05-07  7:41 ` rguenth at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-12-13 13:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

--- Comment #17 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:ff8d0ce17fb585a29a83349acbc67b2dd3556629

commit r14-6495-gff8d0ce17fb585a29a83349acbc67b2dd3556629
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Wed Dec 13 13:36:44 2023 +0000

    ARC: Add *extvsi_n_0 define_insn_and_split for PR 110717.

    This patch improves the code generated for bitfield sign extensions on
    ARC cpus without a barrel shifter.

    Compiling the following test case:

    int foo(int x) { return (x<<27)>>27; }

    with -O2 -mcpu=em, generates two loops:

    foo:    mov     lp_count,27
            lp      2f
            add     r0,r0,r0
            nop
    2:      # end single insn loop
            mov     lp_count,27
            lp      2f
            asr     r0,r0
            nop
    2:      # end single insn loop
            j_s     [blink]

    and the closely related test case:

    struct S { int a : 5; };
    int bar (struct S *p) { return p->a; }

    generates the slightly better:

    bar:    ldb_s   r0,[r0]
            mov_s   r2,0    ;3
            add3    r0,r2,r0
            sexb_s  r0,r0
            asr_s   r0,r0
            asr_s   r0,r0
            j_s.d   [blink]
            asr_s   r0,r0

    which uses 6 instructions to perform this particular sign extension.
    It turns out that sign extensions can always be implemented using at
    most three instructions on ARC (without a barrel shifter) using the
    idiom ((x&mask)^msb)-msb [as described in section "2-5 Sign Extension"
    of Henry Warren's book "Hacker's Delight"].  Using this, the sign
    extensions above on ARC's EM both become:

            bmsk_s  r0,r0,4
            xor     r0,r0,16
            sub     r0,r0,16

    which takes about 3 cycles, compared to the ~112 cycles for the loops
    in foo.

    2023-12-13  Roger Sayle  <roger@nextmovesoftware.com>
                Jeff Law  <jlaw@ventanamicro.com>

    gcc/ChangeLog
            * config/arc/arc.md (*extvsi_n_0): New define_insn_and_split to
            implement SImode sign extract using a AND, XOR and MINUS sequence.

    gcc/testsuite/ChangeLog
            * gcc.target/arc/extvsi-1.c: New test case.
            * gcc.target/arc/extvsi-2.c: Likewise.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
  2023-07-18 14:32 [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization jakub at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2023-12-13 13:37 ` cvs-commit at gcc dot gnu.org
@ 2024-05-07  7:41 ` rguenth at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-05-07  7:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|14.0                        |14.2

--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 14.1 is being released, retargeting bugs to GCC 14.2.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-05-07  7:41 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-18 14:32 [Bug rtl-optimization/110717] New: Double-word sign-extension missed-optimization jakub at gcc dot gnu.org
2023-07-18 14:33 ` [Bug rtl-optimization/110717] " jakub at gcc dot gnu.org
2023-07-18 14:54 ` jakub at gcc dot gnu.org
2023-07-19  6:38 ` rguenth at gcc dot gnu.org
2023-07-19  8:55 ` ubizjak at gmail dot com
2023-07-19  9:00 ` jakub at gcc dot gnu.org
2023-07-19  9:30 ` ubizjak at gmail dot com
2023-07-20 18:56 ` cvs-commit at gcc dot gnu.org
2023-07-20 19:03 ` ubizjak at gmail dot com
2023-07-20 19:19 ` jakub at gcc dot gnu.org
2023-07-20 19:21 ` jakub at gcc dot gnu.org
2023-07-21  9:06 ` jakub at gcc dot gnu.org
2023-07-21 10:53 ` segher at gcc dot gnu.org
2023-07-21 16:28 ` segher at gcc dot gnu.org
2023-07-21 16:46 ` pinskia at gcc dot gnu.org
2023-08-04 15:26 ` cvs-commit at gcc dot gnu.org
2023-10-30 16:18 ` cvs-commit at gcc dot gnu.org
2023-12-13 13:37 ` cvs-commit at gcc dot gnu.org
2024-05-07  7:41 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).