[Bug target/107627] New: [13] Regression int128_t shift generates extra xor/or.

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/107627] New: [13] Regression int128_t shift generates extra xor/or.
@ 2022-11-11  2:06 crazylht at gmail dot com
  2022-11-11  7:05 ` [Bug target/107627] " crazylht at gmail dot com
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: crazylht at gmail dot com @ 2022-11-11  2:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627

            Bug ID: 107627
           Summary: [13] Regression int128_t shift generates extra xor/or.
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: crazylht at gmail dot com
  Target Milestone: ---
            Target: x86_64-*-* i?86-*-*

The case is from PR106220.

#include<stdint.h>
static __inline
unsigned __int128 mk_u128(uint64_t hi, uint64_t lo)
{
  return ((unsigned __int128)hi << 64) | lo;
}

static __inline
uint64_t shrdq(uint64_t hi, uint64_t lo, unsigned rshift)
{
  return (uint64_t)(mk_u128(hi, lo) >> (rshift % 64));
}


void foo1to1(uint64_t *dst, const uint64_t* src, unsigned rshift)
{
  uint64_t r0 = shrdq(src[0], src[1], rshift);
  dst[0] = r0;
}


GCC trunk generates 

foo1to1:
        mov     rax, QWORD PTR [rsi]
        mov     r8, QWORD PTR [rsi+8]
        mov     ecx, edx
        xor     r9d, r9d
        mov     rdx, rax
        xor     eax, eax
        or      rax, r8
        or      rdx, r9
        shrd    rax, rdx, cl
        mov     QWORD PTR [rdi], rax

GCC12.2 generates

foo1to1:
        mov     rax, QWORD PTR [rsi+8]
        mov     r8, rdi
        mov     rdi, QWORD PTR [rsi]
        mov     ecx, edx
        shrd    rax, rdi, cl
        mov     QWORD PTR [r8], rax
        ret

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/107627] [13] Regression int128_t shift generates extra xor/or.
  2022-11-11  2:06 [Bug target/107627] New: [13] Regression int128_t shift generates extra xor/or crazylht at gmail dot com
@ 2022-11-11  7:05 ` crazylht at gmail dot com
  2022-11-11  7:09 ` [Bug target/107627] [13 Regression] " pinskia at gcc dot gnu.org
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: crazylht at gmail dot com @ 2022-11-11  7:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627

--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
Looks like caused by r13-1379-ge8a46e5cdab500

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/107627] [13 Regression] int128_t shift generates extra xor/or.
  2022-11-11  2:06 [Bug target/107627] New: [13] Regression int128_t shift generates extra xor/or crazylht at gmail dot com
  2022-11-11  7:05 ` [Bug target/107627] " crazylht at gmail dot com
@ 2022-11-11  7:09 ` pinskia at gcc dot gnu.org
  2022-11-11  9:41 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-11-11  7:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[13] Regression int128_t    |[13 Regression] int128_t
                   |shift generates extra       |shift generates extra
                   |xor/or.                     |xor/or.
           Keywords|                            |missed-optimization
   Target Milestone|---                         |13.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/107627] [13 Regression] int128_t shift generates extra xor/or.
  2022-11-11  2:06 [Bug target/107627] New: [13] Regression int128_t shift generates extra xor/or crazylht at gmail dot com
  2022-11-11  7:05 ` [Bug target/107627] " crazylht at gmail dot com
  2022-11-11  7:09 ` [Bug target/107627] [13 Regression] " pinskia at gcc dot gnu.org
@ 2022-11-11  9:41 ` rguenth at gcc dot gnu.org
  2022-11-21 10:34 ` marxin at gcc dot gnu.org
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-11-11  9:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2022-11-11
             Status|UNCONFIRMED                 |NEW

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/107627] [13 Regression] int128_t shift generates extra xor/or.
  2022-11-11  2:06 [Bug target/107627] New: [13] Regression int128_t shift generates extra xor/or crazylht at gmail dot com
                   ` (2 preceding siblings ...)
  2022-11-11  9:41 ` rguenth at gcc dot gnu.org
@ 2022-11-21 10:34 ` marxin at gcc dot gnu.org
  2022-11-22 12:52 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: marxin at gcc dot gnu.org @ 2022-11-21 10:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |marxin at gcc dot gnu.org

--- Comment #2 from Martin Liška <marxin at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #1)
> Looks like caused by r13-1379-ge8a46e5cdab500

Confirmed.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/107627] [13 Regression] int128_t shift generates extra xor/or.
  2022-11-11  2:06 [Bug target/107627] New: [13] Regression int128_t shift generates extra xor/or crazylht at gmail dot com
                   ` (3 preceding siblings ...)
  2022-11-21 10:34 ` marxin at gcc dot gnu.org
@ 2022-11-22 12:52 ` jakub at gcc dot gnu.org
  2022-11-25 23:40 ` roger at nextmovesoftware dot com
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-11-22 12:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |law at gcc dot gnu.org,
                   |                            |uros at gcc dot gnu.org,
                   |                            |vmakarov at gcc dot gnu.org
           Priority|P3                          |P1

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Slightly cleaned up testcase:
static inline unsigned __int128
foo (unsigned long long x, unsigned long long y)
{
  return ((unsigned __int128) x << 64) | y;
}

static inline unsigned long long
bar (unsigned long long x, unsigned long long y, unsigned z)
{
  return foo (x, y) >> (z % 64);
}

void
baz (unsigned long long *x, const unsigned long long *y, unsigned z)
{
  x[0] = bar (y[0], y[1], z);
}

This is pretty serious regression.  Though for:
static inline unsigned long long
qux (unsigned int x, unsigned int y)
{
  return ((unsigned long long) x << 32) | y;
}

static inline unsigned int
corge (unsigned int x, unsigned int y, unsigned z)
{
  return qux (x, y) >> (z % 32);
}

void
garply (unsigned int *x, const unsigned int *y, unsigned z)
{
  x[0] = corge (y[0], y[1], z);
}
gcc has behaved that way with -O2 -m32 for quiet some time (regressed
with r11-6188-g0b76990a9d75d97b84014e37519086b81824c307 from:
        pushl   %ebx
        movl    12(%esp), %ebx
        movl    16(%esp), %ecx
        movl    (%ebx), %edx
        movl    4(%ebx), %eax
        shrdl   %edx, %eax
        movl    8(%esp), %edx
        movl    %eax, (%edx)
        popl    %ebx
        ret
to:
        pushl   %edi
        xorl    %edi, %edi
        pushl   %esi
        pushl   %ebx
        movl    20(%esp), %ebx
        movl    24(%esp), %ecx
        movl    4(%ebx), %esi
        movl    (%ebx), %edx
        movl    %esi, %eax
        orl     %edi, %edx
        shrdl   %edx, %eax
        movl    16(%esp), %edx
        movl    %eax, (%edx)
        popl    %ebx
        popl    %esi
        popl    %edi
        ret
).  And with -O2 -m32 -msse2 this regressed with
r6-3562-g006ba5047cea15ce6f29b0847009ae901b874d50 (addition of STV).
While the r11-6188 case seems unrelated and I should probably file it
as a separate PR against RTL SSA, the addition of STV for ia32 and
the r13-1379 change are very much similar, they turn the ior{di,ti}3 from
being split during expansion to after reload, and there is nothing that can
fix it up afterwards.  For -m64 on trunk we have:
(insn 8 5 23 2 (set (reg:DI 95 [ *y_3(D) ])
        (mem:DI (reg/v/f:DI 92 [ y ]) [1 *y_3(D)+0 S8 A64])) "pr107627.c":4:11
82 {*movdi_internal}
     (nil))
(insn 23 8 24 2 (clobber (reg:TI 96 [ *y_3(D) ])) "pr107627.c":4:33 -1
     (nil))
(insn 24 23 21 2 (set (reg:TI 96 [ *y_3(D) ])
        (const_int 0 [0])) "pr107627.c":4:33 -1
     (nil))
(insn 21 24 22 2 (set (subreg:DI (reg:TI 96 [ *y_3(D) ]) 8)
        (reg:DI 95 [ *y_3(D) ])) "pr107627.c":4:33 82 {*movdi_internal}
     (expr_list:REG_DEAD (reg:DI 95 [ *y_3(D) ])
        (nil)))
(insn 22 21 12 2 (set (subreg:DI (reg:TI 96 [ *y_3(D) ]) 0)
        (const_int 0 [0])) "pr107627.c":4:33 82 {*movdi_internal}
     (nil))
(insn 12 22 25 2 (set (reg:DI 98 [ MEM[(const long long unsigned int *)y_3(D) +
8B] ])
        (mem:DI (plus:DI (reg/v/f:DI 92 [ y ])
                (const_int 8 [0x8])) [1 MEM[(const long long unsigned int
*)y_3(D) + 8B]+0 S8 A64])) "pr107627.c":4:40 82 {*movdi_internal}
     (expr_list:REG_DEAD (reg/v/f:DI 92 [ y ])
        (nil)))
(insn 25 12 26 2 (clobber (reg:TI 97 [ MEM[(const long long unsigned int
*)y_3(D) + 8B] ])) "pr107627.c":4:40 -1
     (nil))
(insn 26 25 13 2 (set (reg:TI 97 [ MEM[(const long long unsigned int *)y_3(D) +
8B] ])
        (const_int 0 [0])) "pr107627.c":4:40 -1
     (nil))
(insn 13 26 14 2 (set (subreg:DI (reg:TI 97 [ MEM[(const long long unsigned int
*)y_3(D) + 8B] ]) 0)
        (reg:DI 98 [ MEM[(const long long unsigned int *)y_3(D) + 8B] ]))
"pr107627.c":4:40 82 {*movdi_internal}
     (expr_list:REG_DEAD (reg:DI 98 [ MEM[(const long long unsigned int
*)y_3(D) + 8B] ])
        (nil)))
(insn 14 13 15 2 (set (subreg:DI (reg:TI 97 [ MEM[(const long long unsigned int
*)y_3(D) + 8B] ]) 8)
        (const_int 0 [0])) "pr107627.c":4:40 82 {*movdi_internal}
     (nil))
(insn 15 14 16 2 (parallel [
            (set (reg:TI 99)
                (ior:TI (reg:TI 96 [ *y_3(D) ])
                    (reg:TI 97 [ MEM[(const long long unsigned int *)y_3(D) +
8B] ])))
            (clobber (reg:CC 17 flags))
        ]) "pr107627.c":4:40 571 {*iordi3_doubleword}
     (expr_list:REG_DEAD (reg:TI 97 [ MEM[(const long long unsigned int
*)y_3(D) + 8B] ])
        (expr_list:REG_DEAD (reg:TI 96 [ *y_3(D) ])
            (expr_list:REG_UNUSED (reg:CC 17 flags)
                (nil)))))
before combine and the combiner punts on this altogether due to the setting of
parts of the pseudos with subregs,
so there is no hope in even recognizing the idiom of building a 2 words integer
from 2 word pieces as special.
Then after RA we have postreload CSE which perhaps could help but at that point
it isn't split yet,
and when we get to split2, where is nothing that would propagate the
const0_rtxs set to hard registers
into the ior instructions and figure out they are useless.

So, shall we do something in the STV pass and recognize these concats of two
parts into one larger register,
turn them into some define_insn_and_split which is then split after reload that
would allow for better
code generation, or do that in some generic pass somewhere before reload,
and/or try to forward propagate
those 0s after RA (after split2 I mean)?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/107627] [13 Regression] int128_t shift generates extra xor/or.
  2022-11-11  2:06 [Bug target/107627] New: [13] Regression int128_t shift generates extra xor/or crazylht at gmail dot com
                   ` (4 preceding siblings ...)
  2022-11-22 12:52 ` jakub at gcc dot gnu.org
@ 2022-11-25 23:40 ` roger at nextmovesoftware dot com
  2022-11-29 19:12 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: roger at nextmovesoftware dot com @ 2022-11-25 23:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |roger at nextmovesoftware dot com

--- Comment #4 from Roger Sayle <roger at nextmovesoftware dot com> ---
I believe that tweaking i386.md's *concat<mode><dwi>3_? patterns, that call
i386-expand.cc's split_double_concat, should resolve/improve the issue. 
Specifically in this case *concatditi3_[1234].  Currently, these patterns
expect the source operands (specifically the zero_extend operand) to be
registers, hence combine reports:

Trying 10, 9 -> 11:
   10: r96:TI=zero_extend([r92:DI+0x8])
      REG_DEAD r92:DI
    9: {r95:TI=r94:TI<<0x40;clobber flags:CC;}
      REG_DEAD r94:TI
      REG_UNUSED flags:CC
   11: {r97:TI=r95:TI|r96:TI;clobber flags:CC;}
      REG_DEAD r96:TI
      REG_DEAD r95:TI
      REG_UNUSED flags:CC
...
Failed to match this instruction:
(set (reg:TI 97)
    (ior:TI (ashift:TI (reg:TI 94 [ *y_3(D) ])
            (const_int 64 [0x40]))
        (zero_extend:TI (mem:DI (plus:DI (reg/v/f:DI 92 [ yD.1994 ])
                    (const_int 8 [0x8])) [1 MEM[(const long long unsigned
intD.1
6 *)y_3(D) + 8B]+0 S8 A64]))))

If *concatditi3_1 allowed a memory_operand for operand 3, it would match.
The oversight is that the zero_extendditi2 pattern (in insn #10) accepts memory
operands.  I suspect changing the predicates for operands 0, 1 and 3 to be
nonimmediate_operand, but then providing constraints for each permissible
variation should work.

Using the <dwi> expansion of these splitters should also fix the (pre-existing)
-m32 code generation issue, as pointed out by Jakub in comment #3.

Perhaps:
(define_insn_and_split "*concat<mode><dwi>3_1"
  [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r,r")
        (any_or_plus:<DWI>
          (ashift:<DWI> (match_operand:<DWI> 1 "nonimmediate_operand" "r,o,r")
                        (match_operand:<DWI> 2 "const_int_operand"))
          (zero_extend:<DWI> (match_operand:DWIH 3 "nonimmediate_operand"
"r,r,o"))))]

I hope this helps.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/107627] [13 Regression] int128_t shift generates extra xor/or.
  2022-11-11  2:06 [Bug target/107627] New: [13] Regression int128_t shift generates extra xor/or crazylht at gmail dot com
                   ` (5 preceding siblings ...)
  2022-11-25 23:40 ` roger at nextmovesoftware dot com
@ 2022-11-29 19:12 ` jakub at gcc dot gnu.org
  2022-11-29 19:55 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-11-29 19:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Roger Sayle from comment #4)
> Perhaps:
> (define_insn_and_split "*concat<mode><dwi>3_1"
>   [(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r,r")
>         (any_or_plus:<DWI>
>           (ashift:<DWI> (match_operand:<DWI> 1 "nonimmediate_operand"
> "r,o,r")
>                         (match_operand:<DWI> 2 "const_int_operand"))
>           (zero_extend:<DWI> (match_operand:DWIH 3 "nonimmediate_operand"
> "r,r,o"))))]
> 
> I hope this helps.

You're probably right, but:
1) I don't see why operands[1] or operands[3] can't be both memory operands
2) I don't see why they should use o constraint, while for operands[0] we need
   it to be offsettable, because we need to refer to both of its halves, but
   for the others we just refer to those memories or their low parts (which on
   little endian comes first).  Though, maybe it isn't a good idea to narrow
   128-bit or 64-bit loads to just 64-bit or 32-bit loads of their low half,
   the former might trap while the latter might succeed; so maybe limit
   nonimmediate_operand on inputs to zero_extend operands?
On the other side, split_double_concat needs to take into account when the
destination
and/or source are MEMs and some registers are used in their addresses.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/107627] [13 Regression] int128_t shift generates extra xor/or.
  2022-11-11  2:06 [Bug target/107627] New: [13] Regression int128_t shift generates extra xor/or crazylht at gmail dot com
                   ` (6 preceding siblings ...)
  2022-11-29 19:12 ` jakub at gcc dot gnu.org
@ 2022-11-29 19:55 ` jakub at gcc dot gnu.org
  2022-11-30 13:03 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-11-29 19:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #5)
> On the other side, split_double_concat needs to take into account when the
> destination
> and/or source are MEMs and some registers are used in their addresses.

For the r <- m, m alternative I guess using &r <- m, m would be fine, but for
the case
where one input is a register and one memory &r <- m, r or &r <- r, m might be
too penalizing.  If both halves of the destination register are used in m
address, I guess there is always an option to load from the memory into one of
the registers first and in the case where the other input is the other half, do
the swap afterwards?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/107627] [13 Regression] int128_t shift generates extra xor/or.
  2022-11-11  2:06 [Bug target/107627] New: [13] Regression int128_t shift generates extra xor/or crazylht at gmail dot com
                   ` (7 preceding siblings ...)
  2022-11-29 19:55 ` jakub at gcc dot gnu.org
@ 2022-11-30 13:03 ` jakub at gcc dot gnu.org
  2022-12-01  8:35 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-11-30 13:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 53990
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53990&action=edit
gcc13-pr107627.patch

Untested fix.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/107627] [13 Regression] int128_t shift generates extra xor/or.
  2022-11-11  2:06 [Bug target/107627] New: [13] Regression int128_t shift generates extra xor/or crazylht at gmail dot com
                   ` (8 preceding siblings ...)
  2022-11-30 13:03 ` jakub at gcc dot gnu.org
@ 2022-12-01  8:35 ` cvs-commit at gcc dot gnu.org
  2022-12-01  8:36 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-12-01  8:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627

--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:2c089640279614e34b8712bfb318a9d4fc8ac8fe

commit r13-4435-g2c089640279614e34b8712bfb318a9d4fc8ac8fe
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Dec 1 09:29:23 2022 +0100

    i386: Improve *concat<mode><dwi>3_{1,2,3,4} patterns [PR107627]

    On the first testcase we've regressed since 12 at -O2:
    -       movq    8(%rsi), %rax
    -       movq    %rdi, %r8
    -       movq    (%rsi), %rdi
    +       movq    (%rsi), %rax
    +       movq    8(%rsi), %r8
            movl    %edx, %ecx
    -       shrdq   %rdi, %rax
    -       movq    %rax, (%r8)
    +       xorl    %r9d, %r9d
    +       movq    %rax, %rdx
    +       xorl    %eax, %eax
    +       orq     %r8, %rax
    +       orq     %r9, %rdx
    +       shrdq   %rdx, %rax
    +       movq    %rax, (%rdi)
    On the second testcase we've emitted such terrible code
    with the useless xors and ors for a long time.
    For PR91681 the *concat<mode><dwi>3_{1,2,3,4} patterns have been added
    but they allow just register inputs and register or memory offsettable
    output.
    The following patch fixes this by allowing also memory inputs on those
    patterns, because the pattern is then split to 0-2 emit_move_insns or
    one xchg and those can handle loads from memory too just fine.
    So that we don't narrow memory loads (source has 128-bit (or for ia32
    64-bit) load and we would make 64-bit (or for ia32 32-bit) load out of it),
    register_operand -> nonmemory_operand change is done only for operands
    in zero_extend arguments.  o <- m, m or o <- m, r or o <- r, m alternatives
    aren't used, we'd lack registers to perform the moves.  But what is
    in addition to the current ro <- r, r supported are r <- m, r and r <- r, m
    (in that case we just need to be careful about corner cases, see what
    emit_move_insn we'd call and if we wouldn't clobber registers used in m's
    address before loading - split_double_concat handles that now) and
    &r <- m, m (in that case I think the early clobber is the easiest
solution).

    The first testcase then on 12 -> patched trunk at -O2 changes:
    -       movq    8(%rsi), %rax
    -       movq    %rdi, %r8
    -       movq    (%rsi), %rdi
    +       movq    8(%rsi), %r9
    +       movq    (%rsi), %r10
            movl    %edx, %ecx
    -       shrdq   %rdi, %rax
    -       movq    %rax, (%r8)
    +       movq    %r9, %rax
    +       shrdq   %r10, %rax
    +       movq    %rax, (%rdi)
    so same amount of instructions and second testcase 12 -> patched trunk
    at -O2 -m32:
    -       pushl   %edi
    -       xorl    %edi, %edi
            pushl   %esi
    -       movl    16(%esp), %esi
    +       pushl   %ebx
    +       movl    16(%esp), %eax
            movl    20(%esp), %ecx
    -       movl    (%esi), %eax
    -       movl    4(%esi), %esi
    -       movl    %eax, %edx
    -       movl    $0, %eax
    -       orl     %edi, %edx
    -       orl     %esi, %eax
    -       shrdl   %edx, %eax
            movl    12(%esp), %edx
    +       movl    4(%eax), %ebx
    +       movl    (%eax), %esi
    +       movl    %ebx, %eax
    +       shrdl   %esi, %eax
            movl    %eax, (%edx)
    +       popl    %ebx
            popl    %esi
    -       popl    %edi

    BTW, I wonder if we couldn't add additional patterns which would catch
    the case where one of the operands is constant and how does this interact
    with the stv pass in 32-bit mode where I think stv is right after combine,
    so if we match these patterns, perhaps it would be nice to handle them
    in stv (unless they are handled there already).

    2022-12-01  Jakub Jelinek  <jakub@redhat.com>

            PR target/107627
            * config/i386/i386.md (*concat<mode><dwi>3_1,
*concat<mode><dwi>3_2):
            For operands which are zero_extend arguments allow memory if
            output operand is a register.
            (*concat<mode><dwi>3_3, *concat<mode><dwi>3_4): Likewise.  If
            both input operands are memory, use early clobber on output
operand.
            * config/i386/i386-expand.cc (split_double_concat): Deal with
corner
            cases where one input is memory and the other is not and the
address
            of the memory input uses a register we'd overwrite before loading
            the memory into a register.

            * gcc.target/i386/pr107627-1.c: New test.
            * gcc.target/i386/pr107627-2.c: New test.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/107627] [13 Regression] int128_t shift generates extra xor/or.
  2022-11-11  2:06 [Bug target/107627] New: [13] Regression int128_t shift generates extra xor/or crazylht at gmail dot com
                   ` (9 preceding siblings ...)
  2022-12-01  8:35 ` cvs-commit at gcc dot gnu.org
@ 2022-12-01  8:36 ` jakub at gcc dot gnu.org
  2022-12-07 14:31 ` jakub at gcc dot gnu.org
  2022-12-08 13:57 ` cvs-commit at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-12-01  8:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/107627] [13 Regression] int128_t shift generates extra xor/or.
  2022-11-11  2:06 [Bug target/107627] New: [13] Regression int128_t shift generates extra xor/or crazylht at gmail dot com
                   ` (10 preceding siblings ...)
  2022-12-01  8:36 ` jakub at gcc dot gnu.org
@ 2022-12-07 14:31 ` jakub at gcc dot gnu.org
  2022-12-08 13:57 ` cvs-commit at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-12-07 14:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 54034
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54034&action=edit
gcc13-pr107627-imm.patch

Untested patch to improve the case where one of the concat operand is constant.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/107627] [13 Regression] int128_t shift generates extra xor/or.
  2022-11-11  2:06 [Bug target/107627] New: [13] Regression int128_t shift generates extra xor/or crazylht at gmail dot com
                   ` (11 preceding siblings ...)
  2022-12-07 14:31 ` jakub at gcc dot gnu.org
@ 2022-12-08 13:57 ` cvs-commit at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-12-08 13:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107627

--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:1dc49df4eeaec311f19638861c64e90d7ec696e5

commit r13-4558-g1dc49df4eeaec311f19638861c64e90d7ec696e5
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Dec 8 14:55:46 2022 +0100

    i386: Add *concat<mode><dwi>3_{5,6,7} patterns [PR107627]

    On Thu, Dec 01, 2022 at 09:09:51AM +0100, Jakub Jelinek via Gcc-patches
wrote:
    > BTW, I wonder if we couldn't add additional patterns which would catch
    > the case where one of the operands is constant.

    The following patch does add those.
    The difference with the patch on the 2 testcases is:
     baz:
    -       movq    8(%rsi), %rax
    +       movq    8(%rsi), %rsi
    +       movq    %rdi, %r8
            movl    %edx, %ecx
    -       xorl    %r8d, %r8d
    -       xorl    %edx, %edx
    -       movabsq $-2401053089206453570, %r9
    -       orq     %r8, %rax
    -       orq     %r9, %rdx
    -       shrdq   %rdx, %rax
    -       movq    %rax, (%rdi)
    +       movabsq $-2401053089206453570, %rdi
    +       movq    %rsi, %rax
    +       shrdq   %rdi, %rax
    +       movq    %rax, (%r8)
     qux:
    -       movq    (%rsi), %rax
    +       movq    %rdi, %r8
    +       movq    (%rsi), %rdi
            movl    %edx, %ecx
    -       xorl    %r9d, %r9d
    -       movabsq $-2401053089206453570, %r8
    -       movq    %rax, %rdx
    -       xorl    %eax, %eax
    -       orq     %r8, %rax
    -       orq     %r9, %rdx
    -       shrdq   %rdx, %rax
    -       movq    %rax, (%rdi)
    +       movabsq $-2401053089206453570, %rsi
    +       movq    %rsi, %rax
    +       shrdq   %rdi, %rax
    +       movq    %rax, (%r8)
    and
     garply:
            pushl   %esi
    -       xorl    %edx, %edx
    +       movl    $-559038737, %esi
            pushl   %ebx
            movl    16(%esp), %eax
    -       orl     $-559038737, %edx
            movl    20(%esp), %ecx
    -       movl    4(%eax), %eax
    -       shrdl   %edx, %eax
            movl    12(%esp), %edx
    +       movl    4(%eax), %ebx
    +       movl    %ebx, %eax
    +       shrdl   %esi, %eax
     fred:
    ...
            movl    16(%esp), %eax
    +       movl    $-889275714, %ebx
            movl    20(%esp), %ecx
    -       movl    (%eax), %eax
    -       movl    %eax, %edx
    -       movl    $0, %eax
    -       orl     $-889275714, %eax
    -       shrdl   %edx, %eax
            movl    12(%esp), %edx
    +       movl    (%eax), %esi
    +       movl    %ebx, %eax
    +       shrdl   %esi, %eax

    2022-12-08  Jakub Jelinek  <jakub@redhat.com>

            PR target/107627
            * config/i386/i386.md (HALF, half): New mode attributes.
            (*concat<half><mode>3_5, *concat<mode><dwi>3_6,
            *concat<mode><dwi>3_7): New define_insn_and_split patterns.

            * gcc.target/i386/pr107627-3.c: New test.
            * gcc.target/i386/pr107627-4.c: New test.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-12-08 13:57 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-11  2:06 [Bug target/107627] New: [13] Regression int128_t shift generates extra xor/or crazylht at gmail dot com
2022-11-11  7:05 ` [Bug target/107627] " crazylht at gmail dot com
2022-11-11  7:09 ` [Bug target/107627] [13 Regression] " pinskia at gcc dot gnu.org
2022-11-11  9:41 ` rguenth at gcc dot gnu.org
2022-11-21 10:34 ` marxin at gcc dot gnu.org
2022-11-22 12:52 ` jakub at gcc dot gnu.org
2022-11-25 23:40 ` roger at nextmovesoftware dot com
2022-11-29 19:12 ` jakub at gcc dot gnu.org
2022-11-29 19:55 ` jakub at gcc dot gnu.org
2022-11-30 13:03 ` jakub at gcc dot gnu.org
2022-12-01  8:35 ` cvs-commit at gcc dot gnu.org
2022-12-01  8:36 ` jakub at gcc dot gnu.org
2022-12-07 14:31 ` jakub at gcc dot gnu.org
2022-12-08 13:57 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).