public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/115487] New: -march=cascadelake causes spilling
@ 2024-06-14  9:31 rguenth at gcc dot gnu.org
  2024-06-14  9:42 ` [Bug target/115487] " rguenth at gcc dot gnu.org
  2024-06-14  9:59 ` rguenth at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-14  9:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115487

            Bug ID: 115487
           Summary: -march=cascadelake causes spilling
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

When looking at the report of gcc.target/i386/vect-strided-3.c FAILing with
-march=cascadelake I arrived at using -mno-sse4 instead of just -mno-avx
to avoid pextrq from being used instead of movhps.

But then adding -m32 does

        movq    %xmm3, 8(%esp)
        movd    12(%esp), %xmm1
        movd    8(%esp), %xmm0
..
        punpckldq       %xmm1, %xmm0

but only when -march=cascadelake, not with -march=x86-64.  So this confuses
the -march=cascadelake -m32 testresult.  Same with -march=znver2 but not
with -mavx2 -mno-sse4.

diff --git a/gcc/testsuite/gcc.target/i386/vect-strided-3.c
b/gcc/testsuite/gcc.target/i386/vect-strided-3.c
index b462701a0b2..f9c54a6f715 100644
--- a/gcc/testsuite/gcc.target/i386/vect-strided-3.c
+++ b/gcc/testsuite/gcc.target/i386/vect-strided-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -msse2 -mno-avx -fno-tree-slp-vectorize" } */
+/* { dg-options "-O2 -msse2 -mno-sse4 -fno-tree-slp-vectorize" } */

 void foo (int * __restrict a, int *b, int s)
 {

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/115487] -march=cascadelake causes spilling
  2024-06-14  9:31 [Bug target/115487] New: -march=cascadelake causes spilling rguenth at gcc dot gnu.org
@ 2024-06-14  9:42 ` rguenth at gcc dot gnu.org
  2024-06-14  9:59 ` rguenth at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-14  9:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115487

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
It looks like it's STV2 doing

    28: r131:V4SI=[r102:SI]
    30: r132:V4SI=[r102:SI+0x10]
    33: r133:DI=vec_select(r131:V4SI#0,parallel)
-   34: [r103:SI]=r133:DI
+   92: r143:DI#0=vec_merge(vec_duplicate(r133:DI#0),const_vector,0x1)
+   93: r144:DI#0=vec_merge(vec_duplicate(r133:DI#4),const_vector,0x1)
+   94: r143:DI#0=vec_select(vec_concat(r143:DI#0,r144:DI#0),parallel)
+   89: r141:DI#0=vec_merge(vec_duplicate(r133:DI#0),const_vector,0x1)
+   90: r142:DI#0=vec_merge(vec_duplicate(r133:DI#4),const_vector,0x1)
+   91: r141:DI#0=vec_select(vec_concat(r141:DI#0,r142:DI#0),parallel)
+   34: [r103:SI]=r141:DI
    36: [r103:SI+0x8]=vec_select(r131:V4SI#0,parallel)
       REG_DEAD r131:V4SI
-   38: [r103:SI+0x10]=r133:DI
-      REG_DEAD r133:DI
+   38: [r103:SI+0x10]=r143:DI
+      REG_DEAD r143:DI

for some weird reason and that later causes spilling.

I'll also note that STV2 doesn't seem to recog any of the created insns.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/115487] -march=cascadelake causes spilling
  2024-06-14  9:31 [Bug target/115487] New: -march=cascadelake causes spilling rguenth at gcc dot gnu.org
  2024-06-14  9:42 ` [Bug target/115487] " rguenth at gcc dot gnu.org
@ 2024-06-14  9:59 ` rguenth at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-14  9:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115487

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Building chain #7...
  Adding insn 34 to chain #7
  r133 def in insn 33 isn't convertible
  Mark r133 def in insn 33 as requiring both modes in chain #7
Collected chain #7...
  insns: 34
  defs to convert: r133
Computing gain for chain #7...
  Instruction gain 8 for    34: [r103:SI]=r133:DI
  Instruction conversion gain: 8
  Registers conversion cost: 6
  Total gain: 2
Converting chain #7...
deferring rescan insn with uid = 89.
deferring rescan insn with uid = 90.
deferring rescan insn with uid = 91.
  Copied r133 to a vector register r141 for insn 33

the question is why we start the chain at

   33: r133:DI=vec_select(r131:V4SI#0,parallel)

rather than at

   28: r131:V4SI=[r102:SI]

or why we are working with SImode regs/vectors in the lowering.  The code
before STV2 looks perfectly OK, and most definitely the costs are totally
off here.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-06-14  9:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-14  9:31 [Bug target/115487] New: -march=cascadelake causes spilling rguenth at gcc dot gnu.org
2024-06-14  9:42 ` [Bug target/115487] " rguenth at gcc dot gnu.org
2024-06-14  9:59 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).