[Bug rtl-optimization/48128] New: Excessive code generated for vectorized loop

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/48128] New: Excessive code generated for vectorized loop
@ 2011-03-15  0:02 d.g.gorbachev at gmail dot com
  2011-03-15  0:57 ` [Bug rtl-optimization/48128] " pinskia at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: d.g.gorbachev at gmail dot com @ 2011-03-15  0:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48128

           Summary: Excessive code generated for vectorized loop
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: d.g.gorbachev@gmail.com
            Target: i686-*-*


Created attachment 23658
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23658
Testcase (compile with `-O3 -march=pentium3')


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/48128] Excessive code generated for vectorized loop
  2011-03-15  0:02 [Bug rtl-optimization/48128] New: Excessive code generated for vectorized loop d.g.gorbachev at gmail dot com
@ 2011-03-15  0:57 ` pinskia at gcc dot gnu.org
  2011-03-15  2:00 ` d.g.gorbachev at gmail dot com
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2011-03-15  0:57 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48128

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> 2011-03-15 00:16:10 UTC ---
The problem is CSE, is CSEing the address of baz which confuses the register
allocator because of not enough registers to work with on x86.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/48128] Excessive code generated for vectorized loop
  2011-03-15  0:02 [Bug rtl-optimization/48128] New: Excessive code generated for vectorized loop d.g.gorbachev at gmail dot com
  2011-03-15  0:57 ` [Bug rtl-optimization/48128] " pinskia at gcc dot gnu.org
@ 2011-03-15  2:00 ` d.g.gorbachev at gmail dot com
  2011-03-15 10:32 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: d.g.gorbachev at gmail dot com @ 2011-03-15  2:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48128

--- Comment #2 from Dmitry Gorbachev <d.g.gorbachev at gmail dot com> 2011-03-15 00:57:30 UTC ---
When marking baz as static an compiling with -mno-sse, the result is even more
strange...


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/48128] Excessive code generated for vectorized loop
  2011-03-15  0:02 [Bug rtl-optimization/48128] New: Excessive code generated for vectorized loop d.g.gorbachev at gmail dot com
  2011-03-15  0:57 ` [Bug rtl-optimization/48128] " pinskia at gcc dot gnu.org
  2011-03-15  2:00 ` d.g.gorbachev at gmail dot com
@ 2011-03-15 10:32 ` rguenth at gcc dot gnu.org
  2014-06-19  5:35 ` d.g.gorbachev at gmail dot com
  2021-08-27  5:24 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-03-15 10:32 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48128

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|i686-*-*                    |i686-*-*, x86_64-*-*
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2011.03.15 10:32:18
     Ever Confirmed|0                           |1

--- Comment #3 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-03-15 10:32:18 UTC ---
Confirmed.  The fun thing is that the tree level optimized code looks exactly
the same ...

On x86_64 we get

foo2:
.LFB1:
        .cfi_startproc
        movq    baz(%rip), %rdx
        movq    %rdx, -24(%rsp)
        movl    %edx, %eax
        movq    baz+8(%rip), %rdx
        movq    %rdx, -16(%rsp)
        movdqa  -24(%rsp), %xmm0
        movdqa  %xmm0, bar(%rip)
        movdqa  baz+16(%rip), %xmm0
        movdqa  %xmm0, bar+16(%rip)
        ret

so it spills everything to the stack here as well!?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/48128] Excessive code generated for vectorized loop
  2011-03-15  0:02 [Bug rtl-optimization/48128] New: Excessive code generated for vectorized loop d.g.gorbachev at gmail dot com
                   ` (2 preceding siblings ...)
  2011-03-15 10:32 ` rguenth at gcc dot gnu.org
@ 2014-06-19  5:35 ` d.g.gorbachev at gmail dot com
  2021-08-27  5:24 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: d.g.gorbachev at gmail dot com @ 2014-06-19  5:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48128

--- Comment #4 from Dmitry Gorbachev <d.g.gorbachev at gmail dot com> ---
(In reply to comment #2)

> When marking baz as static an compiling with -mno-sse, the result is even
> more strange...

Still true for GCC 4.9.1 and 4.10.0.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/48128] Excessive code generated for vectorized loop
  2011-03-15  0:02 [Bug rtl-optimization/48128] New: Excessive code generated for vectorized loop d.g.gorbachev at gmail dot com
                   ` (3 preceding siblings ...)
  2014-06-19  5:35 ` d.g.gorbachev at gmail dot com
@ 2021-08-27  5:24 ` pinskia at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-27  5:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48128

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.8.0
             Status|NEW                         |RESOLVED
      Known to work|                            |4.8.0
           Keywords|                            |ra
         Resolution|---                         |FIXED

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The problem was this RTL:
(insn 7 2 25 2 (set (reg:V4SI 65 [ MEM[(int[8] *)&baz] ])
        (mem/c:V4SI (symbol_ref:SI ("baz") [flags 0x2]  <var_decl
0x7fc7df4971e0 baz>) [2 MEM[(int[8] *)&baz]+0 S16 A256])) /app/example.cpp:19
1080 {*movv4si_internal}
     (nil))

(insn 25 7 26 2 (set (reg:SI 72)
        (subreg:SI (reg:V4SI 65 [ MEM[(int[8] *)&baz] ]) 0))
/app/example.cpp:19 -1
     (nil))

Which was produced by dse.

In GCC 4.8 we produce the same except lra produces:
(insn 25 7 26 2 (set (reg:SI 0 ax [72])
        (mem/c:SI (symbol_ref:SI ("baz") [flags 0x2]  <var_decl 0x7ffb6912d428
baz>) [2 MEM[(int[8] *)&baz]+0 S4 A256])) /app/example.cpp:19 89
{*movsi_internal}
     (nil))

So this got fixed with the new reload (LRA) :).


Note -fno-tree-loop-distribute-patterns is needed otherwise you get a memcpy
:).
With -mno-sse, the extra register push was gone in GCC 4.6.0.

GCC 8 also no longer vectorizers the code based on the cost model of pentium3,
you need to add -fno-vect-cost-model.

Starting GCC 9, GCC is able to produce for -msse2 case:
        movd    %xmm0, %eax

Anyways the original issue is fixed.


With the trunk -O3 -m32 -msse GCC produces:
foo2():
        movdqa  baz, %xmm7
        movd    %xmm7, %eax
        movaps  %xmm7, bar
        movdqa  baz+16, %xmm7
        movaps  %xmm7, bar+16
        ret

With -O3 -m32 -msse2:
foo2():
        movdqa  baz, %xmm7
        movd    %xmm7, %eax
        movaps  %xmm7, bar
        movdqa  baz+16, %xmm7
        movaps  %xmm7, bar+16
        ret

The problem is -march=pentium3 causes a loop for the memcpy (tuning).

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-27  5:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-15  0:02 [Bug rtl-optimization/48128] New: Excessive code generated for vectorized loop d.g.gorbachev at gmail dot com
2011-03-15  0:57 ` [Bug rtl-optimization/48128] " pinskia at gcc dot gnu.org
2011-03-15  2:00 ` d.g.gorbachev at gmail dot com
2011-03-15 10:32 ` rguenth at gcc dot gnu.org
2014-06-19  5:35 ` d.g.gorbachev at gmail dot com
2021-08-27  5:24 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).