public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/59501] New: Vector Gather with GCC 4.9 2013-12-08 Snapshot
@ 2013-12-13 22:56 freddie at witherden dot org
  2013-12-19 13:31 ` [Bug tree-optimization/59501] [4.9 Regression] " rguenth at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: freddie at witherden dot org @ 2013-12-13 22:56 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501

            Bug ID: 59501
           Summary: Vector Gather with GCC 4.9 2013-12-08 Snapshot
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: freddie at witherden dot org

Compiling the following snippet with the 2013-12-08 shapshot of 4.9:

    typedef double v4d __attribute__((vector_size(32)));

    v4d gather(double *base, unsigned *offt)
    {
        v4d tmp = { base[offt[0]], base[offt[1]], base[offt[2]],  base[offt[3]]
};
        return tmp;
    }

with flags: -std=c++11 -Ofast -march=core-avx2 emits the following ASM:

0000000000000000 <_Z6gatherPdPj>:
   0:   8b 16                   mov    (%rsi),%edx
   2:   4c 8d 54 24 08          lea    0x8(%rsp),%r10
   7:   48 83 e4 e0             and    $0xffffffffffffffe0,%rsp
   b:   44 8b 46 08             mov    0x8(%rsi),%r8d
   f:   41 ff 72 f8             pushq  -0x8(%r10)
  13:   55                      push   %rbp
  14:   8b 46 04                mov    0x4(%rsi),%eax
  17:   48 89 e5                mov    %rsp,%rbp
  1a:   8b 4e 0c                mov    0xc(%rsi),%ecx
  1d:   41 52                   push   %r10
  1f:   41 5a                   pop    %r10
  21:   c4 a1 7b 10 14 c7       vmovsd (%rdi,%r8,8),%xmm2
  27:   c5 fb 10 1c d7          vmovsd (%rdi,%rdx,8),%xmm3
  2c:   c5 e9 16 0c cf          vmovhpd (%rdi,%rcx,8),%xmm2,%xmm1
  31:   5d                      pop    %rbp
  32:   c5 e1 16 04 c7          vmovhpd (%rdi,%rax,8),%xmm3,%xmm0
  37:   c4 e3 7d 18 c1 01       vinsertf128 $0x1,%xmm1,%ymm0,%ymm0
  3d:   49 8d 62 f8             lea    -0x8(%r10),%rsp
  41:   c3                      retq   

which appears to be a regression when compared with 4.8.2:

0000000000000000 <_Z6gatherPdPj>:
   0:   8b 16                   mov    (%rsi),%edx
   2:   44 8b 46 08             mov    0x8(%rsi),%r8d
   6:   8b 46 04                mov    0x4(%rsi),%eax
   9:   8b 4e 0c                mov    0xc(%rsi),%ecx
   c:   c5 fb 10 1c d7          vmovsd (%rdi,%rdx,8),%xmm3
  11:   c4 a1 7b 10 14 c7       vmovsd (%rdi,%r8,8),%xmm2
  17:   c5 e1 16 0c c7          vmovhpd (%rdi,%rax,8),%xmm3,%xmm1
  1c:   c5 e9 16 04 cf          vmovhpd (%rdi,%rcx,8),%xmm2,%xmm0
  21:   c4 e3 75 18 c0 01       vinsertf128 $0x1,%xmm0,%ymm1,%ymm0
  27:   c3                      retq


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot
  2013-12-13 22:56 [Bug tree-optimization/59501] New: Vector Gather with GCC 4.9 2013-12-08 Snapshot freddie at witherden dot org
@ 2013-12-19 13:31 ` rguenth at gcc dot gnu.org
  2013-12-19 18:09 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-12-19 13:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
      Known to work|                            |4.8.2
   Target Milestone|---                         |4.9.0
            Summary|Vector Gather with GCC 4.9  |[4.9 Regression] Vector
                   |2013-12-08 Snapshot         |Gather with GCC 4.9
                   |                            |2013-12-08 Snapshot


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot
  2013-12-13 22:56 [Bug tree-optimization/59501] New: Vector Gather with GCC 4.9 2013-12-08 Snapshot freddie at witherden dot org
  2013-12-19 13:31 ` [Bug tree-optimization/59501] [4.9 Regression] " rguenth at gcc dot gnu.org
@ 2013-12-19 18:09 ` jakub at gcc dot gnu.org
  2013-12-19 18:31 ` hjl.tools at gmail dot com
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-12-19 18:09 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2013-12-19
                 CC|                            |hjl at gcc dot gnu.org,
                   |                            |hubicka at gcc dot gnu.org,
                   |                            |jakub at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
This regressed with r203171.  Before that change, -maccumulate-outgoing-args
was true, but now it isn't.  The changes I see in the RTL dumps is that there
is a (dead) load from r10 register into a pseudo from expand to jump pass, then
the RTL is pretty much the same (different insn numbers) until
pro_and_epilogue, which creates all the garbage.
The reason why the load from r10 is created and supposedly for the different
pro_and_epilogue behavior is ix86_get_drap_rtx:
  if (ix86_force_drap || !ACCUMULATE_OUTGOING_ARGS)
    crtl->need_drap = true;
But in the function in question, LRA has not spilled anything to the stack, the
stack actually isn't used at all, and neither is the drap reg live at the start
of the function (that would be another reason why we'd need to emit some
setting of the drap reg, but probably wouldn't need to dynamically realign the
stack).


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot
  2013-12-13 22:56 [Bug tree-optimization/59501] New: Vector Gather with GCC 4.9 2013-12-08 Snapshot freddie at witherden dot org
  2013-12-19 13:31 ` [Bug tree-optimization/59501] [4.9 Regression] " rguenth at gcc dot gnu.org
  2013-12-19 18:09 ` jakub at gcc dot gnu.org
@ 2013-12-19 18:31 ` hjl.tools at gmail dot com
  2013-12-19 18:39 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: hjl.tools at gmail dot com @ 2013-12-19 18:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501

--- Comment #2 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Jakub Jelinek from comment #1)
>
>   if (ix86_force_drap || !ACCUMULATE_OUTGOING_ARGS)
>     crtl->need_drap = true;

They are needed for -m32.  Otherwise, we got

FAIL: g++.dg/torture/stackalign/eh-fastcall-1.C  -Os -fpic execution test
FAIL: g++.dg/torture/stackalign/eh-global-1.C  -Os -fpic execution test
FAIL: g++.dg/torture/stackalign/eh-inline-1.C  -Os -fpic execution test
FAIL: g++.dg/torture/stackalign/eh-thiscall-1.C  -Os -fpic execution test


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot
  2013-12-13 22:56 [Bug tree-optimization/59501] New: Vector Gather with GCC 4.9 2013-12-08 Snapshot freddie at witherden dot org
                   ` (2 preceding siblings ...)
  2013-12-19 18:31 ` hjl.tools at gmail dot com
@ 2013-12-19 18:39 ` jakub at gcc dot gnu.org
  2013-12-19 18:43 ` hjl.tools at gmail dot com
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-12-19 18:39 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to H.J. Lu from comment #2)
> (In reply to Jakub Jelinek from comment #1)
> >
> >   if (ix86_force_drap || !ACCUMULATE_OUTGOING_ARGS)
> >     crtl->need_drap = true;
> 
> They are needed for -m32.  Otherwise, we got
> 
> FAIL: g++.dg/torture/stackalign/eh-fastcall-1.C  -Os -fpic execution test
> FAIL: g++.dg/torture/stackalign/eh-global-1.C  -Os -fpic execution test
> FAIL: g++.dg/torture/stackalign/eh-inline-1.C  -Os -fpic execution test
> FAIL: g++.dg/torture/stackalign/eh-thiscall-1.C  -Os -fpic execution test

I'm not saying that ix86_get_drap_rtx should be changed.
But perhaps:
  /* If the only reason for frame_pointer_needed is that we conservatively
     assumed stack realignment might be needed, but in the end nothing that
     needed the stack alignment had been spilled, clear frame_pointer_needed
     and say we don't need stack realignment.  */
  if (stack_realign
      && !crtl->need_drap
      && frame_pointer_needed
      && crtl->is_leaf
      && flag_omit_frame_pointer
      && crtl->sp_is_unchanging
      && !ix86_current_function_calls_tls_descriptor
      && !crtl->accesses_prior_frames
      && !cfun->calls_alloca
      && !crtl->calls_eh_return
      && !(flag_stack_check && STACK_CHECK_MOVING_SP)
      && !ix86_frame_pointer_required ()
      && get_frame_size () == 0
      && ix86_nsaved_sseregs () == 0
      && ix86_varargs_gpr_size + ix86_varargs_fpr_size == 0)
in ix86_finalize_stack_realign_flags could be tweaked, not to bail out always
if we have !crtl->need_drap, because then it will be set pretty much for all
leaf functions.  I wonder if we can e.g. ask DF whether the drap reg is live at
entry, if it isn't live, supposedly we can clear crtl->need_drap or ignore it
for this purpose?  Also, I wonder even if we actually need the drap register we
can't for the leaf functions just avoid the dynamic realignment and simply let
the prologue set the drap reg to the right value.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot
  2013-12-13 22:56 [Bug tree-optimization/59501] New: Vector Gather with GCC 4.9 2013-12-08 Snapshot freddie at witherden dot org
                   ` (3 preceding siblings ...)
  2013-12-19 18:39 ` jakub at gcc dot gnu.org
@ 2013-12-19 18:43 ` hjl.tools at gmail dot com
  2013-12-30  8:53 ` jakub at gcc dot gnu.org
  2013-12-30  9:46 ` jakub at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: hjl.tools at gmail dot com @ 2013-12-19 18:43 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501

--- Comment #4 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Jakub Jelinek from comment #3)
> 
> I'm not saying that ix86_get_drap_rtx should be changed.
> But perhaps:
>   /* If the only reason for frame_pointer_needed is that we conservatively
>      assumed stack realignment might be needed, but in the end nothing that
>      needed the stack alignment had been spilled, clear frame_pointer_needed
>      and say we don't need stack realignment.  */
>   if (stack_realign
>       && !crtl->need_drap
>       && frame_pointer_needed
>       && crtl->is_leaf
>       && flag_omit_frame_pointer
>       && crtl->sp_is_unchanging
>       && !ix86_current_function_calls_tls_descriptor
>       && !crtl->accesses_prior_frames
>       && !cfun->calls_alloca
>       && !crtl->calls_eh_return
>       && !(flag_stack_check && STACK_CHECK_MOVING_SP)
>       && !ix86_frame_pointer_required ()
>       && get_frame_size () == 0
>       && ix86_nsaved_sseregs () == 0
>       && ix86_varargs_gpr_size + ix86_varargs_fpr_size == 0)
> in ix86_finalize_stack_realign_flags could be tweaked, not to bail out
> always if we have !crtl->need_drap, because then it will be set pretty much
> for all leaf functions.  I wonder if we can e.g. ask DF whether the drap reg
> is live at entry, if it isn't live, supposedly we can clear crtl->need_drap
> or ignore it
> for this purpose?  Also, I wonder even if we actually need the drap register
> we can't for the leaf functions just avoid the dynamic realignment and
> simply let the prologue set the drap reg to the right value.

It sounds a good idea.  BTW, I think we have very decent drap
coverage in gcc testsuite, as long as both -m32 and -m64 are
tested.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot
  2013-12-13 22:56 [Bug tree-optimization/59501] New: Vector Gather with GCC 4.9 2013-12-08 Snapshot freddie at witherden dot org
                   ` (4 preceding siblings ...)
  2013-12-19 18:43 ` hjl.tools at gmail dot com
@ 2013-12-30  8:53 ` jakub at gcc dot gnu.org
  2013-12-30  9:46 ` jakub at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-12-30  8:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Author: jakub
Date: Mon Dec 30 08:53:10 2013
New Revision: 206243

URL: http://gcc.gnu.org/viewcvs?rev=206243&root=gcc&view=rev
Log:
    PR target/59501
    * config/i386/i386.c (ix86_save_reg): Don't return true for drap_reg
    if !crtl->stack_realign_needed.
    (ix86_finalize_stack_realign_flags): If drap_reg isn't live on entry
    and stack_realign_needed will be false, clear drap_reg and need_drap.
    Optimize leaf functions that don't need stack frame even if
    crtl->need_drap.

    * gcc.target/i386/pr59501-1.c: New test.
    * gcc.target/i386/pr59501-1a.c: New test.
    * gcc.target/i386/pr59501-2.c: New test.
    * gcc.target/i386/pr59501-2a.c: New test.
    * gcc.target/i386/pr59501-3.c: New test.
    * gcc.target/i386/pr59501-3a.c: New test.
    * gcc.target/i386/pr59501-4.c: New test.
    * gcc.target/i386/pr59501-4a.c: New test.
    * gcc.target/i386/pr59501-5.c: New test.
    * gcc.target/i386/pr59501-6.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr59501-1.c
    trunk/gcc/testsuite/gcc.target/i386/pr59501-1a.c
    trunk/gcc/testsuite/gcc.target/i386/pr59501-2.c
    trunk/gcc/testsuite/gcc.target/i386/pr59501-2a.c
    trunk/gcc/testsuite/gcc.target/i386/pr59501-3.c
    trunk/gcc/testsuite/gcc.target/i386/pr59501-3a.c
    trunk/gcc/testsuite/gcc.target/i386/pr59501-4.c
    trunk/gcc/testsuite/gcc.target/i386/pr59501-4a.c
    trunk/gcc/testsuite/gcc.target/i386/pr59501-5.c
    trunk/gcc/testsuite/gcc.target/i386/pr59501-6.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/testsuite/ChangeLog


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/59501] [4.9 Regression] Vector Gather with GCC 4.9 2013-12-08 Snapshot
  2013-12-13 22:56 [Bug tree-optimization/59501] New: Vector Gather with GCC 4.9 2013-12-08 Snapshot freddie at witherden dot org
                   ` (5 preceding siblings ...)
  2013-12-30  8:53 ` jakub at gcc dot gnu.org
@ 2013-12-30  9:46 ` jakub at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-12-30  9:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59501

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-12-30  9:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-13 22:56 [Bug tree-optimization/59501] New: Vector Gather with GCC 4.9 2013-12-08 Snapshot freddie at witherden dot org
2013-12-19 13:31 ` [Bug tree-optimization/59501] [4.9 Regression] " rguenth at gcc dot gnu.org
2013-12-19 18:09 ` jakub at gcc dot gnu.org
2013-12-19 18:31 ` hjl.tools at gmail dot com
2013-12-19 18:39 ` jakub at gcc dot gnu.org
2013-12-19 18:43 ` hjl.tools at gmail dot com
2013-12-30  8:53 ` jakub at gcc dot gnu.org
2013-12-30  9:46 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).