From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-472086-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 13902 invoked by alias); 3 Jan 2015 01:18:30 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 13870 invoked by uid 48); 3 Jan 2015 01:18:23 -0000
From: "zoltan at hidvegi dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/64477] New: x86 sse unnecessary GPR spill
Date: Sat, 03 Jan 2015 01:18:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 4.9.2
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: zoltan at hidvegi dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter
Message-ID: <bug-64477-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-01/txt/msg00080.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64477

            Bug ID: 64477
           Summary: x86 sse unnecessary GPR spill
           Product: gcc
           Version: 4.9.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zoltan at hidvegi dot com

typedef signed char v16si __attribute__ ((vector_size (16)));
v16si ary(signed char a)
{
    return v16si{a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a};
}

Compiled with g++-4.9 -m64 -O2 -fomit-frame-pointer -Wall -I$HOME/dev/common
-mssse3 -std=gnu++11 -S xmm_test.C

I get

        pxor    %xmm1, %xmm1
        movd    %edi, %xmm0
        movl    %edi, -12(%rsp)
        pshufb  %xmm1, %xmm0
        ret

Note the unnecessary spill of edi, with gcc-4.8 this does not happen, so you
may consider this a regression. I think this may happen because it first tries
to move from gpr to xmm via the stack, but later optimizes to a direct gpr to
xmm move, but the stack spill stays.

When using -march=corei7-avx and 4x4 int vector, gcc-4.9 uses store to stack
and vbroadcastss instead of movd and pshufd  $0, %xmm1, %xmm0 used by gcc-4.8,
again gcc-4.8 seems better to me. But even gcc-4.8 goes through the stack in
that case with -mtune=generic