[Bug rtl-optimization/65078] New: [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/65078] New: [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2
@ 2015-02-16 14:17 ysrumyan at gmail dot com
  2015-02-16 14:19 ` [Bug rtl-optimization/65078] " ysrumyan at gmail dot com
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: ysrumyan at gmail dot com @ 2015-02-16 14:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078

            Bug ID: 65078
           Summary: [5.0 Regression] 4.9 and 5.0 generate more spill-fill
                    in comparison with 4.8.2
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ysrumyan at gmail dot com

Using attached simple test-case extracted from codec we found out that 4.8.2
compiler generates more compact binaries in comparison with 4.9 & 5.0 compilers
('-O3 -msse2 -m32" options were used), for example 

grep -c "%esp" t1.4.8.2.s                                                       
25
grep -c "%esp" t1.trunk.s                                                       
75


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/65078] [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2
  2015-02-16 14:17 [Bug rtl-optimization/65078] New: [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2 ysrumyan at gmail dot com
@ 2015-02-16 14:19 ` ysrumyan at gmail dot com
  2015-02-16 14:38 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: ysrumyan at gmail dot com @ 2015-02-16 14:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078

--- Comment #1 from Yuri Rumyantsev <ysrumyan at gmail dot com> ---
Created attachment 34782
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34782&action=edit
test-case to reproduce

Options -m32 -msse2 -O3 must be used.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/65078] [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2
  2015-02-16 14:17 [Bug rtl-optimization/65078] New: [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2 ysrumyan at gmail dot com
  2015-02-16 14:19 ` [Bug rtl-optimization/65078] " ysrumyan at gmail dot com
@ 2015-02-16 14:38 ` rguenth at gcc dot gnu.org
  2015-02-16 14:40 ` ubizjak at gmail dot com
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-16 14:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-02-16
     Ever confirmed|0                           |1

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.

4.8 has

  _62 = MEM[(__m64 * {ref-all})dest_284];
  _63 = VIEW_CONVERT_EXPR<long long int>(_62);
  _64 = {_63, 0};
  _65 = VIEW_CONVERT_EXPR<vector(16) char>(_64);
  _66 = __builtin_ia32_punpcklbw128 (_65, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0 });
  _67 = VIEW_CONVERT_EXPR<__m128i>(_66);
  _68 = VIEW_CONVERT_EXPR<vector(8) short int>(_67);
  _70 = __builtin_ia32_paddw128 (pretmp_327, _68);
  _71 = __builtin_ia32_packuswb128 (_70, _70);
  _72 = VIEW_CONVERT_EXPR<__m128i>(_71);
  _73 = __builtin_ia32_vec_ext_v2di (_72, 0);
  MEM[(long long int *)dest_284] = _73;

while 5

  _79 = MEM[(__m64 * {ref-all})dest_268];
  _78 = VIEW_CONVERT_EXPR<long long int>(_79);
  _77 = {_78, 0};
  _74 = VIEW_CONVERT_EXPR<vector(16) char>(_77);
  _73 = __builtin_ia32_punpcklbw128 (_74, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0 });
  _69 = VIEW_CONVERT_EXPR<vector(8) short unsigned int>(_73);
  _68 = _69 + pretmp_312;
  _67 = VIEW_CONVERT_EXPR<vector(8) short int>(_68);
  _64 = __builtin_ia32_packuswb128 (_67, _67);
  _63 = VIEW_CONVERT_EXPR<__m128i>(_64);
  _62 = BIT_FIELD_REF <_63, 64, 0>;
  MEM[(long long int *)dest_268] = _62;

so some intrinsics are no longer builtins.  But the real difference is
the following weird store sequence

        packuswb        %xmm1, %xmm2
        movaps  %xmm2, (%esp)
        movl    (%esp), %esi
        movl    4(%esp), %edi
        movl    %esi, (%eax)
        movl    %edi, 4(%eax)

compared to just

        packuswb        %xmm1, %xmm1
        movq    %xmm1, (%edx)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/65078] [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2
  2015-02-16 14:17 [Bug rtl-optimization/65078] New: [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2 ysrumyan at gmail dot com
  2015-02-16 14:19 ` [Bug rtl-optimization/65078] " ysrumyan at gmail dot com
  2015-02-16 14:38 ` rguenth at gcc dot gnu.org
@ 2015-02-16 14:40 ` ubizjak at gmail dot com
  2015-02-16 14:44 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: ubizjak at gmail dot com @ 2015-02-16 14:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078

--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
Similar to PR21182 ?

As suggested in the above PR, does "-fschedule-insns -fsched-pressure" make any
difference?
>From gcc-bugs-return-477432-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Mon Feb 16 14:43:40 2015
Return-Path: <gcc-bugs-return-477432-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 17012 invoked by alias); 16 Feb 2015 14:43:40 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 16981 invoked by uid 48); 16 Feb 2015 14:43:37 -0000
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/65078] [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2
Date: Mon, 16 Feb 2015 14:43:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords: ra
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: keywords cf_gcctarget cc
Message-ID: <bug-65078-4-Lva18y5mxF@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-65078-4@http.gcc.gnu.org/bugzilla/>
References: <bug-65078-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-02/txt/msg01765.txt.bz2
Content-length: 547

https://gcc.gnu.org/bugzilla/show_bug.cgi?ide078

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |ra
             Target|                            |i?86-*-*
                 CC|                            |vmakarov at gcc dot gnu.org

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Seems LRA does a very bad job here for some reason.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/65078] [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2
  2015-02-16 14:17 [Bug rtl-optimization/65078] New: [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2 ysrumyan at gmail dot com
                   ` (3 preceding siblings ...)
  2015-02-16 14:44 ` rguenth at gcc dot gnu.org
@ 2015-02-16 14:44 ` jakub at gcc dot gnu.org
  2015-02-16 14:50 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-02-16 14:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Seems this has started with r216247, and indeed, compiling the testcase with
-std=gnu89 even with latest trunk results in those 25 %esp references, while
using -std=gnu11 even with r190000 results in 69 %esp references.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/65078] [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2
  2015-02-16 14:17 [Bug rtl-optimization/65078] New: [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2 ysrumyan at gmail dot com
                   ` (2 preceding siblings ...)
  2015-02-16 14:40 ` ubizjak at gmail dot com
@ 2015-02-16 14:44 ` rguenth at gcc dot gnu.org
  2015-02-16 14:44 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-16 14:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #3)
> Similar to PR21182 ?
> 
> As suggested in the above PR, does "-fschedule-insns -fsched-pressure" make
> any difference?

No.
>From gcc-bugs-return-477435-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Mon Feb 16 14:47:12 2015
Return-Path: <gcc-bugs-return-477435-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 22504 invoked by alias); 16 Feb 2015 14:47:12 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 22475 invoked by uid 48); 16 Feb 2015 14:47:09 -0000
From: "trippels at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug ipa/64812] [4.9 regression] x86 LibreOffice Build failure: undefined reference to acquire
Date: Mon, 16 Feb 2015 14:47:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: ipa
X-Bugzilla-Version: 4.9.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: trippels at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.9.3
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: cf_gcctarget bug_status cc component cf_known_to_fail
Message-ID: <bug-64812-4-HUK84EGzph@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-64812-4@http.gcc.gnu.org/bugzilla/>
References: <bug-64812-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-02/txt/msg01768.txt.bz2
Content-length: 1627

https://gcc.gnu.org/bugzilla/show_bug.cgi?idd812

Markus Trippelsdorf <trippels at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|i?86-*-*                    |
             Status|WAITING                     |NEW
                 CC|                            |hubicka at gcc dot gnu.org
          Component|regression                  |ipa
      Known to fail|                            |5.0

--- Comment #7 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
markus@x4 tmp % cat fmgridif.ii
template <class T> class A
{
  T *p;

public:
  A (T *p1) : p (p1) { p->acquire (); }
};

class B
{
public:
  virtual void acquire ();
};
class D : public B
{
};
class F : B
{
  int mrContext;
};
class WindowListenerMultiplexer : F, public D
{
  void
  acquire ()
  {
    acquire ();
  }
};
class C
{
  void createPeer () throw ();
  WindowListenerMultiplexer maWindowListeners;
};
class FmXGridPeer
{
public:
  void addWindowListener (A<D>);
} a;
void
C::createPeer () throw ()
{
  a.addWindowListener (&maWindowListeners);
}

markus@x4 tmp % g++ -Os -c fmgridif.ii && nm --demangle fmgridif.o | grep
WindowListenerMultiplexer
                 U non-virtual thunk to WindowListenerMultiplexer::acquire()

markus@x4 tmp % g++ -O2 -c fmgridif.ii && nm --demangle fmgridif.o | grep
WindowListenerMultiplexer
0000000000000000 W WindowListenerMultiplexer::acquire()
0000000000000010 W non-virtual thunk to WindowListenerMultiplexer::acquire()

Not sure if the devirtualization is valid. Honza?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/65078] [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2
  2015-02-16 14:17 [Bug rtl-optimization/65078] New: [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2 ysrumyan at gmail dot com
                   ` (4 preceding siblings ...)
  2015-02-16 14:44 ` jakub at gcc dot gnu.org
@ 2015-02-16 14:50 ` jakub at gcc dot gnu.org
  2015-02-18 17:15 ` [Bug rtl-optimization/65078] [5 " law at redhat dot com
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-02-16 14:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Ah, if I hack up the preprocessed source, so that some functions like atoi,
gnu_dev_major etc. are __attribute__((__gnu_inline__)), I get 25 %esp
references both with latest trunk and 4.8.  So, I really can't reproduce...


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/65078] [5 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2
  2015-02-16 14:17 [Bug rtl-optimization/65078] New: [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2 ysrumyan at gmail dot com
                   ` (5 preceding siblings ...)
  2015-02-16 14:50 ` jakub at gcc dot gnu.org
@ 2015-02-18 17:15 ` law at redhat dot com
  2015-03-17 10:33 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: law at redhat dot com @ 2015-02-18 17:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078

Jeffrey A. Law <law at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |5.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/65078] [5 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2
  2015-02-16 14:17 [Bug rtl-optimization/65078] New: [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2 ysrumyan at gmail dot com
                   ` (6 preceding siblings ...)
  2015-02-18 17:15 ` [Bug rtl-optimization/65078] [5 " law at redhat dot com
@ 2015-03-17 10:33 ` jakub at gcc dot gnu.org
  2015-03-17 11:05 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-03-17 10:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |glisse at gcc dot gnu.org

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Ah, I have managed to reproduce it, but only if it is preprocessed with each
compiler separately.  Seems it is the _mm_storel_epi64 change from r217608 that
matters here, if I use the pre-r217608 content of that inline function, I get
25 %esp references with all compilers I've tried, with r217608 or later
_mm_storel_epi64 I get 75 %esp references even with 4.8.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/65078] [5 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2
  2015-02-16 14:17 [Bug rtl-optimization/65078] New: [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2 ysrumyan at gmail dot com
                   ` (7 preceding siblings ...)
  2015-03-17 10:33 ` jakub at gcc dot gnu.org
@ 2015-03-17 11:05 ` jakub at gcc dot gnu.org
  2015-03-17 12:22 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-03-17 11:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |uros at gcc dot gnu.org

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So, in *.optimized the changes are just 16 times a difference like:
-  _62 = __builtin_ia32_vec_ext_v2di (_63, 0);
+  _62 = BIT_FIELD_REF <_63, 64, 0>;
And during expansion, the difference is:
-;; _62 = __builtin_ia32_vec_ext_v2di (_63, 0);
-
-(insn 42 41 43 (set (reg:V2DI 329)
-        (subreg:V2DI (reg:V16QI 138 [ D.4823 ]) 0)) ./include/emmintrin.h:722
-1
-     (nil))
-
-(insn 43 42 44 (set (reg:DI 330)
-        (vec_select:DI (reg:V2DI 329)
-            (parallel [
-                    (const_int 0 [0])
-                ]))) ./include/emmintrin.h:722 -1
-     (nil))
-
-(insn 44 43 0 (set (reg:DI 136 [ D.4825 ])
-        (reg:DI 330)) ./include/emmintrin.h:722 -1
-      (nil))
-
-;; MEM[(long long int *)dest_268] = _62;
-
-(insn 45 44 0 (set (mem:DI (reg/v/f:SI 317 [ dest ]) [3 MEM[(long long int
*)dest_268]+0 S8 A64])
-        (reg:DI 136 [ D.4825 ])) ./include/emmintrin.h:722 -1
-      (nil))
+;; MEM[(long long int *)dest_268] = _62;
+ 
+(insn 42 41 43 (set (reg:TI 329)
+        (subreg:TI (reg:V16QI 138 [ D.4825 ]) 0)) ./include/emmintrin.h:722 -1
+      (nil))
+(insn 43 42 0 (set (mem:DI (reg/v/f:SI 317 [ dest ]) [3 MEM[(long long int
*)dest_268]+0 S8 A64])
+        (subreg:DI (reg:TI 329) 0)) ./include/emmintrin.h:722 -1
+      (nil))

With the new storel_epi64 we get before RA:
(insn 43 40 44 3 (set (mem:DI (reg/v/f:SI 317 [ dest ]) [3 MEM[(long long int
*)dest_268]+0 S8 A64])
        (subreg:DI (reg:V16QI 328) 0)) ./include/emmintrin.h:722 89
{*movdi_internal}
     (expr_list:REG_DEAD (reg:V16QI 328)
        (nil)))
out of this, and not surprisingly the RA reloads it by storing the V16QI 328
into stack and loads back a DImode value, while with the old intrinsic before
RA we have:
(insn 45 43 46 3 (set (mem:DI (reg/v/f:SI 317 [ dest ]) [3 MEM[(long long int
*)dest_268]+0 S8 A64])
        (vec_select:DI (subreg:V2DI (reg:V16QI 328) 0)
            (parallel [
                    (const_int 0 [0])
                ]))) ./include/emmintrin.h:722 3660 {*vec_extractv2di_0_sse}
     (expr_list:REG_DEAD (reg:V16QI 328)
        (nil)))
and don't need to spill that.  Now the question is if we can tell RA somehow
(secondary reload) that to get a DImode lowpart subreg (and SImode too?) out of
a vector register it can use the *vec_extractv2di_0_sse instruction for that.
Or add !TARGET_64BIT pattern for storing a DImode lowpart subreg of a vector
register (any mode there?) into memory?  Or ensure that the BIT_FIELD_REF is
expanded as the builtin used to be.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/65078] [5 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2
  2015-02-16 14:17 [Bug rtl-optimization/65078] New: [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2 ysrumyan at gmail dot com
                   ` (8 preceding siblings ...)
  2015-03-17 11:05 ` jakub at gcc dot gnu.org
@ 2015-03-17 12:22 ` jakub at gcc dot gnu.org
  2015-03-17 15:13 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-03-17 12:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
During the expansion, we don't try vec_extract because we are trying to extract
low DImode (64bits) out of a V16QImode pseudo, which is not really vector
element extraction, and the middle end doesn't know that on this target it is
beneficial to just subreg the V16QImode pseudo to identically sized vector with
different sized elements (V2DImode in this case).

So, in order to handle this at the expansion level, we probably would need to
add some new optab like vec_extract that would be not just about the source
mode, but also target mode (conversion optab?), or some target hook or macro
that would instruct the middle-end to also try to subreg the vector mode to
identically sized other vector mode before trying vec_extract.

Immediately after the vec_extract check, we already convert the V16QImode to
TImode and force_reg it, so that is the last spot that can do something about
it during expansion.

To fix this up before reload, we have the option of either !reload_completed
splitter or some combiner pattern(s).

Short testcase that shows hopefully optimal or close to that output for f5-f8
and really bad code for f1-f4, both with -O2 -m64 and -O2 -msse2 -m32.

typedef unsigned char V __attribute__((vector_size (16)));
typedef unsigned long long W __attribute__((vector_size (16)));
typedef unsigned int T __attribute__((vector_size (16)));

void
f1 (unsigned long long *x, V y)
{
  *x = ((W)y)[0];
}

unsigned long long
f2 (V y)
{
  return ((W)y)[0];
}

void
f3 (unsigned int *x, V y)
{
  *x = ((T)y)[0];
}

unsigned int
f4 (V y)
{
  return ((T)y)[0];
}

void
f5 (unsigned long long *x, W y)
{
  *x = ((W)y)[0];
}

unsigned long long
f6 (W y)
{
  return ((W)y)[0];
}

void
f7 (unsigned int *x, T y)
{
  *x = ((T)y)[0];
}

unsigned int
f8 (T y)
{
  return ((T)y)[0];
}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/65078] [5 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2
  2015-02-16 14:17 [Bug rtl-optimization/65078] New: [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2 ysrumyan at gmail dot com
                   ` (9 preceding siblings ...)
  2015-03-17 12:22 ` jakub at gcc dot gnu.org
@ 2015-03-17 15:13 ` jakub at gcc dot gnu.org
  2015-03-18 10:59 ` jakub at gcc dot gnu.org
  2015-03-18 11:12 ` jakub at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-03-17 15:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078

--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 35044
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35044&action=edit
gcc5-pr65078.patch

Untested fix using a pre-reload splitter of mov[sd]i if the source is SI/DImode
lowpart subreg of 16/32/64 byte vector register.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/65078] [5 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2
  2015-02-16 14:17 [Bug rtl-optimization/65078] New: [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2 ysrumyan at gmail dot com
                   ` (10 preceding siblings ...)
  2015-03-17 15:13 ` jakub at gcc dot gnu.org
@ 2015-03-18 10:59 ` jakub at gcc dot gnu.org
  2015-03-18 11:12 ` jakub at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-03-18 10:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078

--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Author: jakub
Date: Wed Mar 18 10:58:32 2015
New Revision: 221485

URL: https://gcc.gnu.org/viewcvs?rev=221485&root=gcc&view=rev
Log:
    PR target/65078
    * config/i386/sse.md (movsi/movdi -> vec_extract_*_0 splitter): New.

    * gcc.target/i386/pr65078-1.c: New test.
    * gcc.target/i386/pr65078-2.c: New test.
    * gcc.target/i386/pr65078-3.c: New test.
    * gcc.target/i386/pr65078-4.c: New test.
    * gcc.target/i386/pr65078-5.c: New test.
    * gcc.target/i386/pr65078-6.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr65078-1.c
    trunk/gcc/testsuite/gcc.target/i386/pr65078-2.c
    trunk/gcc/testsuite/gcc.target/i386/pr65078-3.c
    trunk/gcc/testsuite/gcc.target/i386/pr65078-4.c
    trunk/gcc/testsuite/gcc.target/i386/pr65078-5.c
    trunk/gcc/testsuite/gcc.target/i386/pr65078-6.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/65078] [5 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2
  2015-02-16 14:17 [Bug rtl-optimization/65078] New: [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2 ysrumyan at gmail dot com
                   ` (11 preceding siblings ...)
  2015-03-18 10:59 ` jakub at gcc dot gnu.org
@ 2015-03-18 11:12 ` jakub at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-03-18 11:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #13 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Should be fixed now.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-03-18 11:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-16 14:17 [Bug rtl-optimization/65078] New: [5.0 Regression] 4.9 and 5.0 generate more spill-fill in comparison with 4.8.2 ysrumyan at gmail dot com
2015-02-16 14:19 ` [Bug rtl-optimization/65078] " ysrumyan at gmail dot com
2015-02-16 14:38 ` rguenth at gcc dot gnu.org
2015-02-16 14:40 ` ubizjak at gmail dot com
2015-02-16 14:44 ` rguenth at gcc dot gnu.org
2015-02-16 14:44 ` jakub at gcc dot gnu.org
2015-02-16 14:50 ` jakub at gcc dot gnu.org
2015-02-18 17:15 ` [Bug rtl-optimization/65078] [5 " law at redhat dot com
2015-03-17 10:33 ` jakub at gcc dot gnu.org
2015-03-17 11:05 ` jakub at gcc dot gnu.org
2015-03-17 12:22 ` jakub at gcc dot gnu.org
2015-03-17 15:13 ` jakub at gcc dot gnu.org
2015-03-18 10:59 ` jakub at gcc dot gnu.org
2015-03-18 11:12 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).