public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/59835] New: [4.9 Regression] gcc.target/i386/sse-23.c/gcc.target/i386/sse-24.c timeout
@ 2014-01-15 22:01 hjl.tools at gmail dot com
  2014-01-16 16:04 ` [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout jakub at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: hjl.tools at gmail dot com @ 2014-01-15 22:01 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59835

            Bug ID: 59835
           Summary: [4.9 Regression]
                    gcc.target/i386/sse-23.c/gcc.target/i386/sse-24.c
                    timeout
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hjl.tools at gmail dot com

On Linux/x86-64, revision 206638 gave

WARNING: program timed out.
FAIL: gcc.target/i386/sse-23.c (test for excess errors)
WARNING: program timed out.
FAIL: gcc.target/i386/sse-24.c (test for excess errors)

cc1 takes huge amount of memory. Revision 206630 is OK.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout
  2014-01-15 22:01 [Bug rtl-optimization/59835] New: [4.9 Regression] gcc.target/i386/sse-23.c/gcc.target/i386/sse-24.c timeout hjl.tools at gmail dot com
@ 2014-01-16 16:04 ` jakub at gcc dot gnu.org
  2014-01-16 16:48 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: jakub at gcc dot gnu.org @ 2014-01-16 16:04 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59835

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
This boils down to -O2 -mavx512f
typedef int V __attribute__ ((vector_size (8)));

V
foo (char x)
{
  return (V) __builtin_ia32_vec_init_v8qi (x, x, x, x, x, x, x, x);
}

I think, except that for some weird reasons in this shorter testcase
we get:
pr59835.c: In function ‘foo’:
pr59835.c:7:1: internal compiler error: Max. number of generated reload insns
per insn is achieved (90)
ICE and with the larger one LRA just keeps iterating forever.
>From gcc-bugs-return-440571-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Thu Jan 16 16:08:46 2014
Return-Path: <gcc-bugs-return-440571-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 13808 invoked by alias); 16 Jan 2014 16:08:45 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 13775 invoked by uid 48); 16 Jan 2014 16:08:42 -0000
From: "jakub at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout
Date: Thu, 16 Jan 2014 16:08:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 4.9.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: jakub at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.9.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-59835-4-Fv6VDtgHBv@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-59835-4@http.gcc.gnu.org/bugzilla/>
References: <bug-59835-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-01/txt/msg01713.txt.bz2
Content-length: 219

http://gcc.gnu.org/bugzilla/show_bug.cgi?idY835

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Ah, with -O2 -mavx512f -march=k8 it actually hangs.  So the short testcase is
enough to reproduce it.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout
  2014-01-15 22:01 [Bug rtl-optimization/59835] New: [4.9 Regression] gcc.target/i386/sse-23.c/gcc.target/i386/sse-24.c timeout hjl.tools at gmail dot com
  2014-01-16 16:04 ` [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout jakub at gcc dot gnu.org
@ 2014-01-16 16:48 ` jakub at gcc dot gnu.org
  2014-01-16 17:16 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: jakub at gcc dot gnu.org @ 2014-01-16 16:48 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59835

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kyukhin at gcc dot gnu.org,
                   |                            |rth at gcc dot gnu.org,
                   |                            |uros at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Regardless if there is a LRA bug or not, I'd say the *k<logic><mode>, maybe
kortestzhi, kortestchi, and definitely kunpckhi look problematic to me.
As shown on the testcase, kunpckhi can be very well matched by the combiner,
but the pattern doesn't have any GPR constraints, and if as in this testcase
the result isn't used as mask of any AVX512F vector insn, nor say input loaded
from memory and result immediately stored into memory, I'd say reloading it
into mask registers and back can't be cheap.  Can't the kunpckhi constraints be
"=Yk,Q", "Yk,Q", "Yk,0" and just emit "mov{b}\t{%1, %h0|%h0, %1}" in that case
(could be of course just limited to TARGET_AVX512F as is now).
As for kortest[cz]hi, dunno if the combiner can actually match them.  And for
*k<logic><mode>, my issue with that pattern is that it doesn't have (clobber
CC)
and in theory could be matched pre-RA by something and then would force RA to
choose the mask registers over something perhaps cheaper.
I wonder if the pattern can't be limited to reload_completed and perhaps there
can be a splitter that will split post-reload the any_logic SWI12 operation
with
(clobber CC) into the non-(clobber CC) variant if the operands are Yk.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout
  2014-01-15 22:01 [Bug rtl-optimization/59835] New: [4.9 Regression] gcc.target/i386/sse-23.c/gcc.target/i386/sse-24.c timeout hjl.tools at gmail dot com
  2014-01-16 16:04 ` [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout jakub at gcc dot gnu.org
  2014-01-16 16:48 ` jakub at gcc dot gnu.org
@ 2014-01-16 17:16 ` jakub at gcc dot gnu.org
  2014-01-16 19:04 ` vmakarov at gcc dot gnu.org
  2014-01-16 19:33 ` ubizjak at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: jakub at gcc dot gnu.org @ 2014-01-16 17:16 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59835

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Untested patch for kunpckhi:
2014-01-16  Jakub Jelinek  <jakub@redhat.com>

    * config/i386/i386.md (kunpckhi): Add GPR alternative.

--- gcc/config/i386/i386.md.jj    2014-01-09 21:07:23.000000000 +0100
+++ gcc/config/i386/i386.md    2014-01-16 17:53:54.983352747 +0100
@@ -8486,14 +8486,16 @@ (define_insn "kortestchi"
    (set_attr "prefix" "vex")])

 (define_insn "kunpckhi"
-  [(set (match_operand:HI 0 "register_operand" "=Yk")
+  [(set (match_operand:HI 0 "register_operand" "=Yk,Q")
     (ior:HI
       (ashift:HI
-        (match_operand:HI 1 "register_operand" "Yk")
+        (match_operand:HI 1 "register_operand" "Yk,Q")
         (const_int 8))
-      (zero_extend:HI (match_operand:QI 2 "register_operand" "Yk"))))]
+      (zero_extend:HI (match_operand:QI 2 "register_operand" "Yk,0"))))]
   "TARGET_AVX512F"
-  "kunpckbw\t{%2, %1, %0|%0, %1, %2}"
+  "@
+   kunpckbw\t{%2, %1, %0|%0, %1, %2}
+   mov{b}\t{%b1, %h0|%h0, %b1}"
   [(set_attr "mode" "HI")
    (set_attr "type" "msklog")
    (set_attr "prefix" "vex")])

Of course, no real performance testing has been performed, perhaps there should
be one ? or more for the =Q, Q, 0 alternative.  Without any ?, we don't ICE or
endlessly consume memory anymore, with one ? we do again.

With -O2 -march=k8 -mavx512f the patch changes (from before r206638 to trunk +
patch):
-    kmovw    %edi, %k1
-    kunpckbw    %k1, %k1, %k0
-    kmovw    %k0, -8(%rsp)
-    movd    -8(%rsp), %mm0
+    movl    %edi, %eax
+    movb    %al, %ah
+    movd    %eax, %mm0
Dunno of course how that compares performance wise, but at least it is shorter.
For -O2 -mavx512f:
-    kmovw    %edi, %k1
-    kunpckbw    %k1, %k1, %k0
-    kmovw    %k0, -8(%rsp)
+    movl    %edi, %eax
+    movl    %edi, %edx
+    movb    %al, %dh
+    movw    %dx, -8(%rsp)
so in this case perhaps using mask registers is better, as we store the result
into memory anyway.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout
  2014-01-15 22:01 [Bug rtl-optimization/59835] New: [4.9 Regression] gcc.target/i386/sse-23.c/gcc.target/i386/sse-24.c timeout hjl.tools at gmail dot com
                   ` (2 preceding siblings ...)
  2014-01-16 17:16 ` jakub at gcc dot gnu.org
@ 2014-01-16 19:04 ` vmakarov at gcc dot gnu.org
  2014-01-16 19:33 ` ubizjak at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2014-01-16 19:04 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59835

--- Comment #6 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
Author: vmakarov
Date: Thu Jan 16 19:04:08 2014
New Revision: 206676

URL: http://gcc.gnu.org/viewcvs?rev=206676&root=gcc&view=rev
Log:
2014-01-16  Vladimir Makarov  <vmakarov@redhat.com>

    PR rtl-optimization/59835
    * ira.c (ira_init_register_move_cost): Increase cost for
    impossible modes.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ira.c


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout
  2014-01-15 22:01 [Bug rtl-optimization/59835] New: [4.9 Regression] gcc.target/i386/sse-23.c/gcc.target/i386/sse-24.c timeout hjl.tools at gmail dot com
                   ` (3 preceding siblings ...)
  2014-01-16 19:04 ` vmakarov at gcc dot gnu.org
@ 2014-01-16 19:33 ` ubizjak at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: ubizjak at gmail dot com @ 2014-01-16 19:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59835

--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #5)

> Of course, no real performance testing has been performed, perhaps there
> should be one ? or more for the =Q, Q, 0 alternative.  Without any ?, we
> don't ICE or endlessly consume memory anymore, with one ? we do again.

IMO, handling of mask registers can be improved overall. However, we have to
start somewhere, and for 4.9 the implementation is "good enough" to move things
forward. On the condition that nothing regresses, of course.

The perfection will be reached incrementally in later revisions. Your patch is
in the right direction, but probably requires various cost function
improvements and some fine tuning of constraints. I believe that compiler
should be able to choose correct instructions on its own, without crippling the
move pattern with UNSPEC.
>From gcc-bugs-return-440613-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Thu Jan 16 19:42:18 2014
Return-Path: <gcc-bugs-return-440613-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 12216 invoked by alias); 16 Jan 2014 19:42:17 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 12168 invoked by uid 48); 16 Jan 2014 19:42:13 -0000
From: "jakub at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout
Date: Thu, 16 Jan 2014 19:42:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 4.9.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: jakub at gcc dot gnu.org
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.9.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_status resolution
Message-ID: <bug-59835-4-aWEu50po3U@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-59835-4@http.gcc.gnu.org/bugzilla/>
References: <bug-59835-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-01/txt/msg01755.txt.bz2
Content-length: 443

http://gcc.gnu.org/bugzilla/show_bug.cgi?idY835

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed with Vladimir's patch.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-01-16 19:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-15 22:01 [Bug rtl-optimization/59835] New: [4.9 Regression] gcc.target/i386/sse-23.c/gcc.target/i386/sse-24.c timeout hjl.tools at gmail dot com
2014-01-16 16:04 ` [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout jakub at gcc dot gnu.org
2014-01-16 16:48 ` jakub at gcc dot gnu.org
2014-01-16 17:16 ` jakub at gcc dot gnu.org
2014-01-16 19:04 ` vmakarov at gcc dot gnu.org
2014-01-16 19:33 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).