* [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout
2014-01-15 22:01 [Bug rtl-optimization/59835] New: [4.9 Regression] gcc.target/i386/sse-23.c/gcc.target/i386/sse-24.c timeout hjl.tools at gmail dot com
@ 2014-01-16 16:04 ` jakub at gcc dot gnu.org
2014-01-16 16:48 ` jakub at gcc dot gnu.org
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: jakub at gcc dot gnu.org @ 2014-01-16 16:04 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59835
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
This boils down to -O2 -mavx512f
typedef int V __attribute__ ((vector_size (8)));
V
foo (char x)
{
return (V) __builtin_ia32_vec_init_v8qi (x, x, x, x, x, x, x, x);
}
I think, except that for some weird reasons in this shorter testcase
we get:
pr59835.c: In function ‘foo’:
pr59835.c:7:1: internal compiler error: Max. number of generated reload insns
per insn is achieved (90)
ICE and with the larger one LRA just keeps iterating forever.
>From gcc-bugs-return-440571-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Thu Jan 16 16:08:46 2014
Return-Path: <gcc-bugs-return-440571-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 13808 invoked by alias); 16 Jan 2014 16:08:45 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 13775 invoked by uid 48); 16 Jan 2014 16:08:42 -0000
From: "jakub at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout
Date: Thu, 16 Jan 2014 16:08:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 4.9.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: jakub at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.9.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-59835-4-Fv6VDtgHBv@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-59835-4@http.gcc.gnu.org/bugzilla/>
References: <bug-59835-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-01/txt/msg01713.txt.bz2
Content-length: 219
http://gcc.gnu.org/bugzilla/show_bug.cgi?idY835
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Ah, with -O2 -mavx512f -march=k8 it actually hangs. So the short testcase is
enough to reproduce it.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout
2014-01-15 22:01 [Bug rtl-optimization/59835] New: [4.9 Regression] gcc.target/i386/sse-23.c/gcc.target/i386/sse-24.c timeout hjl.tools at gmail dot com
2014-01-16 16:04 ` [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout jakub at gcc dot gnu.org
@ 2014-01-16 16:48 ` jakub at gcc dot gnu.org
2014-01-16 17:16 ` jakub at gcc dot gnu.org
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: jakub at gcc dot gnu.org @ 2014-01-16 16:48 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59835
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |kyukhin at gcc dot gnu.org,
| |rth at gcc dot gnu.org,
| |uros at gcc dot gnu.org
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Regardless if there is a LRA bug or not, I'd say the *k<logic><mode>, maybe
kortestzhi, kortestchi, and definitely kunpckhi look problematic to me.
As shown on the testcase, kunpckhi can be very well matched by the combiner,
but the pattern doesn't have any GPR constraints, and if as in this testcase
the result isn't used as mask of any AVX512F vector insn, nor say input loaded
from memory and result immediately stored into memory, I'd say reloading it
into mask registers and back can't be cheap. Can't the kunpckhi constraints be
"=Yk,Q", "Yk,Q", "Yk,0" and just emit "mov{b}\t{%1, %h0|%h0, %1}" in that case
(could be of course just limited to TARGET_AVX512F as is now).
As for kortest[cz]hi, dunno if the combiner can actually match them. And for
*k<logic><mode>, my issue with that pattern is that it doesn't have (clobber
CC)
and in theory could be matched pre-RA by something and then would force RA to
choose the mask registers over something perhaps cheaper.
I wonder if the pattern can't be limited to reload_completed and perhaps there
can be a splitter that will split post-reload the any_logic SWI12 operation
with
(clobber CC) into the non-(clobber CC) variant if the operands are Yk.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout
2014-01-15 22:01 [Bug rtl-optimization/59835] New: [4.9 Regression] gcc.target/i386/sse-23.c/gcc.target/i386/sse-24.c timeout hjl.tools at gmail dot com
2014-01-16 16:04 ` [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout jakub at gcc dot gnu.org
2014-01-16 16:48 ` jakub at gcc dot gnu.org
@ 2014-01-16 17:16 ` jakub at gcc dot gnu.org
2014-01-16 19:04 ` vmakarov at gcc dot gnu.org
2014-01-16 19:33 ` ubizjak at gmail dot com
4 siblings, 0 replies; 6+ messages in thread
From: jakub at gcc dot gnu.org @ 2014-01-16 17:16 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59835
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Untested patch for kunpckhi:
2014-01-16 Jakub Jelinek <jakub@redhat.com>
* config/i386/i386.md (kunpckhi): Add GPR alternative.
--- gcc/config/i386/i386.md.jj 2014-01-09 21:07:23.000000000 +0100
+++ gcc/config/i386/i386.md 2014-01-16 17:53:54.983352747 +0100
@@ -8486,14 +8486,16 @@ (define_insn "kortestchi"
(set_attr "prefix" "vex")])
(define_insn "kunpckhi"
- [(set (match_operand:HI 0 "register_operand" "=Yk")
+ [(set (match_operand:HI 0 "register_operand" "=Yk,Q")
(ior:HI
(ashift:HI
- (match_operand:HI 1 "register_operand" "Yk")
+ (match_operand:HI 1 "register_operand" "Yk,Q")
(const_int 8))
- (zero_extend:HI (match_operand:QI 2 "register_operand" "Yk"))))]
+ (zero_extend:HI (match_operand:QI 2 "register_operand" "Yk,0"))))]
"TARGET_AVX512F"
- "kunpckbw\t{%2, %1, %0|%0, %1, %2}"
+ "@
+ kunpckbw\t{%2, %1, %0|%0, %1, %2}
+ mov{b}\t{%b1, %h0|%h0, %b1}"
[(set_attr "mode" "HI")
(set_attr "type" "msklog")
(set_attr "prefix" "vex")])
Of course, no real performance testing has been performed, perhaps there should
be one ? or more for the =Q, Q, 0 alternative. Without any ?, we don't ICE or
endlessly consume memory anymore, with one ? we do again.
With -O2 -march=k8 -mavx512f the patch changes (from before r206638 to trunk +
patch):
- kmovw %edi, %k1
- kunpckbw %k1, %k1, %k0
- kmovw %k0, -8(%rsp)
- movd -8(%rsp), %mm0
+ movl %edi, %eax
+ movb %al, %ah
+ movd %eax, %mm0
Dunno of course how that compares performance wise, but at least it is shorter.
For -O2 -mavx512f:
- kmovw %edi, %k1
- kunpckbw %k1, %k1, %k0
- kmovw %k0, -8(%rsp)
+ movl %edi, %eax
+ movl %edi, %edx
+ movb %al, %dh
+ movw %dx, -8(%rsp)
so in this case perhaps using mask registers is better, as we store the result
into memory anyway.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout
2014-01-15 22:01 [Bug rtl-optimization/59835] New: [4.9 Regression] gcc.target/i386/sse-23.c/gcc.target/i386/sse-24.c timeout hjl.tools at gmail dot com
` (2 preceding siblings ...)
2014-01-16 17:16 ` jakub at gcc dot gnu.org
@ 2014-01-16 19:04 ` vmakarov at gcc dot gnu.org
2014-01-16 19:33 ` ubizjak at gmail dot com
4 siblings, 0 replies; 6+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2014-01-16 19:04 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59835
--- Comment #6 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
Author: vmakarov
Date: Thu Jan 16 19:04:08 2014
New Revision: 206676
URL: http://gcc.gnu.org/viewcvs?rev=206676&root=gcc&view=rev
Log:
2014-01-16 Vladimir Makarov <vmakarov@redhat.com>
PR rtl-optimization/59835
* ira.c (ira_init_register_move_cost): Increase cost for
impossible modes.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/ira.c
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout
2014-01-15 22:01 [Bug rtl-optimization/59835] New: [4.9 Regression] gcc.target/i386/sse-23.c/gcc.target/i386/sse-24.c timeout hjl.tools at gmail dot com
` (3 preceding siblings ...)
2014-01-16 19:04 ` vmakarov at gcc dot gnu.org
@ 2014-01-16 19:33 ` ubizjak at gmail dot com
4 siblings, 0 replies; 6+ messages in thread
From: ubizjak at gmail dot com @ 2014-01-16 19:33 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59835
--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #5)
> Of course, no real performance testing has been performed, perhaps there
> should be one ? or more for the =Q, Q, 0 alternative. Without any ?, we
> don't ICE or endlessly consume memory anymore, with one ? we do again.
IMO, handling of mask registers can be improved overall. However, we have to
start somewhere, and for 4.9 the implementation is "good enough" to move things
forward. On the condition that nothing regresses, of course.
The perfection will be reached incrementally in later revisions. Your patch is
in the right direction, but probably requires various cost function
improvements and some fine tuning of constraints. I believe that compiler
should be able to choose correct instructions on its own, without crippling the
move pattern with UNSPEC.
>From gcc-bugs-return-440613-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Thu Jan 16 19:42:18 2014
Return-Path: <gcc-bugs-return-440613-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 12216 invoked by alias); 16 Jan 2014 19:42:17 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 12168 invoked by uid 48); 16 Jan 2014 19:42:13 -0000
From: "jakub at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/59835] [4.9 Regression] gcc.target/i386/sse-2[34].c timeout
Date: Thu, 16 Jan 2014 19:42:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 4.9.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: jakub at gcc dot gnu.org
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.9.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_status resolution
Message-ID: <bug-59835-4-aWEu50po3U@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-59835-4@http.gcc.gnu.org/bugzilla/>
References: <bug-59835-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-01/txt/msg01755.txt.bz2
Content-length: 443
http://gcc.gnu.org/bugzilla/show_bug.cgi?idY835
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed with Vladimir's patch.
^ permalink raw reply [flat|nested] 6+ messages in thread