* [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
@ 2013-01-06 15:48 Uros Bizjak
2013-01-06 16:23 ` Jakub Jelinek
0 siblings, 1 reply; 7+ messages in thread
From: Uros Bizjak @ 2013-01-06 15:48 UTC (permalink / raw)
To: gcc-patches; +Cc: Vladimir Yakovlev, Kumar, Venkataramanan
[-- Attachment #1: Type: text/plain, Size: 815 bytes --]
Hello!
Attached patch fixes runtime comparison failure of 454.calculix due to
wrong movement of vzeroupper in jump2 pass. It turns out, that
can_move_insns_accross function does not special-case
unspec_volatiles, so vzeroupper is allowed to pass various 256bit avx
instructions.
The patch rejects moves of unspec_volatile insns in can_move_insn_accross.
2012-01-06 Uros Bizjak <ubizjak@gmail.com>
PR rtl-optimization/55845
* df-problems.c (can_move_insns_across): Stop scanning at
unspec_volatile source instruction.
2012-01-06 Uros Bizjak <ubizjak@gmail.com>
Vladimir Yakovlev <vladimir.b.yakovlev@intel.com>
PR rtl-optimization/55845
* gcc.target/i386/pr55845.c: New test.
Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} AVX target.
OK for mainline and 4.7 branch?
Uros.
[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 1393 bytes --]
Index: df-problems.c
===================================================================
--- df-problems.c (revision 194945)
+++ df-problems.c (working copy)
@@ -3916,6 +3916,10 @@ can_move_insns_across (rtx from, rtx to, rtx acros
break;
if (NONDEBUG_INSN_P (insn))
{
+ /* Do not move unspec_volatile insns. */
+ if (GET_CODE (PATTERN (insn)) == UNSPEC_VOLATILE)
+ break;
+
if (may_trap_or_fault_p (PATTERN (insn))
&& (trapping_insns_in_across || other_branch_live != NULL))
break;
Index: testsuite/gcc.target/i386/pr55845.c
===================================================================
--- testsuite/gcc.target/i386/pr55845.c (revision 0)
+++ testsuite/gcc.target/i386/pr55845.c (working copy)
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx } */
+/* { dg-options "-O3 -ffast-math -fschedule-insns -mavx -mvzeroupper" } */
+
+#include "avx-check.h"
+
+#define N 100
+
+double
+__attribute__((noinline))
+foo (int size, double y[], double x[])
+{
+ double sum = 0.0;
+ int i;
+ for (i = 0, sum = 0.; i < size; i++)
+ sum += y[i] * x[i];
+ return (sum);
+}
+
+static void
+__attribute__ ((noinline))
+avx_test ()
+{
+ double x[N];
+ double y[N];
+ double s;
+ int i;
+
+ for (i = 0; i < N; i++)
+ {
+ x[i] = i;
+ y[i] = i;
+ }
+
+ s = foo (N, y, x);
+
+ if (s != 328350.0)
+ abort ();
+}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
2013-01-06 15:48 [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper Uros Bizjak
@ 2013-01-06 16:23 ` Jakub Jelinek
2013-01-06 16:44 ` Eric Botcazou
2013-01-07 16:52 ` Uros Bizjak
0 siblings, 2 replies; 7+ messages in thread
From: Jakub Jelinek @ 2013-01-06 16:23 UTC (permalink / raw)
To: Uros Bizjak, Paolo Bonzini, Richard Henderson
Cc: gcc-patches, Vladimir Yakovlev, Kumar, Venkataramanan
On Sun, Jan 06, 2013 at 04:48:03PM +0100, Uros Bizjak wrote:
> --- df-problems.c (revision 194945)
> +++ df-problems.c (working copy)
> @@ -3916,6 +3916,10 @@ can_move_insns_across (rtx from, rtx to, rtx acros
> break;
> if (NONDEBUG_INSN_P (insn))
> {
> + /* Do not move unspec_volatile insns. */
> + if (GET_CODE (PATTERN (insn)) == UNSPEC_VOLATILE)
> + break;
> +
Shouldn't UNSPEC_VOLATILE be handled similarly in the across_from ..
across_to loop? Both UNSPEC_VOLATILE and volatile asm are handled there
just with
trapping_insns_in_across |= may_trap_p (PATTERN (insn));
but your new change doesn't prevent moving just trapping insns across
UNSPEC_VOLATILE, but any insns whatsoever. So supposedly for UNSPEC_VOLATILE
the first loop should just return false; (or fail = 1; ?).
For asm volatile I guess the code is fine as is, it must always describe
what exactly it modifies, so supposedly non-trapping insns can be moved
across asm volatile.
> if (may_trap_or_fault_p (PATTERN (insn))
> && (trapping_insns_in_across || other_branch_live != NULL))
> break;
You could do the check only for may_trap_or_fault_p, all UNSPEC_VOLATILE
may trap.
BTW, can't UNSPEC_VOLATILE be embedded deeply in the pattern?
So volatile_insn_p (insn) && asm_noperands (PATTERN (insn)) == -1?
But perhaps you want to treat that way only UNSPEC_VOLATILE directly in the
pattern and all other UNSPEC_VOLATILE insns must describe in detail what
exactly they are changing? This really needs to be better documented.
Jakub
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
2013-01-06 16:23 ` Jakub Jelinek
@ 2013-01-06 16:44 ` Eric Botcazou
2013-01-07 16:52 ` Uros Bizjak
1 sibling, 0 replies; 7+ messages in thread
From: Eric Botcazou @ 2013-01-06 16:44 UTC (permalink / raw)
To: Jakub Jelinek
Cc: gcc-patches, Uros Bizjak, Paolo Bonzini, Richard Henderson,
Vladimir Yakovlev, Kumar, Venkataramanan
> BTW, can't UNSPEC_VOLATILE be embedded deeply in the pattern?
> So volatile_insn_p (insn) && asm_noperands (PATTERN (insn)) == -1?
> But perhaps you want to treat that way only UNSPEC_VOLATILE directly in the
> pattern and all other UNSPEC_VOLATILE insns must describe in detail what
> exactly they are changing? This really needs to be better documented.
Yes, I think that we should document that UNSPEC_Vs are full optimization
barriers so the existing blockage insn of all ports are really blockage.
That's already what is implemented and seems non-controversial (unlike the
volatile asms). Something like:
Index: rtl.def
===================================================================
--- rtl.def (revision 194946)
+++ rtl.def (working copy)
@@ -213,7 +213,9 @@ DEF_RTL_EXPR(ASM_OPERANDS, "asm_operands
*/
DEF_RTL_EXPR(UNSPEC, "unspec", "Ei", RTX_EXTRA)
-/* Similar, but a volatile operation and one which may trap. */
+/* Similar, but a volatile operation and one which may trap. Moreover, it's
a
+ full optimization barrier, i.e. no instructions may be moved and no
register
+ (hard or pseudo) or memory equivalences may be used across it. */
DEF_RTL_EXPR(UNSPEC_VOLATILE, "unspec_volatile", "Ei", RTX_EXTRA)
/* Vector of addresses, stored as full words. */
I'd also propose that blockage insns always be UNSPEC_Vs (that's already the
case in practice, but the manual also lists volatile asms).
And I'm somewhat dubious about the distinction between toplevel and embedded
UNSPEC_Vs in a pattern; IMO, that shouldn't make any difference.
--
Eric Botcazou
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
2013-01-06 16:23 ` Jakub Jelinek
2013-01-06 16:44 ` Eric Botcazou
@ 2013-01-07 16:52 ` Uros Bizjak
2013-01-07 23:26 ` Jakub Jelinek
1 sibling, 1 reply; 7+ messages in thread
From: Uros Bizjak @ 2013-01-07 16:52 UTC (permalink / raw)
To: Jakub Jelinek
Cc: Paolo Bonzini, Richard Henderson, gcc-patches, Vladimir Yakovlev,
Kumar, Venkataramanan, Eric Botcazou
On Sun, Jan 6, 2013 at 5:22 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> --- df-problems.c (revision 194945)
>> +++ df-problems.c (working copy)
>> @@ -3916,6 +3916,10 @@ can_move_insns_across (rtx from, rtx to, rtx acros
>> break;
>> if (NONDEBUG_INSN_P (insn))
>> {
>> + /* Do not move unspec_volatile insns. */
>> + if (GET_CODE (PATTERN (insn)) == UNSPEC_VOLATILE)
>> + break;
>> +
>
> Shouldn't UNSPEC_VOLATILE be handled similarly in the across_from ..
> across_to loop? Both UNSPEC_VOLATILE and volatile asm are handled there
> just with
> trapping_insns_in_across |= may_trap_p (PATTERN (insn));
> but your new change doesn't prevent moving just trapping insns across
> UNSPEC_VOLATILE, but any insns whatsoever. So supposedly for UNSPEC_VOLATILE
> the first loop should just return false; (or fail = 1; ?).
> For asm volatile I guess the code is fine as is, it must always describe
> what exactly it modifies, so supposedly non-trapping insns can be moved
> across asm volatile.
>
>> if (may_trap_or_fault_p (PATTERN (insn))
>> && (trapping_insns_in_across || other_branch_live != NULL))
>> break;
>
> You could do the check only for may_trap_or_fault_p, all UNSPEC_VOLATILE
> may trap.
>
> BTW, can't UNSPEC_VOLATILE be embedded deeply in the pattern?
> So volatile_insn_p (insn) && asm_noperands (PATTERN (insn)) == -1?
> But perhaps you want to treat that way only UNSPEC_VOLATILE directly in the
> pattern and all other UNSPEC_VOLATILE insns must describe in detail what
> exactly they are changing? This really needs to be better documented.
TBH, I'm not that familiar with the RTL infrastructure enough to
answer these questions. While I can spend some time on this problem,
and probably waste quite some reviewer's time, the problem is not that
trivial as I hoped to be, so I would kindly ask someone with better
understanding of this part of the compiler for the proper solution.
Uros.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
2013-01-07 16:52 ` Uros Bizjak
@ 2013-01-07 23:26 ` Jakub Jelinek
2013-01-08 7:10 ` Uros Bizjak
2013-01-08 17:55 ` Richard Henderson
0 siblings, 2 replies; 7+ messages in thread
From: Jakub Jelinek @ 2013-01-07 23:26 UTC (permalink / raw)
To: Uros Bizjak, Richard Henderson
Cc: Paolo Bonzini, gcc-patches, Vladimir Yakovlev, Kumar,
Venkataramanan, Eric Botcazou
On Mon, Jan 07, 2013 at 05:52:23PM +0100, Uros Bizjak wrote:
> TBH, I'm not that familiar with the RTL infrastructure enough to
> answer these questions. While I can spend some time on this problem,
> and probably waste quite some reviewer's time, the problem is not that
> trivial as I hoped to be, so I would kindly ask someone with better
> understanding of this part of the compiler for the proper solution.
After discussion with rth on IRC, this modified patch just uses
volatile_insn_p, making all UNSPEC_VOLATILE (wherever in insn) and asm
volatile into a complete scheduling barrier for optimizations that use this
function.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
2012-01-08 Jakub Jelinek <jakub@redhat.com>
Uros Bizjak <ubizjak@gmail.com>
PR rtl-optimization/55845
* df-problems.c (can_move_insns_across): Stop scanning at
volatile_insn_p source instruction or give up if
across_from .. across_to range contains any volatile_insn_p
instructions.
2012-01-08 Uros Bizjak <ubizjak@gmail.com>
Vladimir Yakovlev <vladimir.b.yakovlev@intel.com>
PR rtl-optimization/55845
* gcc.target/i386/pr55845.c: New test.
--- gcc/df-problems.c.jj 2012-11-19 14:41:26.181898964 +0100
+++ gcc/df-problems.c 2013-01-07 18:38:33.064974313 +0100
@@ -3858,6 +3858,8 @@ can_move_insns_across (rtx from, rtx to,
}
if (NONDEBUG_INSN_P (insn))
{
+ if (volatile_insn_p (PATTERN (insn)))
+ return false;
memrefs_in_across |= for_each_rtx (&PATTERN (insn), find_memory,
NULL);
note_stores (PATTERN (insn), find_memory_stores,
@@ -3917,7 +3919,9 @@ can_move_insns_across (rtx from, rtx to,
if (NONDEBUG_INSN_P (insn))
{
if (may_trap_or_fault_p (PATTERN (insn))
- && (trapping_insns_in_across || other_branch_live != NULL))
+ && (trapping_insns_in_across
+ || other_branch_live != NULL
+ || volatile_insn_p (PATTERN (insn))))
break;
/* We cannot move memory stores past each other, or move memory
--- gcc/testsuite/gcc.target/i386/pr55845.c.jj 2013-01-07 18:30:19.168801389 +0100
+++ gcc/testsuite/gcc.target/i386/pr55845.c 2013-01-07 18:30:19.168801389 +0100
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx } */
+/* { dg-options "-O3 -ffast-math -fschedule-insns -mavx -mvzeroupper" } */
+
+#include "avx-check.h"
+
+#define N 100
+
+double
+__attribute__((noinline))
+foo (int size, double y[], double x[])
+{
+ double sum = 0.0;
+ int i;
+ for (i = 0, sum = 0.; i < size; i++)
+ sum += y[i] * x[i];
+ return (sum);
+}
+
+static void
+__attribute__ ((noinline))
+avx_test ()
+{
+ double x[N];
+ double y[N];
+ double s;
+ int i;
+
+ for (i = 0; i < N; i++)
+ {
+ x[i] = i;
+ y[i] = i;
+ }
+
+ s = foo (N, y, x);
+
+ if (s != 328350.0)
+ abort ();
+}
Jakub
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
2013-01-07 23:26 ` Jakub Jelinek
@ 2013-01-08 7:10 ` Uros Bizjak
2013-01-08 17:55 ` Richard Henderson
1 sibling, 0 replies; 7+ messages in thread
From: Uros Bizjak @ 2013-01-08 7:10 UTC (permalink / raw)
To: Jakub Jelinek
Cc: Richard Henderson, Paolo Bonzini, gcc-patches, Vladimir Yakovlev,
Kumar, Venkataramanan, Eric Botcazou
On Tue, Jan 8, 2013 at 12:26 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Jan 07, 2013 at 05:52:23PM +0100, Uros Bizjak wrote:
>> TBH, I'm not that familiar with the RTL infrastructure enough to
>> answer these questions. While I can spend some time on this problem,
>> and probably waste quite some reviewer's time, the problem is not that
>> trivial as I hoped to be, so I would kindly ask someone with better
>> understanding of this part of the compiler for the proper solution.
>
> After discussion with rth on IRC, this modified patch just uses
> volatile_insn_p, making all UNSPEC_VOLATILE (wherever in insn) and asm
> volatile into a complete scheduling barrier for optimizations that use this
> function.
Thanks!
Just two little nits in the testcase:
> +foo (int size, double y[], double x[])
foo (int size, double *y, double *x)
> + return (sum);
return sum;
Uros.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
2013-01-07 23:26 ` Jakub Jelinek
2013-01-08 7:10 ` Uros Bizjak
@ 2013-01-08 17:55 ` Richard Henderson
1 sibling, 0 replies; 7+ messages in thread
From: Richard Henderson @ 2013-01-08 17:55 UTC (permalink / raw)
To: Jakub Jelinek
Cc: Uros Bizjak, Paolo Bonzini, gcc-patches, Vladimir Yakovlev,
Kumar, Venkataramanan, Eric Botcazou
On 01/07/2013 03:26 PM, Jakub Jelinek wrote:
> 2012-01-08 Jakub Jelinek <jakub@redhat.com>
> Uros Bizjak <ubizjak@gmail.com>
>
> PR rtl-optimization/55845
> * df-problems.c (can_move_insns_across): Stop scanning at
> volatile_insn_p source instruction or give up if
> across_from .. across_to range contains any volatile_insn_p
> instructions.
Ok.
r~
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-01-08 17:55 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-06 15:48 [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper Uros Bizjak
2013-01-06 16:23 ` Jakub Jelinek
2013-01-06 16:44 ` Eric Botcazou
2013-01-07 16:52 ` Uros Bizjak
2013-01-07 23:26 ` Jakub Jelinek
2013-01-08 7:10 ` Uros Bizjak
2013-01-08 17:55 ` Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).