public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
@ 2013-01-06 15:48 Uros Bizjak
  2013-01-06 16:23 ` Jakub Jelinek
  0 siblings, 1 reply; 7+ messages in thread
From: Uros Bizjak @ 2013-01-06 15:48 UTC (permalink / raw)
  To: gcc-patches; +Cc: Vladimir Yakovlev, Kumar, Venkataramanan

[-- Attachment #1: Type: text/plain, Size: 815 bytes --]

Hello!

Attached patch fixes runtime comparison failure of 454.calculix due to
wrong movement of vzeroupper in jump2 pass. It turns out, that
can_move_insns_accross function does not special-case
unspec_volatiles, so vzeroupper is allowed to pass various 256bit avx
instructions.

The patch rejects moves of unspec_volatile insns in can_move_insn_accross.

2012-01-06  Uros Bizjak  <ubizjak@gmail.com>

	PR rtl-optimization/55845
	* df-problems.c (can_move_insns_across): Stop scanning at
	unspec_volatile source instruction.

2012-01-06  Uros Bizjak  <ubizjak@gmail.com>
	    Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>

	PR rtl-optimization/55845
	* gcc.target/i386/pr55845.c: New test.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} AVX target.

OK for mainline and 4.7 branch?

Uros.

[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 1393 bytes --]

Index: df-problems.c
===================================================================
--- df-problems.c	(revision 194945)
+++ df-problems.c	(working copy)
@@ -3916,6 +3916,10 @@ can_move_insns_across (rtx from, rtx to, rtx acros
 	break;
       if (NONDEBUG_INSN_P (insn))
 	{
+	  /* Do not move unspec_volatile insns.  */
+	  if (GET_CODE (PATTERN (insn)) == UNSPEC_VOLATILE)
+	    break;
+
 	  if (may_trap_or_fault_p (PATTERN (insn))
 	      && (trapping_insns_in_across || other_branch_live != NULL))
 	    break;
Index: testsuite/gcc.target/i386/pr55845.c
===================================================================
--- testsuite/gcc.target/i386/pr55845.c	(revision 0)
+++ testsuite/gcc.target/i386/pr55845.c	(working copy)
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx } */
+/* { dg-options "-O3 -ffast-math -fschedule-insns -mavx -mvzeroupper" } */
+
+#include "avx-check.h"
+
+#define N 100
+
+double
+__attribute__((noinline))
+foo (int size, double y[], double x[])
+{
+  double sum = 0.0;
+  int i;
+  for (i = 0, sum = 0.; i < size; i++)
+    sum += y[i] * x[i];
+  return (sum);
+}
+
+static void
+__attribute__ ((noinline))
+avx_test ()
+{
+  double x[N];
+  double y[N];
+  double s;
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      x[i] = i;
+      y[i] = i;
+    }
+
+  s = foo (N, y, x);
+
+  if (s != 328350.0)
+    abort ();
+}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
  2013-01-06 15:48 [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper Uros Bizjak
@ 2013-01-06 16:23 ` Jakub Jelinek
  2013-01-06 16:44   ` Eric Botcazou
  2013-01-07 16:52   ` Uros Bizjak
  0 siblings, 2 replies; 7+ messages in thread
From: Jakub Jelinek @ 2013-01-06 16:23 UTC (permalink / raw)
  To: Uros Bizjak, Paolo Bonzini, Richard Henderson
  Cc: gcc-patches, Vladimir Yakovlev, Kumar, Venkataramanan

On Sun, Jan 06, 2013 at 04:48:03PM +0100, Uros Bizjak wrote:
> --- df-problems.c	(revision 194945)
> +++ df-problems.c	(working copy)
> @@ -3916,6 +3916,10 @@ can_move_insns_across (rtx from, rtx to, rtx acros
>  	break;
>        if (NONDEBUG_INSN_P (insn))
>  	{
> +	  /* Do not move unspec_volatile insns.  */
> +	  if (GET_CODE (PATTERN (insn)) == UNSPEC_VOLATILE)
> +	    break;
> +

Shouldn't UNSPEC_VOLATILE be handled similarly in the across_from ..
across_to loop?  Both UNSPEC_VOLATILE and volatile asm are handled there
just with
	trapping_insns_in_across |= may_trap_p (PATTERN (insn));
but your new change doesn't prevent moving just trapping insns across
UNSPEC_VOLATILE, but any insns whatsoever.  So supposedly for UNSPEC_VOLATILE
the first loop should just return false; (or fail = 1; ?).
For asm volatile I guess the code is fine as is, it must always describe
what exactly it modifies, so supposedly non-trapping insns can be moved
across asm volatile.

>  	  if (may_trap_or_fault_p (PATTERN (insn))
>  	      && (trapping_insns_in_across || other_branch_live != NULL))
>  	    break;

You could do the check only for may_trap_or_fault_p, all UNSPEC_VOLATILE
may trap.

BTW, can't UNSPEC_VOLATILE be embedded deeply in the pattern?
So volatile_insn_p (insn) && asm_noperands (PATTERN (insn)) == -1?
But perhaps you want to treat that way only UNSPEC_VOLATILE directly in the
pattern and all other UNSPEC_VOLATILE insns must describe in detail what
exactly they are changing?  This really needs to be better documented.

	Jakub

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
  2013-01-06 16:23 ` Jakub Jelinek
@ 2013-01-06 16:44   ` Eric Botcazou
  2013-01-07 16:52   ` Uros Bizjak
  1 sibling, 0 replies; 7+ messages in thread
From: Eric Botcazou @ 2013-01-06 16:44 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: gcc-patches, Uros Bizjak, Paolo Bonzini, Richard Henderson,
	Vladimir Yakovlev, Kumar, Venkataramanan

> BTW, can't UNSPEC_VOLATILE be embedded deeply in the pattern?
> So volatile_insn_p (insn) && asm_noperands (PATTERN (insn)) == -1?
> But perhaps you want to treat that way only UNSPEC_VOLATILE directly in the
> pattern and all other UNSPEC_VOLATILE insns must describe in detail what
> exactly they are changing?  This really needs to be better documented.

Yes, I think that we should document that UNSPEC_Vs are full optimization 
barriers so the existing blockage insn of all ports are really blockage.
That's already what is implemented and seems non-controversial (unlike the 
volatile asms).  Something like:

Index: rtl.def
===================================================================
--- rtl.def     (revision 194946)
+++ rtl.def     (working copy)
@@ -213,7 +213,9 @@ DEF_RTL_EXPR(ASM_OPERANDS, "asm_operands
    */
 DEF_RTL_EXPR(UNSPEC, "unspec", "Ei", RTX_EXTRA)
 
-/* Similar, but a volatile operation and one which may trap.  */
+/* Similar, but a volatile operation and one which may trap.  Moreover, it's 
a
+   full optimization barrier, i.e. no instructions may be moved and no 
register
+   (hard or pseudo) or memory equivalences may be used across it.  */
 DEF_RTL_EXPR(UNSPEC_VOLATILE, "unspec_volatile", "Ei", RTX_EXTRA)
 
 /* Vector of addresses, stored as full words.  */

I'd also propose that blockage insns always be UNSPEC_Vs (that's already the 
case in practice, but the manual also lists volatile asms).

And I'm somewhat dubious about the distinction between toplevel and embedded 
UNSPEC_Vs in a pattern; IMO, that shouldn't make any difference.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
  2013-01-06 16:23 ` Jakub Jelinek
  2013-01-06 16:44   ` Eric Botcazou
@ 2013-01-07 16:52   ` Uros Bizjak
  2013-01-07 23:26     ` Jakub Jelinek
  1 sibling, 1 reply; 7+ messages in thread
From: Uros Bizjak @ 2013-01-07 16:52 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Paolo Bonzini, Richard Henderson, gcc-patches, Vladimir Yakovlev,
	Kumar, Venkataramanan, Eric Botcazou

On Sun, Jan 6, 2013 at 5:22 PM, Jakub Jelinek <jakub@redhat.com> wrote:

>> --- df-problems.c     (revision 194945)
>> +++ df-problems.c     (working copy)
>> @@ -3916,6 +3916,10 @@ can_move_insns_across (rtx from, rtx to, rtx acros
>>       break;
>>        if (NONDEBUG_INSN_P (insn))
>>       {
>> +       /* Do not move unspec_volatile insns.  */
>> +       if (GET_CODE (PATTERN (insn)) == UNSPEC_VOLATILE)
>> +         break;
>> +
>
> Shouldn't UNSPEC_VOLATILE be handled similarly in the across_from ..
> across_to loop?  Both UNSPEC_VOLATILE and volatile asm are handled there
> just with
>         trapping_insns_in_across |= may_trap_p (PATTERN (insn));
> but your new change doesn't prevent moving just trapping insns across
> UNSPEC_VOLATILE, but any insns whatsoever.  So supposedly for UNSPEC_VOLATILE
> the first loop should just return false; (or fail = 1; ?).
> For asm volatile I guess the code is fine as is, it must always describe
> what exactly it modifies, so supposedly non-trapping insns can be moved
> across asm volatile.
>
>>         if (may_trap_or_fault_p (PATTERN (insn))
>>             && (trapping_insns_in_across || other_branch_live != NULL))
>>           break;
>
> You could do the check only for may_trap_or_fault_p, all UNSPEC_VOLATILE
> may trap.
>
> BTW, can't UNSPEC_VOLATILE be embedded deeply in the pattern?
> So volatile_insn_p (insn) && asm_noperands (PATTERN (insn)) == -1?
> But perhaps you want to treat that way only UNSPEC_VOLATILE directly in the
> pattern and all other UNSPEC_VOLATILE insns must describe in detail what
> exactly they are changing?  This really needs to be better documented.

TBH, I'm not that familiar with the RTL infrastructure enough to
answer these questions. While I can spend some time on this problem,
and probably waste quite some reviewer's time, the problem is not that
trivial as I hoped to be, so I would kindly ask someone with better
understanding of this part of the compiler for the proper solution.

Uros.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
  2013-01-07 16:52   ` Uros Bizjak
@ 2013-01-07 23:26     ` Jakub Jelinek
  2013-01-08  7:10       ` Uros Bizjak
  2013-01-08 17:55       ` Richard Henderson
  0 siblings, 2 replies; 7+ messages in thread
From: Jakub Jelinek @ 2013-01-07 23:26 UTC (permalink / raw)
  To: Uros Bizjak, Richard Henderson
  Cc: Paolo Bonzini, gcc-patches, Vladimir Yakovlev, Kumar,
	Venkataramanan, Eric Botcazou

On Mon, Jan 07, 2013 at 05:52:23PM +0100, Uros Bizjak wrote:
> TBH, I'm not that familiar with the RTL infrastructure enough to
> answer these questions. While I can spend some time on this problem,
> and probably waste quite some reviewer's time, the problem is not that
> trivial as I hoped to be, so I would kindly ask someone with better
> understanding of this part of the compiler for the proper solution.

After discussion with rth on IRC, this modified patch just uses
volatile_insn_p, making all UNSPEC_VOLATILE (wherever in insn) and asm
volatile into a complete scheduling barrier for optimizations that use this
function.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2012-01-08  Jakub Jelinek  <jakub@redhat.com>
	    Uros Bizjak  <ubizjak@gmail.com>

	PR rtl-optimization/55845
	* df-problems.c (can_move_insns_across): Stop scanning at
	volatile_insn_p source instruction or give up if
	across_from .. across_to range contains any volatile_insn_p
	instructions.

2012-01-08  Uros Bizjak  <ubizjak@gmail.com>
	    Vladimir Yakovlev  <vladimir.b.yakovlev@intel.com>

	PR rtl-optimization/55845
	* gcc.target/i386/pr55845.c: New test.

--- gcc/df-problems.c.jj	2012-11-19 14:41:26.181898964 +0100
+++ gcc/df-problems.c	2013-01-07 18:38:33.064974313 +0100
@@ -3858,6 +3858,8 @@ can_move_insns_across (rtx from, rtx to,
 	}
       if (NONDEBUG_INSN_P (insn))
 	{
+	  if (volatile_insn_p (PATTERN (insn)))
+	    return false;
 	  memrefs_in_across |= for_each_rtx (&PATTERN (insn), find_memory,
 					     NULL);
 	  note_stores (PATTERN (insn), find_memory_stores,
@@ -3917,7 +3919,9 @@ can_move_insns_across (rtx from, rtx to,
       if (NONDEBUG_INSN_P (insn))
 	{
 	  if (may_trap_or_fault_p (PATTERN (insn))
-	      && (trapping_insns_in_across || other_branch_live != NULL))
+	      && (trapping_insns_in_across
+		  || other_branch_live != NULL
+		  || volatile_insn_p (PATTERN (insn))))
 	    break;
 
 	  /* We cannot move memory stores past each other, or move memory
--- gcc/testsuite/gcc.target/i386/pr55845.c.jj	2013-01-07 18:30:19.168801389 +0100
+++ gcc/testsuite/gcc.target/i386/pr55845.c	2013-01-07 18:30:19.168801389 +0100
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx } */
+/* { dg-options "-O3 -ffast-math -fschedule-insns -mavx -mvzeroupper" } */
+
+#include "avx-check.h"
+
+#define N 100
+
+double
+__attribute__((noinline))
+foo (int size, double y[], double x[])
+{
+  double sum = 0.0;
+  int i;
+  for (i = 0, sum = 0.; i < size; i++)
+    sum += y[i] * x[i];
+  return (sum);
+}
+
+static void
+__attribute__ ((noinline))
+avx_test ()
+{
+  double x[N];
+  double y[N];
+  double s;
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      x[i] = i;
+      y[i] = i;
+    }
+
+  s = foo (N, y, x);
+
+  if (s != 328350.0)
+    abort ();
+}


	Jakub

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
  2013-01-07 23:26     ` Jakub Jelinek
@ 2013-01-08  7:10       ` Uros Bizjak
  2013-01-08 17:55       ` Richard Henderson
  1 sibling, 0 replies; 7+ messages in thread
From: Uros Bizjak @ 2013-01-08  7:10 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Henderson, Paolo Bonzini, gcc-patches, Vladimir Yakovlev,
	Kumar, Venkataramanan, Eric Botcazou

On Tue, Jan 8, 2013 at 12:26 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Jan 07, 2013 at 05:52:23PM +0100, Uros Bizjak wrote:
>> TBH, I'm not that familiar with the RTL infrastructure enough to
>> answer these questions. While I can spend some time on this problem,
>> and probably waste quite some reviewer's time, the problem is not that
>> trivial as I hoped to be, so I would kindly ask someone with better
>> understanding of this part of the compiler for the proper solution.
>
> After discussion with rth on IRC, this modified patch just uses
> volatile_insn_p, making all UNSPEC_VOLATILE (wherever in insn) and asm
> volatile into a complete scheduling barrier for optimizations that use this
> function.

Thanks!

Just two little nits in the testcase:

> +foo (int size, double y[], double x[])

foo (int size, double *y, double *x)

> +  return (sum);

return sum;

Uros.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper
  2013-01-07 23:26     ` Jakub Jelinek
  2013-01-08  7:10       ` Uros Bizjak
@ 2013-01-08 17:55       ` Richard Henderson
  1 sibling, 0 replies; 7+ messages in thread
From: Richard Henderson @ 2013-01-08 17:55 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Uros Bizjak, Paolo Bonzini, gcc-patches, Vladimir Yakovlev,
	Kumar, Venkataramanan, Eric Botcazou

On 01/07/2013 03:26 PM, Jakub Jelinek wrote:
> 2012-01-08  Jakub Jelinek  <jakub@redhat.com>
> 	    Uros Bizjak  <ubizjak@gmail.com>
> 
> 	PR rtl-optimization/55845
> 	* df-problems.c (can_move_insns_across): Stop scanning at
> 	volatile_insn_p source instruction or give up if
> 	across_from .. across_to range contains any volatile_insn_p
> 	instructions.

Ok.


r~

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-01-08 17:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-06 15:48 [PATCH, dataflow]: Fix PR55845, 454.calculix miscompares on x86 AVX due to movement of vzeroupper Uros Bizjak
2013-01-06 16:23 ` Jakub Jelinek
2013-01-06 16:44   ` Eric Botcazou
2013-01-07 16:52   ` Uros Bizjak
2013-01-07 23:26     ` Jakub Jelinek
2013-01-08  7:10       ` Uros Bizjak
2013-01-08 17:55       ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).