public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* PATCH: Properly check the end of basic block
@ 2010-11-17  8:34 H.J. Lu
  2010-11-17  9:24 ` Uros Bizjak
  0 siblings, 1 reply; 26+ messages in thread
From: H.J. Lu @ 2010-11-17  8:34 UTC (permalink / raw)
  To: gcc-patches; +Cc: Uros Bizjak

Hi,

We may have

insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))

We should check NEXT_INSN (insn) != NEXT_INSN (BB_END (bb)) in
move_or_delete_vzeroupper_2.  This patch does it.

OK for trunk?

Thanks.


H.J.
----
2010-11-16  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386.c (move_or_delete_vzeroupper_2): Properly
	check the end of basic block.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e52f9b2..704a67d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -108,7 +108,7 @@ check_avx256_stores (rtx dest, const_rtx set, void *data)
 static void
 move_or_delete_vzeroupper_2 (basic_block bb, bool upper_128bits_set)
 {
-  rtx insn;
+  rtx insn, last;
   rtx vzeroupper_insn = NULL_RTX;
   rtx pat;
   int avx256;
@@ -118,9 +118,12 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool upper_128bits_set)
 	     bb->index, upper_128bits_set);
 
   insn = BB_HEAD (bb);
+  last = NEXT_INSN (BB_END (bb));
   while (insn != BB_END (bb))
     {
       insn = NEXT_INSN (insn);
+      if (insn == last)
+	break;
 
       if (!NONDEBUG_INSN_P (insn))
 	continue;

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-17  8:34 PATCH: Properly check the end of basic block H.J. Lu
@ 2010-11-17  9:24 ` Uros Bizjak
  2010-11-17 15:21   ` H.J. Lu
  2010-11-18 18:38   ` H.J. Lu
  0 siblings, 2 replies; 26+ messages in thread
From: Uros Bizjak @ 2010-11-17  9:24 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches

On Wed, Nov 17, 2010 at 7:44 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:

> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))
>
> We should check NEXT_INSN (insn) != NEXT_INSN (BB_END (bb)) in
> move_or_delete_vzeroupper_2.  This patch does it.

Huh? The loop does simple linear scan of all insns in the bb, so it
can't miss BB_END. IIUC, in your case the bb does not have BB_END
(bb), but it has NEXT_INSN (BB_END (bb))?

Can you please provide a test case that illustrates this?

Uros.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-17  9:24 ` Uros Bizjak
@ 2010-11-17 15:21   ` H.J. Lu
  2010-11-17 20:26     ` Uros Bizjak
  2010-11-18 18:38   ` H.J. Lu
  1 sibling, 1 reply; 26+ messages in thread
From: H.J. Lu @ 2010-11-17 15:21 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1561 bytes --]

On Tue, Nov 16, 2010 at 11:51 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Wed, Nov 17, 2010 at 7:44 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>
>> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))
>>
>> We should check NEXT_INSN (insn) != NEXT_INSN (BB_END (bb)) in
>> move_or_delete_vzeroupper_2.  This patch does it.
>
> Huh? The loop does simple linear scan of all insns in the bb, so it
> can't miss BB_END. IIUC, in your case the bb does not have BB_END
> (bb), but it has NEXT_INSN (BB_END (bb))?

It has BB_END, but it won't be visited by NEXT_INSN starting from
BB_HEAD. insn != NEXT_INSN (BB_END (bb)) is used to check the
end of the BB everywhere in gcc.

> Can you please provide a test case that illustrates this?
>

I am enclosing a work in progress.  We noticed that we are
missing a few vzerouppers at -O3 on SPEC CPU 2K/2006.
One isssue is we may have

foo:

       call bar <<<<< Missing vzeroupper

       256bit vectorized insn
       goto foo

We miss vzeroupper before call bar.  We don't have a small testcase.
But this patch fixes this case by inspection. We are checking other
cases.


-- 
H.J.

gcc/

2010-11-16  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386.c (move_or_delete_vzeroupper_2): Properly
	check the end of basic block.
	(ix86_expand_epilogue): Also check flag_tree_vectorize when
	generating vzeroupper.
	(ix86_expand_call): Likewise.

gcc/testsuite/

2010-11-16  H.J. Lu  <hongjiu.lu@intel.com>

	* gcc.target/i386/avx-vzeroupper-20.c: New.

[-- Attachment #2: gcc-vzeroupper-O3-1.patch --]
[-- Type: text/plain, Size: 2281 bytes --]

gcc/

2010-11-16  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386.c (move_or_delete_vzeroupper_2): Properly
	check the end of basic block.
	(ix86_expand_epilogue): Also check flag_tree_vectorize when
	generating vzeroupper.
	(ix86_expand_call): Likewise.

gcc/testsuite/

2010-11-16  H.J. Lu  <hongjiu.lu@intel.com>

	* gcc.target/i386/avx-vzeroupper-20.c: New.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 11820cf..704a67d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -108,7 +108,7 @@ check_avx256_stores (rtx dest, const_rtx set, void *data)
 static void
 move_or_delete_vzeroupper_2 (basic_block bb, bool upper_128bits_set)
 {
-  rtx insn;
+  rtx insn, last;
   rtx vzeroupper_insn = NULL_RTX;
   rtx pat;
   int avx256;
@@ -118,9 +118,12 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool upper_128bits_set)
 	     bb->index, upper_128bits_set);
 
   insn = BB_HEAD (bb);
+  last = NEXT_INSN (BB_END (bb));
   while (insn != BB_END (bb))
     {
       insn = NEXT_INSN (insn);
+      if (insn == last)
+	break;
 
       if (!NONDEBUG_INSN_P (insn))
 	continue;
@@ -10970,7 +10973,7 @@ ix86_expand_epilogue (int style)
 
   /* Emit vzeroupper if needed.  */
   if (TARGET_VZEROUPPER
-      && cfun->machine->use_avx256_p
+      && (cfun->machine->use_avx256_p || flag_tree_vectorize)
       && !cfun->machine->caller_return_avx256_p)
     {
       cfun->machine->use_vzeroupper_p = 1;
@@ -21661,7 +21664,8 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
     }
 
   /* Add UNSPEC_CALL_NEEDS_VZEROUPPER decoration.  */
-  if (TARGET_VZEROUPPER && cfun->machine->use_avx256_p)
+  if (TARGET_VZEROUPPER
+      && (cfun->machine->use_avx256_p || flag_tree_vectorize))
     {
       rtx unspec;
       int avx256;
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
new file mode 100644
index 0000000..3301083
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx -mtune=generic -dp" } */
+
+extern void free (void *);
+void
+bar (void *ncstrp)
+{
+  if(ncstrp==((void *)0))
+    return;
+  free(ncstrp);
+}
+
+/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-17 15:21   ` H.J. Lu
@ 2010-11-17 20:26     ` Uros Bizjak
  2010-11-18  0:49       ` H.J. Lu
  0 siblings, 1 reply; 26+ messages in thread
From: Uros Bizjak @ 2010-11-17 20:26 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches

On Wed, Nov 17, 2010 at 2:33 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Nov 16, 2010 at 11:51 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Wed, Nov 17, 2010 at 7:44 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>
>>> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))
>>>
>>> We should check NEXT_INSN (insn) != NEXT_INSN (BB_END (bb)) in
>>> move_or_delete_vzeroupper_2.  This patch does it.
>>
>> Huh? The loop does simple linear scan of all insns in the bb, so it
>> can't miss BB_END. IIUC, in your case the bb does not have BB_END
>> (bb), but it has NEXT_INSN (BB_END (bb))?
>
> It has BB_END, but it won't be visited by NEXT_INSN starting from
> BB_HEAD. insn != NEXT_INSN (BB_END (bb)) is used to check the
> end of the BB everywhere in gcc.
>
>> Can you please provide a test case that illustrates this?
>>
>
> I am enclosing a work in progress.  We noticed that we are
> missing a few vzerouppers at -O3 on SPEC CPU 2K/2006.
> One isssue is we may have
>
> foo:
>
>       call bar <<<<< Missing vzeroupper
>
>       256bit vectorized insn
>       goto foo
>
> We miss vzeroupper before call bar.  We don't have a small testcase.
> But this patch fixes this case by inspection. We are checking other
> cases.

@@ -118,9 +118,12 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool
upper_128bits_set)
 	     bb->index, upper_128bits_set);

   insn = BB_HEAD (bb);
+  last = NEXT_INSN (BB_END (bb));
   while (insn != BB_END (bb))
     {
       insn = NEXT_INSN (insn);
+      if (insn == last)
+	break;

       if (!NONDEBUG_INSN_P (insn))
 	continue;

The change above is not needed. The new check is never triggered - the
loop terminates when "insn == BB_END (bb)" at "while", so I fail to
see why additional termination for "NEXT_INSN (insn) == NEXT_INSN
(BB_END (bb))" is needed.

(The BB_HEAD (bb) is either a NOTE or CODE_LABEL so it can be skipped
with NEXT_INSN.)

@@ -10970,7 +10973,7 @@ ix86_expand_epilogue (int style)

   /* Emit vzeroupper if needed.  */
   if (TARGET_VZEROUPPER
-      && cfun->machine->use_avx256_p
+      && (cfun->machine->use_avx256_p || flag_tree_vectorize)
       && !cfun->machine->caller_return_avx256_p)
     {
       cfun->machine->use_vzeroupper_p = 1;
@@ -21661,7 +21664,8 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
     }

   /* Add UNSPEC_CALL_NEEDS_VZEROUPPER decoration.  */
-  if (TARGET_VZEROUPPER && cfun->machine->use_avx256_p)
+  if (TARGET_VZEROUPPER
+      && (cfun->machine->use_avx256_p || flag_tree_vectorize))

Decorate *ALL* calls with CALL_NEEDS_VZEROUPPER with
-ftree-vectorize?! It looks that parts (or state machine) that set
...->use_avx256_p flag should be fixed.

     {
       rtx unspec;
       int avx256;
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
new file mode 100644
index 0000000..3301083
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx -mtune=generic -dp" } */
+
+extern void free (void *);
+void
+bar (void *ncstrp)
+{
+  if(ncstrp==((void *)0))
+    return;
+  free(ncstrp);
+}
+
+/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */

Hm, this testcase doesn't go together with the above change. There is
no vectorization involved, and the scan checks that vzeroupper is NOT
emitted.

Uros.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* PATCH: Properly check the end of basic block
  2010-11-17 20:26     ` Uros Bizjak
@ 2010-11-18  0:49       ` H.J. Lu
  2010-11-18  7:06         ` H.J. Lu
                           ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: H.J. Lu @ 2010-11-18  0:49 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

On Wed, Nov 17, 2010 at 11:23 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Wed, Nov 17, 2010 at 2:33 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Nov 16, 2010 at 11:51 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Wed, Nov 17, 2010 at 7:44 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>>
>>>> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))
>>>>
>>>> We should check NEXT_INSN (insn) != NEXT_INSN (BB_END (bb)) in
>>>> move_or_delete_vzeroupper_2.  This patch does it.
>>>
>>> Huh? The loop does simple linear scan of all insns in the bb, so it
>>> can't miss BB_END. IIUC, in your case the bb does not have BB_END
>>> (bb), but it has NEXT_INSN (BB_END (bb))?
>>
>> It has BB_END, but it won't be visited by NEXT_INSN starting from
>> BB_HEAD. insn != NEXT_INSN (BB_END (bb)) is used to check the
>> end of the BB everywhere in gcc.
>>
>>> Can you please provide a test case that illustrates this?
>>>
>>
>> I am enclosing a work in progress.  We noticed that we are
>> missing a few vzerouppers at -O3 on SPEC CPU 2K/2006.
>> One isssue is we may have
>>
>> foo:
>>
>>       call bar <<<<< Missing vzeroupper
>>
>>       256bit vectorized insn
>>       goto foo
>>
>> We miss vzeroupper before call bar.  We don't have a small testcase.
>> But this patch fixes this case by inspection. We are checking other
>> cases.
>
> @@ -118,9 +118,12 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool
> upper_128bits_set)
>             bb->index, upper_128bits_set);
>
>   insn = BB_HEAD (bb);
> +  last = NEXT_INSN (BB_END (bb));
>   while (insn != BB_END (bb))
>     {
>       insn = NEXT_INSN (insn);
> +      if (insn == last)
> +       break;
>
>       if (!NONDEBUG_INSN_P (insn))
>        continue;
>
> The change above is not needed. The new check is never triggered - the
> loop terminates when "insn == BB_END (bb)" at "while", so I fail to
> see why additional termination for "NEXT_INSN (insn) == NEXT_INSN
> (BB_END (bb))" is needed.

Here is the patch for

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46519

We have 2 blocks pointing to each others. This patch first scans
all blocks without moving vzeroupper so that we can have accurate
information about upper 128bits at block entry.

> (The BB_HEAD (bb) is either a NOTE or CODE_LABEL so it can be skipped
> with NEXT_INSN.)

Please try gcc.target/i386/avx-vzeroupper-20.c.  It will
trigger this condition.

> @@ -10970,7 +10973,7 @@ ix86_expand_epilogue (int style)
>
>   /* Emit vzeroupper if needed.  */
>   if (TARGET_VZEROUPPER
> -      && cfun->machine->use_avx256_p
> +      && (cfun->machine->use_avx256_p || flag_tree_vectorize)
>       && !cfun->machine->caller_return_avx256_p)
>     {
>       cfun->machine->use_vzeroupper_p = 1;
> @@ -21661,7 +21664,8 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
>     }
>
>   /* Add UNSPEC_CALL_NEEDS_VZEROUPPER decoration.  */
> -  if (TARGET_VZEROUPPER && cfun->machine->use_avx256_p)
> +  if (TARGET_VZEROUPPER
> +      && (cfun->machine->use_avx256_p || flag_tree_vectorize))
>
> Decorate *ALL* calls with CALL_NEEDS_VZEROUPPER with
> -ftree-vectorize?! It looks that parts (or state machine) that set
> ...->use_avx256_p flag should be fixed.

There are:

foo:

      call bar <<<<< Missing vzeroupper

      256bit vectorized insn
      goto foo

I couldn't find a hook to set use_avx256_p before RTL expansion
starts.

>     {
>       rtx unspec;
>       int avx256;
> diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
> b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
> new file mode 100644
> index 0000000..3301083
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx -mtune=generic -dp" } */
> +
> +extern void free (void *);
> +void
> +bar (void *ncstrp)
> +{
> +  if(ncstrp==((void *)0))
> +    return;
> +  free(ncstrp);
> +}
> +
> +/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
>
> Hm, this testcase doesn't go together with the above change. There is
> no vectorization involved, and the scan checks that vzeroupper is NOT
> emitted.
>

This testcase is for

insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))

-- 
H.J.
---
gcc/

2010-11-17  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* config/i386/i386.c (block_info_def): Add scaned and no_avx256.
	(move_or_delete_vzeroupper_2): Properly check the end of basic
	block.  Call note_stores only if no_avx256 is false.
	(scan_live_upper_128bits_2): New.
	(scan_live_upper_128bits_1): Likewise.
	(move_or_delete_vzeroupper): Call scan_live_upper_128bits_1 to
	scan predecessor blocks of all exit points.
	(ix86_expand_epilogue): Also check flag_tree_vectorize when
	generating vzeroupper.
	(ix86_expand_call): Likewise.

gcc/testsuite/

2010-11-17  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* gcc.target/i386/avx-vzeroupper-20.c: New.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-18  0:49       ` H.J. Lu
@ 2010-11-18  7:06         ` H.J. Lu
  2010-11-18  9:23         ` Uros Bizjak
  2010-11-18 10:47         ` Uros Bizjak
  2 siblings, 0 replies; 26+ messages in thread
From: H.J. Lu @ 2010-11-18  7:06 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 5437 bytes --]

On Wed, Nov 17, 2010 at 3:36 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Nov 17, 2010 at 11:23 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Wed, Nov 17, 2010 at 2:33 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Tue, Nov 16, 2010 at 11:51 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>> On Wed, Nov 17, 2010 at 7:44 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>>>
>>>>> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))
>>>>>
>>>>> We should check NEXT_INSN (insn) != NEXT_INSN (BB_END (bb)) in
>>>>> move_or_delete_vzeroupper_2.  This patch does it.
>>>>
>>>> Huh? The loop does simple linear scan of all insns in the bb, so it
>>>> can't miss BB_END. IIUC, in your case the bb does not have BB_END
>>>> (bb), but it has NEXT_INSN (BB_END (bb))?
>>>
>>> It has BB_END, but it won't be visited by NEXT_INSN starting from
>>> BB_HEAD. insn != NEXT_INSN (BB_END (bb)) is used to check the
>>> end of the BB everywhere in gcc.
>>>
>>>> Can you please provide a test case that illustrates this?
>>>>
>>>
>>> I am enclosing a work in progress.  We noticed that we are
>>> missing a few vzerouppers at -O3 on SPEC CPU 2K/2006.
>>> One isssue is we may have
>>>
>>> foo:
>>>
>>>       call bar <<<<< Missing vzeroupper
>>>
>>>       256bit vectorized insn
>>>       goto foo
>>>
>>> We miss vzeroupper before call bar.  We don't have a small testcase.
>>> But this patch fixes this case by inspection. We are checking other
>>> cases.
>>
>> @@ -118,9 +118,12 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool
>> upper_128bits_set)
>>             bb->index, upper_128bits_set);
>>
>>   insn = BB_HEAD (bb);
>> +  last = NEXT_INSN (BB_END (bb));
>>   while (insn != BB_END (bb))
>>     {
>>       insn = NEXT_INSN (insn);
>> +      if (insn == last)
>> +       break;
>>
>>       if (!NONDEBUG_INSN_P (insn))
>>        continue;
>>
>> The change above is not needed. The new check is never triggered - the
>> loop terminates when "insn == BB_END (bb)" at "while", so I fail to
>> see why additional termination for "NEXT_INSN (insn) == NEXT_INSN
>> (BB_END (bb))" is needed.
>
> Here is the patch for
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46519
>
> We have 2 blocks pointing to each others. This patch first scans
> all blocks without moving vzeroupper so that we can have accurate
> information about upper 128bits at block entry.
>
>> (The BB_HEAD (bb) is either a NOTE or CODE_LABEL so it can be skipped
>> with NEXT_INSN.)
>
> Please try gcc.target/i386/avx-vzeroupper-20.c.  It will
> trigger this condition.
>
>> @@ -10970,7 +10973,7 @@ ix86_expand_epilogue (int style)
>>
>>   /* Emit vzeroupper if needed.  */
>>   if (TARGET_VZEROUPPER
>> -      && cfun->machine->use_avx256_p
>> +      && (cfun->machine->use_avx256_p || flag_tree_vectorize)
>>       && !cfun->machine->caller_return_avx256_p)
>>     {
>>       cfun->machine->use_vzeroupper_p = 1;
>> @@ -21661,7 +21664,8 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
>>     }
>>
>>   /* Add UNSPEC_CALL_NEEDS_VZEROUPPER decoration.  */
>> -  if (TARGET_VZEROUPPER && cfun->machine->use_avx256_p)
>> +  if (TARGET_VZEROUPPER
>> +      && (cfun->machine->use_avx256_p || flag_tree_vectorize))
>>
>> Decorate *ALL* calls with CALL_NEEDS_VZEROUPPER with
>> -ftree-vectorize?! It looks that parts (or state machine) that set
>> ...->use_avx256_p flag should be fixed.
>
> There are:
>
> foo:
>
>      call bar <<<<< Missing vzeroupper
>
>      256bit vectorized insn
>      goto foo
>
> I couldn't find a hook to set use_avx256_p before RTL expansion
> starts.
>
>>     {
>>       rtx unspec;
>>       int avx256;
>> diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
>> b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
>> new file mode 100644
>> index 0000000..3301083
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
>> @@ -0,0 +1,13 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3 -mavx -mtune=generic -dp" } */
>> +
>> +extern void free (void *);
>> +void
>> +bar (void *ncstrp)
>> +{
>> +  if(ncstrp==((void *)0))
>> +    return;
>> +  free(ncstrp);
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
>>
>> Hm, this testcase doesn't go together with the above change. There is
>> no vectorization involved, and the scan checks that vzeroupper is NOT
>> emitted.
>>
>
> This testcase is for
>
> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))
>
> --
> H.J.
> ---
> gcc/
>
> 2010-11-17  H.J. Lu  <hongjiu.lu@intel.com>
>
>        PR target/46519
>        * config/i386/i386.c (block_info_def): Add scaned and no_avx256.
>        (move_or_delete_vzeroupper_2): Properly check the end of basic
>        block.  Call note_stores only if no_avx256 is false.
>        (scan_live_upper_128bits_2): New.
>        (scan_live_upper_128bits_1): Likewise.
>        (move_or_delete_vzeroupper): Call scan_live_upper_128bits_1 to
>        scan predecessor blocks of all exit points.
>        (ix86_expand_epilogue): Also check flag_tree_vectorize when
>        generating vzeroupper.
>        (ix86_expand_call): Likewise.
>
> gcc/testsuite/
>
> 2010-11-17  H.J. Lu  <hongjiu.lu@intel.com>
>
>        PR target/46519
>        * gcc.target/i386/avx-vzeroupper-20.c: New.
>



-- 
H.J.

[-- Attachment #2: gcc-pr46519-2.patch --]
[-- Type: text/plain, Size: 12434 bytes --]

gcc/

2010-11-17  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* config/i386/i386.c (block_info_def): Add scanned and no_avx256.
	(move_or_delete_vzeroupper_2): Properly check the end of basic
	block.  Call note_stores only if no_avx256 is false.
	(scan_live_upper_128bits_2): New.
	(scan_live_upper_128bits_1): Likewise.
	(move_or_delete_vzeroupper): Call scan_live_upper_128bits_1 to
	scan predecessor blocks of all exit points.
	(use_avx256_p): Removed.
	(init_cumulative_args): Don't set use_avx256_p.
	(ix86_function_arg): Likewise.
	(ix86_expand_move): Likewise.
	(ix86_expand_vector_move_misalign): Likewise.
	(ix86_local_alignment): Likewise.
	(ix86_minimum_alignment): Likewise.
	(ix86_expand_epilogue): Don't check use_avx256_p when generating
	vzeroupper.
	(ix86_expand_call): Likewise.

	* config/i386/i386.h (machine_function): Remove use_avx256_p.

gcc/testsuite/

2010-11-17  H.J. Lu  <hongjiu.lu@intel.com>

	PR target/46519
	* gcc.target/i386/avx-vzeroupper-10.c: Expect no avx_vzeroupper.
	* gcc.target/i386/avx-vzeroupper-11.c: Likewise.

	* gcc.target/i386/avx-vzeroupper-20.c: New.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 11820cf..4b450a3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -63,6 +63,10 @@ typedef struct block_info_def
   bool upper_128bits_set;
   /* TRUE if block has been processed.  */
   bool done;
+  /* TRUE if block has been scanned.  */
+  bool scanned;
+  /* TRUE if 256bit AVX register isn't referenced in block.  */
+  bool no_avx256;
 } *block_info;
 
 #define BLOCK_INFO(B)   ((block_info) (B)->aux)
@@ -108,19 +112,23 @@ check_avx256_stores (rtx dest, const_rtx set, void *data)
 static void
 move_or_delete_vzeroupper_2 (basic_block bb, bool upper_128bits_set)
 {
-  rtx insn;
+  rtx insn, last;
   rtx vzeroupper_insn = NULL_RTX;
   rtx pat;
   int avx256;
+  bool no_avx256 = BLOCK_INFO (bb)->no_avx256;
 
   if (dump_file)
     fprintf (dump_file, " BB [%i] entry: upper 128bits: %d\n",
 	     bb->index, upper_128bits_set);
 
   insn = BB_HEAD (bb);
+  last = NEXT_INSN (BB_END (bb));
   while (insn != BB_END (bb))
     {
       insn = NEXT_INSN (insn);
+      if (insn == last)
+	break;
 
       if (!NONDEBUG_INSN_P (insn))
 	continue;
@@ -176,7 +184,7 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool upper_128bits_set)
 		  vzeroupper_insn = NULL_RTX;
 		}
 	    }
-	  else if (!upper_128bits_set)
+	  else if (!upper_128bits_set && !no_avx256)
 	    note_stores (pat, check_avx256_stores, &upper_128bits_set);
 	  continue;
 	}
@@ -191,8 +199,8 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool upper_128bits_set)
 	     returns 256bit AVX register.  */
 	  upper_128bits_set = (avx256 == callee_return_avx256);
 
-	  /* Remove unnecessary vzeroupper since
-	     upper 128bits are cleared.  */
+	  /* Remove unnecessary vzeroupper since upper 128bits are
+	     cleared.  */
 	  if (dump_file)
 	    {
 	      fprintf (dump_file, "Delete redundant vzeroupper:\n");
@@ -207,8 +215,8 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool upper_128bits_set)
 	     returns 256bit AVX register.  */
 	  upper_128bits_set = (avx256 == callee_return_pass_avx256);
 
-	  /* Must remove vzeroupper since
-	     callee passes in 256bit AVX register.  */
+	  /* Must remove vzeroupper since callee passes in 256bit
+	     AVX register.  */
 	  if (dump_file)
 	    {
 	      fprintf (dump_file, "Delete callee pass vzeroupper:\n");
@@ -265,6 +273,109 @@ move_or_delete_vzeroupper_1 (basic_block block)
   move_or_delete_vzeroupper_2 (block, upper_128bits_set);
 }
 
+/* Helper function for scan_live_upper_128bits_1.  Scan BB to check
+   if the upper 128bits of any AVX registers is live at exit of BB.  */
+
+static void
+scan_live_upper_128bits_2 (basic_block bb, bool upper_128bits_set)
+{
+  rtx insn, pat;
+  int avx256;
+  bool no_avx256 = true;
+
+  if (dump_file)
+    fprintf (dump_file, " BB [%i] entry: upper 128bits: %d\n",
+	     bb->index, upper_128bits_set);
+
+  FOR_BB_INSNS (bb, insn)
+    if (NONJUMP_INSN_P (insn))
+      {
+	pat = PATTERN (insn);
+
+	/* Check insn for vzeroupper intrinsic.  */
+	if (GET_CODE (pat) == UNSPEC_VOLATILE
+	    && XINT (pat, 1) == UNSPECV_VZEROUPPER)
+	  {
+	    /* Process vzeroupper intrinsic.  */
+	    avx256 = INTVAL (XVECEXP (pat, 0, 0));
+	    if (!upper_128bits_set)
+	      {
+		/* Since the upper 128bits are cleared, callee must
+		   not pass 256bit AVX register.  We only need to check
+		   if callee returns 256bit AVX register.  */
+		upper_128bits_set = (avx256 == callee_return_avx256);
+	      }
+	    else if (avx256 == callee_return_pass_avx256
+		     || avx256 == callee_pass_avx256)
+	      {
+		/* Callee passes 256bit AVX register.  Check if callee
+		   returns 256bit AVX register.  */
+		upper_128bits_set = (avx256 == callee_return_pass_avx256);
+	      }
+	    else
+	      upper_128bits_set = false;
+	  }
+	else
+	  {
+	    /* Check insn for vzeroall intrinsic.  */
+	    if (GET_CODE (pat) == PARALLEL
+		&& GET_CODE (XVECEXP (pat, 0, 0)) == UNSPEC_VOLATILE
+		&& XINT (XVECEXP (pat, 0, 0), 1) == UNSPECV_VZEROALL)
+	      upper_128bits_set = false;
+	    else if (!upper_128bits_set)
+	      {
+		note_stores (pat, check_avx256_stores,
+			     &upper_128bits_set);
+		if (upper_128bits_set)
+		  no_avx256 = false;
+	      }
+	  }
+      }
+
+  BLOCK_INFO (bb)->upper_128bits_set = upper_128bits_set;
+  BLOCK_INFO (bb)->no_avx256 = no_avx256;
+
+  if (dump_file)
+    fprintf (dump_file, " BB [%i] exit: upper 128bits: %d\n",
+	     bb->index, upper_128bits_set);
+}
+
+/* Helper function for move_or_delete_vzeroupper.  Scan BLOCK and its
+   predecessor blocks recursively to check if the upper 128bits of any
+   AVX registers is live at exit of BLOCK.  */
+
+static void
+scan_live_upper_128bits_1 (basic_block block)
+{
+  edge e;
+  edge_iterator ei;
+  bool upper_128bits_set;
+
+  if (dump_file)
+    fprintf (dump_file, " Scan BB [%i]: status: %d\n",
+	     block->index, BLOCK_INFO (block)->scanned);
+
+  if (BLOCK_INFO (block)->scanned)
+    return;
+
+  BLOCK_INFO (block)->scanned = true;
+
+  upper_128bits_set = false;
+
+  /* Process all predecessor edges of this block.  */
+  FOR_EACH_EDGE (e, ei, block->preds)
+    {
+      if (e->src == block)
+	continue;
+      scan_live_upper_128bits_1 (e->src);
+      if (BLOCK_INFO (e->src)->upper_128bits_set)
+	upper_128bits_set = true;
+    }
+
+  /* Scan this block.  */
+  scan_live_upper_128bits_2 (block, upper_128bits_set);
+}
+
 /* Go through the instruction stream looking for vzeroupper.  Delete
    it if upper 128bit AVX registers are unused.  If it isn't deleted,
    move it to just before a jump insn.  */
@@ -287,8 +398,16 @@ move_or_delete_vzeroupper (void)
       move_or_delete_vzeroupper_2 (e->dest,
 				   cfun->machine->caller_pass_avx256_p);
       BLOCK_INFO (e->dest)->done = true;
+      BLOCK_INFO (e->dest)->scanned = true;
     }
 
+  /* Scan predecessor blocks of all exit points.  */
+  if (dump_file)
+    fprintf (dump_file, "Scan all exit points\n");
+
+  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds)
+    scan_live_upper_128bits_1 (e->src);
+
   /* Process predecessor blocks of all exit points.  */
   if (dump_file)
     fprintf (dump_file, "Process all exit points\n");
@@ -4062,17 +4181,6 @@ ix86_option_override_internal (bool main_args_p)
     }
 }
 
-/* Return TRUE if type TYPE and mode MODE use 256bit AVX modes.  */
-
-static bool
-use_avx256_p (enum machine_mode mode, const_tree type)
-{
-  return (VALID_AVX256_REG_MODE (mode)
-	  || (type
-	      && TREE_CODE (type) == VECTOR_TYPE
-	      && int_size_in_bytes (type) == 32));
-}
-
 /* Return TRUE if VAL is passed in register with 256bit AVX modes.  */
 
 static bool
@@ -5687,7 +5795,6 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* Argument info to initialize */
       if (function_pass_avx256_p (fnret_value))
 	{
 	  /* The return value of this function uses 256bit AVX modes.  */
-	  cfun->machine->use_avx256_p = true;
 	  if (caller)
 	    cfun->machine->callee_return_avx256_p = true;
 	  else
@@ -6956,7 +7063,6 @@ ix86_function_arg (CUMULATIVE_ARGS *cum, enum machine_mode omode,
   if (TARGET_VZEROUPPER && function_pass_avx256_p (arg))
     {
       /* This argument uses 256bit AVX modes.  */
-      cfun->machine->use_avx256_p = true;
       if (cum->caller)
 	cfun->machine->callee_pass_avx256_p = true;
       else
@@ -10970,7 +11076,6 @@ ix86_expand_epilogue (int style)
 
   /* Emit vzeroupper if needed.  */
   if (TARGET_VZEROUPPER
-      && cfun->machine->use_avx256_p
       && !cfun->machine->caller_return_avx256_p)
     {
       cfun->machine->use_vzeroupper_p = 1;
@@ -15130,9 +15235,6 @@ ix86_expand_move (enum machine_mode mode, rtx operands[])
   rtx op0, op1;
   enum tls_model model;
 
-  if (VALID_AVX256_REG_MODE (mode))
-    cfun->machine->use_avx256_p = true;
-
   op0 = operands[0];
   op1 = operands[1];
 
@@ -15277,9 +15379,6 @@ ix86_expand_vector_move (enum machine_mode mode, rtx operands[])
   rtx op0 = operands[0], op1 = operands[1];
   unsigned int align = GET_MODE_ALIGNMENT (mode);
 
-  if (VALID_AVX256_REG_MODE (mode))
-    cfun->machine->use_avx256_p = true;
-
   /* Force constants other than zero into memory.  We do not know how
      the instructions used to build constants modify the upper 64 bits
      of the register, once we have that information we may be able
@@ -15386,9 +15485,6 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[])
 {
   rtx op0, op1, m;
 
-  if (VALID_AVX256_REG_MODE (mode))
-    cfun->machine->use_avx256_p = true;
-
   op0 = operands[0];
   op1 = operands[1];
 
@@ -21661,7 +21757,7 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
     }
 
   /* Add UNSPEC_CALL_NEEDS_VZEROUPPER decoration.  */
-  if (TARGET_VZEROUPPER && cfun->machine->use_avx256_p)
+  if (TARGET_VZEROUPPER)
     {
       rtx unspec;
       int avx256;
@@ -22763,9 +22859,6 @@ ix86_local_alignment (tree exp, enum machine_mode mode,
       decl = NULL;
     }
 
-  if (use_avx256_p (mode, type))
-    cfun->machine->use_avx256_p = true;
-
   /* Don't do dynamic stack realignment for long long objects with
      -mpreferred-stack-boundary=2.  */
   if (!TARGET_64BIT
@@ -22872,9 +22965,6 @@ ix86_minimum_alignment (tree exp, enum machine_mode mode,
       decl = NULL;
     }
 
-  if (use_avx256_p (mode, type))
-    cfun->machine->use_avx256_p = true;
-
   if (TARGET_64BIT || align != 64 || ix86_preferred_stack_boundary >= 64)
     return align;
 
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 170ad50..f7c38e5 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2299,9 +2299,6 @@ struct GTY(()) machine_function {
   /* Nonzero if the current function uses vzeroupper.  */
   BOOL_BITFIELD use_vzeroupper_p : 1;
 
-  /* Nonzero if the current function uses 256bit AVX regisers.  */
-  BOOL_BITFIELD use_avx256_p : 1;
-
   /* Nonzero if caller passes 256bit AVX modes.  */
   BOOL_BITFIELD caller_pass_avx256_p : 1;
 
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c
index 5007753..667bb17 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-10.c
@@ -14,4 +14,4 @@ foo ()
   _mm256_zeroupper ();
 }
 
-/* { dg-final { scan-assembler-times "avx_vzeroupper" 3 } } */
+/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c
index 507f945..d98ceb9 100644
--- a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-11.c
@@ -16,4 +16,4 @@ foo ()
 }
 
 /* { dg-final { scan-assembler-times "\\*avx_vzeroall" 1 } } */
-/* { dg-final { scan-assembler-times "avx_vzeroupper" 3 } } */
+/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
new file mode 100644
index 0000000..3301083
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx-vzeroupper-20.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx -mtune=generic -dp" } */
+
+extern void free (void *);
+void
+bar (void *ncstrp)
+{
+  if(ncstrp==((void *)0))
+    return;
+  free(ncstrp);
+}
+
+/* { dg-final { scan-assembler-not "avx_vzeroupper" } } */

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-18  0:49       ` H.J. Lu
  2010-11-18  7:06         ` H.J. Lu
@ 2010-11-18  9:23         ` Uros Bizjak
  2010-11-18 10:47         ` Uros Bizjak
  2 siblings, 0 replies; 26+ messages in thread
From: Uros Bizjak @ 2010-11-18  9:23 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches

On Thu, Nov 18, 2010 at 12:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote:

>> @@ -118,9 +118,12 @@ move_or_delete_vzeroupper_2 (basic_block bb, bool
>> upper_128bits_set)
>>             bb->index, upper_128bits_set);
>>
>>   insn = BB_HEAD (bb);
>> +  last = NEXT_INSN (BB_END (bb));
>>   while (insn != BB_END (bb))
>>     {
>>       insn = NEXT_INSN (insn);
>> +      if (insn == last)
>> +       break;
>>
>>       if (!NONDEBUG_INSN_P (insn))
>>        continue;
>>
>> The change above is not needed. The new check is never triggered - the
>> loop terminates when "insn == BB_END (bb)" at "while", so I fail to
>> see why additional termination for "NEXT_INSN (insn) == NEXT_INSN
>> (BB_END (bb))" is needed.
>
> Here is the patch for
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46519
>
> We have 2 blocks pointing to each others. This patch first scans
> all blocks without moving vzeroupper so that we can have accurate
> information about upper 128bits at block entry.
>
>> (The BB_HEAD (bb) is either a NOTE or CODE_LABEL so it can be skipped
>> with NEXT_INSN.)
>
> Please try gcc.target/i386/avx-vzeroupper-20.c.  It will
> trigger this condition.

It doesn't even trigger move_or_delete_vzeroupper.

OK, I have hacked gcc a bit to blindly trigger the function in order
to check the loop iself. For your testcase we process following insn
stream:

 2 bb 2  [10000]
 3 bb 3  [7836]
 4 bb 4  [2164]
    1 NOTE_INSN_DELETED
    4 NOTE_INSN_BASIC_BLOCK
   17 NOTE_INSN_PROLOGUE_END
    3 NOTE_INSN_FUNCTION_BEG
    6 flags:CCZ=cmp(di:DI,0)
    7 pc={(flags:CCZ==0)?L14:pc}
      REG_DEAD: flags:CCZ
      REG_BR_PROB: 0x874
    8 NOTE_INSN_BASIC_BLOCK
   20 NOTE_INSN_EPILOGUE_BEG
   10 call <...>
      REG_DEAD: di:DI
      REG_EH_REGION: 0
i  11: barrier
L14:
   15 NOTE_INSN_BASIC_BLOCK
   19 return
i  18: barrier
   16 NOTE_INSN_DELETED

And the loop processed:

Assembling functions:
 bar
======
    4 NOTE_INSN_BASIC_BLOCK
++++++++
    6 flags:CCZ=cmp(di:DI,0)
++++++++
    7 pc={(flags:CCZ==0)?L14:pc}
      REG_DEAD: flags:CCZ
      REG_BR_PROB: 0x874
======
    8 NOTE_INSN_BASIC_BLOCK
++++++++
   10 call <...>
      REG_DEAD: di:DI
      REG_EH_REGION: 0
======
L14:
++++++++
   23 {return;unspec{0;};}

Where "===" precedes instruction, detected as BB_START and "+++"
precedes instructions that passed !NONDEBUG_INSN_P filter.

Please, can you show me the exact insn that was missed in this
particular testcase. I have used "-O3 -mavx -mtune=generic" and I have
to short-circuit the check for uses_vzeroupper_p at the end of
ix86_reorg.

Uros.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-18  0:49       ` H.J. Lu
  2010-11-18  7:06         ` H.J. Lu
  2010-11-18  9:23         ` Uros Bizjak
@ 2010-11-18 10:47         ` Uros Bizjak
  2 siblings, 0 replies; 26+ messages in thread
From: Uros Bizjak @ 2010-11-18 10:47 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches

On Thu, Nov 18, 2010 at 12:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote:

> Here is the patch for
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46519
>
> We have 2 blocks pointing to each others. This patch first scans
> all blocks without moving vzeroupper so that we can have accurate
> information about upper 128bits at block entry.

This introduces another insn scanning pass, almost the same as
existing vzeroupper pass (modulo CALL_INSN/JUMP_INSN handling).

So, if I understand correctly:
- The patch removes the detection if the function ever touches AVX registers.
- Due to this, all call_insn RTXes have to be decorated with
CALL_NEEDS_VZEROUPPER.
- A new pre-pass is required that scans all functions in order to
detect functions with live AVX registers at exit, and at the same time
marks the functions that *do not* use AVX registers.
- Existing pass then re-scans everything to again detect functions
with live AVX registers at exit and handles vzeroupper emission.

I don't think this approach is acceptable. Maybe a LCM infrastructure
can be used to handle this case?

Uros.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-17  9:24 ` Uros Bizjak
  2010-11-17 15:21   ` H.J. Lu
@ 2010-11-18 18:38   ` H.J. Lu
  2010-11-18 19:27     ` Uros Bizjak
  1 sibling, 1 reply; 26+ messages in thread
From: H.J. Lu @ 2010-11-18 18:38 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

On Tue, Nov 16, 2010 at 11:51 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Wed, Nov 17, 2010 at 7:44 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>
>> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))
>>
>> We should check NEXT_INSN (insn) != NEXT_INSN (BB_END (bb)) in
>> move_or_delete_vzeroupper_2.  This patch does it.
>
> Huh? The loop does simple linear scan of all insns in the bb, so it
> can't miss BB_END. IIUC, in your case the bb does not have BB_END
> (bb), but it has NEXT_INSN (BB_END (bb))?
>
> Can you please provide a test case that illustrates this?
>

ix86_pad_returns forgot to update BB_END when it
replaces it with a new one. OK for trunk?

Thanks.

-- 
H.J.
---
2010-11-18  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386.c (ix86_pad_returns): Update BB_END (bb).

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7eb4116..3cd066d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -29749,7 +29746,8 @@ ix86_pad_returns (void)
 	}
       if (replace)
 	{
-	  emit_jump_insn_before (gen_return_internal_long (), ret);
+	  BB_END (bb)
+	    = emit_jump_insn_before (gen_return_internal_long (), ret);
 	  delete_insn (ret);
 	}
     }

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-18 18:38   ` H.J. Lu
@ 2010-11-18 19:27     ` Uros Bizjak
  2010-11-18 19:51       ` H.J. Lu
  0 siblings, 1 reply; 26+ messages in thread
From: Uros Bizjak @ 2010-11-18 19:27 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches

On Thu, Nov 18, 2010 at 6:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>>> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))
>>>
>>> We should check NEXT_INSN (insn) != NEXT_INSN (BB_END (bb)) in
>>> move_or_delete_vzeroupper_2.  This patch does it.
>>
>> Huh? The loop does simple linear scan of all insns in the bb, so it
>> can't miss BB_END. IIUC, in your case the bb does not have BB_END
>> (bb), but it has NEXT_INSN (BB_END (bb))?
>>
>> Can you please provide a test case that illustrates this?
>>
>
> ix86_pad_returns forgot to update BB_END when it
> replaces it with a new one. OK for trunk?

IMO,  you should just move the call to vzeroupper optimization to be
the first thing in ix86_reorg. This way, ix86_pad_short_function,
ix86_pad_returns and ix86_avoid_jump_mispredict will also count
emitted vzeroupper.

The only possible drawback of this approach would be different
position of nops w.r.t to vzeroupper in case of
ix86_pad_short_functions:

vzeroupper
nop
nop
nop
ret

Uros.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-18 19:27     ` Uros Bizjak
@ 2010-11-18 19:51       ` H.J. Lu
  2010-11-18 20:25         ` Uros Bizjak
  0 siblings, 1 reply; 26+ messages in thread
From: H.J. Lu @ 2010-11-18 19:51 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches

On Thu, Nov 18, 2010 at 10:38 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Thu, Nov 18, 2010 at 6:32 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>>>> insn != BB_END (bb) && NEXT_INSN (insn) == NEXT_INSN (BB_END (bb))
>>>>
>>>> We should check NEXT_INSN (insn) != NEXT_INSN (BB_END (bb)) in
>>>> move_or_delete_vzeroupper_2.  This patch does it.
>>>
>>> Huh? The loop does simple linear scan of all insns in the bb, so it
>>> can't miss BB_END. IIUC, in your case the bb does not have BB_END
>>> (bb), but it has NEXT_INSN (BB_END (bb))?
>>>
>>> Can you please provide a test case that illustrates this?
>>>
>>
>> ix86_pad_returns forgot to update BB_END when it
>> replaces it with a new one. OK for trunk?
>
> IMO,  you should just move the call to vzeroupper optimization to be
> the first thing in ix86_reorg. This way, ix86_pad_short_function,
> ix86_pad_returns and ix86_avoid_jump_mispredict will also count
> emitted vzeroupper.

But it will leave bad BB_END in place.  Any uses of BB_END later
will still be screwed.

> The only possible drawback of this approach would be different
> position of nops w.r.t to vzeroupper in case of
> ix86_pad_short_functions:
>
> vzeroupper
> nop
> nop
> nop
> ret

That is one issue.


-- 
H.J.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-18 19:51       ` H.J. Lu
@ 2010-11-18 20:25         ` Uros Bizjak
  2010-11-18 20:44           ` H.J. Lu
  2010-11-18 22:30           ` Uros Bizjak
  0 siblings, 2 replies; 26+ messages in thread
From: Uros Bizjak @ 2010-11-18 20:25 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Jan Hubicka, Richard Henderson

On Thu, Nov 18, 2010 at 8:37 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

>>> ix86_pad_returns forgot to update BB_END when it
>>> replaces it with a new one. OK for trunk?
>>
>> IMO,  you should just move the call to vzeroupper optimization to be
>> the first thing in ix86_reorg. This way, ix86_pad_short_function,
>> ix86_pad_returns and ix86_avoid_jump_mispredict will also count
>> emitted vzeroupper.
>
> But it will leave bad BB_END in place.  Any uses of BB_END later
> will still be screwed.

I think that Jan or Richard (CC'd) can provide better answer on this issue.

Uros.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-18 20:25         ` Uros Bizjak
@ 2010-11-18 20:44           ` H.J. Lu
  2010-11-18 22:30           ` Uros Bizjak
  1 sibling, 0 replies; 26+ messages in thread
From: H.J. Lu @ 2010-11-18 20:44 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Jan Hubicka, Richard Henderson

On Thu, Nov 18, 2010 at 11:43 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Thu, Nov 18, 2010 at 8:37 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>>>> ix86_pad_returns forgot to update BB_END when it
>>>> replaces it with a new one. OK for trunk?
>>>
>>> IMO,  you should just move the call to vzeroupper optimization to be
>>> the first thing in ix86_reorg. This way, ix86_pad_short_function,
>>> ix86_pad_returns and ix86_avoid_jump_mispredict will also count
>>> emitted vzeroupper.
>>
>> But it will leave bad BB_END in place.  Any uses of BB_END later
>> will still be screwed.
>
> I think that Jan or Richard (CC'd) can provide better answer on this issue.
>

I opened a bug:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46546



-- 
H.J.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-18 20:25         ` Uros Bizjak
  2010-11-18 20:44           ` H.J. Lu
@ 2010-11-18 22:30           ` Uros Bizjak
  2010-11-18 22:33             ` Andrew Pinski
                               ` (3 more replies)
  1 sibling, 4 replies; 26+ messages in thread
From: Uros Bizjak @ 2010-11-18 22:30 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches, Jan Hubicka, Richard Henderson

On Thu, Nov 18, 2010 at 8:43 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Thu, Nov 18, 2010 at 8:37 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>>>> ix86_pad_returns forgot to update BB_END when it
>>>> replaces it with a new one. OK for trunk?
>>>
>>> IMO,  you should just move the call to vzeroupper optimization to be
>>> the first thing in ix86_reorg. This way, ix86_pad_short_function,
>>> ix86_pad_returns and ix86_avoid_jump_mispredict will also count
>>> emitted vzeroupper.
>>
>> But it will leave bad BB_END in place.  Any uses of BB_END later
>> will still be screwed.
>
> I think that Jan or Richard (CC'd) can provide better answer on this issue.

Got it.

It is a pass ordering problem, we free CFG before machine reorg pass,
so BLOCK_FOR_INSN in machine reorg pass does not work anymore (it
returns 0).

remove_insn (called from delete_insn) can correctly fixup BB_END (see
emit-rtl.c, line 3883), but it needs data from BLOCK_FOR_INSN to
figure out BB_END of the bb of the insn it processes.

The fix is then trivial.

2010-11-18  Uros Bizjak  <ubizjak@gmail.com>

	PR middle-end/46546
	* passes.c (init_optimization_passes): Move machine_reorg pass before
	free_cfg pass.

Tested on x86_64-pc-linux-gnu {,-m32}. OK for mainline and release branches?

Uros.

Index: passes.c
===================================================================
--- passes.c	(revision 166920)
+++ passes.c	(working copy)
@@ -1051,8 +1051,8 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_compute_alignments);
 	  NEXT_PASS (pass_duplicate_computed_gotos);
 	  NEXT_PASS (pass_variable_tracking);
-	  NEXT_PASS (pass_free_cfg);
 	  NEXT_PASS (pass_machine_reorg);
+	  NEXT_PASS (pass_free_cfg);
 	  NEXT_PASS (pass_cleanup_barriers);
 	  NEXT_PASS (pass_delay_slots);
 	  NEXT_PASS (pass_split_for_shorten_branches);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-18 22:30           ` Uros Bizjak
@ 2010-11-18 22:33             ` Andrew Pinski
  2010-11-18 22:36             ` Jakub Jelinek
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 26+ messages in thread
From: Andrew Pinski @ 2010-11-18 22:33 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: H.J. Lu, gcc-patches, Jan Hubicka, Richard Henderson

On Thu, Nov 18, 2010 at 2:15 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 2010-11-18  Uros Bizjak  <ubizjak@gmail.com>
>
>        PR middle-end/46546
>        * passes.c (init_optimization_passes): Move machine_reorg pass before
>        free_cfg pass.
>
> Tested on x86_64-pc-linux-gnu {,-m32}. OK for mainline and release branches?

I don't think this will work because machine_reorg on some target
recreate the BB's anyways.  I think x86 should do that.

-- Pinski

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-18 22:30           ` Uros Bizjak
  2010-11-18 22:33             ` Andrew Pinski
@ 2010-11-18 22:36             ` Jakub Jelinek
  2010-11-18 22:55               ` Uros Bizjak
  2010-11-18 22:50             ` Richard Henderson
  2010-11-19  9:56             ` Jan Hubicka
  3 siblings, 1 reply; 26+ messages in thread
From: Jakub Jelinek @ 2010-11-18 22:36 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: H.J. Lu, gcc-patches, Jan Hubicka, Richard Henderson

On Thu, Nov 18, 2010 at 11:15:47PM +0100, Uros Bizjak wrote:
> On Thu, Nov 18, 2010 at 8:43 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> > On Thu, Nov 18, 2010 at 8:37 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> The fix is then trivial.

Not so much.

> 
> 2010-11-18  Uros Bizjak  <ubizjak@gmail.com>
> 
> 	PR middle-end/46546
> 	* passes.c (init_optimization_passes): Move machine_reorg pass before
> 	free_cfg pass.
> 
> Tested on x86_64-pc-linux-gnu {,-m32}. OK for mainline and release branches?

I'm afraid this is going to break various targets, such change can't be
taken lightly and testing just on 2 targets is definitely not sufficient.

Definitely not something that should be applied ever to release branches,
not sure if it is something that should be done in stage3 for 4.6.

Targets that need cfg in the reorg pass compute it themselves (e.g. ia64),
other targets could depend on that the CFG is gone.

Why does i?86 actually care about CFG in its reorg pass, unlike targets
that do scheduling etc. I don't see why it should care.

E.g. ia64's comments say:
static void
ia64_reorg (void)
{ 
  /* We are freeing block_for_insn in the toplev to keep compatibility
     with old MDEP_REORGS that are not CFG based.  Recompute it now.  */
  compute_bb_for_insn ();
...

I'd say that applying this patch after testing/converting all targets
would be nice thing to do for stage1.

	Jakub

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-18 22:30           ` Uros Bizjak
  2010-11-18 22:33             ` Andrew Pinski
  2010-11-18 22:36             ` Jakub Jelinek
@ 2010-11-18 22:50             ` Richard Henderson
  2010-11-18 23:48               ` Uros Bizjak
  2010-11-19  9:56             ` Jan Hubicka
  3 siblings, 1 reply; 26+ messages in thread
From: Richard Henderson @ 2010-11-18 22:50 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: H.J. Lu, gcc-patches, Jan Hubicka

On 11/18/2010 02:15 PM, Uros Bizjak wrote:
> Tested on x86_64-pc-linux-gnu {,-m32}. OK for mainline and release branches?

Definitely not, as mentioned elsewhere.

I just thought that I'd add that, for next stage1, it would be nice
to have a targetm boolean that says whether pass_machine_reorg needs
the cfg.  Because it does seem silly to free the cfg only to have
to immeidately re-compute it.

But that's not something to fix now.


r~

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-18 22:36             ` Jakub Jelinek
@ 2010-11-18 22:55               ` Uros Bizjak
  0 siblings, 0 replies; 26+ messages in thread
From: Uros Bizjak @ 2010-11-18 22:55 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: H.J. Lu, gcc-patches, Jan Hubicka, Richard Henderson

On Thu, Nov 18, 2010 at 11:24 PM, Jakub Jelinek <jakub@redhat.com> wrote:

>> 2010-11-18  Uros Bizjak  <ubizjak@gmail.com>
>>
>>       PR middle-end/46546
>>       * passes.c (init_optimization_passes): Move machine_reorg pass before
>>       free_cfg pass.
>>
>> Tested on x86_64-pc-linux-gnu {,-m32}. OK for mainline and release branches?
>
> I'm afraid this is going to break various targets, such change can't be
> taken lightly and testing just on 2 targets is definitely not sufficient.
>
> Definitely not something that should be applied ever to release branches,
> not sure if it is something that should be done in stage3 for 4.6.

Note taken.

> Targets that need cfg in the reorg pass compute it themselves (e.g. ia64),
> other targets could depend on that the CFG is gone.
>
> Why does i?86 actually care about CFG in its reorg pass, unlike targets
> that do scheduling etc. I don't see why it should care.

Just for the sole delete_insn of the insn at the BB_END in
ix86_pad_returns. We can in fact manually update BB_END, as H.J.
proposed.

Uros.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-18 22:50             ` Richard Henderson
@ 2010-11-18 23:48               ` Uros Bizjak
  2010-11-19  0:56                 ` Richard Henderson
  0 siblings, 1 reply; 26+ messages in thread
From: Uros Bizjak @ 2010-11-18 23:48 UTC (permalink / raw)
  To: Richard Henderson; +Cc: H.J. Lu, gcc-patches, Jan Hubicka

On Thu, Nov 18, 2010 at 11:32 PM, Richard Henderson <rth@redhat.com> wrote:
> On 11/18/2010 02:15 PM, Uros Bizjak wrote:
>> Tested on x86_64-pc-linux-gnu {,-m32}. OK for mainline and release branches?
>
> Definitely not, as mentioned elsewhere.
>
> I just thought that I'd add that, for next stage1, it would be nice
> to have a targetm boolean that says whether pass_machine_reorg needs
> the cfg.  Because it does seem silly to free the cfg only to have
> to immeidately re-compute it.
>
> But that's not something to fix now.

The approach, proposed by H.J also works. IMO, the patch is OK,
perhaps with a comment like:

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 166920)
+++ config/i386/i386.c	(working copy)
@@ -29640,7 +29640,11 @@ ix86_pad_returns (void)
 	}
       if (replace)
 	{
-	  emit_jump_insn_before (gen_return_internal_long (), ret);
+	  /* We have to update BB_END (bb) here - delete_insn will
+	     not do it automatically since CFG is not available in
+	     machine_reorg pass.  */
+	  BB_END (bb)
+	    = emit_jump_insn_before (gen_return_internal_long (), ret);
 	  delete_insn (ret);
 	}
     }

Uros.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-18 23:48               ` Uros Bizjak
@ 2010-11-19  0:56                 ` Richard Henderson
  2010-11-19  1:15                   ` Uros Bizjak
  0 siblings, 1 reply; 26+ messages in thread
From: Richard Henderson @ 2010-11-19  0:56 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: H.J. Lu, gcc-patches, Jan Hubicka

On 11/18/2010 02:50 PM, Uros Bizjak wrote:
> -	  emit_jump_insn_before (gen_return_internal_long (), ret);
> +	  /* We have to update BB_END (bb) here - delete_insn will
> +	     not do it automatically since CFG is not available in
> +	     machine_reorg pass.  */
> +	  BB_END (bb)
> +	    = emit_jump_insn_before (gen_return_internal_long (), ret);

While that by itself is fine, calling compute_bb_for_insn at the 
beginning of md_reorg will make sure that things stay up-to-date
for any other changes that are being made within that function.

I.e. compute_bb_for_insn seems safer overall.


r~

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-19  0:56                 ` Richard Henderson
@ 2010-11-19  1:15                   ` Uros Bizjak
  0 siblings, 0 replies; 26+ messages in thread
From: Uros Bizjak @ 2010-11-19  1:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: H.J. Lu, gcc-patches, Jan Hubicka

On Fri, Nov 19, 2010 at 12:09 AM, Richard Henderson <rth@redhat.com> wrote:
> On 11/18/2010 02:50 PM, Uros Bizjak wrote:
>> -       emit_jump_insn_before (gen_return_internal_long (), ret);
>> +       /* We have to update BB_END (bb) here - delete_insn will
>> +          not do it automatically since CFG is not available in
>> +          machine_reorg pass.  */
>> +       BB_END (bb)
>> +         = emit_jump_insn_before (gen_return_internal_long (), ret);
>
> While that by itself is fine, calling compute_bb_for_insn at the
> beginning of md_reorg will make sure that things stay up-to-date
> for any other changes that are being made within that function.
>
> I.e. compute_bb_for_insn seems safer overall.

Thanks,

attached patch adds x86 as a 10th member to an unhappy family of
targets that are crippled by a compatibility requirements...

2010-11-19  Uros Bizjak  <ubizjak@gmail.com>

	PR target/46546
	* config/i386/i386.c (ix86_reorg): Call compute_bb_for_insn.

Bootstrapped on x86_64-pc-linux-gnu {,32}, will commit to mainline
once regression test ends.

Uros.

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 166920)
+++ config/i386/i386.c	(working copy)
@@ -29765,6 +29765,10 @@ ix86_pad_short_function (void)
 static void
 ix86_reorg (void)
 {
+  /* We are freeing block_for_insn in the toplev to keep compatibility
+     with old MDEP_REORGS that are not CFG based.  Recompute it now.  */
+  compute_bb_for_insn ();
+
   if (optimize && optimize_function_for_speed_p (cfun))
     {
       if (TARGET_PAD_SHORT_FUNCTION)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-18 22:30           ` Uros Bizjak
                               ` (2 preceding siblings ...)
  2010-11-18 22:50             ` Richard Henderson
@ 2010-11-19  9:56             ` Jan Hubicka
  3 siblings, 0 replies; 26+ messages in thread
From: Jan Hubicka @ 2010-11-19  9:56 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: H.J. Lu, gcc-patches, Jan Hubicka, Richard Henderson

> On Thu, Nov 18, 2010 at 8:43 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> > On Thu, Nov 18, 2010 at 8:37 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> >>>> ix86_pad_returns forgot to update BB_END when it
> >>>> replaces it with a new one. OK for trunk?
> >>>
> >>> IMO,  you should just move the call to vzeroupper optimization to be
> >>> the first thing in ix86_reorg. This way, ix86_pad_short_function,
> >>> ix86_pad_returns and ix86_avoid_jump_mispredict will also count
> >>> emitted vzeroupper.
> >>
> >> But it will leave bad BB_END in place.  Any uses of BB_END later
> >> will still be screwed.
> >
> > I think that Jan or Richard (CC'd) can provide better answer on this issue.
> 
> Got it.
> 
> It is a pass ordering problem, we free CFG before machine reorg pass,
> so BLOCK_FOR_INSN in machine reorg pass does not work anymore (it
> returns 0).

If the question was why machine reorg runs after cfg freeing, the reason is
that all the machine reorgs except for Itanium one (at least last time I looked)
were not aware of CFG.  I never felt like getting all of them updated as it is bit
tricky (they do ugly stuff like inserting constant pool right into code that has CFG
representation issues)

Honza

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-23 12:42 ` Richard Sandiford
@ 2010-11-23 13:28   ` Steven Bosscher
  0 siblings, 0 replies; 26+ messages in thread
From: Steven Bosscher @ 2010-11-23 13:28 UTC (permalink / raw)
  To: Steven Bosscher, Uros Bizjak, H.J. Lu, gcc-patches, Jan Hubicka,
	Richard Henderson, rdsandiford

On Tue, Nov 23, 2010 at 11:11 AM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> Steven Bosscher <stevenb.gcc@gmail.com> writes:
>>> Index: passes.c
>>> ===================================================================
>>> --- passes.c (revision 166920)
>>> +++ passes.c (working copy)
>>> @@ -1051,8 +1051,8 @@ init_optimization_passes (void)
>>>        NEXT_PASS (pass_compute_alignments);
>>>        NEXT_PASS (pass_duplicate_computed_gotos);
>>>        NEXT_PASS (pass_variable_tracking);
>>> -      NEXT_PASS (pass_free_cfg);
>>>        NEXT_PASS (pass_machine_reorg);
>>> +      NEXT_PASS (pass_free_cfg);
>>>        NEXT_PASS (pass_cleanup_barriers);
>>>        NEXT_PASS (pass_delay_slots);
>>>        NEXT_PASS (pass_split_for_shorten_branches);
>>
>> This breaks at least all targets that run delay slot scheduling during
>> machine-reorg (MIPS), targets that layout constant pools (SH,ARM),
>> targets that recompute the CFG in their machine-reorg (blackfin, ia64,
>> MIPS (?!)).
>
> Just out of curiosity, why the (?!) ?  We need dataflow info when
> working around an r10k speculative execution errata and when relaxing
> PIC calls.  We free it again before doing DBR, like you say.

The (?!) because it's just funny that on the one hand MIPS
re-calculates the CFG for some subtargets, but cannot work if the CFG
is not released before machine_reorg for some other subtargets. This
means that just putting pass_free_cfg after machine_reorg in passes.c
is impossible (MIPS will break, and recalculating the CFG after
dbr_sched is also impossible) but free-ing the CFG before
machine_reorg (like it is now) causes extra work for MIPS.  Kind-of
schizofrenic port :-)

Ciao!
Steven

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-18 23:09 Steven Bosscher
  2010-11-18 23:37 ` Uros Bizjak
@ 2010-11-23 12:42 ` Richard Sandiford
  2010-11-23 13:28   ` Steven Bosscher
  1 sibling, 1 reply; 26+ messages in thread
From: Richard Sandiford @ 2010-11-23 12:42 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: Uros Bizjak, H.J. Lu, gcc-patches, Jan Hubicka, Richard Henderson

Steven Bosscher <stevenb.gcc@gmail.com> writes:
>> Index: passes.c
>> ===================================================================
>> --- passes.c	(revision 166920)
>> +++ passes.c	(working copy)
>> @@ -1051,8 +1051,8 @@ init_optimization_passes (void)
>>  	  NEXT_PASS (pass_compute_alignments);
>>  	  NEXT_PASS (pass_duplicate_computed_gotos);
>>  	  NEXT_PASS (pass_variable_tracking);
>> -	  NEXT_PASS (pass_free_cfg);
>>  	  NEXT_PASS (pass_machine_reorg);
>> +	  NEXT_PASS (pass_free_cfg);
>>  	  NEXT_PASS (pass_cleanup_barriers);
>>  	  NEXT_PASS (pass_delay_slots);
>>  	  NEXT_PASS (pass_split_for_shorten_branches);
>
> This breaks at least all targets that run delay slot scheduling during
> machine-reorg (MIPS), targets that layout constant pools (SH,ARM),
> targets that recompute the CFG in their machine-reorg (blackfin, ia64,
> MIPS (?!)).

Just out of curiosity, why the (?!) ?  We need dataflow info when
working around an r10k speculative execution errata and when relaxing
PIC calls.  We free it again before doing DBR, like you say.

Richard

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
  2010-11-18 23:09 Steven Bosscher
@ 2010-11-18 23:37 ` Uros Bizjak
  2010-11-23 12:42 ` Richard Sandiford
  1 sibling, 0 replies; 26+ messages in thread
From: Uros Bizjak @ 2010-11-18 23:37 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: H.J. Lu, gcc-patches, Jan Hubicka, Richard Henderson

On Thu, Nov 18, 2010 at 11:36 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
>> Index: passes.c
>> ===================================================================
>> --- passes.c  (revision 166920)
>> +++ passes.c  (working copy)
>> @@ -1051,8 +1051,8 @@ init_optimization_passes (void)
>>         NEXT_PASS (pass_compute_alignments);
>>         NEXT_PASS (pass_duplicate_computed_gotos);
>>         NEXT_PASS (pass_variable_tracking);
>> -       NEXT_PASS (pass_free_cfg);
>>         NEXT_PASS (pass_machine_reorg);
>> +       NEXT_PASS (pass_free_cfg);
>>         NEXT_PASS (pass_cleanup_barriers);
>>         NEXT_PASS (pass_delay_slots);
>>         NEXT_PASS (pass_split_for_shorten_branches);
>
> This breaks at least all targets that run delay slot scheduling during
> machine-reorg (MIPS), targets that layout constant pools (SH,ARM),
> targets that recompute the CFG in their machine-reorg (blackfin, ia64,
> MIPS (?!)).
>
> But I'm impressed that an ix86 bootstrap survives with this patch.
> Have you verified that there are no changes in the generated code?

Bootstrap and regression test went both without problems/regressions.
I didn't investigate code changes.

Uros.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Properly check the end of basic block
@ 2010-11-18 23:09 Steven Bosscher
  2010-11-18 23:37 ` Uros Bizjak
  2010-11-23 12:42 ` Richard Sandiford
  0 siblings, 2 replies; 26+ messages in thread
From: Steven Bosscher @ 2010-11-18 23:09 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: H.J. Lu, gcc-patches, Jan Hubicka, Richard Henderson

> Index: passes.c
> ===================================================================
> --- passes.c	(revision 166920)
> +++ passes.c	(working copy)
> @@ -1051,8 +1051,8 @@ init_optimization_passes (void)
>  	  NEXT_PASS (pass_compute_alignments);
>  	  NEXT_PASS (pass_duplicate_computed_gotos);
>  	  NEXT_PASS (pass_variable_tracking);
> -	  NEXT_PASS (pass_free_cfg);
>  	  NEXT_PASS (pass_machine_reorg);
> +	  NEXT_PASS (pass_free_cfg);
>  	  NEXT_PASS (pass_cleanup_barriers);
>  	  NEXT_PASS (pass_delay_slots);
>  	  NEXT_PASS (pass_split_for_shorten_branches);

This breaks at least all targets that run delay slot scheduling during
machine-reorg (MIPS), targets that layout constant pools (SH,ARM),
targets that recompute the CFG in their machine-reorg (blackfin, ia64,
MIPS (?!)).

But I'm impressed that an ix86 bootstrap survives with this patch.
Have you verified that there are no changes in the generated code?

Ciao!
Steven

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2010-11-23 12:06 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-17  8:34 PATCH: Properly check the end of basic block H.J. Lu
2010-11-17  9:24 ` Uros Bizjak
2010-11-17 15:21   ` H.J. Lu
2010-11-17 20:26     ` Uros Bizjak
2010-11-18  0:49       ` H.J. Lu
2010-11-18  7:06         ` H.J. Lu
2010-11-18  9:23         ` Uros Bizjak
2010-11-18 10:47         ` Uros Bizjak
2010-11-18 18:38   ` H.J. Lu
2010-11-18 19:27     ` Uros Bizjak
2010-11-18 19:51       ` H.J. Lu
2010-11-18 20:25         ` Uros Bizjak
2010-11-18 20:44           ` H.J. Lu
2010-11-18 22:30           ` Uros Bizjak
2010-11-18 22:33             ` Andrew Pinski
2010-11-18 22:36             ` Jakub Jelinek
2010-11-18 22:55               ` Uros Bizjak
2010-11-18 22:50             ` Richard Henderson
2010-11-18 23:48               ` Uros Bizjak
2010-11-19  0:56                 ` Richard Henderson
2010-11-19  1:15                   ` Uros Bizjak
2010-11-19  9:56             ` Jan Hubicka
2010-11-18 23:09 Steven Bosscher
2010-11-18 23:37 ` Uros Bizjak
2010-11-23 12:42 ` Richard Sandiford
2010-11-23 13:28   ` Steven Bosscher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).