public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "H.J. Lu" <hjl.tools@gmail.com>
To: Uros Bizjak <ubizjak@gmail.com>
Cc: gcc-patches@gcc.gnu.org, Richard Guenther <rguenther@suse.de>,
		Jakub Jelinek <jakub@redhat.com>,
	Mark Mitchell <mark@codesourcery.com>
Subject: Re: PATCH: PR target/46519: Missing vzeroupper
Date: Wed, 29 Dec 2010 16:23:00 -0000	[thread overview]
Message-ID: <AANLkTi=0tNb1FX7XweRCUQBh6OUjNW7f6+vyO0YuYk=z@mail.gmail.com> (raw)
In-Reply-To: <AANLkTi=K3bPqru+joTk1fsjthVnSe0EMkoEF=tGr7SxF@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3675 bytes --]

On Wed, Dec 29, 2010 at 1:10 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sat, Dec 18, 2010 at 7:10 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Sat, Dec 18, 2010 at 9:48 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Fri, Dec 17, 2010 at 8:03 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>>
>>>> This patch fixes another missing vzeroupper.  OK for trunk?
>
>> I'd like to apply this patch instead. It removes escan_move_or_delete_vzeroupper
>> and rewrites move_or_delete_vzeroupper_1 to avoid recursive call. It first scans
>> all basic blocks repeatedly until no basic block changes the upper
>> 128bits of AVX
>> to used at exit.  Then it rescans all basic blocks with unknown upper
>> 128bit state.
>> OK for trunk?
>
> H.J. explained me in a private mail about the importance of this
> patch. I think that the quote below explains it:
>
> <quote>
>> I'm not sure that the algorithm is correct (and I don't have enough
>> experience in this area), so I'd rather leave the review to someone
>> else. AFAICS, there can be 20 passes, and from comments, it is
>> questionable if this is enough.
>
> I tried several benchmarks which failed before my patch.  The most pass
> I saw is 2. I can change it to 2 and re-run SPEC CPU 2K/2006 to find
> out what the smallest pass should be.
>
>> I propose that you commit your previous (simple) patch, since IMO this
>
> My simple patch doesn't work on SPEC CPU 2K/2006. It isn't very
> useful for 4.6.
>
>> one is too invasive for this development stage. However, I still think
>
> The old algorithm is obviously incorrect. The new algorithm removes the
> recursive calls and is simpler/faster than the old one.  vzeroupper optimization
> is a very important new feature for AVX. The current implementation is
> incorrect.  I'd like to fix it before 4.6 is released.
>
>> that LCM infrastructure (see lcm.c) should be used to place
>> vzerouppers at optimum points.
>
> We will investigate LCM for 4.7.
> </qoute>
>
> I think that due to these reasons, the patch should be committed to
> SVN even in this development stage. Even if the algorithm is not
> optimal, the patch demonstrably produces substantially better code.
> This feature has no impact on generic code without -mvzeroupper /
> -mavx switch, and since there are currently very few AVX users,
> negligible overall impact.
>
>> gcc/
>>
>> 2010-12-18  H.J. Lu  <hongjiu.lu@intel.com>
>>
>>        PR target/46519
>>        * config/i386/i386.c (block_info_def): Remove referenced, count
>>        and rescanned.
>>        (move_or_delete_vzeroupper_2): Updated.
>>        (move_or_delete_vzeroupper_1): Rewritten to avoid recursive call.
>>        (rescan_move_or_delete_vzeroupper): Removed.
>>        (move_or_delete_vzeroupper): Repeat processing all basic blocks
>>        until no basic block state is changed to used at exit.
>>
>> gcc/testsuite/
>>
>> 2010-12-18  H.J. Lu  <hongjiu.lu@intel.com>
>>
>>        PR target/46519
>>        * gfortran.dg/pr46519-2.f90: New.
>>
>
> The patch is OK, but please allow a day or two for RMs (CC'd) to
> eventually comment.

We will investigate LCM for 4.7.  In the meantime, here is  a small patch
on top of the current one. If the upper 128bits are never changed in a basic
block, we can skip it in the later passes.  OK for trunk together with the
current patch?

Thanks.


-- 
H.J.
---
2010-12-29  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386.c (upper_128bits_state): Update comments.
	(block_info_def): Add unchanged.
	(move_or_delete_vzeroupper_2): Short circuit if upper 128bits
	are unchanged in the block.

[-- Attachment #2: gcc-pr46519-9.patch --]
[-- Type: text/plain, Size: 3719 bytes --]

2010-12-29  H.J. Lu  <hongjiu.lu@intel.com>

	* config/i386/i386.c (upper_128bits_state): Update comments.
	(block_info_def): Add unchanged.
	(move_or_delete_vzeroupper_2): Short circuit if upper 128bits
	are unchanged in the block.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 28b26ef..2d06c04 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -60,14 +60,17 @@ along with GCC; see the file COPYING3.  If not see
 enum upper_128bits_state
 {
   unknown = 0,		/* Unknown.  */
-  unused,		/* Not used or not referenced.  */
-  used			/* Used or referenced.  */
+  unused,		/* Not used.  */
+  used			/* Used.  */
 };
 
 typedef struct block_info_def
 {
-  /* State of the upper 128bits of any AVX registers at exit.  */
+  /* State of the upper 128bits of AVX registers at exit.  */
   enum upper_128bits_state state;
+  /* TRUE if state of the upper 128bits of AVX registers is unchanged
+     in this block.  */
+  bool unchanged;
   /* TRUE if block has been processed.  */
   bool processed;
 } *block_info;
@@ -110,8 +113,7 @@ check_avx256_stores (rtx dest, const_rtx set, void *data)
    in basic block BB.  Delete it if upper 128bit AVX registers are
    unused.  If it isn't deleted, move it to just before a jump insn.
    
-   UPPER_128BITS_LIVE is TRUE if the upper 128bits of any AVX registers
-   are live at entry.  */
+   STATE is state of the upper 128bits of AVX registers at entry.  */
 
 static void
 move_or_delete_vzeroupper_2 (basic_block bb,
@@ -121,11 +123,24 @@ move_or_delete_vzeroupper_2 (basic_block bb,
   rtx vzeroupper_insn = NULL_RTX;
   rtx pat;
   int avx256;
+  bool unchanged;
+
+  if (BLOCK_INFO (bb)->unchanged)
+    {
+      if (dump_file)
+	fprintf (dump_file, " [bb %i] unchanged: upper 128bits: %d\n",
+		 bb->index, state);
+
+      BLOCK_INFO (bb)->state = state;
+      return;
+    }
 
   if (dump_file)
     fprintf (dump_file, " [bb %i] entry: upper 128bits: %d\n",
 	     bb->index, state);
 
+  unchanged = true;
+
   /* BB_END changes when it is deleted.  */
   bb_end = BB_END (bb);
   insn = BB_HEAD (bb);
@@ -179,6 +194,7 @@ move_or_delete_vzeroupper_2 (basic_block bb,
 	      && XINT (XVECEXP (pat, 0, 0), 1) == UNSPECV_VZEROALL)
 	    {
 	      state = unused;
+	      unchanged = false;
 
 	      /* Delete pending vzeroupper insertion.  */
 	      if (vzeroupper_insn)
@@ -189,9 +205,9 @@ move_or_delete_vzeroupper_2 (basic_block bb,
 	    }
 	  else if (state != used)
 	    {
-	      /* No need to call note_stores if the upper 128bits of
-		 AVX registers are never referenced.  */
 	      note_stores (pat, check_avx256_stores, &state);
+	      if (state == used)
+		unchanged = false;
 	    }
 	  continue;
 	}
@@ -205,7 +221,10 @@ move_or_delete_vzeroupper_2 (basic_block bb,
 	     256bit AVX register.  We only need to check if callee
 	     returns 256bit AVX register.  */
 	  if (avx256 == callee_return_avx256)
-	    state = used;
+	    {
+	      state = used;
+	      unchanged = false;
+	    }
 
 	  /* Remove unnecessary vzeroupper since upper 128bits are
 	     cleared.  */
@@ -236,15 +255,20 @@ move_or_delete_vzeroupper_2 (basic_block bb,
 	      delete_insn (insn);
 	    }
 	  else
-	    vzeroupper_insn = insn;
+	    {
+	      vzeroupper_insn = insn;
+	      unchanged = false;
+	    }
 	}
     }
 
   BLOCK_INFO (bb)->state = state;
+  BLOCK_INFO (bb)->unchanged = unchanged;
 
   if (dump_file)
-    fprintf (dump_file, " [bb %i] exit: upper 128bits: %d\n",
-	     bb->index, state);
+    fprintf (dump_file, " [bb %i] exit: %s: upper 128bits: %d\n",
+	     bb->index, unchanged ? "unchanged" : "changed",
+	     state);
 }
 
 /* Helper function for move_or_delete_vzeroupper.  Process vzeroupper

  reply	other threads:[~2010-12-29 15:32 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-17 19:45 H.J. Lu
2010-12-18 19:36 ` Uros Bizjak
2010-12-18 20:11   ` H.J. Lu
2010-12-29 11:03     ` Uros Bizjak
2010-12-29 16:23       ` H.J. Lu [this message]
2010-12-30 12:42         ` Uros Bizjak
2011-01-01  1:05           ` Mark Mitchell
2011-01-01  1:38             ` H.J. Lu
2011-01-01  1:39               ` Mark Mitchell
2011-01-01  2:08                 ` H.J. Lu
2011-01-01  2:17                   ` Mark Mitchell
2011-01-01 16:01                     ` H.J. Lu
2011-01-04  1:15                       ` Mark Mitchell
2011-01-04  3:59                         ` H.J. Lu
2011-01-04  5:54                           ` Mark Mitchell
2011-01-04 22:17                             ` H.J. Lu
2011-01-04 23:53                               ` Mark Mitchell
2011-01-05  0:06                                 ` H.J. Lu
2011-01-05  0:08                                   ` Mark Mitchell
2011-01-05  0:09                                     ` H.J. Lu
2011-01-05  0:24                                       ` Mark Mitchell
2011-01-05 16:44                                         ` H.J. Lu
2011-01-05 17:12                                           ` Jakub Jelinek
2011-01-05 23:01                                             ` H.J. Lu
2011-01-13 17:19                                             ` H.J. Lu
2011-01-13 17:25                                               ` Mark Mitchell
2011-01-13 18:16                                               ` Richard Henderson
2011-01-13 18:51                                                 ` H.J. Lu
2011-01-14 16:06                                                   ` Richard Henderson
2011-01-14 16:08                                                     ` H.J. Lu
2011-01-16  8:04                                                       ` H.J. Lu
2011-01-24 18:00                                                         ` Richard Henderson
2011-01-24 18:12                                                           ` H.J. Lu
2011-01-13 18:04                                           ` Richard Henderson
2011-01-13 18:09                                             ` H.J. Lu
  -- strict thread matches above, loose matches on Subject: below --
2010-11-19 21:58 H.J. Lu
2010-11-20  0:24 ` Richard Guenther
2010-11-20  1:48   ` H.J. Lu
2010-11-20 12:11     ` Richard Guenther
2010-11-20 18:20       ` H.J. Lu
2010-11-24 19:48         ` Uros Bizjak
2010-11-24 19:53           ` H.J. Lu
2010-11-24 19:57             ` Uros Bizjak
2010-11-24 21:41               ` H.J. Lu
2010-11-18  7:29 H.J. Lu
2010-11-18  8:34 ` H.J. Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='AANLkTi=0tNb1FX7XweRCUQBh6OUjNW7f6+vyO0YuYk=z@mail.gmail.com' \
    --to=hjl.tools@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=mark@codesourcery.com \
    --cc=rguenther@suse.de \
    --cc=ubizjak@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).