From: "H.J. Lu" <hjl.tools@gmail.com>
To: Uros Bizjak <ubizjak@gmail.com>
Cc: gcc-patches@gcc.gnu.org, Richard Guenther <rguenther@suse.de>,
Jakub Jelinek <jakub@redhat.com>,
Mark Mitchell <mark@codesourcery.com>
Subject: Re: PATCH: PR target/46519: Missing vzeroupper
Date: Wed, 29 Dec 2010 16:23:00 -0000 [thread overview]
Message-ID: <AANLkTi=0tNb1FX7XweRCUQBh6OUjNW7f6+vyO0YuYk=z@mail.gmail.com> (raw)
In-Reply-To: <AANLkTi=K3bPqru+joTk1fsjthVnSe0EMkoEF=tGr7SxF@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3675 bytes --]
On Wed, Dec 29, 2010 at 1:10 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sat, Dec 18, 2010 at 7:10 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Sat, Dec 18, 2010 at 9:48 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Fri, Dec 17, 2010 at 8:03 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>>
>>>> This patch fixes another missing vzeroupper. OK for trunk?
>
>> I'd like to apply this patch instead. It removes escan_move_or_delete_vzeroupper
>> and rewrites move_or_delete_vzeroupper_1 to avoid recursive call. It first scans
>> all basic blocks repeatedly until no basic block changes the upper
>> 128bits of AVX
>> to used at exit. Then it rescans all basic blocks with unknown upper
>> 128bit state.
>> OK for trunk?
>
> H.J. explained me in a private mail about the importance of this
> patch. I think that the quote below explains it:
>
> <quote>
>> I'm not sure that the algorithm is correct (and I don't have enough
>> experience in this area), so I'd rather leave the review to someone
>> else. AFAICS, there can be 20 passes, and from comments, it is
>> questionable if this is enough.
>
> I tried several benchmarks which failed before my patch. The most pass
> I saw is 2. I can change it to 2 and re-run SPEC CPU 2K/2006 to find
> out what the smallest pass should be.
>
>> I propose that you commit your previous (simple) patch, since IMO this
>
> My simple patch doesn't work on SPEC CPU 2K/2006. It isn't very
> useful for 4.6.
>
>> one is too invasive for this development stage. However, I still think
>
> The old algorithm is obviously incorrect. The new algorithm removes the
> recursive calls and is simpler/faster than the old one. vzeroupper optimization
> is a very important new feature for AVX. The current implementation is
> incorrect. I'd like to fix it before 4.6 is released.
>
>> that LCM infrastructure (see lcm.c) should be used to place
>> vzerouppers at optimum points.
>
> We will investigate LCM for 4.7.
> </qoute>
>
> I think that due to these reasons, the patch should be committed to
> SVN even in this development stage. Even if the algorithm is not
> optimal, the patch demonstrably produces substantially better code.
> This feature has no impact on generic code without -mvzeroupper /
> -mavx switch, and since there are currently very few AVX users,
> negligible overall impact.
>
>> gcc/
>>
>> 2010-12-18 H.J. Lu <hongjiu.lu@intel.com>
>>
>> PR target/46519
>> * config/i386/i386.c (block_info_def): Remove referenced, count
>> and rescanned.
>> (move_or_delete_vzeroupper_2): Updated.
>> (move_or_delete_vzeroupper_1): Rewritten to avoid recursive call.
>> (rescan_move_or_delete_vzeroupper): Removed.
>> (move_or_delete_vzeroupper): Repeat processing all basic blocks
>> until no basic block state is changed to used at exit.
>>
>> gcc/testsuite/
>>
>> 2010-12-18 H.J. Lu <hongjiu.lu@intel.com>
>>
>> PR target/46519
>> * gfortran.dg/pr46519-2.f90: New.
>>
>
> The patch is OK, but please allow a day or two for RMs (CC'd) to
> eventually comment.
We will investigate LCM for 4.7. In the meantime, here is a small patch
on top of the current one. If the upper 128bits are never changed in a basic
block, we can skip it in the later passes. OK for trunk together with the
current patch?
Thanks.
--
H.J.
---
2010-12-29 H.J. Lu <hongjiu.lu@intel.com>
* config/i386/i386.c (upper_128bits_state): Update comments.
(block_info_def): Add unchanged.
(move_or_delete_vzeroupper_2): Short circuit if upper 128bits
are unchanged in the block.
[-- Attachment #2: gcc-pr46519-9.patch --]
[-- Type: text/plain, Size: 3719 bytes --]
2010-12-29 H.J. Lu <hongjiu.lu@intel.com>
* config/i386/i386.c (upper_128bits_state): Update comments.
(block_info_def): Add unchanged.
(move_or_delete_vzeroupper_2): Short circuit if upper 128bits
are unchanged in the block.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 28b26ef..2d06c04 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -60,14 +60,17 @@ along with GCC; see the file COPYING3. If not see
enum upper_128bits_state
{
unknown = 0, /* Unknown. */
- unused, /* Not used or not referenced. */
- used /* Used or referenced. */
+ unused, /* Not used. */
+ used /* Used. */
};
typedef struct block_info_def
{
- /* State of the upper 128bits of any AVX registers at exit. */
+ /* State of the upper 128bits of AVX registers at exit. */
enum upper_128bits_state state;
+ /* TRUE if state of the upper 128bits of AVX registers is unchanged
+ in this block. */
+ bool unchanged;
/* TRUE if block has been processed. */
bool processed;
} *block_info;
@@ -110,8 +113,7 @@ check_avx256_stores (rtx dest, const_rtx set, void *data)
in basic block BB. Delete it if upper 128bit AVX registers are
unused. If it isn't deleted, move it to just before a jump insn.
- UPPER_128BITS_LIVE is TRUE if the upper 128bits of any AVX registers
- are live at entry. */
+ STATE is state of the upper 128bits of AVX registers at entry. */
static void
move_or_delete_vzeroupper_2 (basic_block bb,
@@ -121,11 +123,24 @@ move_or_delete_vzeroupper_2 (basic_block bb,
rtx vzeroupper_insn = NULL_RTX;
rtx pat;
int avx256;
+ bool unchanged;
+
+ if (BLOCK_INFO (bb)->unchanged)
+ {
+ if (dump_file)
+ fprintf (dump_file, " [bb %i] unchanged: upper 128bits: %d\n",
+ bb->index, state);
+
+ BLOCK_INFO (bb)->state = state;
+ return;
+ }
if (dump_file)
fprintf (dump_file, " [bb %i] entry: upper 128bits: %d\n",
bb->index, state);
+ unchanged = true;
+
/* BB_END changes when it is deleted. */
bb_end = BB_END (bb);
insn = BB_HEAD (bb);
@@ -179,6 +194,7 @@ move_or_delete_vzeroupper_2 (basic_block bb,
&& XINT (XVECEXP (pat, 0, 0), 1) == UNSPECV_VZEROALL)
{
state = unused;
+ unchanged = false;
/* Delete pending vzeroupper insertion. */
if (vzeroupper_insn)
@@ -189,9 +205,9 @@ move_or_delete_vzeroupper_2 (basic_block bb,
}
else if (state != used)
{
- /* No need to call note_stores if the upper 128bits of
- AVX registers are never referenced. */
note_stores (pat, check_avx256_stores, &state);
+ if (state == used)
+ unchanged = false;
}
continue;
}
@@ -205,7 +221,10 @@ move_or_delete_vzeroupper_2 (basic_block bb,
256bit AVX register. We only need to check if callee
returns 256bit AVX register. */
if (avx256 == callee_return_avx256)
- state = used;
+ {
+ state = used;
+ unchanged = false;
+ }
/* Remove unnecessary vzeroupper since upper 128bits are
cleared. */
@@ -236,15 +255,20 @@ move_or_delete_vzeroupper_2 (basic_block bb,
delete_insn (insn);
}
else
- vzeroupper_insn = insn;
+ {
+ vzeroupper_insn = insn;
+ unchanged = false;
+ }
}
}
BLOCK_INFO (bb)->state = state;
+ BLOCK_INFO (bb)->unchanged = unchanged;
if (dump_file)
- fprintf (dump_file, " [bb %i] exit: upper 128bits: %d\n",
- bb->index, state);
+ fprintf (dump_file, " [bb %i] exit: %s: upper 128bits: %d\n",
+ bb->index, unchanged ? "unchanged" : "changed",
+ state);
}
/* Helper function for move_or_delete_vzeroupper. Process vzeroupper
next prev parent reply other threads:[~2010-12-29 15:32 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-17 19:45 H.J. Lu
2010-12-18 19:36 ` Uros Bizjak
2010-12-18 20:11 ` H.J. Lu
2010-12-29 11:03 ` Uros Bizjak
2010-12-29 16:23 ` H.J. Lu [this message]
2010-12-30 12:42 ` Uros Bizjak
2011-01-01 1:05 ` Mark Mitchell
2011-01-01 1:38 ` H.J. Lu
2011-01-01 1:39 ` Mark Mitchell
2011-01-01 2:08 ` H.J. Lu
2011-01-01 2:17 ` Mark Mitchell
2011-01-01 16:01 ` H.J. Lu
2011-01-04 1:15 ` Mark Mitchell
2011-01-04 3:59 ` H.J. Lu
2011-01-04 5:54 ` Mark Mitchell
2011-01-04 22:17 ` H.J. Lu
2011-01-04 23:53 ` Mark Mitchell
2011-01-05 0:06 ` H.J. Lu
2011-01-05 0:08 ` Mark Mitchell
2011-01-05 0:09 ` H.J. Lu
2011-01-05 0:24 ` Mark Mitchell
2011-01-05 16:44 ` H.J. Lu
2011-01-05 17:12 ` Jakub Jelinek
2011-01-05 23:01 ` H.J. Lu
2011-01-13 17:19 ` H.J. Lu
2011-01-13 17:25 ` Mark Mitchell
2011-01-13 18:16 ` Richard Henderson
2011-01-13 18:51 ` H.J. Lu
2011-01-14 16:06 ` Richard Henderson
2011-01-14 16:08 ` H.J. Lu
2011-01-16 8:04 ` H.J. Lu
2011-01-24 18:00 ` Richard Henderson
2011-01-24 18:12 ` H.J. Lu
2011-01-13 18:04 ` Richard Henderson
2011-01-13 18:09 ` H.J. Lu
-- strict thread matches above, loose matches on Subject: below --
2010-11-19 21:58 H.J. Lu
2010-11-20 0:24 ` Richard Guenther
2010-11-20 1:48 ` H.J. Lu
2010-11-20 12:11 ` Richard Guenther
2010-11-20 18:20 ` H.J. Lu
2010-11-24 19:48 ` Uros Bizjak
2010-11-24 19:53 ` H.J. Lu
2010-11-24 19:57 ` Uros Bizjak
2010-11-24 21:41 ` H.J. Lu
2010-11-18 7:29 H.J. Lu
2010-11-18 8:34 ` H.J. Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='AANLkTi=0tNb1FX7XweRCUQBh6OUjNW7f6+vyO0YuYk=z@mail.gmail.com' \
--to=hjl.tools@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=jakub@redhat.com \
--cc=mark@codesourcery.com \
--cc=rguenther@suse.de \
--cc=ubizjak@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).