Re: PATCH: PR target/46519: Missing vzeroupper

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Guenther <richard.guenther@gmail.com>
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: Uros Bizjak <ubizjak@gmail.com>, gcc-patches@gcc.gnu.org
Subject: Re: PATCH: PR target/46519: Missing vzeroupper
Date: Sat, 20 Nov 2010 12:11:00 -0000	[thread overview]
Message-ID: <AANLkTi=GBP7tLOpXB01K_HO6uQnby+YFo5TCp8zBnKGy@mail.gmail.com> (raw)
In-Reply-To: <AANLkTi=wuUYc0rkWpJ5hq_+NKz5cCMzyicwQ3omh2X=H@mail.gmail.com>

On Sat, Nov 20, 2010 at 12:31 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Nov 19, 2010 at 2:48 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Fri, Nov 19, 2010 at 10:30 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Thu, Nov 18, 2010 at 1:11 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>> On Thu, Nov 18, 2010 at 12:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>
>>>>> Here is the patch for
>>>>>
>>>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46519
>>>>>
>>>>> We have 2 blocks pointing to each others. This patch first scans
>>>>> all blocks without moving vzeroupper so that we can have accurate
>>>>> information about upper 128bits at block entry.
>>>>
>>>> This introduces another insn scanning pass, almost the same as
>>>> existing vzeroupper pass (modulo CALL_INSN/JUMP_INSN handling).
>>>>
>>>> So, if I understand correctly:
>>>> - The patch removes the detection if the function ever touches AVX registers.
>>>> - Due to this, all call_insn RTXes have to be decorated with
>>>> CALL_NEEDS_VZEROUPPER.
>>>> - A new pre-pass is required that scans all functions in order to
>>>> detect functions with live AVX registers at exit, and at the same time
>>>> marks the functions that *do not* use AVX registers.
>>>> - Existing pass then re-scans everything to again detect functions
>>>> with live AVX registers at exit and handles vzeroupper emission.
>>>>
>>>> I don't think this approach is acceptable. Maybe a LCM infrastructure
>>>> can be used to handle this case?
>>>>
>>>
>>> Here is the rewrite of the vzeroupper optimization pass.
>>> To avoid circular dependency, it has 2 passes.  It
>>> delays the circular dependency to the second pass
>>> and avoid rescan as much as possible.
>>>
>>> I compared the bootstrap times with/wthout this patch
>>> on 64bit Sandy Bridge with multilib and --with-fpmath=avx.
>>> I enabled c,c++,fortran,java,lto,objc
>>>
>>> Without patch:
>>>
>>> 12378.70user 573.02system 41:54.21elapsed 515%CPU
>>>
>>> With patch
>>>
>>> 12580.56user 578.07system 42:25.41elapsed 516%CPU
>>>
>>> The overhead is about 1.6%.
>>
>> That's a quite big overhead for something that doesn't use FP
>> math (and thus no AVX).
>
> AVX256 vector insns are independent of FP math.  They can be
> generated by vectorizer as well as loop unroll.  We can limit
> it to -O2 or -O3 if overhead is a big concern.

Limiting it to -fexpensive-optimizations would be a good start.  Btw,
how is code-size affected?  Does it make sense to disable it when
optimizing a function for size?  As it affects performance of callees
whether the caller is optimized for size or speed probably isn't the
best thing to check.

Richard.

> H.J.
> ---
>> Richard.
>>
>>>
>>> --
>>> H.J.
>>> ---
>>> gcc/
>>>
>>> 2010-11-19  H.J. Lu  <hongjiu.lu@intel.com>
>>>
>>>        PR target/46519
>>>        * config/i386/i386.c (upper_128bits_state): New.
>>>        (block_info_def): Remove upper_128bits_set and done.  Add state,
>>>        referenced, count, processed and rescanned.
>>>        (check_avx256_stores): Updated.
>>>        (move_or_delete_vzeroupper_2): Updated. Handle deleted BB_END.
>>>        Call note_stores only if needed.  Set referenced and count.
>>>        (move_or_delete_vzeroupper_1): Updated.  Set rescan_vzeroupper_p.
>>>        (rescan_move_or_delete_vzeroupper): New.
>>>        (move_or_delete_vzeroupper):  Process and rescan all all basic
>>>        blocks instead of predecessor blocks of all exit points.
>>>        (use_avx256_p): Removed.
>>>        (init_cumulative_args): Don't set use_avx256_p.
>>>        (ix86_function_arg): Likewise.
>>>        (ix86_expand_move): Likewise.
>>>        (ix86_expand_vector_move_misalign): Likewise.
>>>        (ix86_local_alignment): Likewise.
>>>        (ix86_minimum_alignment): Likewise.
>>>        (ix86_expand_epilogue): Don't check use_avx256_p when generating
>>>        vzeroupper.
>>>        (ix86_expand_call): Likewise.
>>>
>>>        * config/i386/i386.h (machine_function): Remove use_vzeroupper_p
>>>        and use_avx256_p.  Add rescan_vzeroupper_p.
>>>
>>> gcc/testsuite/
>>>
>>> 2010-11-17  H.J. Lu  <hongjiu.lu@intel.com>
>>>
>>>        PR target/46519
>>>        * gcc.target/i386/avx-vzeroupper-10.c: Expect no avx_vzeroupper.
>>>        * gcc.target/i386/avx-vzeroupper-11.c: Likewise.
>>>
>>>        * gcc.target/i386/avx-vzeroupper-20.c: New.
>>>        * gcc.target/i386/avx-vzeroupper-21.c: Likewise.
>>>        * gcc.target/i386/avx-vzeroupper-22.c: Likewise.
>>>        * gcc.target/i386/avx-vzeroupper-23.c: Likewise.
>>>        * gcc.target/i386/avx-vzeroupper-24.c: Likewise.
>>>
>>
>
>
>
> --
> H.J.
>

next prev parent reply	other threads:[~2010-11-20 10:53 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-19 21:58 H.J. Lu
2010-11-20  0:24 ` Richard Guenther
2010-11-20  1:48   ` H.J. Lu
2010-11-20 12:11     ` Richard Guenther [this message]
2010-11-20 18:20       ` H.J. Lu
2010-11-24 19:48         ` Uros Bizjak
2010-11-24 19:53           ` H.J. Lu
2010-11-24 19:57             ` Uros Bizjak
2010-11-24 21:41               ` H.J. Lu
  -- strict thread matches above, loose matches on Subject: below --
2010-12-17 19:45 H.J. Lu
2010-12-18 19:36 ` Uros Bizjak
2010-12-18 20:11   ` H.J. Lu
2010-12-29 11:03     ` Uros Bizjak
2010-12-29 16:23       ` H.J. Lu
2010-12-30 12:42         ` Uros Bizjak
2011-01-01  1:05           ` Mark Mitchell
2011-01-01  1:38             ` H.J. Lu
2011-01-01  1:39               ` Mark Mitchell
2011-01-01  2:08                 ` H.J. Lu
2011-01-01  2:17                   ` Mark Mitchell
2011-01-01 16:01                     ` H.J. Lu
2011-01-04  1:15                       ` Mark Mitchell
2011-01-04  3:59                         ` H.J. Lu
2011-01-04  5:54                           ` Mark Mitchell
2011-01-04 22:17                             ` H.J. Lu
2011-01-04 23:53                               ` Mark Mitchell
2011-01-05  0:06                                 ` H.J. Lu
2011-01-05  0:08                                   ` Mark Mitchell
2011-01-05  0:09                                     ` H.J. Lu
2011-01-05  0:24                                       ` Mark Mitchell
2011-01-05 16:44                                         ` H.J. Lu
2011-01-05 17:12                                           ` Jakub Jelinek
2011-01-05 23:01                                             ` H.J. Lu
2011-01-13 17:19                                             ` H.J. Lu
2011-01-13 17:25                                               ` Mark Mitchell
2011-01-13 18:16                                               ` Richard Henderson
2011-01-13 18:51                                                 ` H.J. Lu
2011-01-14 16:06                                                   ` Richard Henderson
2011-01-14 16:08                                                     ` H.J. Lu
2011-01-16  8:04                                                       ` H.J. Lu
2011-01-24 18:00                                                         ` Richard Henderson
2011-01-24 18:12                                                           ` H.J. Lu
2011-01-13 18:04                                           ` Richard Henderson
2011-01-13 18:09                                             ` H.J. Lu
2010-11-18  7:29 H.J. Lu
2010-11-18  8:34 ` H.J. Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='AANLkTi=GBP7tLOpXB01K_HO6uQnby+YFo5TCp8zBnKGy@mail.gmail.com' \
    --to=richard.guenther@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hjl.tools@gmail.com \
    --cc=ubizjak@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).