Re: using scratchpads to enhance RTL-level if-conversion: revised patch

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Bernd Schmidt <bschmidt@redhat.com>
To: Jeff Law <law@redhat.com>, Abe <abe_skolnik@yahoo.com>
Cc: Sebastian Pop <sebpop@gmail.com>,
	Kyrill Tkachov <kyrylo.tkachov@arm.com>,
	       "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: using scratchpads to enhance RTL-level if-conversion: revised patch
Date: Wed, 14 Oct 2015 19:15:00 -0000	[thread overview]
Message-ID: <561EA9D4.2070101@redhat.com> (raw)
In-Reply-To: <561E9458.5090701@redhat.com>

On 10/14/2015 07:43 PM, Jeff Law wrote:
> Obviously some pessimization relative to current code is necessary to
> fix some of the problems WRT thread safety and avoiding things like
> introducing faults in code which did not previously fault.

Huh? This patch is purely an (attempt at) optimization, not something 
that fixes any problems.

> However, pessimization of safe code is, err, um, bad and needs to be
> avoided.

Here's an example:

                                   >         subq    $16, %rsp
[...]
                                   >         leaq    8(%rsp), %r8
                                   >         leaq    256(%rax), %rdx
     cmpq    256(%rax), %rcx       |         cmpq    256(%rax), %rsi
     jne    .L97                   <
     movq    $0, 256(%rax)         <
.L97:                             <
                                   >         movq    %rdx, %rax
                                   >         cmovne  %r8, %rax
                                   >         movq    $0, (%rax)
[...]
                                   >         addq    $16, %rsp

In the worst case that executes six more instructions, and always causes 
unnecessary stack frame bloat. This on x86 where AFAIK it's doubtful 
whether cmov is a win at all anyway. I think this shows the approach is 
just bad, even ignoring problems like that it could allocate multiple 
scratchpads when one would suffice, or allocate one and end up not using 
it because the transformation fails.

I can't test valgrind right now, it fails to run on my machine, but I 
guess it could adapt to allow stores slightly below the stack (maybe 
warning once)? It seems like a bit of an edge case to worry about, but 
if supporting it is critical and it can't be changed to adapt to new 
optimizations, then I think we're probably better off entirely without 
this scratchpad transformation.

Alternatively I can think of a few other possible approaches which 
wouldn't require this kind of bloat:
  * add support for allocating space in the stack redzone. That could be
    interesting for the register allocator as well. Would help only
    x86_64, but that's a large fraction of gcc's userbase.
  * add support for opportunistically finding unused alignment padding
    in the existing stack frame. Less likely to work but would produce
    better results when it does.
  * on embedded targets we probably don't have to worry about valgrind,
    so do the optimal (sp - x) thing there
  * allocate a single global as the dummy target. Might be more
    expensive to load the address on some targets though.
  * at least find a way to express costs for this transformation.
    Difficult since you don't yet necessarily know if the function is
    going to have a stack frame. Hence, IMO this approach is flawed.
    (You'll still want cost estimates even when not allocating stuff in
    the normal stack frame, because generated code will still execute
    between two and four extra instructions).


Bernd

next prev parent reply	other threads:[~2015-10-14 19:15 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-07 23:29 Abe
2015-10-08 13:09 ` Sebastian Pop
2015-10-08 13:20   ` Sebastian Pop
2015-10-08 13:26     ` Bernd Schmidt
2015-10-08 13:23 ` Bernd Schmidt
2015-10-13 19:34   ` Abe
2015-10-13 20:16     ` Bernd Schmidt
2015-10-14 17:43       ` Jeff Law
2015-10-14 19:15         ` Bernd Schmidt [this message]
2015-10-15  8:52           ` Richard Biener
2015-10-20  5:52           ` Jeff Law
2015-10-20  9:37             ` Richard Biener
2015-10-14  1:05   ` Richard Henderson
2015-10-14  1:11     ` Richard Henderson
2015-10-14  8:29     ` Eric Botcazou
2015-10-14 17:46       ` Jeff Law
2015-10-13 20:05 Abe
     [not found] <024301d11106$2379b5f0$6a6d21d0$@samsung.com>
2015-10-27 23:02 ` Abe
2015-10-30 14:09   ` Bernd Schmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=561EA9D4.2070101@redhat.com \
    --to=bschmidt@redhat.com \
    --cc=abe_skolnik@yahoo.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=kyrylo.tkachov@arm.com \
    --cc=law@redhat.com \
    --cc=sebpop@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).