From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-410597-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 26976 invoked by alias); 20 Oct 2015 05:43:29 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 26873 invoked by uid 89); 20 Oct 2015 05:43:29 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Tue, 20 Oct 2015 05:43:26 +0000
Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26])	by mx1.redhat.com (Postfix) with ESMTPS id B8519C000082;	Tue, 20 Oct 2015 05:43:24 +0000 (UTC)
Received: from localhost.localdomain (ovpn-113-75.phx2.redhat.com [10.3.113.75])	by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t9K5hOWD029060;	Tue, 20 Oct 2015 01:43:24 -0400
Subject: Re: using scratchpads to enhance RTL-level if-conversion: revised patch
To: Bernd Schmidt <bschmidt@redhat.com>, Abe <abe_skolnik@yahoo.com>
References: <5615AADE.4030306@yahoo.com> <56166E68.2040004@redhat.com> <561D5CC4.8030502@yahoo.com> <561D66AB.9090003@redhat.com> <561E9458.5090701@redhat.com> <561EA9D4.2070101@redhat.com>
Cc: Sebastian Pop <sebpop@gmail.com>, Kyrill Tkachov <kyrylo.tkachov@arm.com>,        "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
From: Jeff Law <law@redhat.com>
Message-ID: <5625D47B.7040004@redhat.com>
Date: Tue, 20 Oct 2015 05:52:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <561EA9D4.2070101@redhat.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
X-SW-Source: 2015-10/txt/msg01818.txt.bz2

On 10/14/2015 01:15 PM, Bernd Schmidt wrote:
> On 10/14/2015 07:43 PM, Jeff Law wrote:
>> Obviously some pessimization relative to current code is necessary to
>> fix some of the problems WRT thread safety and avoiding things like
>> introducing faults in code which did not previously fault.
>
> Huh? This patch is purely an (attempt at) optimization, not something
> that fixes any problems.
Then I must be mentally merging two things Abe has been working on then. 
  He's certainly had an if-converter patch that was designed to avoid 
introducing races in code that didn't previously have races.

Looking back through the archives that appears to be the case. His 
patches to avoid racing are for the tree level if converter, not the RTL 
if converter.

Sigh, sorry for the confusion.  It's totally my fault.  Assuming Abe 
doesn't have a correctness case at all here, then I don't see any way 
for the code to go forward as-is since it's likely making things 
significantly worse.

>
> I can't test valgrind right now, it fails to run on my machine, but I
> guess it could adapt to allow stores slightly below the stack (maybe
> warning once)? It seems like a bit of an edge case to worry about, but
> if supporting it is critical and it can't be changed to adapt to new
> optimizations, then I think we're probably better off entirely without
> this scratchpad transformation.
>
> Alternatively I can think of a few other possible approaches which
> wouldn't require this kind of bloat:
>   * add support for allocating space in the stack redzone. That could be
>     interesting for the register allocator as well. Would help only
>     x86_64, but that's a large fraction of gcc's userbase.
>   * add support for opportunistically finding unused alignment padding
>     in the existing stack frame. Less likely to work but would produce
>     better results when it does.
>   * on embedded targets we probably don't have to worry about valgrind,
>     so do the optimal (sp - x) thing there
>   * allocate a single global as the dummy target. Might be more
>     expensive to load the address on some targets though.
>   * at least find a way to express costs for this transformation.
>     Difficult since you don't yet necessarily know if the function is
>     going to have a stack frame. Hence, IMO this approach is flawed.
>     (You'll still want cost estimates even when not allocating stuff in
>     the normal stack frame, because generated code will still execute
>     between two and four extra instructions).
One could argue these should all be on the table.  However, I tend to 
really dislike using area beyond the current stack.  I realize it's 
throw-away data, but it just seems like a bad idea to me -- even on 
embedded targets that don't support valgrind.