From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-75876-listarch-libc-alpha=sources.redhat.com@sourceware.org>
Received: (qmail 10477 invoked by alias); 14 Dec 2016 16:58:54 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 10462 invoked by uid 89); 14 Dec 2016 16:58:53 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-5.0 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=
X-HELO: mx1.redhat.com
Subject: Re: [PATCH] explicit_bzero final
To: Zack Weinberg <zackw@panix.com>
References: <20161212230622.14045-1-zackw@panix.com>
 <96232743-345c-5126-e526-9a8aadc4fdb7@redhat.com>
 <CAKCAbMgRrFKsisTu1wMYtLnc44Zwt5-GPFpt8GPWu2oXv6JN=Q@mail.gmail.com>
 <2e51eedf-874e-cb81-79a6-0a0c667371db@redhat.com>
 <CAKCAbMgxfviybrcRtSO-=eEc0YQc-TfwP+fvtKqQsodBTr_isg@mail.gmail.com>
Cc: Florian Weimer <fweimer@redhat.com>,
        GNU C Library <libc-alpha@sourceware.org>,
        Joseph Myers <joseph@codesourcery.com>,
        Adhemerval Zanella <adhemerval.zanella@linaro.org>,
        Wilco Dijkstra <Wilco.Dijkstra@arm.com>
From: Jeff Law <law@redhat.com>
Message-ID: <ad37af4c-f3a5-00ec-b7aa-cee930a0f08b@redhat.com>
Date: Wed, 14 Dec 2016 16:58:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.5.1
MIME-Version: 1.0
In-Reply-To: <CAKCAbMgxfviybrcRtSO-=eEc0YQc-TfwP+fvtKqQsodBTr_isg@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-SW-Source: 2016-12/txt/msg00504.txt.bz2

On 12/13/2016 06:04 PM, Zack Weinberg wrote:
> On Tue, Dec 13, 2016 at 5:46 PM, Jeff Law <law@redhat.com> wrote:
>> On 12/13/2016 08:53 AM, Zack Weinberg wrote:
>>>
>>> It's probably good to loop in some GCC people, but off the top of my
>>> head, I think it should be sufficient to have it treat a call to
>>> '__glibc_read_memory' as triggering the same special behavior as a
>>> call to 'explicit_bzero'.  Of course that would mean
>>> __glibc_read_memory can't be used for other cases where we need a
>>> "those writes are not dead" optimization fence, so maybe I should
>>> rename it __explicit_bzero_fence or something like that.
>>
>> It's going to be hard for GCC to recognize the memset_chk+read_memory as an
>> explict_bzero.  It'll have to infer it from the code pattern, which is
>> inherently prone to misses.
>
> Maybe I don't understand what you have in mind, but I don't think you
> have to infer anything from the code pattern.  Just treat
> __explicit_bzero_fence the same as explicit_bzero itself.  The
> arguments are the same and it won't be used for anything else.
>
> I have no idea how hard this would be to implement in present-day GCC,
> but the way I imagine explicit_bzero support working is: there's an
> __attribute__ for variables (let's call it "sensitive") that triggers
> a forward taint analysis: every data object transitively data or
> control dependent on the "sensitive" variable also becomes
> "sensitive".  This has no effect on data objects whose lifetime
> extends beyond the function in which they were declared, but if they
> don't, they are erased when their lifetime ends.
The concept of tainting variables and tracking that property seems 
distinct from making sure that the compiler properly handles 
explicit_bzero.  And just to be clear, I see that property as having value.


>
> Then, the effect of the explicit_bzero and __explicit_bzero_fence
> builtins just is to tag the variable whose address they receive with
> the "sensitive" attribute; the explicit_bzero builtin also behaves as
> __builtin_memset(s, 0, n) (this is in case the sensitive variable is
> allocated on the heap or otherwise survives -- it still needs to be
> erased when the programmer said it should be erased).  This might need
> some sort of _backward_ data flow analysis to figure out which
> source-code variable corresponds to the SSA name received by the
> builtin, I dunno.
Interesting.  I hadn't thought about using these builtins to seed the 
dataflow analysis.


>
> There's probably some wacky corner cases I haven't thought of, but I
> don't see why in-glibc expansion of explicit_bzero to memset+fence
> should be a problem.
My worry is ensuring that the compiler doesn't ever remove these stores 
as dead.  And the easiest way I see to do that is to recognize the 
explicit_bzero explicitly.  Otherwise I have to walk the use->def chain 
to pair the fence with the bzero and then tag the bzero.  While that 
ought to work, it just seems easier to recognize the bzero from the 
start as special.


>
>> I'd also worry that this style definition may be prone to goofs.  The keys
>> being that if LTO were to see inside __glibc_read_memory and realize that
>> the reads are dead and removes the reads.
>
> Right now, LTO into glibc will have all kinds of other negative
> consequences (Joseph can explain in more detail) and we don't even try
> to support it.  I've carefully documented in the source code that, if
> we ever do try to support LTO, __glibc_read_memory (/
> __explicit_bzero_fence) must remain opaque to the compiler.
Understood, WRT LTO today.  I'm not worried about today, I'm worried 
about what happens in the future when the assumptions we're making today 
about what the compiler can and can not do may no longer hold.
Jeff
>
> zw
>