From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10477 invoked by alias); 14 Dec 2016 16:58:54 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 10462 invoked by uid 89); 14 Dec 2016 16:58:53 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-5.0 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mx1.redhat.com Subject: Re: [PATCH] explicit_bzero final To: Zack Weinberg References: <20161212230622.14045-1-zackw@panix.com> <96232743-345c-5126-e526-9a8aadc4fdb7@redhat.com> <2e51eedf-874e-cb81-79a6-0a0c667371db@redhat.com> Cc: Florian Weimer , GNU C Library , Joseph Myers , Adhemerval Zanella , Wilco Dijkstra From: Jeff Law Message-ID: Date: Wed, 14 Dec 2016 16:58:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2016-12/txt/msg00504.txt.bz2 On 12/13/2016 06:04 PM, Zack Weinberg wrote: > On Tue, Dec 13, 2016 at 5:46 PM, Jeff Law wrote: >> On 12/13/2016 08:53 AM, Zack Weinberg wrote: >>> >>> It's probably good to loop in some GCC people, but off the top of my >>> head, I think it should be sufficient to have it treat a call to >>> '__glibc_read_memory' as triggering the same special behavior as a >>> call to 'explicit_bzero'. Of course that would mean >>> __glibc_read_memory can't be used for other cases where we need a >>> "those writes are not dead" optimization fence, so maybe I should >>> rename it __explicit_bzero_fence or something like that. >> >> It's going to be hard for GCC to recognize the memset_chk+read_memory as an >> explict_bzero. It'll have to infer it from the code pattern, which is >> inherently prone to misses. > > Maybe I don't understand what you have in mind, but I don't think you > have to infer anything from the code pattern. Just treat > __explicit_bzero_fence the same as explicit_bzero itself. The > arguments are the same and it won't be used for anything else. > > I have no idea how hard this would be to implement in present-day GCC, > but the way I imagine explicit_bzero support working is: there's an > __attribute__ for variables (let's call it "sensitive") that triggers > a forward taint analysis: every data object transitively data or > control dependent on the "sensitive" variable also becomes > "sensitive". This has no effect on data objects whose lifetime > extends beyond the function in which they were declared, but if they > don't, they are erased when their lifetime ends. The concept of tainting variables and tracking that property seems distinct from making sure that the compiler properly handles explicit_bzero. And just to be clear, I see that property as having value. > > Then, the effect of the explicit_bzero and __explicit_bzero_fence > builtins just is to tag the variable whose address they receive with > the "sensitive" attribute; the explicit_bzero builtin also behaves as > __builtin_memset(s, 0, n) (this is in case the sensitive variable is > allocated on the heap or otherwise survives -- it still needs to be > erased when the programmer said it should be erased). This might need > some sort of _backward_ data flow analysis to figure out which > source-code variable corresponds to the SSA name received by the > builtin, I dunno. Interesting. I hadn't thought about using these builtins to seed the dataflow analysis. > > There's probably some wacky corner cases I haven't thought of, but I > don't see why in-glibc expansion of explicit_bzero to memset+fence > should be a problem. My worry is ensuring that the compiler doesn't ever remove these stores as dead. And the easiest way I see to do that is to recognize the explicit_bzero explicitly. Otherwise I have to walk the use->def chain to pair the fence with the bzero and then tag the bzero. While that ought to work, it just seems easier to recognize the bzero from the start as special. > >> I'd also worry that this style definition may be prone to goofs. The keys >> being that if LTO were to see inside __glibc_read_memory and realize that >> the reads are dead and removes the reads. > > Right now, LTO into glibc will have all kinds of other negative > consequences (Joseph can explain in more detail) and we don't even try > to support it. I've carefully documented in the source code that, if > we ever do try to support LTO, __glibc_read_memory (/ > __explicit_bzero_fence) must remain opaque to the compiler. Understood, WRT LTO today. I'm not worried about today, I'm worried about what happens in the future when the assumptions we're making today about what the compiler can and can not do may no longer hold. Jeff > > zw >