From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 86971 invoked by alias); 5 Feb 2016 18:24:43 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 86952 invoked by uid 89); 5 Feb 2016 18:24:42 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.0 required=5.0 tests=AWL,BAYES_50,KAM_ASCII_DIVIDERS,SPF_PASS autolearn=no version=3.3.2 spammy=swapped, dispute, relates, facilitating X-HELO: contest.openwall.net From: Alexander Cherepanov Subject: Re: Undefined behavior in glibc To: Joseph Myers References: <27c31890079f41775175b94a4abedb0c.squirrel@server316.webhostingpad.com> <20160128225845.GE14840@vapier.lan> <20160128234356.GH14840@vapier.lan> <56AAB6CE.8060101@openwall.com> <20160129005816.GK14840@vapier.lan> <56AC16C8.4030202@openwall.com> <56B1F294.5020105@openwall.com> Cc: Dwight Guth , libc-alpha@sourceware.org X-Enigmail-Draft-Status: N1110 Message-ID: <56B4E8E3.5010308@openwall.com> Date: Fri, 05 Feb 2016 18:24:00 -0000 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2016-02/txt/msg00074.txt.bz2 On 2016-02-03 16:07, Joseph Myers wrote: >>> If something involves undefined behavior in GNU C, >> >> GNU C is ISO C (whatever it means:-) + GCC implementation-defined behavior as >> described in [1] + GCC extensions[2], right? > > Plus undocumented features The moment you say "undocumented" everything falls apart. One cannot be sure whether some observed behavior is an undocumented feature or something that will be broken by the next gcc release. > and a good understanding of aspects of how C > relates to the underlying machines (that is, you cannot assume e.g. that a > C addition corresponds to a machine add instruction, but you can assume > that memory is made of pages and that if it's not visible to the compiler > how memory was allocated, having particular access to any byte in a page > means having such access to all bytes in that page, subject to avoiding > data races on write). 1. This rules out all modern hardening/debugging solutions like Valgrind, ASan, Intel MPX, CHERI (don't know if the latter is relevant for glibc). 2. You seem to be concerned with crashes only but there are other problems too, e.g., info leaks (think Heartbleed). Saying that the code is free to access any accessible memory means that one cannot catch info leaks automatically. > Cf > > (but the non-ISO aspects of GNU C as used in low-level system software > such as glibc and the Linux kernel should be assumed to be much more > wide-ranging than the issues discussed there). Yes, I have seen it. I have even linked to your reply to this survey in my last email. >> 1) strlen and other string functions read memory by chunks which is aliasing >> violation; >> 2) strlen accesses the OOB memory. > > Both of those fall under deliberate use of separate compilation, i.e. it's > not visible to the compiler what the original effective type of the memory > was or its original size, so a whole page can be accessed provided the > types used don't involve aliasing violations within the source visible to > the compiler. I thought that this falls under "there is no plausible optimization [due to the info not available to a compiler]". Well, ok. Then let's look at strlen (&co.) from another angle. It starts accessing its parameter as chars and then continues accessing it as longs, like this: ... char *p = ...; *p++; *(long *)p; Is there a situation where this is valid in GNU C? For ISO C the question seems quite moot but I think Question 16 in DR 017 plus DR 260 say that it's not valid. Maybe there are no restrictions in GNU C on intra-object pointer arithmetic / accesses at all (apart from aliasing rules)? >> I don't see cleaning glibc for LTO in todo list in the wiki. Shouldn't it be >> there? > > The todo list basically has ideas that one person once thought might be > useful. In general, you can't assume that things there have consensus, > and nor do you need consensus to put something there. I think > facilitating LTO of the standard C library is a very hard whole-system > issue, not something that could be considered for glibc in isolation; > you'd probably need several different forms of source annotation that act > as barriers to particular forms of LTO, saying that calls to a function > can only be optimized with reference to its semantics and not with > reference to the contents of a particular implementation of it. Isn't noinline+noclone enough for this right now? >> Well, before I go and file the bugs, perhaps it's worth skimming through the >> list. It's mainly invalid pointer arithmetic. GNU C seems not to deviate from >> ISO C in this topic. Should the issues be grouped by type, by file, by subdir >> or in some other way? > > I think such cases should generally be considered bugs except for limited > cases such as string functions that deliberately work in terms of pages or > aligned units smaller than pages, and so may go slightly outside the > original object (before or after) but obviously cannot wrap around the > address space to affect the results of comparisons. Ok > The natural division is based on whether it might make sense to fix two > issues separately - so any issues in parts of glibc for which different > people might have expertise, or that differ in how clear it is that there > is a problem or how clear it is what the correct fix would be, should be > filed separately. Of course, it's even better if fixes (following the > contribution checklist) for such issues are submitted as well. Ok >> Capping pointers at the end of the address space >> ------------------------------------------------ >> >> Some functions in C11 have a parameter which limits a number of characters >> written. IMHO it's clear that C11 doesn't intend this parameter to be a real >> size of output buffer and, thus, it could validly be SIZE_MAX. But we have >> seen differing opinions in the recent thread about strlcpy. Perhaps you can >> clear the question. > > Given the dispute in the Austin Group over snprintf, a C11 DR is clearly > needed. Ok >> Structs at NULL >> --------------- >> >> Plausible optimization: this is unconditional UB, hence mark the code as >> unreachable and remove everything around. > > I think such offsetof-type code should be considered valid GNU C, although Ok > it would still be better to use offsetof. At least some are inside > "#ifndef offsetof" (some of these examples are code shared with gnulib or > gettext) and so obviously not bugs in glibc (given that is > included). Yeah, and I managed to include some that are used to get a size of a field instead of an offset and so are fine even in ISO C. Have to filter them out in the future. >> Pointers used after free (without dereferencing) >> ------------------------------------------------ >> >> https://sourceware.org/git/?p=glibc.git;a=blob;f=libio/fileops.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l1063 >> (IIUC conditions "fp->_IO_read_base != NULL" and "!_IO_in_backup (fp)" should >> be swapped.) FTR: this one seems to be a false. fp->_IO_read_base could be accessed on line 1063 after a free on line 1020 but it's set inside _IO_setg on line 1025. >> https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/tst-obstack.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l25 >> https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/tst-malloc-backtrace.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l30 >> https://sourceware.org/git/?p=glibc.git;a=blob;f=io/pwd.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l41 > > I'd say such comparisons should be avoided, but that's more a cleanup than > a bug fix. BTW is there a formal method to tell test files from important files? I see .../tst-* and .../test-* files but io/pwd.c looks like a test too. -- Alexander Cherepanov