public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
@ 2016-01-28 20:20 Dwight Guth
  2016-01-28 21:19 ` Joseph Myers
  2016-02-01  9:44 ` [PATCH v2][BZ #17979][BZ #17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler Dwight Guth
  0 siblings, 2 replies; 34+ messages in thread
From: Dwight Guth @ 2016-01-28 20:20 UTC (permalink / raw)
  To: libc-alpha

If linking against Glibc with a compiler for which the __GNUC__ macro is not
defined, problems arise when including header files that use the __restrict
or __inline keyword and when including uchar.h.

Glibc strips __restrict from the prototypes of C library functions in this
case. This is incorrect if the compiler is a C99-compliant compiler, because
C99 includes the restrict keyword and uses it in the declaration of a number
of functions in the C library. This leads to undefined behavior because
the definitions of those functions were defined with the restrict keyword,
which makes their type signatures incompatible with their declaration,
a violation of C99 sec. 6.2.7 paragraph 2. The same thing occurs with the
__inline keyword, which, while not undefined behavior per-se, seems
undesirable
in cases where the compiler is C99-compliant and therefore includes the
inline keyword. Here we except the case where the compiler declares itself
to be C99-compliant from these checks in order to allow better C99 compliance
for compilers other than gcc which link against Glibc.

Glibc defines char16_t and char32_t in uchar.h as __CHAR16_TYPE__ and
__CHAR32_TYPE__ when the __GNUC__ macro is defined, but when linking against
Glibc with a different compiler, these types are not defined at all,
which is a violation of C11 sec. 7.28 paragraph 2, as well as a syntax
error because these types are used in the prototypes of functions declared
later in the file. According to this section of the standard, these types
must be defined in this header file and must be the same type as
uint_least16_t and uint_least32_t, which are defined in stdint.h as
"unsigned short int" and "unsigned int" respectively. Here we modify the
header so that if __GNUC__ is not defined, we still provide these typedefs,
but we default them manually to the same type as uint_least16_t and
uint_least32_t if __CHAR16_TYPE__ and __CHAR32_TYPE__ are not defined by
the compiler.

-----

I had trouble testing this patch because I ran into unrelated errors in the
test suite. If someone could help me figure out how to set up a test
environment that is likely to pass all the tests, I can try again, but I
don't
have the resources to struggle with all the errors that arise without knowing
their solutions. The patch should not affect any version of GCC, however.

-----

2016-01-28  Dwight Guth  <dwight.guth@runtimeverification.com>

	[BZ 17979]
	* wcsmbs/uchar.h (char16_t, char32_t): Define types if __GNUC__,
	__CHAR16_TYPE__, or __CHAR32_TYPE__ are not defined.

	[BZ 17721]
	* misc/sys/cdefs.h (__restrict, __inline): Define as keywords if
	__GNUC__ is not defined but __STDC_VERSION__ is at least C99.

-----

diff --git a/misc/sys/cdefs.h b/misc/sys/cdefs.h
index 7fd4154..af23ff7 100644
--- a/misc/sys/cdefs.h
+++ b/misc/sys/cdefs.h
@@ -69,8 +69,11 @@

 #else  /* Not GCC.  */

-# define __inline              /* No inline functions.  */
-
+# if defined __STDC_VERSION__ && __STDC_VERSION__ >= 199901L
+#  define __inline             inline
+# else
+#  define __inline             /* No inline functions.  */
+# endif
 # define __THROW
 # define __THROWNL
 # define __NTH(fct)    fct
@@ -360,7 +363,11 @@

 /* __restrict is known in EGCS 1.2 and above. */
 #if !__GNUC_PREREQ (2,92)
-# define __restrict    /* Ignore */
+# if !defined __GNUC__ && defined __STDC_VERSION__ && __STDC_VERSION__ >=
199901L
+#  define __restrict   restrict
+# else
+#  define __restrict   /* Ignore */
+# endif
 #endif

 /* ISO C99 also allows to declare arrays as non-overlapping.  The syntax is
diff --git a/wcsmbs/uchar.h b/wcsmbs/uchar.h
index ce92b25..1484e56 100644
--- a/wcsmbs/uchar.h
+++ b/wcsmbs/uchar.h
@@ -39,14 +39,16 @@ __END_NAMESPACE_C99
 #endif


-#if defined __GNUC__ && !defined __USE_ISOCXX11
+#if !defined __USE_ISOCXX11
 /* Define the 16-bit and 32-bit character types.  Use the information
    provided by the compiler.  */
 # if !defined __CHAR16_TYPE__ || !defined __CHAR32_TYPE__
 #  if defined __STDC_VERSION__ && __STDC_VERSION__ < 201000L
 #   error "<uchar.h> requires ISO C11 mode"
 #  else
-#   error "definitions of __CHAR16_TYPE__ and/or __CHAR32_TYPE__ missing"
+/* Same as uint_least16_t and uint_least32_t in stdint.h. */
+typedef unsigned short int __CHAR16_TYPE__;
+typedef unsigned int       __CHAR32_TYPE__;
 #  endif
 # endif
 typedef __CHAR16_TYPE__ char16_t;


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-28 20:20 [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler Dwight Guth
@ 2016-01-28 21:19 ` Joseph Myers
  2016-01-28 22:20   ` Dwight Guth
  2016-02-01  9:44 ` [PATCH v2][BZ #17979][BZ #17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler Dwight Guth
  1 sibling, 1 reply; 34+ messages in thread
From: Joseph Myers @ 2016-01-28 21:19 UTC (permalink / raw)
  To: Dwight Guth; +Cc: libc-alpha

On Thu, 28 Jan 2016, Dwight Guth wrote:

> Glibc strips __restrict from the prototypes of C library functions in this
> case. This is incorrect if the compiler is a C99-compliant compiler, because
> C99 includes the restrict keyword and uses it in the declaration of a number
> of functions in the C library. This leads to undefined behavior because
> the definitions of those functions were defined with the restrict keyword,
> which makes their type signatures incompatible with their declaration,
> a violation of C99 sec. 6.2.7 paragraph 2. The same thing occurs with the

This affects hardly any functions, if any at all, because qualifiers on 
function argument types are ignored for the purposes of compatibility 
(6.7.5.3#15).  For a function to be affected it would need a restrict 
qualifier that isn't directly on the argument type itself (type *restrict 
*param, for example).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-28 21:19 ` Joseph Myers
@ 2016-01-28 22:20   ` Dwight Guth
  2016-01-28 22:48     ` Joseph Myers
  0 siblings, 1 reply; 34+ messages in thread
From: Dwight Guth @ 2016-01-28 22:20 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Dwight Guth, libc-alpha

Thank you for drawing that to my attention. You are correct that this
means that my original issue that caused me to create this part of the
patch is in fact well defined. However, this still seems like an issue
of correctness to me. I can't seem to find any explicit paragraph in
the standard saying that, e.g. fprintf must be declared with the type:

int fprintf(FILE * restrict stream, const char * restrict format, ...);

but I assume that that is implied, otherwise we could give any of the
functions in the library any signature as long as they broadly
followed the requirements of their corresponding subclause, which
seems wrong to me.

On Thu, Jan 28, 2016 at 3:19 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> On Thu, 28 Jan 2016, Dwight Guth wrote:
>
>> Glibc strips __restrict from the prototypes of C library functions in this
>> case. This is incorrect if the compiler is a C99-compliant compiler, because
>> C99 includes the restrict keyword and uses it in the declaration of a number
>> of functions in the C library. This leads to undefined behavior because
>> the definitions of those functions were defined with the restrict keyword,
>> which makes their type signatures incompatible with their declaration,
>> a violation of C99 sec. 6.2.7 paragraph 2. The same thing occurs with the
>
> This affects hardly any functions, if any at all, because qualifiers on
> function argument types are ignored for the purposes of compatibility
> (6.7.5.3#15).  For a function to be affected it would need a restrict
> qualifier that isn't directly on the argument type itself (type *restrict
> *param, for example).
>
> --
> Joseph S. Myers
> joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-28 22:20   ` Dwight Guth
@ 2016-01-28 22:48     ` Joseph Myers
  2016-01-28 22:50       ` Dwight Guth
  0 siblings, 1 reply; 34+ messages in thread
From: Joseph Myers @ 2016-01-28 22:48 UTC (permalink / raw)
  To: Dwight Guth; +Cc: libc-alpha

On Thu, 28 Jan 2016, Dwight Guth wrote:

> Thank you for drawing that to my attention. You are correct that this
> means that my original issue that caused me to create this part of the
> patch is in fact well defined. However, this still seems like an issue
> of correctness to me. I can't seem to find any explicit paragraph in
> the standard saying that, e.g. fprintf must be declared with the type:
> 
> int fprintf(FILE * restrict stream, const char * restrict format, ...);
> 
> but I assume that that is implied, otherwise we could give any of the
> functions in the library any signature as long as they broadly
> followed the requirements of their corresponding subclause, which
> seems wrong to me.

It's impossible for a valid program to distinguish what qualifiers were 
used for parameters in a function declaration in a header, thus it's not 
even a meaningful question.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-28 22:48     ` Joseph Myers
@ 2016-01-28 22:50       ` Dwight Guth
  2016-01-28 22:52         ` Joseph Myers
  0 siblings, 1 reply; 34+ messages in thread
From: Dwight Guth @ 2016-01-28 22:50 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Dwight Guth, libc-alpha

Okay but if so, then why put the __restrict in the header files at all
if it doesn't really matter? And why put it there only if the compiler
is gcc?

On Thu, Jan 28, 2016 at 4:48 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> On Thu, 28 Jan 2016, Dwight Guth wrote:
>
>> Thank you for drawing that to my attention. You are correct that this
>> means that my original issue that caused me to create this part of the
>> patch is in fact well defined. However, this still seems like an issue
>> of correctness to me. I can't seem to find any explicit paragraph in
>> the standard saying that, e.g. fprintf must be declared with the type:
>>
>> int fprintf(FILE * restrict stream, const char * restrict format, ...);
>>
>> but I assume that that is implied, otherwise we could give any of the
>> functions in the library any signature as long as they broadly
>> followed the requirements of their corresponding subclause, which
>> seems wrong to me.
>
> It's impossible for a valid program to distinguish what qualifiers were
> used for parameters in a function declaration in a header, thus it's not
> even a meaningful question.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-28 22:50       ` Dwight Guth
@ 2016-01-28 22:52         ` Joseph Myers
  2016-01-28 22:58           ` Mike Frysinger
  0 siblings, 1 reply; 34+ messages in thread
From: Joseph Myers @ 2016-01-28 22:52 UTC (permalink / raw)
  To: Dwight Guth; +Cc: libc-alpha

On Thu, 28 Jan 2016, Dwight Guth wrote:

> Okay but if so, then why put the __restrict in the header files at all
> if it doesn't really matter? And why put it there only if the compiler
> is gcc?

Effectively it serves as documentation of intent for people reading the 
headers (much like the argument names with __ prefixes).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-28 22:52         ` Joseph Myers
@ 2016-01-28 22:58           ` Mike Frysinger
  2016-01-28 23:08             ` Dwight Guth
  2016-01-28 23:16             ` Joseph Myers
  0 siblings, 2 replies; 34+ messages in thread
From: Mike Frysinger @ 2016-01-28 22:58 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Dwight Guth, libc-alpha

[-- Attachment #1: Type: text/plain, Size: 473 bytes --]

On 28 Jan 2016 22:52, Joseph Myers wrote:
> On Thu, 28 Jan 2016, Dwight Guth wrote:
> > Okay but if so, then why put the __restrict in the header files at all
> > if it doesn't really matter? And why put it there only if the compiler
> > is gcc?
> 
> Effectively it serves as documentation of intent for people reading the 
> headers (much like the argument names with __ prefixes).

wouldn't it also assist automated tools like linters/static analyzers ?
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-28 22:58           ` Mike Frysinger
@ 2016-01-28 23:08             ` Dwight Guth
  2016-01-28 23:20               ` Joseph Myers
  2016-01-28 23:16             ` Joseph Myers
  1 sibling, 1 reply; 34+ messages in thread
From: Dwight Guth @ 2016-01-28 23:08 UTC (permalink / raw)
  To: Joseph Myers, Dwight Guth, libc-alpha

Yes, actually. A dynamic analysis tool (like the one I am developing)
that uses the GCC preprocessor on this header file to try to decide
the types of functions in order to decide whether they are
restrict-qualified and therefore subject to the restrictions of
restrict-qualified pointers would be highly interested in this
information.

It also seems strange to me to be declaring functions in header files
with different types than are mandated by the standard just because it
won't matter in most cases... It also sounds to me like a highly
subtle bug waiting to happen. Can you guarantee that in the entire C
standard, there isn't anywhere in it where having the wrong qualifiers
on a function declaration might have some kind of impact on which
programs are well-defined or not, or on the behavior of a program? Can
you guarantee that no such cases will continue to exist in all future
revisions of the C standard? What happens when someone uses __restrict
someday in a situation where it does matter (e.g. if the next version
of C introduces a "char * restrict *" parameter somewhere), and
doesn't know anything about this discussion?

On Thu, Jan 28, 2016 at 4:58 PM, Mike Frysinger <vapier@gentoo.org> wrote:
> On 28 Jan 2016 22:52, Joseph Myers wrote:
>> On Thu, 28 Jan 2016, Dwight Guth wrote:
>> > Okay but if so, then why put the __restrict in the header files at all
>> > if it doesn't really matter? And why put it there only if the compiler
>> > is gcc?
>>
>> Effectively it serves as documentation of intent for people reading the
>> headers (much like the argument names with __ prefixes).
>
> wouldn't it also assist automated tools like linters/static analyzers ?
> -mike

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-28 22:58           ` Mike Frysinger
  2016-01-28 23:08             ` Dwight Guth
@ 2016-01-28 23:16             ` Joseph Myers
  2016-01-28 23:44               ` Mike Frysinger
  1 sibling, 1 reply; 34+ messages in thread
From: Joseph Myers @ 2016-01-28 23:16 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: Dwight Guth, libc-alpha

On Thu, 28 Jan 2016, Mike Frysinger wrote:

> On 28 Jan 2016 22:52, Joseph Myers wrote:
> > On Thu, 28 Jan 2016, Dwight Guth wrote:
> > > Okay but if so, then why put the __restrict in the header files at all
> > > if it doesn't really matter? And why put it there only if the compiler
> > > is gcc?
> > 
> > Effectively it serves as documentation of intent for people reading the 
> > headers (much like the argument names with __ prefixes).
> 
> wouldn't it also assist automated tools like linters/static analyzers ?

If they hardcode information about particular functions, the qualifiers in 
the headers are irrelevant.  If not, even having restrict qualifiers on 
the parameters in the function definitions is only useful when you look at 
the body of the definitions as well, unless you apply heuristics beyond 
what is supported by the standard.

Remember that if a function has two restricted pointer arguments (that are 
restricted in the definition), this does *not* mean that they don't alias 
- only that *if* a particular execution of the function modifies some 
elements pointed to by one of the pointers, those elements are not also 
accessed other than through that pointer.  (But it's completely valid to 
have two restricted pointers to the same array, one only used to access / 
modify odd-numbered elements of that array, and the other one only used to 
access / modify even-numbered elements of that array.  Now, static 
analyzers might reasonably consider that dubious usage that should be 
diagnosed.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-28 23:08             ` Dwight Guth
@ 2016-01-28 23:20               ` Joseph Myers
  2016-01-28 23:28                 ` Dwight Guth
  0 siblings, 1 reply; 34+ messages in thread
From: Joseph Myers @ 2016-01-28 23:20 UTC (permalink / raw)
  To: Dwight Guth; +Cc: libc-alpha

On Thu, 28 Jan 2016, Dwight Guth wrote:

> It also seems strange to me to be declaring functions in header files
> with different types than are mandated by the standard just because it
> won't matter in most cases... It also sounds to me like a highly

I didn't say that's the reason, simply pointed out that there is no 
conformance bug from missing the restrict qualifiers - the reason is more 
likely that no-one in practice tends to use the glibc headers with 
compilers not defining __GNUC__ (they use non-GNU compilers, but non-GNU 
compilers that define __GNUC__ to indicate support for GNU extensions) and 
so there has been very little interest in fixing hypothetical issues with 
such compilers (and where people have submitted such patches, they've 
drifted away without resolving issues from review and pinging as needed).  
Combined with: __restrict in the headers may well date back to before C99 
was released and so before the final __STDC_VERSION__ value was known.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-28 23:20               ` Joseph Myers
@ 2016-01-28 23:28                 ` Dwight Guth
  2016-01-28 23:33                   ` Joseph Myers
  0 siblings, 1 reply; 34+ messages in thread
From: Dwight Guth @ 2016-01-28 23:28 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Dwight Guth, libc-alpha

Yes, that's reasonable. What would you see as the correct resolution
to the issue?

On Thu, Jan 28, 2016 at 5:20 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> On Thu, 28 Jan 2016, Dwight Guth wrote:
>
>> It also seems strange to me to be declaring functions in header files
>> with different types than are mandated by the standard just because it
>> won't matter in most cases... It also sounds to me like a highly
>
> I didn't say that's the reason, simply pointed out that there is no
> conformance bug from missing the restrict qualifiers - the reason is more
> likely that no-one in practice tends to use the glibc headers with
> compilers not defining __GNUC__ (they use non-GNU compilers, but non-GNU
> compilers that define __GNUC__ to indicate support for GNU extensions) and
> so there has been very little interest in fixing hypothetical issues with
> such compilers (and where people have submitted such patches, they've
> drifted away without resolving issues from review and pinging as needed).
> Combined with: __restrict in the headers may well date back to before C99
> was released and so before the final __STDC_VERSION__ value was known.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-28 23:28                 ` Dwight Guth
@ 2016-01-28 23:33                   ` Joseph Myers
  2016-01-28 23:47                     ` Dwight Guth
  0 siblings, 1 reply; 34+ messages in thread
From: Joseph Myers @ 2016-01-28 23:33 UTC (permalink / raw)
  To: Dwight Guth; +Cc: libc-alpha

On Thu, 28 Jan 2016, Dwight Guth wrote:

> Yes, that's reasonable. What would you see as the correct resolution
> to the issue?

I think defining to restrict based on __STDC_VERSION__ is reasonable 
(outside of the release freeze period).

uchar.h is trickier in that stdint.h is theoretically system-specific 
(it's in sysdeps), so there's an abstraction violation in copying its 
types into uchar.h (not in sysdeps).  I'd be happy for stdint.h and 
inttypes.h to move out of sysdeps, but any hardcoding should probably 
still be accompanied at least by a comment in stdint.h about where else 
the type information is embedded.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-28 23:16             ` Joseph Myers
@ 2016-01-28 23:44               ` Mike Frysinger
  2016-01-29  0:48                 ` Alexander Cherepanov
  0 siblings, 1 reply; 34+ messages in thread
From: Mike Frysinger @ 2016-01-28 23:44 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Dwight Guth, libc-alpha

[-- Attachment #1: Type: text/plain, Size: 2190 bytes --]

On 28 Jan 2016 23:16, Joseph Myers wrote:
> On Thu, 28 Jan 2016, Mike Frysinger wrote:
> > On 28 Jan 2016 22:52, Joseph Myers wrote:
> > > On Thu, 28 Jan 2016, Dwight Guth wrote:
> > > > Okay but if so, then why put the __restrict in the header files at all
> > > > if it doesn't really matter? And why put it there only if the compiler
> > > > is gcc?
> > > 
> > > Effectively it serves as documentation of intent for people reading the 
> > > headers (much like the argument names with __ prefixes).
> > 
> > wouldn't it also assist automated tools like linters/static analyzers ?
> 
> If they hardcode information about particular functions, the qualifiers in 
> the headers are irrelevant.  If not, even having restrict qualifiers on 
> the parameters in the function definitions is only useful when you look at 
> the body of the definitions as well, unless you apply heuristics beyond 
> what is supported by the standard.
> 
> Remember that if a function has two restricted pointer arguments (that are 
> restricted in the definition), this does *not* mean that they don't alias 
> - only that *if* a particular execution of the function modifies some 
> elements pointed to by one of the pointers, those elements are not also 
> accessed other than through that pointer.  (But it's completely valid to 
> have two restricted pointers to the same array, one only used to access / 
> modify odd-numbered elements of that array, and the other one only used to 
> access / modify even-numbered elements of that array.  Now, static 
> analyzers might reasonably consider that dubious usage that should be 
> diagnosed.)

i think having analyzers warn about that by default is a sane position.
i would expect that such usage is more commonly an unintended mistake
rather than the function actually has such esoteric behavior.  i don't
think any of the glibc functions are written in this manner and expect
the pointers to be distinct memory locations.

i don't see an issue with allowing the headers to use keywords that are
supposed to exist according to the standard, even if we don't know the
exact compiler that is in use (i.e. non-gcc).
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-28 23:33                   ` Joseph Myers
@ 2016-01-28 23:47                     ` Dwight Guth
  2016-01-28 23:52                       ` Joseph Myers
  0 siblings, 1 reply; 34+ messages in thread
From: Dwight Guth @ 2016-01-28 23:47 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Dwight Guth, libc-alpha

Would you see it as better to move stdint.h out of sysdeps, or to do
something similar to what's done with mbstate_t, NULL, size_t, etc by
defining a __need_uint_least16_t and __need_uint_least32_t and having
stdint.h define only a __uint_least16_t and __uint_least32_t type and
then exit, allowing those types to be used across an abstraction
boundary by uchar.h?

On Thu, Jan 28, 2016 at 5:33 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> On Thu, 28 Jan 2016, Dwight Guth wrote:
>
>> Yes, that's reasonable. What would you see as the correct resolution
>> to the issue?
>
> I think defining to restrict based on __STDC_VERSION__ is reasonable
> (outside of the release freeze period).
>
> uchar.h is trickier in that stdint.h is theoretically system-specific
> (it's in sysdeps), so there's an abstraction violation in copying its
> types into uchar.h (not in sysdeps).  I'd be happy for stdint.h and
> inttypes.h to move out of sysdeps, but any hardcoding should probably
> still be accompanied at least by a comment in stdint.h about where else
> the type information is embedded.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-28 23:47                     ` Dwight Guth
@ 2016-01-28 23:52                       ` Joseph Myers
  2016-01-29  0:06                         ` Dwight Guth
  0 siblings, 1 reply; 34+ messages in thread
From: Joseph Myers @ 2016-01-28 23:52 UTC (permalink / raw)
  To: Dwight Guth; +Cc: libc-alpha

On Thu, 28 Jan 2016, Dwight Guth wrote:

> Would you see it as better to move stdint.h out of sysdeps, or to do
> something similar to what's done with mbstate_t, NULL, size_t, etc by
> defining a __need_uint_least16_t and __need_uint_least32_t and having
> stdint.h define only a __uint_least16_t and __uint_least32_t type and
> then exit, allowing those types to be used across an abstraction
> boundary by uchar.h?

I don't think we want to add more __need_* cases; if we split things out, 
it should be by adding a bits/stdint.h header that defines __*_t (and 
moving stdint.h out of sysdeps would still make sense).  Cf 
<https://sourceware.org/ml/libc-alpha/2012-08/msg00510.html>.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-28 23:52                       ` Joseph Myers
@ 2016-01-29  0:06                         ` Dwight Guth
  2016-01-29 15:34                           ` Joseph Myers
  0 siblings, 1 reply; 34+ messages in thread
From: Dwight Guth @ 2016-01-29  0:06 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Dwight Guth, libc-alpha

Alright, fair enough. So the solution you want to see is to move
stdint.h out of sysdeps, move the definition of uint_least16_t and
uint_least32_t to a definition of __*_t in bits/stdint.h, and then
define uint_least16_t, uint_least32_t, char16_t, and char32_t on the
basis of the __*_t types? Or is this overkill given that we are
assuming that stdint.h doesn't need to be system-specific? Which would
you prefer?

On Thu, Jan 28, 2016 at 5:51 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> On Thu, 28 Jan 2016, Dwight Guth wrote:
>
>> Would you see it as better to move stdint.h out of sysdeps, or to do
>> something similar to what's done with mbstate_t, NULL, size_t, etc by
>> defining a __need_uint_least16_t and __need_uint_least32_t and having
>> stdint.h define only a __uint_least16_t and __uint_least32_t type and
>> then exit, allowing those types to be used across an abstraction
>> boundary by uchar.h?
>
> I don't think we want to add more __need_* cases; if we split things out,
> it should be by adding a bits/stdint.h header that defines __*_t (and
> moving stdint.h out of sysdeps would still make sense).  Cf
> <https://sourceware.org/ml/libc-alpha/2012-08/msg00510.html>.
>
> --
> Joseph S. Myers
> joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-28 23:44               ` Mike Frysinger
@ 2016-01-29  0:48                 ` Alexander Cherepanov
  2016-01-29  0:58                   ` Mike Frysinger
  0 siblings, 1 reply; 34+ messages in thread
From: Alexander Cherepanov @ 2016-01-29  0:48 UTC (permalink / raw)
  To: Joseph Myers, Dwight Guth, libc-alpha

On 2016-01-29 02:43, Mike Frysinger wrote:
> On 28 Jan 2016 23:16, Joseph Myers wrote:
>> On Thu, 28 Jan 2016, Mike Frysinger wrote:
>>> On 28 Jan 2016 22:52, Joseph Myers wrote:
>>>> On Thu, 28 Jan 2016, Dwight Guth wrote:
>>>>> Okay but if so, then why put the __restrict in the header files at all
>>>>> if it doesn't really matter? And why put it there only if the compiler
>>>>> is gcc?
>>>>
>>>> Effectively it serves as documentation of intent for people reading the
>>>> headers (much like the argument names with __ prefixes).
>>>
>>> wouldn't it also assist automated tools like linters/static analyzers ?
>>
>> If they hardcode information about particular functions, the qualifiers in
>> the headers are irrelevant.  If not, even having restrict qualifiers on
>> the parameters in the function definitions is only useful when you look at
>> the body of the definitions as well, unless you apply heuristics beyond
>> what is supported by the standard.

I would add that for an analyzer to depend on a specific implementation 
is a bit risky. The meaning of the restrict qualifier on the parameters 
of a libc function depends on the description of the function in the 
standard, not on any implementation.

>> Remember that if a function has two restricted pointer arguments (that are
>> restricted in the definition), this does *not* mean that they don't alias
>> - only that *if* a particular execution of the function modifies some
>> elements pointed to by one of the pointers, those elements are not also
>> accessed other than through that pointer.  (But it's completely valid to
>> have two restricted pointers to the same array, one only used to access /
>> modify odd-numbered elements of that array, and the other one only used to
>> access / modify even-numbered elements of that array.  Now, static
>> analyzers might reasonably consider that dubious usage that should be
>> diagnosed.)
>
> i think having analyzers warn about that by default is a sane position.
> i would expect that such usage is more commonly an unintended mistake
> rather than the function actually has such esoteric behavior.  i don't
> think any of the glibc functions are written in this manner and expect
> the pointers to be distinct memory locations.

What about strncat? I think this:

   char s[10] = "abc";
   strncat(s, s, 2);

is fine according to C11.

-- 
Alexander Cherepanov

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-29  0:48                 ` Alexander Cherepanov
@ 2016-01-29  0:58                   ` Mike Frysinger
  2016-01-30  1:50                     ` Alexander Cherepanov
  0 siblings, 1 reply; 34+ messages in thread
From: Mike Frysinger @ 2016-01-29  0:58 UTC (permalink / raw)
  To: Alexander Cherepanov; +Cc: Joseph Myers, Dwight Guth, libc-alpha

[-- Attachment #1: Type: text/plain, Size: 2950 bytes --]

On 29 Jan 2016 03:48, Alexander Cherepanov wrote:
> On 2016-01-29 02:43, Mike Frysinger wrote:
> > On 28 Jan 2016 23:16, Joseph Myers wrote:
> >> On Thu, 28 Jan 2016, Mike Frysinger wrote:
> >>> On 28 Jan 2016 22:52, Joseph Myers wrote:
> >>>> On Thu, 28 Jan 2016, Dwight Guth wrote:
> >>>>> Okay but if so, then why put the __restrict in the header files at all
> >>>>> if it doesn't really matter? And why put it there only if the compiler
> >>>>> is gcc?
> >>>>
> >>>> Effectively it serves as documentation of intent for people reading the
> >>>> headers (much like the argument names with __ prefixes).
> >>>
> >>> wouldn't it also assist automated tools like linters/static analyzers ?
> >>
> >> If they hardcode information about particular functions, the qualifiers in
> >> the headers are irrelevant.  If not, even having restrict qualifiers on
> >> the parameters in the function definitions is only useful when you look at
> >> the body of the definitions as well, unless you apply heuristics beyond
> >> what is supported by the standard.
> 
> I would add that for an analyzer to depend on a specific implementation 
> is a bit risky. The meaning of the restrict qualifier on the parameters 
> of a libc function depends on the description of the function in the 
> standard, not on any implementation.

not all functions glibc uses restrict on are in any standard, unless you
are lumping "the GNU standard" in there.

> >> Remember that if a function has two restricted pointer arguments (that are
> >> restricted in the definition), this does *not* mean that they don't alias
> >> - only that *if* a particular execution of the function modifies some
> >> elements pointed to by one of the pointers, those elements are not also
> >> accessed other than through that pointer.  (But it's completely valid to
> >> have two restricted pointers to the same array, one only used to access /
> >> modify odd-numbered elements of that array, and the other one only used to
> >> access / modify even-numbered elements of that array.  Now, static
> >> analyzers might reasonably consider that dubious usage that should be
> >> diagnosed.)
> >
> > i think having analyzers warn about that by default is a sane position.
> > i would expect that such usage is more commonly an unintended mistake
> > rather than the function actually has such esoteric behavior.  i don't
> > think any of the glibc functions are written in this manner and expect
> > the pointers to be distinct memory locations.
> 
> What about strncat? I think this:
> 
>    char s[10] = "abc";
>    strncat(s, s, 2);
> 
> is fine according to C11.

linters/static analyzers aren't about doing standard validation.  if i
saw that snippet, i would assume an accidental bug, or pointless code
and should be deleted.  you don't run random hacks through linters, you
run code you want to send to production through them.
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-29  0:06                         ` Dwight Guth
@ 2016-01-29 15:34                           ` Joseph Myers
  0 siblings, 0 replies; 34+ messages in thread
From: Joseph Myers @ 2016-01-29 15:34 UTC (permalink / raw)
  To: Dwight Guth; +Cc: libc-alpha

On Thu, 28 Jan 2016, Dwight Guth wrote:

> Alright, fair enough. So the solution you want to see is to move
> stdint.h out of sysdeps, move the definition of uint_least16_t and
> uint_least32_t to a definition of __*_t in bits/stdint.h, and then
> define uint_least16_t, uint_least32_t, char16_t, and char32_t on the
> basis of the __*_t types? Or is this overkill given that we are
> assuming that stdint.h doesn't need to be system-specific? Which would
> you prefer?

That seems logically right, though it might be overkill.  The minimum is 
cross-references between the headers pointing out the other places where 
type information is embedded.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-29  0:58                   ` Mike Frysinger
@ 2016-01-30  1:50                     ` Alexander Cherepanov
  2016-01-31 16:13                       ` Joseph Myers
  0 siblings, 1 reply; 34+ messages in thread
From: Alexander Cherepanov @ 2016-01-30  1:50 UTC (permalink / raw)
  To: Joseph Myers, Dwight Guth, libc-alpha

On 2016-01-29 03:58, Mike Frysinger wrote:
>>>>> wouldn't it also assist automated tools like linters/static analyzers ?
>>>>
>>>> If they hardcode information about particular functions, the qualifiers in
>>>> the headers are irrelevant.  If not, even having restrict qualifiers on
>>>> the parameters in the function definitions is only useful when you look at
>>>> the body of the definitions as well, unless you apply heuristics beyond
>>>> what is supported by the standard.
>>
>> I would add that for an analyzer to depend on a specific implementation
>> is a bit risky. The meaning of the restrict qualifier on the parameters
>> of a libc function depends on the description of the function in the
>> standard, not on any implementation.
>
> not all functions glibc uses restrict on are in any standard, unless you
> are lumping "the GNU standard" in there.

It doesn't matter. My point is that you have to work from the 
description instead of implementation. The C standard seems to impose 
minimal restrictions so there would be no difference between the 
description and any conforming implementation in many cases. For 
contrast consider proposed strlcpy/strlcat -- the last revision of their 
description is intentionally more strict than their implementation. The 
whole point of this distinction is to enable tools to catch problematic 
uses.

>>>> Remember that if a function has two restricted pointer arguments (that are
>>>> restricted in the definition), this does *not* mean that they don't alias
>>>> - only that *if* a particular execution of the function modifies some
>>>> elements pointed to by one of the pointers, those elements are not also
>>>> accessed other than through that pointer.  (But it's completely valid to
>>>> have two restricted pointers to the same array, one only used to access /
>>>> modify odd-numbered elements of that array, and the other one only used to
>>>> access / modify even-numbered elements of that array.  Now, static
>>>> analyzers might reasonably consider that dubious usage that should be
>>>> diagnosed.)
>>>
>>> i think having analyzers warn about that by default is a sane position.
>>> i would expect that such usage is more commonly an unintended mistake
>>> rather than the function actually has such esoteric behavior.  i don't
>>> think any of the glibc functions are written in this manner and expect
>>> the pointers to be distinct memory locations.
>>
>> What about strncat? I think this:
>>
>>     char s[10] = "abc";
>>     strncat(s, s, 2);
>>
>> is fine according to C11.
>
> linters/static analyzers aren't about doing standard validation.  if i
> saw that snippet, i would assume an accidental bug, or pointless code
> and should be deleted.  you don't run random hacks through linters, you
> run code you want to send to production through them.

I guess opinions will differ.

1. Personally, I would appreciate the distinction between definite 
standard violations and doubtful/risky code.

2. The stylistic preferences differ greatly. You call it a hack, others 
will call it a clever trick (though a couple of ifs and assignments in 
between to make the example code non-pointless).

But we can consider something much more real: glibc code contains many 
cases of undefined behavior (according to C11). I consider it bugs 
unwanted in production (maybe with the exception of strlen) but it's not 
clear to me what is the POV of the project on this. E.g., are bug 
reports about UB are welcome?

-- 
Alexander Cherepanov

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-30  1:50                     ` Alexander Cherepanov
@ 2016-01-31 16:13                       ` Joseph Myers
  2016-02-03 12:29                         ` Undefined behavior in glibc -- was: " Alexander Cherepanov
  0 siblings, 1 reply; 34+ messages in thread
From: Joseph Myers @ 2016-01-31 16:13 UTC (permalink / raw)
  To: Alexander Cherepanov; +Cc: Dwight Guth, libc-alpha

On Sat, 30 Jan 2016, Alexander Cherepanov wrote:

> But we can consider something much more real: glibc code contains many cases
> of undefined behavior (according to C11). I consider it bugs unwanted in
> production (maybe with the exception of strlen) but it's not clear to me what
> is the POV of the project on this. E.g., are bug reports about UB are welcome?

It's an intrinsic part of implementing the C library that it involves 
doing things that are not defined in ISO C (the ISO C library cannot all 
be implemented in ISO C).

If something involves undefined behavior in GNU C, on systems with the 
properties that hold for all systems supported by glibc (e.g. 32-bit int, 
32-bit or 64-bit long and pointers), for inputs to library functions where 
the library function semantics don't involve undefined behavior, taking 
due account of both separate compilation and use of asm to reduce the 
information available to the compiler in certain places and so make things 
not undefined that would otherwise be undefined, then I'd consider bug 
reports appropriate.  Examples of this include buffer overruns and signed 
integer arithmetic overflow.  I think that applies even if there is no 
plausible optimization that could result in the signed integer arithmetic 
overflow causing problems.  Of course, patches are even better.

If something is fully defined in GNU C, but undefined, unspecified or 
implementation-defined in ISO C, I don't think bug reports are 
appropriate.  E.g., it's not a bug for glibc code to rely on signed shifts 
by a nonnegative amount that's less than the width of the type, including 
the aspects of those that are not defined by ISO C, or on the 
fully-defined nature of conversions of out-of-range integer values to 
signed integer types.  Some such cases in code shared with gnulib (or 
other external sources) may still be gnulib bugs where they aren't glibc 
bugs, since gnulib is meant to be much more portable.  And there may be 
cases where you could argue that eliminating such usages is a desirable 
*cleanup* in glibc - e.g., that the code is cleaner when it uses unsigned 
types for bitwise operations.  But in such cases there is not bug and so 
patches should be sent without any bug reports in Bugzilla (or more 
general groups of issues identified in the list of possible cleanups in 
the todo list on the wiki).

Where something depends on a property common to all systems supported by 
glibc, again it may be a useful *cleanup* to make this more explicit in 
the code, without there being any *bug* that's appropriate to report to 
Bugzilla.  E.g., if code is using int in a way that requires it to be 
exactly 32-bit, changing the code to use int32_t is a sensible cleanup; if 
it really does mean int but is hardcoding the value of INT_MAX, making it 
use INT_MAX is a sensible cleanup; if it's using long as a pointer-sized 
integer type, changing it to use intptr_t is a sensible cleanup.

Where glibc code relies on separate compilation to avoid undefined 
behavior, this is not a bug; use of LTO on glibc's own code is not 
supported.  For example, where functions with different C types but 
compatible ABIs are aliased, or a function is otherwise defined with a 
different type from that used when it is called.  Similarly, asm may be 
used to limit code movement, and that could e.g. mean that aliasing is not 
undefined behavior where it would otherwise be undefined.

Of course, if the semantics for a function say that certain inputs result 
in undefined behavior, it does not matter if C-level undefined behavior 
occurs within that function for those inputs.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2][BZ #17979][BZ #17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
@ 2016-02-01  9:44 ` Dwight Guth
  2016-02-10  4:16   ` Dwight Guth
  2016-02-10 17:21   ` Mike Frysinger
  0 siblings, 2 replies; 34+ messages in thread
From: Dwight Guth @ 2016-02-01  9:44 UTC (permalink / raw)
  To: libc-alpha

If linking against Glibc with a compiler for which the __GNUC__ macro is not
defined, problems arise when including header files that use the __restrict
or __inline keyword and when including uchar.h.

Glibc strips __restrict from the prototypes of C library functions in this
case. This is undesirable if the compiler is a C99-compliant compiler,
because
C99 includes the restrict keyword and uses it in the declaration of a number
of functions in the C library. While this is broadly correct, lack of this
information can have undesirable effects on analysis tools. The same thing
occurs with the __inline keyword, which is also defined in C99 but stripped
if the compiler is not GNU C. Here we except the case where the compiler
declares itself to be C99-compliant from these checks in order to allow
better
C99 compliance for non-GNU-C compilers which link against Glibc.

Glibc defines char16_t and char32_t in uchar.h as __CHAR16_TYPE__ and
__CHAR32_TYPE__ when the __GNUC__ macro is defined, but when linking against
Glibc with a different compiler, these types are not defined at all,
which is a violation of C11 sec. 7.28 paragraph 2, as well as a syntax
error because these types are used in the prototypes of functions declared
later in the file. According to this section of the standard, these types
must be defined in this header file and must be the same type as
uint_least16_t and uint_least32_t, which are defined in stdint.h as
"unsigned short int" and "unsigned int" respectively. Here we modify the
header so that if __GNUC__ is not defined, we still provide these typedefs,
but we obtain them from bits/stdint.h, where they are defined to be equal
to uint_least16_t and uint_least32_t, if __CHAR16_TYPE__ and __CHAR32_TYPE__
are not defined by the compiler.

---

I had trouble testing this patch because I ran into unrelated errors in the
test suite. No tests seem to fail on my machine after the patch that were
passing before, however. It seems to happen because it can't dynamically
link against libgcc_s, but I'm not 100% sure what the correct way to pass
that
to make check is. Can someone point me to where this documentation is?

---

Changed from previous version:

* Updated patch description based on comments on mailing list.
* Added a bits/stdint.h which defines __uint_least16_t and __uint_least32_t
  and which is included from stdint.h and uchar.h.

I didn't move stdint.h out of sysdeps because I wasn't sure how controversial
that would be, but I can do that in a third version of the patch if need be.
I was under impression that the issue was that uchar.h was outside sysdeps
and could not directly assume a particular version of stdint.h, but that
assuming that whatever particular version of bits/stdint.h you had would
define
a certain type, and including that header from outside sysdeps should be
fine.
If this is not the case, apologies, just let me know.

---

2016-01-28  Dwight Guth  <dwight.guth@runtimeverification.com>

	[BZ #17979]
	* wcsmbs/uchar.h (char16_t, char32_t): Define types if __GNUC__,
	__CHAR16_TYPE__, or __CHAR32_TYPE__ are not defined.
	* sysdeps/generic/bits/stdint.h: Created.
	* sysdeps/generic/stdint.h (uint_least16_t, uint_least32_t): Defined
	based on types from bits/stdint.h.

	[BZ #17721]
	* misc/sys/cdefs.h (__restrict, __inline): Define as keywords if
	__GNUC__ is not defined but __STDC_VERSION__ is at least C99.

---

diff --git a/misc/sys/cdefs.h b/misc/sys/cdefs.h
index 7fd4154..af23ff7 100644
--- a/misc/sys/cdefs.h
+++ b/misc/sys/cdefs.h
@@ -69,8 +69,11 @@

 #else	/* Not GCC.  */

-# define __inline		/* No inline functions.  */
-
+# if defined __STDC_VERSION__ && __STDC_VERSION__ >= 199901L
+#  define __inline		inline
+# else
+#  define __inline		/* No inline functions.  */
+# endif
 # define __THROW
 # define __THROWNL
 # define __NTH(fct)	fct
@@ -360,7 +363,11 @@

 /* __restrict is known in EGCS 1.2 and above. */
 #if !__GNUC_PREREQ (2,92)
-# define __restrict	/* Ignore */
+# if !defined __GNUC__ && defined __STDC_VERSION__ && __STDC_VERSION__ >=
199901L
+#  define __restrict	restrict
+# else
+#  define __restrict	/* Ignore */
+# endif
 #endif

 /* ISO C99 also allows to declare arrays as non-overlapping.  The syntax is
diff --git a/sysdeps/generic/bits/stdint.h b/sysdeps/generic/bits/stdint.h
new file mode 100644
index 0000000..ee63a2a
--- /dev/null
+++ b/sysdeps/generic/bits/stdint.h
@@ -0,0 +1,29 @@
+/* uint_least16_t and uint_least32_t definitions
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _BITS_STDINT_H
+#define _BITS_STDINT_H	1
+
+/* uint_least16_t and char16_t must be the same type, as must
+   uint_least32_t and char32_t. We define here a reserved identifier
+   for these types so that they will always remain the same type. */
+
+typedef unsigned short int	__uint_least16_t;
+typedef unsigned int		__uint_least32_t;
+
+#endif	/* bits/stdint.h */
diff --git a/sysdeps/generic/stdint.h b/sysdeps/generic/stdint.h
index 4427627..2f4ebd2 100644
--- a/sysdeps/generic/stdint.h
+++ b/sysdeps/generic/stdint.h
@@ -25,6 +25,7 @@
 #include <features.h>
 #include <bits/wchar.h>
 #include <bits/wordsize.h>
+#include <bits/stdint.h>

 /* Exact integral types.  */

@@ -74,8 +75,8 @@ typedef long long int		int_least64_t;

 /* Unsigned.  */
 typedef unsigned char		uint_least8_t;
-typedef unsigned short int	uint_least16_t;
-typedef unsigned int		uint_least32_t;
+typedef __uint_least16_t	uint_least16_t;
+typedef __uint_least32_t	uint_least32_t;
 #if __WORDSIZE == 64
 typedef unsigned long int	uint_least64_t;
 #else
diff --git a/wcsmbs/uchar.h b/wcsmbs/uchar.h
index ce92b25..875919a 100644
--- a/wcsmbs/uchar.h
+++ b/wcsmbs/uchar.h
@@ -30,6 +30,8 @@
 #define __need_mbstate_t
 #include <wchar.h>

+#include <bits/stdint.h>
+
 #ifndef __mbstate_t_defined
 __BEGIN_NAMESPACE_C99
 /* Public type.  */
@@ -39,14 +41,16 @@ __END_NAMESPACE_C99
 #endif


-#if defined __GNUC__ && !defined __USE_ISOCXX11
+#if !defined __USE_ISOCXX11
 /* Define the 16-bit and 32-bit character types.  Use the information
    provided by the compiler.  */
 # if !defined __CHAR16_TYPE__ || !defined __CHAR32_TYPE__
 #  if defined __STDC_VERSION__ && __STDC_VERSION__ < 201000L
 #   error "<uchar.h> requires ISO C11 mode"
 #  else
-#   error "definitions of __CHAR16_TYPE__ and/or __CHAR32_TYPE__ missing"
+/* Same as uint_least16_t and uint_least32_t in stdint.h. */
+typedef __uint_least16_t __CHAR16_TYPE__;
+typedef __uint_least32_t __CHAR32_TYPE__;
 #  endif
 # endif
 typedef __CHAR16_TYPE__ char16_t;

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Undefined behavior in glibc -- was: Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-01-31 16:13                       ` Joseph Myers
@ 2016-02-03 12:29                         ` Alexander Cherepanov
  2016-02-03 13:07                           ` Joseph Myers
  0 siblings, 1 reply; 34+ messages in thread
From: Alexander Cherepanov @ 2016-02-03 12:29 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Dwight Guth, libc-alpha

[-- Attachment #1: Type: text/plain, Size: 12233 bytes --]

Joseph,

thanks a lot for the detailed reply. My comments are below.

On 2016-01-31 19:12, Joseph Myers wrote:
> On Sat, 30 Jan 2016, Alexander Cherepanov wrote:
>
>> But we can consider something much more real: glibc code contains many cases
>> of undefined behavior (according to C11). I consider it bugs unwanted in
>> production (maybe with the exception of strlen) but it's not clear to me what
>> is the POV of the project on this. E.g., are bug reports about UB are welcome?
>
> It's an intrinsic part of implementing the C library that it involves
> doing things that are not defined in ISO C (the ISO C library cannot all
> be implemented in ISO C).

But those parts which cannot be implemented in ISO C would be 
implemented in asm? AIUI there is no much magic in compiling C parts of 
glibc so they should obey the common rules?

> If something involves undefined behavior in GNU C,

GNU C is ISO C (whatever it means:-) + GCC implementation-defined 
behavior as described in [1] + GCC extensions[2], right?

[1] https://gcc.gnu.org/onlinedocs/gcc/C-Implementation.html
[2] https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html

> on systems with the
> properties that hold for all systems supported by glibc (e.g. 32-bit int,
> 32-bit or 64-bit long and pointers), for inputs to library functions where
> the library function semantics don't involve undefined behavior, taking
> due account of both separate compilation and use of asm to reduce the
> information available to the compiler in certain places and so make things
> not undefined that would otherwise be undefined, then I'd consider bug
> reports appropriate.  Examples of this include buffer overruns and signed
> integer arithmetic overflow.  I think that applies even if there is no
> plausible optimization that could result in the signed integer arithmetic
> overflow causing problems.  Of course, patches are even better.

This looks reasonable but I guess there supposed to be some exceptions? 
E.g.:

1) strlen and other string functions read memory by chunks which is 
aliasing violation;
2) strlen accesses the OOB memory.

AIUI for 1) there is may_alias gcc attribute but it almost never used in 
glibc. There are no tools in GNU C to make 2) well-defined but the 
speed-up is remarkable (Rich Felker mentioned 3x difference in speed in 
musl strlen on his hw).

Documenting all the exceptions would be nice. Then, ASan is in glibc 
todo so a clean version of strlen has to be added.

> If something is fully defined in GNU C, but undefined, unspecified or
> implementation-defined in ISO C, I don't think bug reports are
> appropriate.  E.g., it's not a bug for glibc code to rely on signed shifts
> by a nonnegative amount that's less than the width of the type, including
> the aspects of those that are not defined by ISO C, or on the
> fully-defined nature of conversions of out-of-range integer values to
> signed integer types.  Some such cases in code shared with gnulib (or
> other external sources) may still be gnulib bugs where they aren't glibc
> bugs, since gnulib is meant to be much more portable.  And there may be
> cases where you could argue that eliminating such usages is a desirable
> *cleanup* in glibc - e.g., that the code is cleaner when it uses unsigned
> types for bitwise operations.  But in such cases there is not bug and so
> patches should be sent without any bug reports in Bugzilla (or more
> general groups of issues identified in the list of possible cleanups in
> the todo list on the wiki).

Ok

Making glibc compilable by clang would be nice too (and it's in the 
glibc todo list).

> Where something depends on a property common to all systems supported by
> glibc, again it may be a useful *cleanup* to make this more explicit in
> the code, without there being any *bug* that's appropriate to report to
> Bugzilla.  E.g., if code is using int in a way that requires it to be
> exactly 32-bit, changing the code to use int32_t is a sensible cleanup; if
> it really does mean int but is hardcoding the value of INT_MAX, making it
> use INT_MAX is a sensible cleanup; if it's using long as a pointer-sized
> integer type, changing it to use intptr_t is a sensible cleanup.

Ok

> Where glibc code relies on separate compilation to avoid undefined
> behavior, this is not a bug; use of LTO on glibc's own code is not
> supported.  For example, where functions with different C types but
> compatible ABIs are aliased, or a function is otherwise defined with a
> different type from that used when it is called.  Similarly, asm may be
> used to limit code movement, and that could e.g. mean that aliasing is not
> undefined behavior where it would otherwise be undefined.

Not sure what you mean. Is it specific to calling functions or it's 
supposed to cover the case of strlen too?

I don't see cleaning glibc for LTO in todo list in the wiki. Shouldn't 
it be there?

> Of course, if the semantics for a function say that certain inputs result
> in undefined behavior, it does not matter if C-level undefined behavior
> occurs within that function for those inputs.

Sure.

Well, before I go and file the bugs, perhaps it's worth skimming through 
the list. It's mainly invalid pointer arithmetic. GNU C seems not to 
deviate from ISO C in this topic. Should the issues be grouped by type, 
by file, by subdir or in some other way?

Pointer wrapping
----------------

Likely to be miscompiled in the future. Already reported:

https://sourceware.org/bugzilla/show_bug.cgi?id=19411
https://sourceware.org/bugzilla/show_bug.cgi?id=19412
https://sourceware.org/bugzilla/show_bug.cgi?id=19413

Pointers pointing past an array
-------------------------------

I guess there are a lot of these but searching for them is not easily 
automated so the list below is probably just the tip of the iceberg and 
maybe not even accurate.

Sometimes it could be proved that the amount of "overflow" is small and, 
thus, it should be safe in practice (given that an object usually could 
not be allocated near the end of the address space).

https://sourceware.org/git/?p=glibc.git;a=blob;f=resolv/ns_name.c;h=f355cf34443c46d091157ed06f4ef8487214bf10;hb=HEAD#l548
https://sourceware.org/git/?p=glibc.git;a=blob;f=nscd/nscd_getai.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l113
https://sourceware.org/git/?p=glibc.git;a=blob;f=nscd/nscd_getgr_r.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l240
https://sourceware.org/git/?p=glibc.git;a=blob;f=nscd/nscd_gethst_r.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l215
https://sourceware.org/git/?p=glibc.git;a=blob;f=nscd/nscd_gethst_r.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l370
https://sourceware.org/git/?p=glibc.git;a=blob;f=nscd/nscd_getpw_r.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l168
https://sourceware.org/git/?p=glibc.git;a=blob;f=nscd/nscd_getserv_r.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l139
https://sourceware.org/git/?p=glibc.git;a=blob;f=nscd/nscd_getserv_r.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l292
https://sourceware.org/git/?p=glibc.git;a=blob;f=nscd/nscd_initgroups.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l73

Pointers pointing before an array
---------------------------------

Should be safe in practice?

https://sourceware.org/git/?p=glibc.git;a=blob;f=wcsmbs/wcsncat.c;h=9b22764ca13b16a96b6f32636072203658950fdd;hb=HEAD#l39
https://sourceware.org/git/?p=glibc.git;a=blob;f=wcsmbs/wcsncpy.c;h=cd90024e4410cf06cc1083ec10d6f677f18f3e92;hb=HEAD#l32
https://sourceware.org/git/?p=glibc.git;a=blob;f=string/memcmp.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l131
https://sourceware.org/git/?p=glibc.git;a=blob;f=string/wordcopy.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l42
etc.

Capping pointers at the end of the address space
------------------------------------------------

Some functions in C11 have a parameter which limits a number of 
characters written. IMHO it's clear that C11 doesn't intend this 
parameter to be a real size of output buffer and, thus, it could validly 
be SIZE_MAX. But we have seen differing opinions in the recent thread 
about strlcpy. Perhaps you can clear the question.

Anyway, glibc inner parts use a pointer to the end of the buffer and 
upper-level functions have to check for pointer wrapping and to cap this 
pointer at the end of the address space.

1. Functions like snprintf cap the pointer in _IO_str_init_static_internal.

2. Functions like wcsrtombs have to cap __outbufend in struct 
__gconv_step_data.

The capping itself is not UB (if done right) but later pointer 
comparisons with the resulting pointer are UB. Probably safe in practice 
as it's similar to [7/15] in 
https://gcc.gnu.org/ml/gcc/2015-04/msg00325.html .

API change required? :-)

Then, there are less important cases:

https://sourceware.org/git/?p=glibc.git;a=blob;f=nscd/nscd_gethst_r.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l169
https://sourceware.org/git/?p=glibc.git;a=blob;f=nscd/nscd_getai.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l68
https://sourceware.org/git/?p=glibc.git;a=blob;f=nscd/nscd_getgr_r.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l105
https://sourceware.org/git/?p=glibc.git;a=blob;f=nscd/nscd_getpw_r.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l100
https://sourceware.org/git/?p=glibc.git;a=blob;f=nscd/nscd_getserv_r.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l113

and probably others.

Subtracting NULL from a pointer
-------------------------------

Plausible optimization: this is unconditional UB, hence mark the code as 
unreachable and remove everything around.

https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/sigstack.c;h=b134d8e5b9655aec4afea961e1eb63f558be62c1;hb=HEAD#l45
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/posix/profil.c;h=23e601c6c2d5a677a778df65c9088f34ff1b7a04;hb=HEAD#l40
35 other cases are in the attached subtract-null.patch (generated).

Structs at NULL
---------------

Plausible optimization: this is unconditional UB, hence mark the code as 
unreachable and remove everything around.

https://sourceware.org/git/?p=glibc.git;a=blob;f=argp/argp-parse.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l81
https://sourceware.org/git/?p=glibc.git;a=blob;f=include/list.h;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l94
https://sourceware.org/git/?p=glibc.git;a=blob;f=intl/bindtextdom.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l44
https://sourceware.org/git/?p=glibc.git;a=blob;f=intl/dcigettext.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l120
https://sourceware.org/git/?p=glibc.git;a=blob;f=intl/dcigettext.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l125
https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl_db/db_info.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l76
https://sourceware.org/git/?p=glibc.git;a=blob;f=posix/glob.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l151
https://sourceware.org/git/?p=glibc.git;a=blob;f=resolv/nss_dns/dns-network.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l241
https://sourceware.org/git/?p=glibc.git;a=blob;f=stdio-common/reg-modifier.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l58
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/ia64/sys/ucontext.h;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l39

Pointers used after free (without dereferencing)
------------------------------------------------

https://sourceware.org/git/?p=glibc.git;a=blob;f=libio/fileops.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l1063
(IIUC conditions "fp->_IO_read_base != NULL" and "!_IO_in_backup (fp)" 
should be swapped.)

https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/tst-obstack.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l25
https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/tst-malloc-backtrace.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l30
https://sourceware.org/git/?p=glibc.git;a=blob;f=io/pwd.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l41

Memory allocations without check
--------------------------------

Not UB but while we are at it...

AIUI the policy in glibc is to check results of malloc no matter how 
small the requested amount of memory is. Right?

So mallocs like in https://sourceware.org/bugzilla/show_bug.cgi?id=19416 
should be reported?

-- 
Alexander Cherepanov

[-- Attachment #2: subtract-null.patch --]
[-- Type: text/x-patch, Size: 10849 bytes --]

diff -u -p a/nss/nss_files/files-alias.c b/nss/nss_files/files-alias.c
--- a/nss/nss_files/files-alias.c
+++ b/nss/nss_files/files-alias.c
@@ -338,7 +338,7 @@ get_next_alias (FILE *stream, const char
 		      /* Adjust the pointer so it is aligned for
 			 storing pointers.  */
 		      first_unused += __alignof__ (char *) - 1;
-		      first_unused -= ((first_unused - (char *) 0)
+		      first_unused -= ((uintptr_t)first_unused
 				       % __alignof__ (char *));
 		      result->alias_members = (char **) first_unused;
 
diff -u -p a/nss/nss_files/files-hosts.c b/nss/nss_files/files-hosts.c
--- a/nss/nss_files/files-hosts.c
+++ b/nss/nss_files/files-hosts.c
@@ -211,7 +211,7 @@ _nss_files_gethostbyname3_r (const char
 		    }
 
 		  /* Make sure bufferend is aligned.  */
-		  assert ((bufferend - (char *) 0) % sizeof (char *) == 0);
+		  assert ((uintptr_t)bufferend % sizeof (char *) == 0);
 
 		  /* Now we can check whether the buffer is large enough.
 		     16 is the maximal size of the IP address.  */
@@ -264,7 +264,7 @@ _nss_files_gethostbyname3_r (const char
 
 		  /* Round up the buffer end address.  */
 		  bufferend += (sizeof (char *)
-				- ((bufferend - (char *) 0)
+				- ((uintptr_t)bufferend
 				   % sizeof (char *))) % sizeof (char *);
 
 		  /* Now the new address.  */
diff -u -p a/nss/nss_files/files-parse.c b/nss/nss_files/files-parse.c
--- a/nss/nss_files/files-parse.c
+++ b/nss/nss_files/files-parse.c
@@ -243,7 +243,7 @@ parse_list (char **linep, char *eol, cha
 
   /* Adjust the pointer so it is aligned for storing pointers.  */
   eol += __alignof__ (char *) - 1;
-  eol -= (eol - (char *) 0) % __alignof__ (char *);
+  eol -= (uintptr_t)eol % __alignof__ (char *);
   /* We will start the storage here for the vector of pointers.  */
   list = (char **) eol;
 
diff -u -p a/nis/nss_nisplus/nisplus-rpc.c b/nis/nss_nisplus/nisplus-rpc.c
--- a/nis/nss_nisplus/nisplus-rpc.c
+++ b/nis/nss_nisplus/nisplus-rpc.c
@@ -96,7 +96,7 @@ _nss_nisplus_parse_rpcent (nis_result *r
   /* Adjust the pointer so it is aligned for
      storing pointers.  */
   size_t adjust = ((__alignof__ (char *)
-		    - (first_unused - (char *) 0) % __alignof__ (char *))
+		    - (uintptr_t)first_unused % __alignof__ (char *))
 		   % __alignof__ (char *));
   if (room_left < adjust + sizeof (char *))
     goto no_more_room;
diff -u -p a/nis/nss_nisplus/nisplus-network.c b/nis/nss_nisplus/nisplus-network.c
--- a/nis/nss_nisplus/nisplus-network.c
+++ b/nis/nss_nisplus/nisplus-network.c
@@ -100,7 +100,7 @@ _nss_nisplus_parse_netent (nis_result *r
   /* Adjust the pointer so it is aligned for
      storing pointers.  */
   size_t adjust = ((__alignof__ (char *)
-		    - (first_unused - (char *) 0) % __alignof__ (char *))
+		    - (uintptr_t)first_unused % __alignof__ (char *))
 		   % __alignof__ (char *));
   if (room_left < adjust + sizeof (char *))
     goto no_more_room;
diff -u -p a/nis/nss_nisplus/nisplus-alias.c b/nis/nss_nisplus/nisplus-alias.c
--- a/nis/nss_nisplus/nisplus-alias.c
+++ b/nis/nss_nisplus/nisplus-alias.c
@@ -121,7 +121,7 @@ _nss_nisplus_parse_aliasent (nis_result
   /* Adjust the pointer so it is aligned for
      storing pointers.  */
   size_t adjust = ((__alignof__ (char *)
-		    - (first_unused - (char *) 0) % __alignof__ (char *))
+		    - (uintptr_t)first_unused % __alignof__ (char *))
 		   % __alignof__ (char *));
   if (room_left < adjust)
     goto no_more_room;
diff -u -p a/nis/nss_nisplus/nisplus-service.c b/nis/nss_nisplus/nisplus-service.c
--- a/nis/nss_nisplus/nisplus-service.c
+++ b/nis/nss_nisplus/nisplus-service.c
@@ -103,7 +103,7 @@ _nss_nisplus_parse_servent (nis_result *
   /* Adjust the pointer so it is aligned for
      storing pointers.  */
   size_t adjust = ((__alignof__ (char *)
-		    - (first_unused - (char *) 0) % __alignof__ (char *))
+		    - (uintptr_t)first_unused % __alignof__ (char *))
 		   % __alignof__ (char *));
   if (room_left < adjust + sizeof (char *))
     goto no_more_room;
diff -u -p a/nis/nss_nisplus/nisplus-hosts.c b/nis/nss_nisplus/nisplus-hosts.c
--- a/nis/nss_nisplus/nisplus-hosts.c
+++ b/nis/nss_nisplus/nisplus-hosts.c
@@ -142,7 +142,7 @@ _nss_nisplus_parse_hostent (nis_result *
   /* Adjust the pointer so it is aligned for
      storing pointers.  */
   size_t adjust = ((__alignof__ (char *)
-		    - (first_unused - (char *) 0) % __alignof__ (char *))
+		    - (uintptr_t)first_unused % __alignof__ (char *))
 		   % __alignof__ (char *));
   if (room_left < adjust + 3 * sizeof (char *))
     goto no_more_room;
diff -u -p a/nis/nss_nisplus/nisplus-parser.c b/nis/nss_nisplus/nisplus-parser.c
--- a/nis/nss_nisplus/nisplus-parser.c
+++ b/nis/nss_nisplus/nisplus-parser.c
@@ -218,7 +218,7 @@ _nss_nisplus_parse_grent (nis_result *re
   /* Adjust the pointer so it is aligned for
      storing pointers.  */
   size_t adjust = ((__alignof__ (char *)
-		    - (first_unused - (char *) 0) % __alignof__ (char *))
+		    - (uintptr_t)first_unused % __alignof__ (char *))
 		   % __alignof__ (char *));
   if (room_left < adjust)
     goto no_more_room;
diff -u -p a/nis/nss_nisplus/nisplus-proto.c b/nis/nss_nisplus/nisplus-proto.c
--- a/nis/nss_nisplus/nisplus-proto.c
+++ b/nis/nss_nisplus/nisplus-proto.c
@@ -97,7 +97,7 @@ _nss_nisplus_parse_protoent (nis_result
   /* Adjust the pointer so it is aligned for
      storing pointers.  */
   size_t adjust = ((__alignof__ (char *)
-		    - (first_unused - (char *) 0) % __alignof__ (char *))
+		    - (uintptr_t)first_unused % __alignof__ (char *))
 		   % __alignof__ (char *));
   if (room_left < adjust + sizeof (char *))
     goto no_more_room;
diff -u -p a/nis/nss_nis/nis-alias.c b/nis/nss_nis/nis-alias.c
--- a/nis/nss_nis/nis-alias.c
+++ b/nis/nss_nis/nis-alias.c
@@ -67,7 +67,7 @@ _nss_nis_parse_aliasent (const char *key
   /* Adjust the pointer so it is aligned for
      storing pointers.  */
   first_unused += __alignof__ (char *) - 1;
-  first_unused -= ((first_unused - (char *) 0) % __alignof__ (char *));
+  first_unused -= ((uintptr_t)first_unused % __alignof__ (char *));
   result->alias_members = (char **) first_unused;
 
   line = alias;
diff -u -p a/crypt/sha256-crypt.c b/crypt/sha256-crypt.c
--- a/crypt/sha256-crypt.c
+++ b/crypt/sha256-crypt.c
@@ -143,7 +143,7 @@ __sha256_crypt_r (const char *key, const
   salt_len = MIN (strcspn (salt, "$"), SALT_LEN_MAX);
   key_len = strlen (key);
 
-  if ((key - (char *) 0) % __alignof__ (uint32_t) != 0)
+  if ((uintptr_t)key % __alignof__ (uint32_t) != 0)
     {
       char *tmp;
 
@@ -158,20 +158,20 @@ __sha256_crypt_r (const char *key, const
 
       key = copied_key =
 	memcpy (tmp + __alignof__ (uint32_t)
-		- (tmp - (char *) 0) % __alignof__ (uint32_t),
+		- (uintptr_t)tmp % __alignof__ (uint32_t),
 		key, key_len);
-      assert ((key - (char *) 0) % __alignof__ (uint32_t) == 0);
+      assert ((uintptr_t)key % __alignof__ (uint32_t) == 0);
     }
 
-  if ((salt - (char *) 0) % __alignof__ (uint32_t) != 0)
+  if ((uintptr_t)salt % __alignof__ (uint32_t) != 0)
     {
       char *tmp = (char *) alloca (salt_len + __alignof__ (uint32_t));
       alloca_used += salt_len + __alignof__ (uint32_t);
       salt = copied_salt =
 	memcpy (tmp + __alignof__ (uint32_t)
-		- (tmp - (char *) 0) % __alignof__ (uint32_t),
+		- (uintptr_t)tmp % __alignof__ (uint32_t),
 		salt, salt_len);
-      assert ((salt - (char *) 0) % __alignof__ (uint32_t) == 0);
+      assert ((uintptr_t)salt % __alignof__ (uint32_t) == 0);
     }
 
 #ifdef USE_NSS
diff -u -p a/crypt/sha512-crypt.c b/crypt/sha512-crypt.c
--- a/crypt/sha512-crypt.c
+++ b/crypt/sha512-crypt.c
@@ -143,7 +143,7 @@ __sha512_crypt_r (const char *key, const
   salt_len = MIN (strcspn (salt, "$"), SALT_LEN_MAX);
   key_len = strlen (key);
 
-  if ((key - (char *) 0) % __alignof__ (uint64_t) != 0)
+  if ((uintptr_t)key % __alignof__ (uint64_t) != 0)
     {
       char *tmp;
 
@@ -158,19 +158,19 @@ __sha512_crypt_r (const char *key, const
 
       key = copied_key =
 	memcpy (tmp + __alignof__ (uint64_t)
-		- (tmp - (char *) 0) % __alignof__ (uint64_t),
+		- (uintptr_t)tmp % __alignof__ (uint64_t),
 		key, key_len);
-      assert ((key - (char *) 0) % __alignof__ (uint64_t) == 0);
+      assert ((uintptr_t)key % __alignof__ (uint64_t) == 0);
     }
 
-  if ((salt - (char *) 0) % __alignof__ (uint64_t) != 0)
+  if ((uintptr_t)salt % __alignof__ (uint64_t) != 0)
     {
       char *tmp = (char *) alloca (salt_len + __alignof__ (uint64_t));
       salt = copied_salt =
 	memcpy (tmp + __alignof__ (uint64_t)
-		- (tmp - (char *) 0) % __alignof__ (uint64_t),
+		- (uintptr_t)tmp % __alignof__ (uint64_t),
 		salt, salt_len);
-      assert ((salt - (char *) 0) % __alignof__ (uint64_t) == 0);
+      assert ((uintptr_t)salt % __alignof__ (uint64_t) == 0);
     }
 
 #ifdef USE_NSS
diff -u -p a/crypt/md5-crypt.c b/crypt/md5-crypt.c
--- a/crypt/md5-crypt.c
+++ b/crypt/md5-crypt.c
@@ -111,7 +111,7 @@ __md5_crypt_r (const char *key, const ch
   salt_len = MIN (strcspn (salt, "$"), 8);
   key_len = strlen (key);
 
-  if ((key - (char *) 0) % __alignof__ (md5_uint32) != 0)
+  if ((uintptr_t)key % __alignof__ (md5_uint32) != 0)
     {
       char *tmp;
 
@@ -126,19 +126,19 @@ __md5_crypt_r (const char *key, const ch
 
       key = copied_key =
 	memcpy (tmp + __alignof__ (md5_uint32)
-		- (tmp - (char *) 0) % __alignof__ (md5_uint32),
+		- (uintptr_t)tmp % __alignof__ (md5_uint32),
 		key, key_len);
-      assert ((key - (char *) 0) % __alignof__ (md5_uint32) == 0);
+      assert ((uintptr_t)key % __alignof__ (md5_uint32) == 0);
     }
 
-  if ((salt - (char *) 0) % __alignof__ (md5_uint32) != 0)
+  if ((uintptr_t)salt % __alignof__ (md5_uint32) != 0)
     {
       char *tmp = (char *) alloca (salt_len + __alignof__ (md5_uint32));
       salt = copied_salt =
 	memcpy (tmp + __alignof__ (md5_uint32)
-		- (tmp - (char *) 0) % __alignof__ (md5_uint32),
+		- (uintptr_t)tmp % __alignof__ (md5_uint32),
 		salt, salt_len);
-      assert ((salt - (char *) 0) % __alignof__ (md5_uint32) == 0);
+      assert ((uintptr_t)salt % __alignof__ (md5_uint32) == 0);
     }
 
 #ifdef USE_NSS
diff -u -p a/stdlib/msort.c b/stdlib/msort.c
--- a/stdlib/msort.c
+++ b/stdlib/msort.c
@@ -282,15 +282,15 @@ __qsort_r (void *b, size_t n, size_t s,
   else
     {
       if ((s & (sizeof (uint32_t) - 1)) == 0
-	  && ((char *) b - (char *) 0) % __alignof__ (uint32_t) == 0)
+	  && (uintptr_t)b % __alignof__ (uint32_t) == 0)
 	{
 	  if (s == sizeof (uint32_t))
 	    p.var = 0;
 	  else if (s == sizeof (uint64_t)
-		   && ((char *) b - (char *) 0) % __alignof__ (uint64_t) == 0)
+		   && (uintptr_t)b % __alignof__ (uint64_t) == 0)
 	    p.var = 1;
 	  else if ((s & (sizeof (unsigned long) - 1)) == 0
-		   && ((char *) b - (char *) 0)
+		   && (uintptr_t)b
 		      % __alignof__ (unsigned long) == 0)
 	    p.var = 2;
 	}

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Undefined behavior in glibc -- was: Re: [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-02-03 12:29                         ` Undefined behavior in glibc -- was: " Alexander Cherepanov
@ 2016-02-03 13:07                           ` Joseph Myers
  2016-02-05 18:24                             ` Undefined behavior in glibc Alexander Cherepanov
  0 siblings, 1 reply; 34+ messages in thread
From: Joseph Myers @ 2016-02-03 13:07 UTC (permalink / raw)
  To: Alexander Cherepanov; +Cc: Dwight Guth, libc-alpha

On Wed, 3 Feb 2016, Alexander Cherepanov wrote:

> But those parts which cannot be implemented in ISO C would be implemented in
> asm? AIUI there is no much magic in compiling C parts of glibc so they should

No, not necessarily.  Written in GNU C with a good understanding of what 
transformations are actually possible given the code visible to the 
compiler and the rules of GNU C.

> > If something involves undefined behavior in GNU C,
> 
> GNU C is ISO C (whatever it means:-) + GCC implementation-defined behavior as
> described in [1] + GCC extensions[2], right?

Plus undocumented features and a good understanding of aspects of how C 
relates to the underlying machines (that is, you cannot assume e.g. that a 
C addition corresponds to a machine add instruction, but you can assume 
that memory is made of pages and that if it's not visible to the compiler 
how memory was allocated, having particular access to any byte in a page 
means having such access to all bytes in that page, subject to avoiding 
data races on write).  Cf 
<https://www.cl.cam.ac.uk/~pes20/cerberus/notes50-2015-05-24-survey-discussion.html> 
(but the non-ISO aspects of GNU C as used in low-level system software 
such as glibc and the Linux kernel should be assumed to be much more 
wide-ranging than the issues discussed there).

> 1) strlen and other string functions read memory by chunks which is aliasing
> violation;
> 2) strlen accesses the OOB memory.

Both of those fall under deliberate use of separate compilation, i.e. it's 
not visible to the compiler what the original effective type of the memory 
was or its original size, so a whole page can be accessed provided the 
types used don't involve aliasing violations within the source visible to 
the compiler.

> > Where glibc code relies on separate compilation to avoid undefined
> > behavior, this is not a bug; use of LTO on glibc's own code is not
> > supported.  For example, where functions with different C types but
> > compatible ABIs are aliased, or a function is otherwise defined with a
> > different type from that used when it is called.  Similarly, asm may be
> > used to limit code movement, and that could e.g. mean that aliasing is not
> > undefined behavior where it would otherwise be undefined.
> 
> Not sure what you mean. Is it specific to calling functions or it's supposed
> to cover the case of strlen too?

It covers strlen.  strlen is well-defined in isolation, so can be presumed 
to be compiled to an ABI-conforming object file, and such an object file, 
if it would work for programs where it would be valid for that strlen 
implementation to be part of the program (under a different name), would 
also work for other programs that are valid with ISO C strlen.

> I don't see cleaning glibc for LTO in todo list in the wiki. Shouldn't it be
> there?

The todo list basically has ideas that one person once thought might be 
useful.  In general, you can't assume that things there have consensus, 
and nor do you need consensus to put something there.  I think 
facilitating LTO of the standard C library is a very hard whole-system 
issue, not something that could be considered for glibc in isolation; 
you'd probably need several different forms of source annotation that act 
as barriers to particular forms of LTO, saying that calls to a function 
can only be optimized with reference to its semantics and not with 
reference to the contents of a particular implementation of it.

> Well, before I go and file the bugs, perhaps it's worth skimming through the
> list. It's mainly invalid pointer arithmetic. GNU C seems not to deviate from
> ISO C in this topic. Should the issues be grouped by type, by file, by subdir
> or in some other way?

I think such cases should generally be considered bugs except for limited 
cases such as string functions that deliberately work in terms of pages or 
aligned units smaller than pages, and so may go slightly outside the 
original object (before or after) but obviously cannot wrap around the 
address space to affect the results of comparisons.

The natural division is based on whether it might make sense to fix two 
issues separately - so any issues in parts of glibc for which different 
people might have expertise, or that differ in how clear it is that there 
is a problem or how clear it is what the correct fix would be, should be 
filed separately.  Of course, it's even better if fixes (following the 
contribution checklist) for such issues are submitted as well.

> Capping pointers at the end of the address space
> ------------------------------------------------
> 
> Some functions in C11 have a parameter which limits a number of characters
> written. IMHO it's clear that C11 doesn't intend this parameter to be a real
> size of output buffer and, thus, it could validly be SIZE_MAX. But we have
> seen differing opinions in the recent thread about strlcpy. Perhaps you can
> clear the question.

Given the dispute in the Austin Group over snprintf, a C11 DR is clearly 
needed.

> Structs at NULL
> ---------------
> 
> Plausible optimization: this is unconditional UB, hence mark the code as
> unreachable and remove everything around.

I think such offsetof-type code should be considered valid GNU C, although 
it would still be better to use offsetof.  At least some are inside 
"#ifndef offsetof" (some of these examples are code shared with gnulib or 
gettext) and so obviously not bugs in glibc (given that <stddef.h> is 
included).

> Pointers used after free (without dereferencing)
> ------------------------------------------------
> 
> https://sourceware.org/git/?p=glibc.git;a=blob;f=libio/fileops.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l1063
> (IIUC conditions "fp->_IO_read_base != NULL" and "!_IO_in_backup (fp)" should
> be swapped.)
> 
> https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/tst-obstack.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l25
> https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/tst-malloc-backtrace.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l30
> https://sourceware.org/git/?p=glibc.git;a=blob;f=io/pwd.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l41

I'd say such comparisons should be avoided, but that's more a cleanup than 
a bug fix.

> Memory allocations without check
> --------------------------------
> 
> Not UB but while we are at it...
> 
> AIUI the policy in glibc is to check results of malloc no matter how small the
> requested amount of memory is. Right?

Yes.

> So mallocs like in https://sourceware.org/bugzilla/show_bug.cgi?id=19416
> should be reported?

Yes.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Undefined behavior in glibc
  2016-02-03 13:07                           ` Joseph Myers
@ 2016-02-05 18:24                             ` Alexander Cherepanov
  2016-02-05 18:39                               ` Joseph Myers
  0 siblings, 1 reply; 34+ messages in thread
From: Alexander Cherepanov @ 2016-02-05 18:24 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Dwight Guth, libc-alpha

On 2016-02-03 16:07, Joseph Myers wrote:
>>> If something involves undefined behavior in GNU C,
>>
>> GNU C is ISO C (whatever it means:-) + GCC implementation-defined behavior as
>> described in [1] + GCC extensions[2], right?
>
> Plus undocumented features

The moment you say "undocumented" everything falls apart. One cannot be 
sure whether some observed behavior is an undocumented feature or 
something that will be broken by the next gcc release.

> and a good understanding of aspects of how C
> relates to the underlying machines (that is, you cannot assume e.g. that a
> C addition corresponds to a machine add instruction, but you can assume
> that memory is made of pages and that if it's not visible to the compiler
> how memory was allocated, having particular access to any byte in a page
> means having such access to all bytes in that page, subject to avoiding
> data races on write).

1. This rules out all modern hardening/debugging solutions like 
Valgrind, ASan, Intel MPX, CHERI (don't know if the latter is relevant 
for glibc).

2. You seem to be concerned with crashes only but there are other 
problems too, e.g., info leaks (think Heartbleed). Saying that the code 
is free to access any accessible memory means that one cannot catch info 
leaks automatically.

> Cf
> <https://www.cl.cam.ac.uk/~pes20/cerberus/notes50-2015-05-24-survey-discussion.html>
> (but the non-ISO aspects of GNU C as used in low-level system software
> such as glibc and the Linux kernel should be assumed to be much more
> wide-ranging than the issues discussed there).

Yes, I have seen it. I have even linked to your reply to this survey in 
my last email.

>> 1) strlen and other string functions read memory by chunks which is aliasing
>> violation;
>> 2) strlen accesses the OOB memory.
>
> Both of those fall under deliberate use of separate compilation, i.e. it's
> not visible to the compiler what the original effective type of the memory
> was or its original size, so a whole page can be accessed provided the
> types used don't involve aliasing violations within the source visible to
> the compiler.

I thought that this falls under "there is no plausible optimization [due 
to the info not available to a compiler]". Well, ok.

Then let's look at strlen (&co.) from another angle. It starts accessing 
its parameter as chars and then continues accessing it as longs, like this:

   ...
   char *p = ...;
   *p++;
   *(long *)p;

Is there a situation where this is valid in GNU C? For ISO C the 
question seems quite moot but I think Question 16 in DR 017 plus DR 260 
say that it's not valid. Maybe there are no restrictions in GNU C on 
intra-object pointer arithmetic / accesses at all (apart from aliasing 
rules)?

>> I don't see cleaning glibc for LTO in todo list in the wiki. Shouldn't it be
>> there?
>
> The todo list basically has ideas that one person once thought might be
> useful.  In general, you can't assume that things there have consensus,
> and nor do you need consensus to put something there.  I think
> facilitating LTO of the standard C library is a very hard whole-system
> issue, not something that could be considered for glibc in isolation;
> you'd probably need several different forms of source annotation that act
> as barriers to particular forms of LTO, saying that calls to a function
> can only be optimized with reference to its semantics and not with
> reference to the contents of a particular implementation of it.

Isn't noinline+noclone enough for this right now?

>> Well, before I go and file the bugs, perhaps it's worth skimming through the
>> list. It's mainly invalid pointer arithmetic. GNU C seems not to deviate from
>> ISO C in this topic. Should the issues be grouped by type, by file, by subdir
>> or in some other way?
>
> I think such cases should generally be considered bugs except for limited
> cases such as string functions that deliberately work in terms of pages or
> aligned units smaller than pages, and so may go slightly outside the
> original object (before or after) but obviously cannot wrap around the
> address space to affect the results of comparisons.

Ok

> The natural division is based on whether it might make sense to fix two
> issues separately - so any issues in parts of glibc for which different
> people might have expertise, or that differ in how clear it is that there
> is a problem or how clear it is what the correct fix would be, should be
> filed separately.  Of course, it's even better if fixes (following the
> contribution checklist) for such issues are submitted as well.

Ok

>> Capping pointers at the end of the address space
>> ------------------------------------------------
>>
>> Some functions in C11 have a parameter which limits a number of characters
>> written. IMHO it's clear that C11 doesn't intend this parameter to be a real
>> size of output buffer and, thus, it could validly be SIZE_MAX. But we have
>> seen differing opinions in the recent thread about strlcpy. Perhaps you can
>> clear the question.
>
> Given the dispute in the Austin Group over snprintf, a C11 DR is clearly
> needed.

Ok

>> Structs at NULL
>> ---------------
>>
>> Plausible optimization: this is unconditional UB, hence mark the code as
>> unreachable and remove everything around.
>
> I think such offsetof-type code should be considered valid GNU C, although

Ok

> it would still be better to use offsetof.  At least some are inside
> "#ifndef offsetof" (some of these examples are code shared with gnulib or
> gettext) and so obviously not bugs in glibc (given that <stddef.h> is
> included).

Yeah, and I managed to include some that are used to get a size of a 
field instead of an offset and so are fine even in ISO C. Have to filter 
them out in the future.

>> Pointers used after free (without dereferencing)
>> ------------------------------------------------
>>
>> https://sourceware.org/git/?p=glibc.git;a=blob;f=libio/fileops.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l1063
>> (IIUC conditions "fp->_IO_read_base != NULL" and "!_IO_in_backup (fp)" should
>> be swapped.)

FTR: this one seems to be a false. fp->_IO_read_base could be accessed 
on line 1063 after a free on line 1020 but it's set inside _IO_setg on 
line 1025.

>> https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/tst-obstack.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l25
>> https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/tst-malloc-backtrace.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l30
>> https://sourceware.org/git/?p=glibc.git;a=blob;f=io/pwd.c;hb=5163b4b76f61e361f0f4bbe3b96732b12e5c9b1a#l41
>
> I'd say such comparisons should be avoided, but that's more a cleanup than
> a bug fix.

BTW is there a formal method to tell test files from important files? I 
see .../tst-* and .../test-* files but io/pwd.c looks like a test too.

-- 
Alexander Cherepanov

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Undefined behavior in glibc
  2016-02-05 18:24                             ` Undefined behavior in glibc Alexander Cherepanov
@ 2016-02-05 18:39                               ` Joseph Myers
  2016-02-05 20:03                                 ` Alexander Cherepanov
  0 siblings, 1 reply; 34+ messages in thread
From: Joseph Myers @ 2016-02-05 18:39 UTC (permalink / raw)
  To: Alexander Cherepanov; +Cc: Dwight Guth, libc-alpha

On Fri, 5 Feb 2016, Alexander Cherepanov wrote:

> Then let's look at strlen (&co.) from another angle. It starts accessing its
> parameter as chars and then continues accessing it as longs, like this:
> 
>   ...
>   char *p = ...;
>   *p++;
>   *(long *)p;
> 
> Is there a situation where this is valid in GNU C? For ISO C the question

If the underlying object has effective type an array of longs it seems 
valid to me.

> BTW is there a formal method to tell test files from important files? I see
> .../tst-* and .../test-* files but io/pwd.c looks like a test too.

You need to look at what files are listed in the relevant makefile 
variables.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Undefined behavior in glibc
  2016-02-05 18:39                               ` Joseph Myers
@ 2016-02-05 20:03                                 ` Alexander Cherepanov
  2016-02-05 22:35                                   ` Joseph Myers
  0 siblings, 1 reply; 34+ messages in thread
From: Alexander Cherepanov @ 2016-02-05 20:03 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Dwight Guth, libc-alpha

On 05.02.2016 21:38, Joseph Myers wrote:
> On Fri, 5 Feb 2016, Alexander Cherepanov wrote:
>
>> Then let's look at strlen (&co.) from another angle. It starts accessing its
>> parameter as chars and then continues accessing it as longs, like this:
>>
>>    ...
>>    char *p = ...;
>>    *p++;
>>    *(long *)p;
>>
>> Is there a situation where this is valid in GNU C? For ISO C the question
>
> If the underlying object has effective type an array of longs it seems
> valid to me.

Ok, then there are two cases here. If you take an address of an element 
of this array and convert it to char* then you cannot go outside of this 
element -- this is the essence of Q16 in DR 017.

If you take an address of the array itself then you can access any of 
its bytes but I don't think the standard permits you to go back from 
working with chars to working with longs. Roughly speaking, the 
structure of the object is forgotten. While you stay at the beginning of 
the object you can go back -- it's a general rule: you can convert 
unchanged pointers forth and back freely (modulo alignment). But if you 
move from the beginning then you lose this freedom. The standard doesn't 
describe going from an unrelated pointer to char to a pointer to an 
(sub)object.

-- 
Alexander Cherepanov

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Undefined behavior in glibc
  2016-02-05 20:03                                 ` Alexander Cherepanov
@ 2016-02-05 22:35                                   ` Joseph Myers
  2016-02-10  3:44                                     ` Alexander Cherepanov
  0 siblings, 1 reply; 34+ messages in thread
From: Joseph Myers @ 2016-02-05 22:35 UTC (permalink / raw)
  To: Alexander Cherepanov; +Cc: Dwight Guth, libc-alpha

On Fri, 5 Feb 2016, Alexander Cherepanov wrote:

> If you take an address of the array itself then you can access any of its
> bytes but I don't think the standard permits you to go back from working with
> chars to working with longs. Roughly speaking, the structure of the object is
> forgotten. While you stay at the beginning of the object you can go back --
> it's a general rule: you can convert unchanged pointers forth and back freely
> (modulo alignment). But if you move from the beginning then you lose this
> freedom. The standard doesn't describe going from an unrelated pointer to char
> to a pointer to an (sub)object.

I think going from the pointer to char back to a pointer to long is valid 
in GNU C and in common usage C, provided you never access the same memory 
with different non-character types (other than signed/unsigned variations) 
in ways that would require a union to do without conversions between 
pointer types.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Undefined behavior in glibc
  2016-02-05 22:35                                   ` Joseph Myers
@ 2016-02-10  3:44                                     ` Alexander Cherepanov
  2016-02-10 11:42                                       ` Szabolcs Nagy
  2016-02-10 12:59                                       ` Joseph Myers
  0 siblings, 2 replies; 34+ messages in thread
From: Alexander Cherepanov @ 2016-02-10  3:44 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Dwight Guth, libc-alpha

On 2016-02-06 01:35, Joseph Myers wrote:
> On Fri, 5 Feb 2016, Alexander Cherepanov wrote:
>
>> If you take an address of the array itself then you can access any of its
>> bytes but I don't think the standard permits you to go back from working with
>> chars to working with longs. Roughly speaking, the structure of the object is
>> forgotten. While you stay at the beginning of the object you can go back --
>> it's a general rule: you can convert unchanged pointers forth and back freely
>> (modulo alignment). But if you move from the beginning then you lose this
>> freedom. The standard doesn't describe going from an unrelated pointer to char
>> to a pointer to an (sub)object.
>
> I think going from the pointer to char back to a pointer to long is valid
> in GNU C and in common usage C, provided you never access the same memory
> with different non-character types (other than signed/unsigned variations)
> in ways that would require a union to do without conversions between
> pointer types.

Not sure this plays well with other principles. AIUI (from [1], [2], [2] 
etc.) you want to retain the freedom in gcc to optimize based on Q16 in 
DR 017. But going through char* seems to permit cheating. Suppose you 
have a 2d array and a pointer to char which points to the boundary 
between two rows. What you get after converting it to a pointer to the 
type of the elements -- a pointer pointing past the end of the previous 
row or a pointer to the first element of the next row? What are the 
limits for pointer arithmetic with this pointer -- one row, two rows, 
whole array?

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41630#c8
[2] https://gcc.gnu.org/ml/gcc/2001-10/msg00820.html
[3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54136#c5

I've got a question regarding separate compilation etc. What's the 
problem with explicit_bzero then? Just compile it separately and no 
barriers are required, right?

A couple of additions to my previous emails.

On 2016-02-03 15:29, Alexander Cherepanov wrote:
 > Pointer wrapping
 > ----------------
 >
 > Likely to be miscompiled in the future. Already reported:
 >
 > https://sourceware.org/bugzilla/show_bug.cgi?id=19411
 > https://sourceware.org/bugzilla/show_bug.cgi?id=19412
 > https://sourceware.org/bugzilla/show_bug.cgi?id=19413

I should have included in the list the bug report by Pascal Cuoq:

https://sourceware.org/bugzilla/show_bug.cgi?id=19391

It motivated me to look for other cases of checks for pointer wrapping 
in glibc. There are links to it in my PRs but I think it should 
explicitly mentioned in this thread too.

On 2016-02-05 21:24, Alexander Cherepanov wrote:
 > 1. This rules out all modern hardening/debugging solutions like
 > Valgrind, ASan, Intel MPX, CHERI (don't know if the latter is relevant
 > for glibc).

I guess everybody here is familiar with ASan as used for debugging but 
perhaps it's worth mentioning that there are at least two ongoing 
projects[1][2] working on full systems using ASan (and, for the second 
one, UBSan) for hardening in production. The specific merits are yet to 
be assessed but the interest is definitely there.

[4] 
https://blog.hboeck.de/archives/879-Safer-use-of-C-code-running-Gentoo-with-Address-Sanitizer.html
[5] 
http://balintreczey.hu/blog/proposing-amd64-hardened-architecture-for-debian/

-- 
Alexander Cherepanov

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2][BZ #17979][BZ #17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-02-01  9:44 ` [PATCH v2][BZ #17979][BZ #17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler Dwight Guth
@ 2016-02-10  4:16   ` Dwight Guth
  2016-02-10 12:59     ` Joseph Myers
  2016-02-10 17:21   ` Mike Frysinger
  1 sibling, 1 reply; 34+ messages in thread
From: Dwight Guth @ 2016-02-10  4:16 UTC (permalink / raw)
  To: Dwight Guth; +Cc: libc-alpha

Pinging this because I didn't get a response.

On Mon, February 1, 2016 3:44 am, Dwight Guth wrote:
> If linking against Glibc with a compiler for which the __GNUC__ macro is
> not defined, problems arise when including header files that use the
> __restrict
> or __inline keyword and when including uchar.h.
>
> Glibc strips __restrict from the prototypes of C library functions in
> this case. This is undesirable if the compiler is a C99-compliant
> compiler, because C99 includes the restrict keyword and uses it in the
> declaration of a number of functions in the C library. While this is
> broadly correct, lack of this information can have undesirable effects on
> analysis tools. The same thing occurs with the __inline keyword, which is
> also defined in C99 but stripped if the compiler is not GNU C. Here we
> except the case where the compiler declares itself to be C99-compliant
> from these checks in order to allow better C99 compliance for non-GNU-C
> compilers which link against Glibc.
>
> Glibc defines char16_t and char32_t in uchar.h as __CHAR16_TYPE__ and
> __CHAR32_TYPE__ when the __GNUC__ macro is defined, but when linking
> against Glibc with a different compiler, these types are not defined at
> all, which is a violation of C11 sec. 7.28 paragraph 2, as well as a
> syntax error because these types are used in the prototypes of functions
> declared later in the file. According to this section of the standard,
> these types must be defined in this header file and must be the same type
> as uint_least16_t and uint_least32_t, which are defined in stdint.h as
> "unsigned short int" and "unsigned int" respectively. Here we modify the
> header so that if __GNUC__ is not defined, we still provide these
> typedefs, but we obtain them from bits/stdint.h, where they are defined to
> be equal to uint_least16_t and uint_least32_t, if __CHAR16_TYPE__ and
> __CHAR32_TYPE__
> are not defined by the compiler.
>
> ---
>
>
> I had trouble testing this patch because I ran into unrelated errors in
> the test suite. No tests seem to fail on my machine after the patch that
> were passing before, however. It seems to happen because it can't
> dynamically link against libgcc_s, but I'm not 100% sure what the correct
> way to pass that to make check is. Can someone point me to where this
> documentation is?
>
> ---
>
>
> Changed from previous version:
>
>
> * Updated patch description based on comments on mailing list.
> * Added a bits/stdint.h which defines __uint_least16_t and
> __uint_least32_t
> and which is included from stdint.h and uchar.h.
>
> I didn't move stdint.h out of sysdeps because I wasn't sure how
> controversial that would be, but I can do that in a third version of the
> patch if need be. I was under impression that the issue was that uchar.h
> was outside sysdeps and could not directly assume a particular version of
> stdint.h, but that assuming that whatever particular version of
> bits/stdint.h you had would define a certain type, and including that
> header from outside sysdeps should be fine. If this is not the case,
> apologies, just let me know.
>
> ---
>
>
> 2016-01-28  Dwight Guth  <dwight.guth@runtimeverification.com>
>
>
> [BZ #17979]
> * wcsmbs/uchar.h (char16_t, char32_t): Define types if __GNUC__,
> __CHAR16_TYPE__, or __CHAR32_TYPE__ are not defined.
> * sysdeps/generic/bits/stdint.h: Created.
> * sysdeps/generic/stdint.h (uint_least16_t, uint_least32_t): Defined
> based on types from bits/stdint.h.
>
> [BZ #17721]
> * misc/sys/cdefs.h (__restrict, __inline): Define as keywords if
> __GNUC__ is not defined but __STDC_VERSION__ is at least C99.
>
>
> ---
>
>
> diff --git a/misc/sys/cdefs.h b/misc/sys/cdefs.h index 7fd4154..af23ff7
> 100644
> --- a/misc/sys/cdefs.h
> +++ b/misc/sys/cdefs.h
> @@ -69,8 +69,11 @@
>
>
> #else	/* Not GCC.  */
>
>
> -# define __inline		/* No inline functions.  */
> -
> +# if defined __STDC_VERSION__ && __STDC_VERSION__ >= 199901L
> +#  define __inline		inline
> +# else
> +#  define __inline		/* No inline functions.  */
> +# endif
> # define __THROW
> # define __THROWNL
> # define __NTH(fct)	fct
> @@ -360,7 +363,11 @@
>
>
> /* __restrict is known in EGCS 1.2 and above. */
> #if !__GNUC_PREREQ (2,92)
> -# define __restrict	/* Ignore */
> +# if !defined __GNUC__ && defined __STDC_VERSION__ && __STDC_VERSION__ >=
>  199901L
> +#  define __restrict	restrict
> +# else
> +#  define __restrict	/* Ignore */
> +# endif
> #endif
>
>
> /* ISO C99 also allows to declare arrays as non-overlapping.  The syntax
> is diff --git a/sysdeps/generic/bits/stdint.h
> b/sysdeps/generic/bits/stdint.h new file mode 100644 index 0000000..ee63a2a
>  --- /dev/null
> +++ b/sysdeps/generic/bits/stdint.h
> @@ -0,0 +1,29 @@
> +/* uint_least16_t and uint_least32_t definitions
> +   Copyright (C) 2016 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _BITS_STDINT_H
> +#define _BITS_STDINT_H	1
> +
> +/* uint_least16_t and char16_t must be the same type, as must
> +   uint_least32_t and char32_t. We define here a reserved identifier
> +   for these types so that they will always remain the same type. */
> +
> +typedef unsigned short int	__uint_least16_t;
> +typedef unsigned int		__uint_least32_t;
> +
> +#endif	/* bits/stdint.h */
> diff --git a/sysdeps/generic/stdint.h b/sysdeps/generic/stdint.h index
> 4427627..2f4ebd2 100644
> --- a/sysdeps/generic/stdint.h
> +++ b/sysdeps/generic/stdint.h
> @@ -25,6 +25,7 @@
> #include <features.h>
> #include <bits/wchar.h>
> #include <bits/wordsize.h>
> +#include <bits/stdint.h>
>
>
> /* Exact integral types.  */
>
>
> @@ -74,8 +75,8 @@ typedef long long int		int_least64_t;
>
>
> /* Unsigned.  */
> typedef unsigned char		uint_least8_t; -typedef unsigned short int
> uint_least16_t; -typedef unsigned int		uint_least32_t;
> +typedef __uint_least16_t	uint_least16_t;
> +typedef __uint_least32_t	uint_least32_t;
> #if __WORDSIZE == 64
> typedef unsigned long int	uint_least64_t; #else
> diff --git a/wcsmbs/uchar.h b/wcsmbs/uchar.h index ce92b25..875919a 100644
> --- a/wcsmbs/uchar.h
> +++ b/wcsmbs/uchar.h
> @@ -30,6 +30,8 @@
> #define __need_mbstate_t
> #include <wchar.h>
>
>
> +#include <bits/stdint.h>
> +
> #ifndef __mbstate_t_defined
> __BEGIN_NAMESPACE_C99
> /* Public type.  */
> @@ -39,14 +41,16 @@ __END_NAMESPACE_C99
> #endif
>
>
>
> -#if defined __GNUC__ && !defined __USE_ISOCXX11
> +#if !defined __USE_ISOCXX11
> /* Define the 16-bit and 32-bit character types.  Use the information
> provided by the compiler.  */ # if !defined __CHAR16_TYPE__ || !defined
> __CHAR32_TYPE__
> #  if defined __STDC_VERSION__ && __STDC_VERSION__ < 201000L
> #   error "<uchar.h> requires ISO C11 mode"
> #  else
> -#   error "definitions of __CHAR16_TYPE__ and/or __CHAR32_TYPE__ missing"
>  +/* Same as uint_least16_t and uint_least32_t in stdint.h. */
> +typedef __uint_least16_t __CHAR16_TYPE__;
> +typedef __uint_least32_t __CHAR32_TYPE__;
> #  endif
> # endif
> typedef __CHAR16_TYPE__ char16_t;
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Undefined behavior in glibc
  2016-02-10  3:44                                     ` Alexander Cherepanov
@ 2016-02-10 11:42                                       ` Szabolcs Nagy
  2016-02-10 12:59                                       ` Joseph Myers
  1 sibling, 0 replies; 34+ messages in thread
From: Szabolcs Nagy @ 2016-02-10 11:42 UTC (permalink / raw)
  To: Alexander Cherepanov, Joseph Myers; +Cc: Dwight Guth, libc-alpha, nd

On 10/02/16 03:44, Alexander Cherepanov wrote:
> On 2016-02-05 21:24, Alexander Cherepanov wrote:
>> 1. This rules out all modern hardening/debugging solutions like
>> Valgrind, ASan, Intel MPX, CHERI (don't know if the latter is relevant
>> for glibc).
> 
> I guess everybody here is familiar with ASan as used for debugging but perhaps it's worth mentioning that there
> are at least two ongoing projects[1][2] working on full systems using ASan (and, for the second one, UBSan) for
> hardening in production. The specific merits are yet to be assessed but the interest is definitely there.
> 
> [4] https://blog.hboeck.de/archives/879-Safer-use-of-C-code-running-Gentoo-with-Address-Sanitizer.html
> [5] http://balintreczey.hu/blog/proposing-amd64-hardened-architecture-for-debian/
> 

well, address sanitizer was not designed for hardening, there
are ways to fix that, but if such projects start to misuse
asan then it will become harder to fix.

in the glibc context, i think there are only a handful of
cases when the libc has excuse to make assumptions beyond
iso c, and the rest should be clean c code.

but even if the code is fixed, hardening the libc sounds
problematic: on 32bit systems the address space is a scarce
resource and on 64bit systems the shadow map breaks reliable
systems with overcommit off.

(ubsan, i believe, is fine because it works without a runtime,
when it only traps and does not depend on unsafe environment.
unfortunately it has false positives in gcc:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63303

outside of the libc, most of the problems with the sanitizers
are due to their interactions with the c runtime and dynamic
linker through various unreliable hacks that can break at
any libc update and may provide new attack surfaces.)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2][BZ #17979][BZ #17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-02-10  4:16   ` Dwight Guth
@ 2016-02-10 12:59     ` Joseph Myers
  0 siblings, 0 replies; 34+ messages in thread
From: Joseph Myers @ 2016-02-10 12:59 UTC (permalink / raw)
  To: Dwight Guth; +Cc: libc-alpha

On Tue, 9 Feb 2016, Dwight Guth wrote:

> Pinging this because I didn't get a response.

We're still in freeze for 2.23.  I advise resubmitting, split into 
separate patches for each issue, once 2.23 is out.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Undefined behavior in glibc
  2016-02-10  3:44                                     ` Alexander Cherepanov
  2016-02-10 11:42                                       ` Szabolcs Nagy
@ 2016-02-10 12:59                                       ` Joseph Myers
  1 sibling, 0 replies; 34+ messages in thread
From: Joseph Myers @ 2016-02-10 12:59 UTC (permalink / raw)
  To: Alexander Cherepanov; +Cc: Dwight Guth, libc-alpha

On Wed, 10 Feb 2016, Alexander Cherepanov wrote:

> > I think going from the pointer to char back to a pointer to long is valid
> > in GNU C and in common usage C, provided you never access the same memory
> > with different non-character types (other than signed/unsigned variations)
> > in ways that would require a union to do without conversions between
> > pointer types.
> 
> Not sure this plays well with other principles. AIUI (from [1], [2], [2] etc.)
> you want to retain the freedom in gcc to optimize based on Q16 in DR 017. But
> going through char* seems to permit cheating. Suppose you have a 2d array and
> a pointer to char which points to the boundary between two rows. What you get
> after converting it to a pointer to the type of the elements -- a pointer
> pointing past the end of the previous row or a pointer to the first element of
> the next row? What are the limits for pointer arithmetic with this pointer --
> one row, two rows, whole array?

This is not relevant to the string functions since it's not visible to 
them if a subobject was used to contstruct the pointer passed in.

> I've got a question regarding separate compilation etc. What's the problem
> with explicit_bzero then?

It needs *consensus* on the appropriateness of the patch as it exists now, 
notwithstanding the various ways in which, absent new compiler features, 
other copies of the zeroed memory may still be present and accessible to 
the process if subsequently compromised.

> Just compile it separately and no barriers are
> required, right?

In my view, yes (the arguments about barriers are irrelevant).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2][BZ #17979][BZ #17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler.
  2016-02-01  9:44 ` [PATCH v2][BZ #17979][BZ #17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler Dwight Guth
  2016-02-10  4:16   ` Dwight Guth
@ 2016-02-10 17:21   ` Mike Frysinger
  1 sibling, 0 replies; 34+ messages in thread
From: Mike Frysinger @ 2016-02-10 17:21 UTC (permalink / raw)
  To: Dwight Guth; +Cc: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 142 bytes --]

i think this should be split up into two patches ... one for restrict
and one for types.  when master re-opens, feel free to ping then.
-mike

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2016-02-10 17:21 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-28 20:20 [PATCH][BZ 17979][BZ 17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler Dwight Guth
2016-01-28 21:19 ` Joseph Myers
2016-01-28 22:20   ` Dwight Guth
2016-01-28 22:48     ` Joseph Myers
2016-01-28 22:50       ` Dwight Guth
2016-01-28 22:52         ` Joseph Myers
2016-01-28 22:58           ` Mike Frysinger
2016-01-28 23:08             ` Dwight Guth
2016-01-28 23:20               ` Joseph Myers
2016-01-28 23:28                 ` Dwight Guth
2016-01-28 23:33                   ` Joseph Myers
2016-01-28 23:47                     ` Dwight Guth
2016-01-28 23:52                       ` Joseph Myers
2016-01-29  0:06                         ` Dwight Guth
2016-01-29 15:34                           ` Joseph Myers
2016-01-28 23:16             ` Joseph Myers
2016-01-28 23:44               ` Mike Frysinger
2016-01-29  0:48                 ` Alexander Cherepanov
2016-01-29  0:58                   ` Mike Frysinger
2016-01-30  1:50                     ` Alexander Cherepanov
2016-01-31 16:13                       ` Joseph Myers
2016-02-03 12:29                         ` Undefined behavior in glibc -- was: " Alexander Cherepanov
2016-02-03 13:07                           ` Joseph Myers
2016-02-05 18:24                             ` Undefined behavior in glibc Alexander Cherepanov
2016-02-05 18:39                               ` Joseph Myers
2016-02-05 20:03                                 ` Alexander Cherepanov
2016-02-05 22:35                                   ` Joseph Myers
2016-02-10  3:44                                     ` Alexander Cherepanov
2016-02-10 11:42                                       ` Szabolcs Nagy
2016-02-10 12:59                                       ` Joseph Myers
2016-02-01  9:44 ` [PATCH v2][BZ #17979][BZ #17721] Fix issues with sys/cdefs.h and uchar.h when using non-gcc compiler Dwight Guth
2016-02-10  4:16   ` Dwight Guth
2016-02-10 12:59     ` Joseph Myers
2016-02-10 17:21   ` Mike Frysinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).