Implement C11 annex K?

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* Implement C11 annex K?
       [not found] <1407616492.31098.ezmlm@sourceware.org>
@ 2014-08-09 20:52 ` David A. Wheeler
  2014-08-10  7:52   ` Andreas Jaeger
  0 siblings, 1 reply; 79+ messages in thread
From: David A. Wheeler @ 2014-08-09 20:52 UTC (permalink / raw)
  To: libc-alpha

I may have someone who'd be willing to develop and submit at least a portion of ISO C11 annex K.  The glibc front page says that glibc "follows all relevant standards including ISO C11 and POSIX.1-2008".  Should I presume that such a submission would be considered, and accepted if high enough quality?  Or would annex K (strcpy_s and friends) be automatically rejected, even though it's in the standard?  If implementing the standard would be automatically rejected, then I don't want to waste this person's time.

I think annex K *should* be supported by glibc.

Thank you.

--- David A. Wheeler

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-09 20:52 ` Implement C11 annex K? David A. Wheeler
@ 2014-08-10  7:52   ` Andreas Jaeger
  2014-08-10 15:03     ` Adhemerval Zanella
  0 siblings, 1 reply; 79+ messages in thread
From: Andreas Jaeger @ 2014-08-10  7:52 UTC (permalink / raw)
  To: dwheeler, libc-alpha

On 08/09/2014 10:51 PM, David A. Wheeler wrote:
> I may have someone who'd be willing to develop and submit at least a portion of ISO C11 annex K.  The glibc front page says that glibc "follows all relevant standards including ISO C11 and POSIX.1-2008".  Should I presume that such a submission would be considered, and accepted if high enough quality?  Or would annex K (strcpy_s and friends) be automatically rejected, even though it's in the standard?  If implementing the standard would be automatically rejected, then I don't want to waste this person's time.
> 
> I think annex K *should* be supported by glibc.


Please see the archives of this mailing list for previous discussions
about these functions. This is something we rejected previously but I
don't recall the details,

Andreas
-- 
 Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi
  SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 NÃ¼rnberg, Germany
   GF: Jeff Hawn,Jennifer Guild,Felix ImendÃ¶rffer,HRB16746 (AG NÃ¼rnberg)
    GPG fingerprint = 93A3 365E CE47 B889 DF7F  FED1 389A 563C C272 A126

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-10  7:52   ` Andreas Jaeger
@ 2014-08-10 15:03     ` Adhemerval Zanella
  2014-08-11 15:32       ` Joseph S. Myers
  2014-08-12  4:23       ` Rich Felker
  0 siblings, 2 replies; 79+ messages in thread
From: Adhemerval Zanella @ 2014-08-10 15:03 UTC (permalink / raw)
  To: libc-alpha

On 10-08-2014 04:52, Andreas Jaeger wrote:
> On 08/09/2014 10:51 PM, David A. Wheeler wrote:
>> I may have someone who'd be willing to develop and submit at least a portion of ISO C11 annex K.  The glibc front page says that glibc "follows all relevant standards including ISO C11 and POSIX.1-2008".  Should I presume that such a submission would be considered, and accepted if high enough quality?  Or would annex K (strcpy_s and friends) be automatically rejected, even though it's in the standard?  If implementing the standard would be automatically rejected, then I don't want to waste this person's time.
>>
>> I think annex K *should* be supported by glibc.
>
> Please see the archives of this mailing list for previous discussions
> about these functions. This is something we rejected previously but I
> don't recall the details,
>
> Andreas

Last iteration to add C11 annex K in GLIBC was from Ulrich Bayer [1] and it
got stalled not because GLIBC maintainers decided this is not worth of adding,
but because the proposer didn't send any more updates based on comments.

So you can take this initial approach as base for your work or if you are willing
start a new thread/implementation. However, keep in mind that since GLIBC is
community driven project, it will probably require a lot of iterations of
submission and revisions.

[1] https://sourceware.org/ml/libc-alpha/2013-06/msg00208.html

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-10 15:03     ` Adhemerval Zanella
@ 2014-08-11 15:32       ` Joseph S. Myers
  2014-08-11 15:52         ` Paul Eggert
  2014-08-11 15:56         ` David A. Wheeler
  2014-08-12  4:23       ` Rich Felker
  1 sibling, 2 replies; 79+ messages in thread
From: Joseph S. Myers @ 2014-08-11 15:32 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

On Sun, 10 Aug 2014, Adhemerval Zanella wrote:

> Last iteration to add C11 annex K in GLIBC was from Ulrich Bayer [1] and it
> got stalled not because GLIBC maintainers decided this is not worth of adding,
> but because the proposer didn't send any more updates based on comments.
> 
> So you can take this initial approach as base for your work or if you are willing
> start a new thread/implementation. However, keep in mind that since GLIBC is
> community driven project, it will probably require a lot of iterations of
> submission and revisions.

Indeed - and when different people in the community suggest different 
things about how to handle some issue, it's your job as submitter to drive 
things to consensus.  Also: I advise getting the copyright assignment in 
before sending any large patches; people may not want to look at large 
unassigned patches.  (Ulrich Bayer and SBA Research GmbH did get their 
assignments done, though I think it was a while after patches were first 
posted.)

I believe the basic approach was agreed in the previous threads (of which 
there were I think several spread over an extended period of time).

* The new functions would go in a separate library libc_s unless so 
closely tied to existing libc functions that this is problematic.  If it's 
natural to implement them in terms of functions not currently exported 
from libc, or exported from libc only with names not in the C11 reserved 
namespace, then additional reserved-namespace exports can be added to libc 
at the GLIBC_PRIVATE symbol version.

* Declarations in headers would be conditional on __STDC_WANT_LIB_EXT1__ 
(directly, not on an __USE_* macro, because what matters is the definition 
of __STDC_WANT_LIB_EXT1__ when affected headers are included, not when 
some unaffected C11 header is included, which is incompatible with the 
__USE_* approach).  Checks for consistency of __STDC_WANT_LIB_EXT1__ when 
different headers are included might be omitted given the WG14 mood for 
simpler __STDC_WANT_* macro semantics.

* The <stddef.h> and <stdint.h> changes should be implemented in GCC (with 
the appropriate conditionals in the headers), with glibc's <stdint.h> 
needing changing as well.

* Pay attention to various C11 DRs relating to Annex K.

* All normal conventions for patches adding new interfaces apply, 
including giving the new interfaces the symbol version (e.g. GLIBC_2.21) 
of the first glibc version with the interface and ensuring sufficient 
testsuite coverage.

* Some functions will need versions for _FILE_OFFSET_BITS=64 / 
_LARGEFILE64_SOURCE (e.g. tmpfile, fopen, freopen have such versions, so 
the _s variants of those functions should also have such versions).  In 
general, you'll need to understand what variants there are of any existing 
function getting an _s variant, so as to work out which of those variants 
need replicating for *_s.  (Another example is the variants of printf / 
scanf functions in libnldbl.a for -mlong-double-64 on some architectures.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-11 15:32       ` Joseph S. Myers
@ 2014-08-11 15:52         ` Paul Eggert
  2014-08-11 16:06           ` Joseph S. Myers
  2014-08-11 15:56         ` David A. Wheeler
  1 sibling, 1 reply; 79+ messages in thread
From: Paul Eggert @ 2014-08-11 15:52 UTC (permalink / raw)
  To: Joseph S. Myers, Adhemerval Zanella; +Cc: libc-alpha

Joseph S. Myers wrote:
> * Some functions will need versions for _FILE_OFFSET_BITS=64 /
> _LARGEFILE64_SOURCE (e.g. tmpfile, fopen, freopen have such versions, so
> the _s variants of those functions should also have such versions).

Also, don't provide _FILE_OFFSET_BITS=32 versions.  Just support 
_FILE_OFFSET_BITS=64, and make it a compile- or link-time error to try 
to use Annex K with _FILE_OFFSET_BITS=32.

Regardless of whether one thinks Annex K is a good idea, its stated goal 
is more-reliable software.  Requiring it to support _FILE_OFFSET_BITS=32 
(which is deliberately *unreliable*) would be counterproductive make-work.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-11 15:32       ` Joseph S. Myers
  2014-08-11 15:52         ` Paul Eggert
@ 2014-08-11 15:56         ` David A. Wheeler
  1 sibling, 0 replies; 79+ messages in thread
From: David A. Wheeler @ 2014-08-11 15:56 UTC (permalink / raw)
  To: joseph; +Cc: libc-alpha, azanella

On Mon, 11 Aug 2014 15:31:53 +0000, "Joseph S. Myers" <joseph@codesourcery.com> wrote:

Excellent points, thanks.
> Also: I advise getting the copyright assignment in 
> before sending any large patches; people may not want to look at large 
> unassigned patches.  (Ulrich Bayer and SBA Research GmbH did get their 
> assignments done, though I think it was a while after patches were first posted.)

Ulrich Bayer got a copyright assignment done?  That's excellent news; that creates
a useful potential place to start.

I will ask this individual to start small, e.g., start with just strcpy_s and strcat_s.
He needs to learn the requirements of this group, and it's easier
to review small patches anyway.

--- David A. Wheeler

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-11 15:52         ` Paul Eggert
@ 2014-08-11 16:06           ` Joseph S. Myers
  0 siblings, 0 replies; 79+ messages in thread
From: Joseph S. Myers @ 2014-08-11 16:06 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Adhemerval Zanella, libc-alpha

On Mon, 11 Aug 2014, Paul Eggert wrote:

> Joseph S. Myers wrote:
> > * Some functions will need versions for _FILE_OFFSET_BITS=64 /
> > _LARGEFILE64_SOURCE (e.g. tmpfile, fopen, freopen have such versions, so
> > the _s variants of those functions should also have such versions).
> 
> Also, don't provide _FILE_OFFSET_BITS=32 versions.  Just support
> _FILE_OFFSET_BITS=64, and make it a compile- or link-time error to try to use
> Annex K with _FILE_OFFSET_BITS=32.
> 
> Regardless of whether one thinks Annex K is a good idea, its stated goal is
> more-reliable software.  Requiring it to support _FILE_OFFSET_BITS=32 (which
> is deliberately *unreliable*) would be counterproductive make-work.

One goal is retrofitting large bodies of existing code with local changes 
(for which adding a requirement to use _FILE_OFFSET_BITS=64 seems 
unhelpful).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-10 15:03     ` Adhemerval Zanella
  2014-08-11 15:32       ` Joseph S. Myers
@ 2014-08-12  4:23       ` Rich Felker
       [not found]         ` <3565dfa0-060c-46b9-b08c-6edc4eaa1179@email.android.com>
  1 sibling, 1 reply; 79+ messages in thread
From: Rich Felker @ 2014-08-12  4:23 UTC (permalink / raw)
  To: libc-alpha

On Sun, Aug 10, 2014 at 12:03:28PM -0300, Adhemerval Zanella wrote:
> On 10-08-2014 04:52, Andreas Jaeger wrote:
> > On 08/09/2014 10:51 PM, David A. Wheeler wrote:
> >> I may have someone who'd be willing to develop and submit at
> >> least a portion of ISO C11 annex K. The glibc front page says
> >> that glibc "follows all relevant standards including ISO C11 and
> >> POSIX.1-2008". Should I presume that such a submission would be
> >> considered, and accepted if high enough quality? Or would annex K
> >> (strcpy_s and friends) be automatically rejected, even though
> >> it's in the standard? If implementing the standard would be
> >> automatically rejected, then I don't want to waste this person's
> >> time.
> >>
> >> I think annex K *should* be supported by glibc.
> >
> > Please see the archives of this mailing list for previous discussions
> > about these functions. This is something we rejected previously but I
> > don't recall the details,
> >
> > Andreas
> 
> Last iteration to add C11 annex K in GLIBC was from Ulrich Bayer [1] and it
> got stalled not because GLIBC maintainers decided this is not worth of adding,
> but because the proposer didn't send any more updates based on comments.
> 
> So you can take this initial approach as base for your work or if you are willing
> start a new thread/implementation. However, keep in mind that since GLIBC is
> community driven project, it will probably require a lot of iterations of
> submission and revisions.

My view is that Annex K was a big mistake -- the latest iteration of
the committee falling for the antics of a "sponsor" that has no
interest in actually implementing the standard and who has done
nothing but try to undermine the C language for the past 20+ years --
and the only good thing about it is that it's optional. As such, I
think support for it should be omitted unless/until there is serious
demand for it in the form of applications that cannot be built without
it. And that's not going to happen unless glibc implements it.

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
       [not found]         ` <3565dfa0-060c-46b9-b08c-6edc4eaa1179@email.android.com>
@ 2014-08-12 21:00           ` Rich Felker
       [not found]             ` <d4ae8119-f629-4235-8981-dd2ccc220fea@email.android.com>
  0 siblings, 1 reply; 79+ messages in thread
From: Rich Felker @ 2014-08-12 21:00 UTC (permalink / raw)
  To: David A. Wheeler; +Cc: libc-alpha

On Tue, Aug 12, 2014 at 03:30:11PM -0400, David A. Wheeler wrote:
> It is a part of the C11 spec. It should be implemented.

It's an optional annex. Note that annex G is also not implemented, and
not likely to be, despite actually being useful...

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
       [not found]             ` <d4ae8119-f629-4235-8981-dd2ccc220fea@email.android.com>
@ 2014-08-12 22:08               ` Rich Felker
  2014-08-12 23:15                 ` David A. Wheeler
  0 siblings, 1 reply; 79+ messages in thread
From: Rich Felker @ 2014-08-12 22:08 UTC (permalink / raw)
  To: David A. Wheeler; +Cc: libc-alpha

On Tue, Aug 12, 2014 at 05:46:01PM -0400, David A. Wheeler wrote:
> Annex K is extremely useful. Many people want to prevent easy buffer
> overflows and other similar problems.
> 
> E.g., the strcpy function does not check buffer lengths at all,
> while strncpy is badly broken because it zeros out the entire
> destination. None of the standard functions provide a simple "copy a
> string, always terminate, don't go beyond the buffer, and do not do
> unnecessary copues". This is a basic omission in the library, one
> that keeps getting recreated. The current library is missing several
> really basic functionalities. Time to fix it.

Annex K has plenty of fundamental flaws that make it essentially
useless for this purpose. The main problem is the way "runtime
constraint handling" is done. I'll first quote the relevant text:

K.3.1.4 Runtime-constraint violations

"Implementations shall verify that the runtime-constraints for a
function are not violated by the program. If a runtime-constraint is
violated, the implementation shall call the currently registered
runtime-constraint handler (see set_constraint_handler_s in
<stdlib.h>). Multiple runtime-constraint violations in the same call
to a library function result in only one call to the
runtime-constraint handler. It is unspecified which one of the
multiple runtime-constraint violations cause the handler to be
called."

...

"The runtime-constraint handler might not return. If the handler does
return, the library function whose runtime-constraint was violated
shall return some indication of failure as given by the returns
section in the function's specification."

K.3.6.1.1 The set_constraint_handler_s function

"The implementation has a default constraint handler that is used if
no calls to the set_constraint_handler_s function have been made. The
behavior of the default handler is implementation-defined, and it may
cause the program to exit or abort."

The important things to take away from this are:

1. Since the default constraint handler is implementation-defined,
   portable code cannot rely on the default handler to catch dangerous
   conditions, nor can it rely on the default handler to return an
   error status to the caller in a way that the caller can handle.

2. The constraint handler is global state, so it cannot be set on a
   per-caller basis. This means, in my terminology, any code using the
   Annex K functions is non-library-safe: it's forced either to modify
   global state in a way that can't safely cooperate with a caller (or
   other library code, much less other threads) that uses the same
   state, or to risk crashing the calling program (via a potential
   default handler that does so).

Basically, this is a backwards, global-state-driven design from the
80s or early 90s that can't be used in modern library-safe,
thread-safe code.

As for your specific example, which I assume is strncpy_s, the
standard function which already solves this problem without the
additonal confusing and error-prone size arguments is snprintf:
snprintf(dest, destsize, "%s", source). And the nonstandard but widely
available function that's even more convenient is strlcpy. And the
name strncpy_s is horribly misleading since its behavior has nothing
to do with strncpy (which is for working with fixed-size, non-C-string
fields, not bounded copying).

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-12 22:08               ` Rich Felker
@ 2014-08-12 23:15                 ` David A. Wheeler
  2014-08-12 23:48                   ` dalias
  0 siblings, 1 reply; 79+ messages in thread
From: David A. Wheeler @ 2014-08-12 23:15 UTC (permalink / raw)
  To: dalias; +Cc: libc-alpha

On Tue, 12 Aug 2014 18:08:11 -0400, Rich Felker <dalias@libc.org> wrote:
> 1. Since the default constraint handler is implementation-defined,
>    portable code cannot rely on the default handler to catch dangerous
>    conditions, nor can it rely on the default handler to return an
>    error status to the caller in a way that the caller can handle.

Portable application code can just set the handler, once.
Portable *library* code just needs to check for error returns; if they don't return,
then the application author decided to do something else.  There's no state-saving necessary.

I think any libc implementation should by default just cause errors to be
returned, since it has no way of knowing exactly what to do with the application instead.
That also means that early library calls don't get caught in a bogus handler.
I agree that the committee should have specified the default, but at least it gives
implementers the chance to choose a reasonable default.

My guess is that most people will use error returns, and that the constraint handler's
primary use will be to support quick-and-dirty code.

> As for your specific example, which I assume is strncpy_s,

Yes.

> the standard function which already solves this problem without the
> additonal confusing and error-prone size arguments is snprintf:
> snprintf(dest, destsize, "%s", source).

snprintf is botched, too.  If the source is as least destsize in length,
then dest is not \0-terminated.  This is possible to detect, but off-by-one errors
leading to vulnerabilities are extremely easy to do.

There are also a lot of annoyances with using snprintf this way.
Why do I have to interpret a format string just to copy source to dest?
Complete botch, complete fail.  String copying is an EXTREMELY common
operation, it should be easy to do.

> And the nonstandard but widely
> available function that's even more convenient is strlcpy.

True.   strlcpy should be in glibc, too.  Lots of people use them.

My theory is that annex K is at least a formal spec, so it has an even *stronger*
argument for inclusion.  But if people would rather start with strlcpy/strlcat, that's fine.

--- David A. Wheeler

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-12 23:15                 ` David A. Wheeler
@ 2014-08-12 23:48                   ` dalias
  2014-08-13 19:23                     ` David A. Wheeler
  0 siblings, 1 reply; 79+ messages in thread
From: dalias @ 2014-08-12 23:48 UTC (permalink / raw)
  To: David A. Wheeler; +Cc: libc-alpha

On Tue, Aug 12, 2014 at 07:15:10PM -0400, David A. Wheeler wrote:
> On Tue, 12 Aug 2014 18:08:11 -0400, Rich Felker <dalias@libc.org> wrote:
> > 1. Since the default constraint handler is implementation-defined,
> >    portable code cannot rely on the default handler to catch dangerous
> >    conditions, nor can it rely on the default handler to return an
> >    error status to the caller in a way that the caller can handle.
> 
> Portable application code can just set the handler, once.

Not necessarily. There could already be library code and multiple
threads running before entering main due to global ctors.

> Portable *library* code just needs to check for error returns; if they don't return,
> then the application author decided to do something else.  There's no state-saving necessary.

Such library code will not be safe in an application that didn't set a
handler, though. Consider the case where application A used library X,
which uses library Y as an implementation detail, and where library Y
uses annex K functions. Application A has no idea that annex K is in
use, and wants nothing to do with annex K, so it's not setting a
constraint handler. Library Y expects that the application using it
setup the constraint handler, but the application is not the "user" of
library Y; library X is the user. And library X can't setup the
handler because it's a library. The best library X could do is
document that it's using library Y, and that library Y requires a
proper annex K constraint handler to be setup, as part of its public
interface contract. But this is ugly because use of library Y should
be an implementation detail (it may even be optional at build time,
and may even be more indirect, via other libs) not something public.

> I think any libc implementation should by default just cause errors to be
> returned, since it has no way of knowing exactly what to do with the application instead.

This is certainly the proper behavior from a library-safety
standpoint, but it's also the least secure, since failure to check
return values could allow bugs to propagate. I suspect the historical
(Windows) usage of these interfaces mostly expects the constraint
handler to abort the program like an assertion failure.

To me, this is the worst-designed aspect of annex K: depending on how
you setup the runtime constraint handler, the functions have very
different usages:

1. If the constraint handler allows failure to be reported, their
   usage is much like strlcpy: they're bounded and valid programs can
   use them in ways that "would overflow" by relying on them to catch
   the overflow and convert it into an error case that can be handled.

2. If the constraint handler aborts, their usage is like an ugly,
   explicit version of _FORTIFY_SOURCE: they're assertion-checking
   functions that convert would-be UB into a predictable crash. Valid
   programs ensure that this never happens, but the check is there as
   an added hardening/safety measure.

> > As for your specific example, which I assume is strncpy_s,
> 
> Yes.
> 
> > the standard function which already solves this problem without the
> > additonal confusing and error-prone size arguments is snprintf:
> > snprintf(dest, destsize, "%s", source).
> 
> snprintf is botched, too.  If the source is as least destsize in length,
> then dest is not \0-terminated.  This is possible to detect, but off-by-one errors
> leading to vulnerabilities are extremely easy to do.

No, this is false. snprintf always null-terminates except when n is
zero, in which case it returns the length of the result without
writing anything.

> There are also a lot of annoyances with using snprintf this way.
> Why do I have to interpret a format string just to copy source to dest?
> Complete botch, complete fail.  String copying is an EXTREMELY common
> operation, it should be easy to do.

Straight string copying into a pre-existing buffer is usually a bug.
snprintf allows it as a degenerate special-case of the non-buggy,
highly-clean-and-efficient usage pattern of constructing a string in a
single opertion rather than via "Schlemiel the painter". In cases
where you just want a copy of an input string to keep past the
lifetime of (or at least the limits of your contract to access) the
original object, strdup is the right tool.

> > And the nonstandard but widely
> > available function that's even more convenient is strlcpy.
> 
> True.   strlcpy should be in glibc, too.  Lots of people use them.
> 
> My theory is that annex K is at least a formal spec, so it has an even *stronger*
> argument for inclusion.  But if people would rather start with strlcpy/strlcat, that's fine.

I'd much rather see strlcpy/strlcat added. Even though I consider them
a broken idiom, they're at least easy to use and provide a formulatic
fix for buggy legacy code, whereas the annex K functions are complex
to use (multiple, often redundant, size arguments, which can easily be
transposed by the programmer) and plagued by the constraint handler
issues.

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-12 23:48                   ` dalias
@ 2014-08-13 19:23                     ` David A. Wheeler
  2014-08-13 19:44                       ` Adhemerval Zanella
                                         ` (2 more replies)
  0 siblings, 3 replies; 79+ messages in thread
From: David A. Wheeler @ 2014-08-13 19:23 UTC (permalink / raw)
  To: dalias; +Cc: libc-alpha

On Tue, 12 Aug 2014 19:48:17 -0400, dalias <dalias@libc.org> wrote:
> I'd much rather see strlcpy/strlcat added.

I'd be happy to have someone work to add strlcpy/strlcat, and
I'd be happy to focus on doing that *first*.
After all, strlcpy/strlcat are short, well-understood, and easy-to-use.
I see no need to put strlcpy/strlcat in a separate lib_s file, since they're short.
Annex K has other capabilities I'd like to see, and is a formal standard, but
staging work is fine.  I suspect that strlcpy/strlcat could help in implementing strcpy_s
(I'd have to check to make sure).

So... would a high-quality strlcpy/strlcat patch be (FINALLY!) accepted?

> > Portable application code can just set the handler, once.
> 
> Not necessarily. There could already be library code and multiple
> threads running before entering main due to global ctors.

Sure, but if the default is "no constraint handler set", then a constraint
handler won't be triggered by surprise.

> ... Library Y expects that the application using it
> setup the constraint handler, but the application is not the "user" of
> library Y; library X is the user.

Hold it right there!  A library that *requires* that a constraint handler be set is
already in a state of sin, because it's perfectly legal to set up the constraint
handler to return.  Indeed, I think that should be the default.  Libraries have to
assume that the constraint handler allows returns, by spec.

I do understand that you have concerns about the constraint handlers.  Got it.
Would it be better if there were GNU extensions for the Annex K functions
that guaranteed to *not* call the constraint handler ("NAME_s_e")?
The "_s" functions could then be implemented by calling these other functions.
I haven't looked into how practical that is, but it's probably do-able.
That would at least give you a way to guarantee avoiding the constraint
handler, if desired, and it would implemented the functions as defined in the standard.

--- David A. Wheeler

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-13 19:23                     ` David A. Wheeler
@ 2014-08-13 19:44                       ` Adhemerval Zanella
  2014-08-13 19:45                         ` Adhemerval Zanella
  2014-08-13 20:49                         ` Rich Felker
  2014-08-13 20:41                       ` dalias
  2014-08-13 20:55                       ` Joseph S. Myers
  2 siblings, 2 replies; 79+ messages in thread
From: Adhemerval Zanella @ 2014-08-13 19:44 UTC (permalink / raw)
  To: libc-alpha

On 13-08-2014 16:23, David A. Wheeler wrote:
> On Tue, 12 Aug 2014 19:48:17 -0400, dalias <dalias@libc.org> wrote:
>> I'd much rather see strlcpy/strlcat added.
> I'd be happy to have someone work to add strlcpy/strlcat, and
> I'd be happy to focus on doing that *first*.
> After all, strlcpy/strlcat are short, well-understood, and easy-to-use.
> I see no need to put strlcpy/strlcat in a separate lib_s file, since they're short.
> Annex K has other capabilities I'd like to see, and is a formal standard, but
> staging work is fine.  I suspect that strlcpy/strlcat could help in implementing strcpy_s
> (I'd have to check to make sure).
>
> So... would a high-quality strlcpy/strlcat patch be (FINALLY!) accepted?

From time to time it arises and there is small FAQ about it [1].  Bottom line: it is not
consensus that the strlcpy/strlcat functions provides any more 'safety'.  You can check
glibc alpha history and there are number of cases saying that it does not in fact, this is
just and example [2]. 

I personally see compiler driver directives (like -D_FORTIFY_SOURCE or ASAN), runtime
checks (like valgrind), or even higher level languages constructs (like std::string) to
provide safety in a much more clean way. But I no expert in security...

[1] https://sourceware.org/glibc/wiki/FAQ#Why_no_strlcpy_.2BAC8_strlcat.3F
[2] https://plus.google.com/u/0/+DavidHerrmann/posts/DGMdzUG3nyj


>
> Hold it right there!  A library that *requires* that a constraint handler be set is
> already in a state of sin, because it's perfectly legal to set up the constraint
> handler to return.  Indeed, I think that should be the default.  Libraries have to
> assume that the constraint handler allows returns, by spec.
>
> I do understand that you have concerns about the constraint handlers.  Got it.
> Would it be better if there were GNU extensions for the Annex K functions
> that guaranteed to *not* call the constraint handler ("NAME_s_e")?

I don't see it a better way to do this.  The initial aim is to provide an optional
C11 annex, now the aim is to provide extension (that would be probably exported)
that might create even more non-portability.


> The "_s" functions could then be implemented by calling these other functions.
> I haven't looked into how practical that is, but it's probably do-able.
> That would at least give you a way to guarantee avoiding the constraint
> handler, if desired, and it would implemented the functions as defined in the standard.
>
> --- David A. Wheeler
>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-13 19:44                       ` Adhemerval Zanella
@ 2014-08-13 19:45                         ` Adhemerval Zanella
  2014-08-13 20:49                         ` Rich Felker
  1 sibling, 0 replies; 79+ messages in thread
From: Adhemerval Zanella @ 2014-08-13 19:45 UTC (permalink / raw)
  To: libc-alpha

On 13-08-2014 16:43, Adhemerval Zanella wrote:
> On 13-08-2014 16:23, David A. Wheeler wrote:
>> On Tue, 12 Aug 2014 19:48:17 -0400, dalias <dalias@libc.org> wrote:
>>> I'd much rather see strlcpy/strlcat added.
>> I'd be happy to have someone work to add strlcpy/strlcat, and
>> I'd be happy to focus on doing that *first*.
>> After all, strlcpy/strlcat are short, well-understood, and easy-to-use.
>> I see no need to put strlcpy/strlcat in a separate lib_s file, since they're short.
>> Annex K has other capabilities I'd like to see, and is a formal standard, but
>> staging work is fine.  I suspect that strlcpy/strlcat could help in implementing strcpy_s
>> (I'd have to check to make sure).
>>
>> So... would a high-quality strlcpy/strlcat patch be (FINALLY!) accepted?
> From time to time it arises and there is small FAQ about it [1].  Bottom line: it is not
> consensus that the strlcpy/strlcat functions provides any more 'safety'.  You can check
> glibc alpha history and there are number of cases saying that it does not in fact, this is
> just and example [2]. 
>
> I personally see compiler driver directives (like -D_FORTIFY_SOURCE or ASAN), runtime
> checks (like valgrind), or even higher level languages constructs (like std::string) to
> provide safety in a much more clean way. But I no expert in security...
>
> [1] https://sourceware.org/glibc/wiki/FAQ#Why_no_strlcpy_.2BAC8_strlcat.3F
> [2] https://plus.google.com/u/0/+DavidHerrmann/posts/DGMdzUG3nyj

Some more interesting comments about [2]: https://plus.google.com/111049168280159033135/posts/1amLbuhWbh5

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-13 19:23                     ` David A. Wheeler
  2014-08-13 19:44                       ` Adhemerval Zanella
@ 2014-08-13 20:41                       ` dalias
  2014-08-13 20:55                       ` Joseph S. Myers
  2 siblings, 0 replies; 79+ messages in thread
From: dalias @ 2014-08-13 20:41 UTC (permalink / raw)
  To: David A. Wheeler; +Cc: libc-alpha

On Wed, Aug 13, 2014 at 03:23:29PM -0400, David A. Wheeler wrote:
> On Tue, 12 Aug 2014 19:48:17 -0400, dalias <dalias@libc.org> wrote:
> > I'd much rather see strlcpy/strlcat added.
> 
> I'd be happy to have someone work to add strlcpy/strlcat, and
> I'd be happy to focus on doing that *first*.
> After all, strlcpy/strlcat are short, well-understood, and easy-to-use.
> I see no need to put strlcpy/strlcat in a separate lib_s file, since they're short.
> Annex K has other capabilities I'd like to see, and is a formal standard, but
> staging work is fine.  I suspect that strlcpy/strlcat could help in implementing strcpy_s
> (I'd have to check to make sure).
> 
> So... would a high-quality strlcpy/strlcat patch be (FINALLY!) accepted?

I hope so, but I have no control over that other than speaking up in
favor of it.

> > > Portable application code can just set the handler, once.
> > 
> > Not necessarily. There could already be library code and multiple
> > threads running before entering main due to global ctors.
> 
> Sure, but if the default is "no constraint handler set", then a constraint
> handler won't be triggered by surprise.

Yes, but portable code can't assume that.

> > ... Library Y expects that the application using it
> > setup the constraint handler, but the application is not the "user" of
> > library Y; library X is the user.
> 
> Hold it right there!  A library that *requires* that a constraint handler be set is
> already in a state of sin, because it's perfectly legal to set up the constraint
> handler to return.

You missed the point: that's the exact constraint handler it would
_want_. But since the default is implementation-defined, it can't rely
on that being there by default. Thus, it requires an explicit
do-nothing constraint handler to be set.

> I do understand that you have concerns about the constraint handlers.  Got it.
> Would it be better if there were GNU extensions for the Annex K functions
> that guaranteed to *not* call the constraint handler ("NAME_s_e")?

No, this would be yet another layer of utter crap functions on top of
a layer of utter crap functions. There are many 3 or 4 "useful"
functions out of the whole annex K (the main one being qsort_s) and
the rest is just utterly redundant and not useful for the stated
purpose (of making software more secure). And for some of them,
implementing them imposes ugliness and potential bugs on other core
functions unless you want completely redundant implementations -- for
instance, the scanf core needs additional code to support the annex K
scanf-family functions, specifically the ugly double-argument forms
for %s, %c, and %[, and these are then incompatible with the POSIX
i18n %N$ argument reordering.

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-13 19:44                       ` Adhemerval Zanella
  2014-08-13 19:45                         ` Adhemerval Zanella
@ 2014-08-13 20:49                         ` Rich Felker
  1 sibling, 0 replies; 79+ messages in thread
From: Rich Felker @ 2014-08-13 20:49 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

On Wed, Aug 13, 2014 at 04:43:58PM -0300, Adhemerval Zanella wrote:
> On 13-08-2014 16:23, David A. Wheeler wrote:
> > On Tue, 12 Aug 2014 19:48:17 -0400, dalias <dalias@libc.org> wrote:
> >> I'd much rather see strlcpy/strlcat added.
> > I'd be happy to have someone work to add strlcpy/strlcat, and
> > I'd be happy to focus on doing that *first*.
> > After all, strlcpy/strlcat are short, well-understood, and easy-to-use.
> > I see no need to put strlcpy/strlcat in a separate lib_s file, since they're short.
> > Annex K has other capabilities I'd like to see, and is a formal standard, but
> > staging work is fine.  I suspect that strlcpy/strlcat could help in implementing strcpy_s
> > (I'd have to check to make sure).
> >
> > So... would a high-quality strlcpy/strlcat patch be (FINALLY!) accepted?
> 
> From time to time it arises and there is small FAQ about it [1].  Bottom line: it is not
> consensus that the strlcpy/strlcat functions provides any more 'safety'.  You can check
> glibc alpha history and there are number of cases saying that it does not in fact, this is
> just and example [2]. 

I agree that these functions don't much to fix the underlying broken
and often insecure idioms of copying and concatenating strings, and I
don't like them and never recommend using them. But I still am in
favor of adding them to glibc, because lots of programs provide their
own when the system lacks them, and at least 75% of the reinvented
versions I've seen have critical bugs. So my main motivation for
wanting strlcpy/strlcat in glibc is just to prevent the buggy
application-provided ones from getting selected at configure time.

> I personally see compiler driver directives (like -D_FORTIFY_SOURCE or ASAN), runtime
> checks (like valgrind), or even higher level languages constructs (like std::string) to
> provide safety in a much more clean way. But I no expert in security...

-D_FORTIFY_SOURCE indeed tends to do a much better job, but it's a
"best effort" thing -- it's dependent on the compiler's analysis to
know object sizes, and if the compiler fails to do the necessary
analysis (e.g. at -O0) it fails open. Still I'd trust the compiler
more to get the actual object sizes right than I would trust the
programmer to get the args to the annex K functions right.

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-13 19:23                     ` David A. Wheeler
  2014-08-13 19:44                       ` Adhemerval Zanella
  2014-08-13 20:41                       ` dalias
@ 2014-08-13 20:55                       ` Joseph S. Myers
  2014-08-13 21:25                         ` Paul Eggert
  2014-08-13 22:20                         ` Time to add strlcpy/strlcat FINALLY David A. Wheeler
  2 siblings, 2 replies; 79+ messages in thread
From: Joseph S. Myers @ 2014-08-13 20:55 UTC (permalink / raw)
  To: David A. Wheeler; +Cc: dalias, libc-alpha

On Wed, 13 Aug 2014, David A. Wheeler wrote:

> So... would a high-quality strlcpy/strlcat patch be (FINALLY!) accepted?

I think it would be reasonable to consider.  High-quality of course means 
including: copyright assignment; documentation; thorough testcases; 
appropriate feature test macro conditionals on the function declarations; 
appropriate symbol versions and corresponding updates to the ABI test 
baselines.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-13 20:55                       ` Joseph S. Myers
@ 2014-08-13 21:25                         ` Paul Eggert
  2014-08-13 21:35                           ` Rich Felker
  2014-08-13 22:20                         ` Time to add strlcpy/strlcat FINALLY David A. Wheeler
  1 sibling, 1 reply; 79+ messages in thread
From: Paul Eggert @ 2014-08-13 21:25 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: libc-alpha

Joseph S. Myers wrote:
> I think it would be reasonable to consider.  High-quality of course means

I don't see how we could have a high-quality implementation of strlcpy. 
  Either it'd be compatible with OpenBSD and thus suffer from DoS 
problems with long sources, or it'd be incompatible and then why bother?

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-13 21:25                         ` Paul Eggert
@ 2014-08-13 21:35                           ` Rich Felker
  2014-08-13 22:46                             ` Tolga Dalman
  0 siblings, 1 reply; 79+ messages in thread
From: Rich Felker @ 2014-08-13 21:35 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Joseph S. Myers, libc-alpha

On Wed, Aug 13, 2014 at 02:25:45PM -0700, Paul Eggert wrote:
> Joseph S. Myers wrote:
> >I think it would be reasonable to consider.  High-quality of course means
> 
> I don't see how we could have a high-quality implementation of
> strlcpy.  Either it'd be compatible with OpenBSD and thus suffer
> from DoS problems with long sources, or it'd be incompatible and
> then why bother?

The snprintf interface has the same "DoS problems" and that's not
reason to exclude it. It just limits the usefulness (or at least
convenience, since you can always validate inputs separately with
strnlen) in some situations. I agree totally that strlcpy is a bad
API, and I don't recommend using it, but since apps are using it, it's
much better to have a fully correct version in glibc than a buggy
application-provided fallback -- and the latter is really common.

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Time to add strlcpy/strlcat FINALLY
  2014-08-13 20:55                       ` Joseph S. Myers
  2014-08-13 21:25                         ` Paul Eggert
@ 2014-08-13 22:20                         ` David A. Wheeler
  2014-08-14  1:09                           ` Paul Eggert
  1 sibling, 1 reply; 79+ messages in thread
From: David A. Wheeler @ 2014-08-13 22:20 UTC (permalink / raw)
  To: libc-alpha, dalias

I said:
> > So... would a high-quality strlcpy/strlcat patch be (FINALLY!) accepted?

On Wed, 13 Aug 2014 20:55:49 +0000, "Joseph S. Myers" <joseph@codesourcery.com> wrote:
> I think it would be reasonable to consider.

That's a start.

A common REPEATED complaint about glibc is that it fails to support strlcpy/strlcat and other functions like them to counter buffer overflows.  The lack of such functions has been *repeatedly* raised as a problem since 2000 and at least through 2012.  Currently people have to keep re-implementing them, but these tend to be slower since they're not optimized.

The general consensus of people who have *studied* how to develop secure software generally recommend strlcpy/strlcat, annex K, or either when dealing with statically-sized buffers in C.  A few examples:
* OpenBSD developers, esp. Todd C. Miller and Theo de Raadt, who CREATED strlcpy/strlcat and use it. Usenix paper here: http://www.openbsd.org/papers/strlcpy-paper.ps.  Some glibc people may not like Theo personally, but that is *not* a reason to reject their work.
* Robert Seacord (SEI), who wrote the book "Secure Coding in C and C++". e.g., http://www.informit.com/articles/article.aspx?p=2036582&seqNum=4
* Michael Howard (Microsoft), who co-authored the book "Writing Secure Code".  The list of "banned functions" (http://msdn.microsoft.com/en-us/library/bb288454.aspx) that are no longer permitted in Microsoft code include both strcpy and strncpy, because programs with many of them tend to have buffer overflows.  They recommend annex K, but also note strlcpy/strlcat as an honorable mention and useful alternatives.
* David A. Wheeler (me); you can blame me for "Secure Programming for Linux and Unix HOWTO" (http://www.dwheeler.com/secure-programs).
* John Viega and Matt Messier, who wrote "Secure Programming Cookbook for C and C++: Recipes for Cryptography ...", recommend strlcpy/strlcat as one possible approach (they even provide a sample implementation when your system doesn't include one).

It's true that "strlcpy/strlcat" don't prevent all buffer overruns.  So what?  No one said they did.  But they do reduce the likelihood.  If you use strlcat/strlcpy or strcpy_s/strcat_s incorrectly, the likely result is silent data truncation.  Not good, but it's way better than "grant total control over machine to attacker" that strcpy and strncpy provide.  People often fail to compute the length correctly in strncpy, which is one reason why it's not an effective countermeasure for buffer overflows.

Please don't tell me "just switch to dynamic buffers".  In many environments these are forbidden, and in many others they're not worth it or are not worth switching to.  We need mechanisms that are easily used, and can in some cases be easily switched to.

Many other measures don't really work.  Managed languages are great, but sometimes you need C or are modifying an existing program in C.  The -D_FORTIFY_SOURCE option can be helpful, but it only works when the compiler knows the size of the destination buffer (often it cannot figure that out, but you know it and can tell the compiler).  Mechanisms like electric fence or Address sanitizer (ASan) are nice, but their overheads make them impractical in many cases (if overhead didn't matter, why are you using C?).

--- David A. Wheeler

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-13 21:35                           ` Rich Felker
@ 2014-08-13 22:46                             ` Tolga Dalman
  2014-08-13 23:59                               ` Russ Allbery
  0 siblings, 1 reply; 79+ messages in thread
From: Tolga Dalman @ 2014-08-13 22:46 UTC (permalink / raw)
  To: libc-alpha

On 08/13/2014 11:35 PM, Rich Felker wrote:
> I agree totally that strlcpy is a bad
> API, and I don't recommend using it, but since apps are using it, it's
> much better to have a fully correct version in glibc than a buggy
> application-provided fallback -- and the latter is really common.

Your argumentation appears inconsistent to me. strlcpy is a bad API by design
and provides no benefits except convenience for some applications (how many use
these functions, btw ?). If an application chooses to use a non-portable
and probably buggy function, well, then the application should be fixed. Not
glibc.

This discussion comes up every few years with no new arguments. IMHO glibc has
done well not to encourage the use of strlcpy/strlcat over better alternatives.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-13 22:46                             ` Tolga Dalman
@ 2014-08-13 23:59                               ` Russ Allbery
  2014-08-14  0:55                                 ` Joseph S. Myers
                                                   ` (2 more replies)
  0 siblings, 3 replies; 79+ messages in thread
From: Russ Allbery @ 2014-08-13 23:59 UTC (permalink / raw)
  To: libc-alpha

Tolga Dalman <tolga.dalman@googlemail.com> writes:

> Your argumentation appears inconsistent to me. strlcpy is a bad API by
> design and provides no benefits except convenience for some applications
> (how many use these functions, btw ?).  If an application chooses to use
> a non-portable and probably buggy function, well, then the application
> should be fixed. Not glibc.

I've heard this statement a lot, and I certainly have a lot of sympathy
with the argument that one should use dynamic strings.  And I do use
asprintf by preference.  But this argument is applied so universally,
which leaves me wondering if I'm missing something.

At the risk of drifting off into the weeds (and if you feel like this is
inappropriate for this mailing list, please feel free to only respond in
private or tell me that in private), how would you recommend writing this
function without using strlcpy and strlcat?

/*
 * Given a vector and a separator string, allocate and build a new string
 * composed of all the strings in the vector separated from each other by the
 * seperator string.  Caller is responsible for freeing.
 */
char *
vector_join(const struct vector *vector, const char *seperator)
{
    char *string;
    size_t i, size, seplen;

    /* If the vector is empty, this is trivial. */
    assert(vector != NULL);
    if (vector->count == 0)
        return xstrdup("");

    /* Determine the total size of the resulting string. */
    seplen = strlen(seperator);
    for (size = 0, i = 0; i < vector->count; i++)
        size += strlen(vector->strings[i]);
    size += (vector->count - 1) * seplen + 1;

    /* Allocate the memory and build up the string using strlcat. */
    string = xmalloc(size);
    strlcpy(string, vector->strings[0], size);
    for (i = 1; i < vector->count; i++) {
        strlcat(string, seperator, size);
        strlcat(string, vector->strings[i], size);
    }
    return string;
}

If the answer is "use strcpy and strcat because this code is provably
correct with them," I guess I understand your position, but if I somehow
messed up the length calculation logic, I would prefer to truncate the
string than to create a buffer overflow.  The truncation might still be a
security vulnerability; the buffer overflow is almost guaranteed to be
one.  Maybe this is just personal preference.

If you have some method that doesn't involve strcpy and strcat either, and
that you think is as readable as the above, I would love to hear what it
is.  That's a sincere question: I would be happy to be taught how to be a
better C programmer.

-- 
Russ Allbery (eagle@eyrie.org)              <http://www.eyrie.org/~eagle/>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-13 23:59                               ` Russ Allbery
@ 2014-08-14  0:55                                 ` Joseph S. Myers
  2014-08-14  1:01                                   ` Russ Allbery
  2014-08-14  2:25                                 ` Rich Felker
  2014-08-14  6:08                                 ` Paul Eggert
  2 siblings, 1 reply; 79+ messages in thread
From: Joseph S. Myers @ 2014-08-14  0:55 UTC (permalink / raw)
  To: Russ Allbery; +Cc: libc-alpha

On Wed, 13 Aug 2014, Russ Allbery wrote:

> If the answer is "use strcpy and strcat because this code is provably
> correct with them," I guess I understand your position, but if I somehow

stpcpy is better than strcpy and strcat here; repeated strcat, or strlcat 
as in your code, has quadratic overhead repeatedly looking for the end of 
the string being built up, whereas stpcpy returns the end pointer for use 
in the next stpcpy call.  However:

> messed up the length calculation logic, I would prefer to truncate the

It's not obvious what a safer version of the stpcpy code would look like, 
and the length calculation logic fails to allow for the sum of the lengths 
exceeding SIZE_MAX (e.g. from a long separator, or from several pointers 
to the same long string in the vector) (so resulting in buffer overflows 
from any version that doesn't check each write fits in the remaining 
space).  (This size calculation as written could feasibly overflow even on 
a 64-bit system; many such overflows are only realistic on 32-bit 
systems.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-14  0:55                                 ` Joseph S. Myers
@ 2014-08-14  1:01                                   ` Russ Allbery
  0 siblings, 0 replies; 79+ messages in thread
From: Russ Allbery @ 2014-08-14  1:01 UTC (permalink / raw)
  To: libc-alpha

"Joseph S. Myers" <joseph@codesourcery.com> writes:
> On Wed, 13 Aug 2014, Russ Allbery wrote:

>> If the answer is "use strcpy and strcat because this code is provably
>> correct with them," I guess I understand your position, but if I somehow

> stpcpy is better than strcpy and strcat here; repeated strcat, or strlcat 
> as in your code, has quadratic overhead repeatedly looking for the end of 
> the string being built up, whereas stpcpy returns the end pointer for use 
> in the next stpcpy call.

Arguably, yes, but the code is not performance-critical, and I find the
stpcpy code (slightly) harder to read, possibly just because it's less
familiar.  No one is ever going to notice that quadratic factor here.
Either way, though, I would prefer the belt and suspenders of the
additional check on the length calculation logic.

>> messed up the length calculation logic, I would prefer to truncate the

> It's not obvious what a safer version of the stpcpy code would look
> like, and the length calculation logic fails to allow for the sum of the
> lengths exceeding SIZE_MAX (e.g. from a long separator, or from several
> pointers to the same long string in the vector) (so resulting in buffer
> overflows from any version that doesn't check each write fits in the
> remaining space).  (This size calculation as written could feasibly
> overflow even on a 64-bit system; many such overflows are only realistic
> on 32-bit systems.)

Yes, thank you, that's a good point.  I need to fix that.  But I think
that's orthogonal to the strlcpy vs. strcpy/stpcpy debate here.  One has
to do the size calculation in roughly the same way with either approach,
and the overflow check applies to either approach, I believe.

-- 
Russ Allbery (eagle@eyrie.org)              <http://www.eyrie.org/~eagle/>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Time to add strlcpy/strlcat FINALLY
  2014-08-13 22:20                         ` Time to add strlcpy/strlcat FINALLY David A. Wheeler
@ 2014-08-14  1:09                           ` Paul Eggert
  2014-08-14  2:34                             ` Rich Felker
  2014-08-14  3:02                             ` William Park
  0 siblings, 2 replies; 79+ messages in thread
From: Paul Eggert @ 2014-08-14  1:09 UTC (permalink / raw)
  To: dwheeler, libc-alpha

David A. Wheeler wrote:
> The general consensus of people who have *studied*  how to develop secure software

Long ago, when I looked into the matter by examining the first few 
instances of strlcpy in OpenSSH (this was soon after they rewrote it to 
use strlcpy), the use of strlcpy did not fix any bugs and may have 
introduced one due to silent truncation.  This convinced me that strlcpy 
was not a good way to go for its intended application area.  And I'll 
bet my admittedly-brief study examined more empirical evidence than the 
cavalcade of experts you cited.

The argument is not strlcpy versus nothing.  It's strlcpy versus 
reasonable alternatives that take the same or less work.  These days the 
alternatives are better, so why refight this old battle?

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-13 23:59                               ` Russ Allbery
  2014-08-14  0:55                                 ` Joseph S. Myers
@ 2014-08-14  2:25                                 ` Rich Felker
  2014-08-14  5:25                                   ` Russ Allbery
  2014-08-14  6:08                                 ` Paul Eggert
  2 siblings, 1 reply; 79+ messages in thread
From: Rich Felker @ 2014-08-14  2:25 UTC (permalink / raw)
  To: libc-alpha

On Wed, Aug 13, 2014 at 04:59:46PM -0700, Russ Allbery wrote:
> At the risk of drifting off into the weeds (and if you feel like this is
> inappropriate for this mailing list, please feel free to only respond in
> private or tell me that in private), how would you recommend writing this
> function without using strlcpy and strlcat?
> 
> /*
>  * Given a vector and a separator string, allocate and build a new string
>  * composed of all the strings in the vector separated from each other by the
>  * seperator string.  Caller is responsible for freeing.
>  */
> char *
> vector_join(const struct vector *vector, const char *seperator)
> {
>     char *string;
>     size_t i, size, seplen;
> 
>     /* If the vector is empty, this is trivial. */
>     assert(vector != NULL);
>     if (vector->count == 0)
>         return xstrdup("");
> 
>     /* Determine the total size of the resulting string. */
>     seplen = strlen(seperator);
>     for (size = 0, i = 0; i < vector->count; i++)
>         size += strlen(vector->strings[i]);
>     size += (vector->count - 1) * seplen + 1;
> 
>     /* Allocate the memory and build up the string using strlcat. */
>     string = xmalloc(size);
>     strlcpy(string, vector->strings[0], size);
>     for (i = 1; i < vector->count; i++) {
>         strlcat(string, seperator, size);
>         strlcat(string, vector->strings[i], size);
>     }
>     return string;
> }
> 
> If the answer is "use strcpy and strcat because this code is provably
> correct with them," I guess I understand your position, but if I somehow

The answer is "use memcpy". While statements of absolutes are
generally wrong, I'd venture very close to saying you should never use
any functions except snprintf and memcpy for constructing strings, at
least not unless you're an expert and willing to spend a lot of extra
time checking over your code to make sure you didn't make careless
errors.

There are a few places where strcpy is safe and efficient (e.g. when
you know a bound on the length of a string because it's a tail of a
string of known length, and thereby know that it fits in the
destination buffer, but you don't have the original pointer to get the
exact length and use memcpy) but these are definitely rare cases
rather than the norm.

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Time to add strlcpy/strlcat FINALLY
  2014-08-14  1:09                           ` Paul Eggert
@ 2014-08-14  2:34                             ` Rich Felker
  2014-08-14  3:02                             ` William Park
  1 sibling, 0 replies; 79+ messages in thread
From: Rich Felker @ 2014-08-14  2:34 UTC (permalink / raw)
  To: libc-alpha

On Wed, Aug 13, 2014 at 06:09:22PM -0700, Paul Eggert wrote:
> David A. Wheeler wrote:
> >The general consensus of people who have *studied*  how to develop secure software
> 
> Long ago, when I looked into the matter by examining the first few
> instances of strlcpy in OpenSSH (this was soon after they rewrote it
> to use strlcpy), the use of strlcpy did not fix any bugs and may
> have introduced one due to silent truncation.  This convinced me
> that strlcpy was not a good way to go for its intended application
> area.  And I'll bet my admittedly-brief study examined more
> empirical evidence than the cavalcade of experts you cited.
> 
> The argument is not strlcpy versus nothing.  It's strlcpy versus
> reasonable alternatives that take the same or less work.  These days
> the alternatives are better, so why refight this old battle?

What are the better alternatives? I'm not asking just to be
argumentative, but to get an idea of what you actually consider to be
an alternative to str[l]cpy/str[l]cat, and whether it's idiomatically
the same but fixed to be safe, or whether it requires switching to a
completely different C programming idiom that might have various pros
and cons.

My personal view is that strings should be constructed with memcpy or
snprintf (or equivalently asprintf, if you can deal with allocation
failures gracefully) as the rule, with deviations from the rule only
when they're well-justified. But I've seen plenty of programmers who
think you should always use a high-level string API, even in C; this
is another solution, but it definitely has its cons too. The one thing
I like about strlcpy/strlcat, despite that _I_ dislike the whole
copy/cat idiom, is that they allow people who are stuck on this idiom
(and unwilling to change) to make their code somewhat better at
checking for security-critical errors.

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Time to add strlcpy/strlcat FINALLY
  2014-08-14  1:09                           ` Paul Eggert
  2014-08-14  2:34                             ` Rich Felker
@ 2014-08-14  3:02                             ` William Park
  2014-08-14 13:01                               ` Mike Frysinger
  1 sibling, 1 reply; 79+ messages in thread
From: William Park @ 2014-08-14  3:02 UTC (permalink / raw)
  To: libc-alpha

On Wed, Aug 13, 2014 at 06:09:22PM -0700, Paul Eggert wrote:
> David A. Wheeler wrote:
> >The general consensus of people who have *studied*  how to develop
> >secure software
> 
> Long ago, when I looked into the matter by examining the first few instances
> of strlcpy in OpenSSH (this was soon after they rewrote it to use strlcpy),
> the use of strlcpy did not fix any bugs and may have introduced one due to
> silent truncation.  This convinced me that strlcpy was not a good way to go
> for its intended application area.  And I'll bet my admittedly-brief study
> examined more empirical evidence than the cavalcade of experts you cited.
> 
> The argument is not strlcpy versus nothing.  It's strlcpy versus reasonable
> alternatives that take the same or less work.  These days the alternatives
> are better, so why refight this old battle?

What are the alternatives, though?  Everyone criticizes 'strcpy' but
it's included in glibc.  But, 'strlcpy' is excluded on the ground that
it's not sufficiently better.  Seems to me, someone has sensitive toes.

If it's included, then can you change the names to 'snstrcpy' or
something?  For ordinary users, names are more confusing than the
debate.
-- 
William

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-14  2:25                                 ` Rich Felker
@ 2014-08-14  5:25                                   ` Russ Allbery
  2014-08-14  5:46                                     ` Rich Felker
  0 siblings, 1 reply; 79+ messages in thread
From: Russ Allbery @ 2014-08-14  5:25 UTC (permalink / raw)
  To: libc-alpha

Rich Felker <dalias@libc.org> writes:
> On Wed, Aug 13, 2014 at 04:59:46PM -0700, Russ Allbery wrote:

>> If the answer is "use strcpy and strcat because this code is provably
>> correct with them," I guess I understand your position, but if I
>> somehow

> The answer is "use memcpy".

memcpy would make that code harder to read and understand, or at least it
did when I tried to write it that way.  Instead of:

    for (i = 1; i < vector->count; i++) {
        strlcat(string, seperator, size);
        strlcat(string, vector->strings[i], size);
    }

you get code more like:

    for (i = 1; i < vector->count; i++) {
        complen = strlen(vector->strings[i]);
        assert(size - offset >= seplen + complen);
        memcpy(string + offset, separator, seplen);
        offset += seplen;
        memcpy(string + offset, vector->strings[i], complen);
        offset += complen;
    }
    string[offset] = '\0';

which I, at least, find harder to read and harder to be sure is correct.
(Admittedly, the error checking is arguably better since this asserts if
the string size calculation was wrong, but you can change the strlcat
version to do that as well if you want without much change.)  I'm not
seeing any advantages over strlcpy and strlcat except for performance,
which doesn't matter here.  So this is my dubious face, although maybe I'm
missing some good way to express this algorithm using memcpy that isn't
painful to read and doesn't require checking even more pointer math.

In general, I try to write C code that does not do lots of pointer math,
since I've found that most memory overrun bugs in my own code happen
because of pointer math that I got wrong.

So far, I'm afraid y'all are reinforcing my opinion that strlcat and
strlcpy are a fairly good API for writing code that constructs strings
dynamically from a number of elements known only at runtime.  asprintf is
far better in places where it can be used easily (and I use it by
preference), but that sort of string construction is not one of them.
Well, at least unless you're willing to keep allocating more memory for
each part of the string assembly and recopying the data, and while
performance doesn't matter for this, I'm not sure that extends to doing a
malloc and free for each addition of another component to the string.

-- 
Russ Allbery (eagle@eyrie.org)              <http://www.eyrie.org/~eagle/>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-14  5:25                                   ` Russ Allbery
@ 2014-08-14  5:46                                     ` Rich Felker
  2014-08-14  6:15                                       ` Russ Allbery
  0 siblings, 1 reply; 79+ messages in thread
From: Rich Felker @ 2014-08-14  5:46 UTC (permalink / raw)
  To: Russ Allbery; +Cc: libc-alpha

On Wed, Aug 13, 2014 at 10:25:18PM -0700, Russ Allbery wrote:
> Rich Felker <dalias@libc.org> writes:
> > On Wed, Aug 13, 2014 at 04:59:46PM -0700, Russ Allbery wrote:
> 
> >> If the answer is "use strcpy and strcat because this code is provably
> >> correct with them," I guess I understand your position, but if I
> >> somehow
> 
> > The answer is "use memcpy".
> 
> memcpy would make that code harder to read and understand, or at least it
> did when I tried to write it that way.  Instead of:

Indeed, I misread. Since you're already computing the lengths to
allocate the right storage, but don't have storage to keep the
computed lengths, it's inefficient and ugly to compute them again.
This is one case where GNY stpcpy, or using snprintf as a substitute
for it, is probably the right solution.

Certainly strlcpy/strlcat are not needed here since the length is
known to be correct.

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-13 23:59                               ` Russ Allbery
  2014-08-14  0:55                                 ` Joseph S. Myers
  2014-08-14  2:25                                 ` Rich Felker
@ 2014-08-14  6:08                                 ` Paul Eggert
  2014-08-15 14:25                                   ` David A. Wheeler
  2 siblings, 1 reply; 79+ messages in thread
From: Paul Eggert @ 2014-08-14  6:08 UTC (permalink / raw)
  To: Russ Allbery, libc-alpha

Russ Allbery wrote:
> how would you recommend writing this
> function without using strlcpy and strlcat?
>
> char *
> vector_join(const struct vector *vector, const char *seperator)
> {
>      char *string;
>      size_t i, size, seplen;
>
>      /* If the vector is empty, this is trivial. */
>      assert(vector != NULL);
>      if (vector->count == 0)
>          return xstrdup("");
>
>      /* Determine the total size of the resulting string. */
>      seplen = strlen(seperator);
>      for (size = 0, i = 0; i < vector->count; i++)
>          size += strlen(vector->strings[i]);
>      size += (vector->count - 1) * seplen + 1;
>
>      /* Allocate the memory and build up the string using strlcat. */
>      string = xmalloc(size);
>      strlcpy(string, vector->strings[0], size);
>      for (i = 1; i < vector->count; i++) {
>          strlcat(string, seperator, size);
>          strlcat(string, vector->strings[i], size);
>      }
>      return string;
> }

The following is certainly shorter and to my eyes considerably easier to 
follow, which appears to be the goal here (not efficiency, obviously, or 
we wouldn't be talking about strlcat).

  char *
  vector_join(const struct vector *vector, const char *sep)
  {
    char *string = xstrdup("");
    for (size_t i = 0; i < vector->count; i++) {
      char *t = xasprintf("%s%s%s", string, i ? sep : "", 
vector->strings[i]);
      free (string);
      string = t;
    }
    return string;
  }

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-14  5:46                                     ` Rich Felker
@ 2014-08-14  6:15                                       ` Russ Allbery
  2014-08-14  9:55                                         ` Florian Weimer
  2014-08-14 15:20                                         ` Paul Eggert
  0 siblings, 2 replies; 79+ messages in thread
From: Russ Allbery @ 2014-08-14  6:15 UTC (permalink / raw)
  To: libc-alpha

Rich Felker <dalias@libc.org> writes:

> Indeed, I misread. Since you're already computing the lengths to
> allocate the right storage, but don't have storage to keep the computed
> lengths, it's inefficient and ugly to compute them again.  This is one
> case where GNY stpcpy, or using snprintf as a substitute for it, is
> probably the right solution.

> Certainly strlcpy/strlcat are not needed here since the length is known
> to be correct.

The second part is what I've often suspected is behind the resistence to
strlcpy and strlcat.  Folks making that argument are confident enough in
their ability to write C code and length calculation that they don't feel
the need for any further safety measure.

Based on your contributions here, you're clearly a more expert C
programmer than I am, and I believe this probably works well for you.  I
do not have the same level of confidence in my own ability.  :)  I would
much prefer to add the additional check and take a small performance hit
for the sake of a bit more robustness in the face of my own mistakes.

Here's the loop with strlcpy/strlcat with assert checking, for comparing
apples to apples:

    strlcpy(string, vector->strings[0], size);
    for (i = 1; i < vector->count; i++) {
        nbytes = strlcat(string, seperator, size);
        assert(nbytes < size);
        nbytes = strlcat(string, vector->strings[i], size);
        assert(nbytes < size);
    }

(This can obviously be simplified if you have a version of assert that
always runs the code inside the assert.)

The snprintf code that I came up with is about as bad as the memcpy code:

    offset = snprintf(string, size, "%s", vector->strings[0]);
    for (i = 1; i < vector->count; i++) {
        nbytes = snprintf(string + offset, size - offset, "%s", separator);
        assert(nbytes < size - offset);
        offset += nbytes;
        nbytes = snprintf(string + offset, size - offset, "%s",
                          vector->strings[1]);
        assert(nbytes < size - offset);
        offset += nbytes;
    }

The stpcpy code I came up with is:

    end = stpcpy(string, vector->strings[0]);
    for (i = 1; i < vector->count; i++) {
        end = stpcpy(end, separator);
        end = stpcpy(end, vector->strings[1]);
    }

This is at least comparable in readability to the assert-less
strlcpy/strlcat original, and more readable than the version with asserts,
but has no additional safety check.

So... if you're an expert C programmer who is confident in your ability to
write the size calculation and always get it right, stpcpy is at least
arguably the most readable.  But for the rest of us, I'm still quite
unconvinced by the arguments against strlcpy/strlcat.

I should stress that this is real code.  I've been following this list for
years, and have a lot of respect for the people who post here.  So, when
people said repeatedly that strlcpy and strlcat are unnecessary and
shouldn't be used, I went to some effort to find other ways of writing all
the code that I had that used them.  And I was largely successful;
asprintf in particular is a better solution for most places I was using
them.  But I was left with a small set of cases of dynamic string
creation, like this one, and I could not convince myself that any of the
approaches other than strlcpy/strlcat were better.  And that's still where
I am.

I would really like glibc to include strlcpy and strlcat so that I can
stop including my own versions for use in this specific situation.  Or,
alternately, for someone to explain to me why I'm wrong in the form of
more readable and maintainable code that doesn't use strlcpy/strlcat.

-- 
Russ Allbery (eagle@eyrie.org)              <http://www.eyrie.org/~eagle/>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-14  6:15                                       ` Russ Allbery
@ 2014-08-14  9:55                                         ` Florian Weimer
  2014-08-14 10:02                                           ` Andreas Schwab
  2014-08-14 15:20                                         ` Paul Eggert
  1 sibling, 1 reply; 79+ messages in thread
From: Florian Weimer @ 2014-08-14  9:55 UTC (permalink / raw)
  To: libc-alpha

On 08/14/2014 08:15 AM, Russ Allbery wrote:

> The second part is what I've often suspected is behind the resistence to
> strlcpy and strlcat.  Folks making that argument are confident enough in
> their ability to write C code and length calculation that they don't feel
> the need for any further safety measure.

Here's a security bug which resulted from the incorrect use of strlcpy:

   <http://www.samba.org/samba/security/CVE-2014-3560>
   <https://git.samba.org/?p=samba.git;a=commitdiff;h=e6a848630d>

If glibc had provided a fortified version of strlcpy, Samba had used it, 
and nmbd had been compiled with -O3, this we would have been able to 
rule out code execution completely (but the crash would have remained, 
of course).

Instead, like many other projects, Samba rolls their own version of 
strlcpy, which doesn't know about __builtin_object_size and other GNU 
extensions.

-- 
Florian Weimer / Red Hat Product Security

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-14  9:55                                         ` Florian Weimer
@ 2014-08-14 10:02                                           ` Andreas Schwab
  2014-08-14 10:06                                             ` Florian Weimer
  0 siblings, 1 reply; 79+ messages in thread
From: Andreas Schwab @ 2014-08-14 10:02 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

Florian Weimer <fweimer@redhat.com> writes:

> Here's a security bug which resulted from the incorrect use of strlcpy:
>
>   <http://www.samba.org/samba/security/CVE-2014-3560>
>   <https://git.samba.org/?p=samba.git;a=commitdiff;h=e6a848630d>

This only proves that strlcpy isn't any better at preventing security
bugs.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-14 10:02                                           ` Andreas Schwab
@ 2014-08-14 10:06                                             ` Florian Weimer
  2014-08-14 10:13                                               ` Andreas Schwab
  0 siblings, 1 reply; 79+ messages in thread
From: Florian Weimer @ 2014-08-14 10:06 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: libc-alpha

On 08/14/2014 12:02 PM, Andreas Schwab wrote:
> Florian Weimer <fweimer@redhat.com> writes:
>
>> Here's a security bug which resulted from the incorrect use of strlcpy:
>>
>>    <http://www.samba.org/samba/security/CVE-2014-3560>
>>    <https://git.samba.org/?p=samba.git;a=commitdiff;h=e6a848630d>
>
> This only proves that strlcpy isn't any better at preventing security
> bugs.

It also shows that there is a real cost to not providing strlcpy in glibc.

-- 
Florian Weimer / Red Hat Product Security

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-14 10:06                                             ` Florian Weimer
@ 2014-08-14 10:13                                               ` Andreas Schwab
  2014-08-14 16:26                                                 ` Rich Felker
  0 siblings, 1 reply; 79+ messages in thread
From: Andreas Schwab @ 2014-08-14 10:13 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

Florian Weimer <fweimer@redhat.com> writes:

> On 08/14/2014 12:02 PM, Andreas Schwab wrote:
>> Florian Weimer <fweimer@redhat.com> writes:
>>
>>> Here's a security bug which resulted from the incorrect use of strlcpy:
>>>
>>>    <http://www.samba.org/samba/security/CVE-2014-3560>
>>>    <https://git.samba.org/?p=samba.git;a=commitdiff;h=e6a848630d>
>>
>> This only proves that strlcpy isn't any better at preventing security
>> bugs.
>
> It also shows that there is a real cost to not providing strlcpy in glibc.

No, you got it backwards.  Had samba used the standard string functions
it would have been "protected" by fortification.  Of course,
fortification is just a workaround for sloppy programming anyway.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Time to add strlcpy/strlcat FINALLY
  2014-08-14  3:02                             ` William Park
@ 2014-08-14 13:01                               ` Mike Frysinger
  2014-08-15 10:37                                 ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 79+ messages in thread
From: Mike Frysinger @ 2014-08-14 13:01 UTC (permalink / raw)
  To: libc-alpha; +Cc: William Park

[-- Attachment #1: Type: text/plain, Size: 421 bytes --]

On Wed 13 Aug 2014 23:02:05 William Park wrote:
> What are the alternatives, though?  Everyone criticizes 'strcpy' but
> it's included in glibc.  But, 'strlcpy' is excluded on the ground that
> it's not sufficiently better.  Seems to me, someone has sensitive toes.

err, no, your example makes no sense.  one of these things is in the standards 
documents while the other is not.  get strlcpy into POSIX :P.
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-14  6:15                                       ` Russ Allbery
  2014-08-14  9:55                                         ` Florian Weimer
@ 2014-08-14 15:20                                         ` Paul Eggert
  2014-08-14 17:20                                           ` Russ Allbery
  2014-08-14 17:46                                           ` Rich Felker
  1 sibling, 2 replies; 79+ messages in thread
From: Paul Eggert @ 2014-08-14 15:20 UTC (permalink / raw)
  To: Russ Allbery, libc-alpha

Russ Allbery wrote:
> I would really like ... for someone to explain to me why I'm wrong
> in the form of more readable and maintainable code
>that doesn't use strlcpy/strlcat.

It seems that our emails crossed; please take a look at
https://sourceware.org/ml/libc-alpha/2014-08/msg00226.html

Here's another (untested) way to skin the cat, also considerably nicer 
than the strlcpy+strlcat version:

  char *
  vector_join(const struct vector *vector, const char *sep)
  {
    char *string;
    size_t size;
    FILE *f = xopen_memstream(&string, &size);
    for (size_t i = 0; i < vector->count; i++)
      xfprintf(f, "%s%s", i ? sep : "", vector->strings[i]);
    xfclose(f);
    return string;
  }

This may help to explain why POSIX didn't standardize strlcpy.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-14 10:13                                               ` Andreas Schwab
@ 2014-08-14 16:26                                                 ` Rich Felker
  2014-08-14 16:53                                                   ` Andreas Schwab
  0 siblings, 1 reply; 79+ messages in thread
From: Rich Felker @ 2014-08-14 16:26 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Florian Weimer, libc-alpha

On Thu, Aug 14, 2014 at 12:12:59PM +0200, Andreas Schwab wrote:
> Florian Weimer <fweimer@redhat.com> writes:
> 
> > On 08/14/2014 12:02 PM, Andreas Schwab wrote:
> >> Florian Weimer <fweimer@redhat.com> writes:
> >>
> >>> Here's a security bug which resulted from the incorrect use of strlcpy:
> >>>
> >>>    <http://www.samba.org/samba/security/CVE-2014-3560>
> >>>    <https://git.samba.org/?p=samba.git;a=commitdiff;h=e6a848630d>
> >>
> >> This only proves that strlcpy isn't any better at preventing security
> >> bugs.
> >
> > It also shows that there is a real cost to not providing strlcpy in glibc.
> 
> No, you got it backwards.  Had samba used the standard string functions
> it would have been "protected" by fortification.  Of course,
> fortification is just a workaround for sloppy programming anyway.

Except that we don't have any control over what Samba does, much less
any control over what every single broken app out there does. Unless
you want to go audit them all, file bug reports, and work through the
fights with their maintainers to get them to fix things, the practical
way to improve security is to provide a correct, fortify-compatible
strlcpy/strlcat in glibc so that these bugs can be caught
automatically. That's something the glibc team _can_ actually do.

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-14 16:26                                                 ` Rich Felker
@ 2014-08-14 16:53                                                   ` Andreas Schwab
  2014-08-14 17:04                                                     ` Rich Felker
  0 siblings, 1 reply; 79+ messages in thread
From: Andreas Schwab @ 2014-08-14 16:53 UTC (permalink / raw)
  To: Rich Felker; +Cc: Florian Weimer, libc-alpha

Rich Felker <dalias@libc.org> writes:

> On Thu, Aug 14, 2014 at 12:12:59PM +0200, Andreas Schwab wrote:
>> Florian Weimer <fweimer@redhat.com> writes:
>> 
>> > On 08/14/2014 12:02 PM, Andreas Schwab wrote:
>> >> Florian Weimer <fweimer@redhat.com> writes:
>> >>
>> >>> Here's a security bug which resulted from the incorrect use of strlcpy:
>> >>>
>> >>>    <http://www.samba.org/samba/security/CVE-2014-3560>
>> >>>    <https://git.samba.org/?p=samba.git;a=commitdiff;h=e6a848630d>
>> >>
>> >> This only proves that strlcpy isn't any better at preventing security
>> >> bugs.
>> >
>> > It also shows that there is a real cost to not providing strlcpy in glibc.
>> 
>> No, you got it backwards.  Had samba used the standard string functions
>> it would have been "protected" by fortification.  Of course,
>> fortification is just a workaround for sloppy programming anyway.
>
> Except that we don't have any control over what Samba does, much less
> any control over what every single broken app out there does.

There's a lot of ways to screw things up.

> Unless you want to go audit them all, file bug reports, and work
> through the fights with their maintainers to get them to fix things,
> the practical way to improve security is to provide a correct,
> fortify-compatible strlcpy/strlcat in glibc so that these bugs can be
> caught automatically. That's something the glibc team _can_ actually
> do.

What is really neded is a high-performance, high-level string library.
Something of the expressiveness of std::string.  Then kludges like
strlcpy wouldn't be needed.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-14 16:53                                                   ` Andreas Schwab
@ 2014-08-14 17:04                                                     ` Rich Felker
  2014-08-18  7:31                                                       ` Andreas Schwab
  0 siblings, 1 reply; 79+ messages in thread
From: Rich Felker @ 2014-08-14 17:04 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Florian Weimer, libc-alpha

On Thu, Aug 14, 2014 at 06:53:51PM +0200, Andreas Schwab wrote:
> Rich Felker <dalias@libc.org> writes:
> 
> > On Thu, Aug 14, 2014 at 12:12:59PM +0200, Andreas Schwab wrote:
> >> Florian Weimer <fweimer@redhat.com> writes:
> >> 
> >> > On 08/14/2014 12:02 PM, Andreas Schwab wrote:
> >> >> Florian Weimer <fweimer@redhat.com> writes:
> >> >>
> >> >>> Here's a security bug which resulted from the incorrect use of strlcpy:
> >> >>>
> >> >>>    <http://www.samba.org/samba/security/CVE-2014-3560>
> >> >>>    <https://git.samba.org/?p=samba.git;a=commitdiff;h=e6a848630d>
> >> >>
> >> >> This only proves that strlcpy isn't any better at preventing security
> >> >> bugs.
> >> >
> >> > It also shows that there is a real cost to not providing strlcpy in glibc.
> >> 
> >> No, you got it backwards.  Had samba used the standard string functions
> >> it would have been "protected" by fortification.  Of course,
> >> fortification is just a workaround for sloppy programming anyway.
> >
> > Except that we don't have any control over what Samba does, much less
> > any control over what every single broken app out there does.
> 
> There's a lot of ways to screw things up.
> 
> > Unless you want to go audit them all, file bug reports, and work
> > through the fights with their maintainers to get them to fix things,
> > the practical way to improve security is to provide a correct,
> > fortify-compatible strlcpy/strlcat in glibc so that these bugs can be
> > caught automatically. That's something the glibc team _can_ actually
> > do.
> 
> What is really neded is a high-performance, high-level string library.
> Something of the expressiveness of std::string.  Then kludges like
> strlcpy wouldn't be needed.

I disagree strongly here. If you want that, you should be using a
language where it's idiomatic. Doing it in C just gets you the worst
of both worlds: (1) C's lack of syntax to express it concisely and
lack of exceptions to handle allocation failures, which are a horrible
pain to handle when they can happen at each operation rather than at
isolated resource-allocation points, and (2) the HLL style loss of
fail-safety (all operations can fail because they do allocation under
the hood), massive overhead in memory usage and memory fragmentation,
load on malloc/free, etc.

The right way to solve this problem is either not to use C, or to use
the tools you have in C (mainly, snprintf) to make string handling
safe while not sacrificing the only values C has: efficiency and
fail-safety in the form of the ability to control where and when you
do allocation and write functions that don't depend on allocation.

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-14 15:20                                         ` Paul Eggert
@ 2014-08-14 17:20                                           ` Russ Allbery
  2014-08-14 17:46                                           ` Rich Felker
  1 sibling, 0 replies; 79+ messages in thread
From: Russ Allbery @ 2014-08-14 17:20 UTC (permalink / raw)
  To: Paul Eggert; +Cc: libc-alpha

Paul Eggert <eggert@cs.ucla.edu> writes:

> It seems that our emails crossed; please take a look at
> https://sourceware.org/ml/libc-alpha/2014-08/msg00226.html

Indeed, they did.

Hm.  For some reason, a malloc and free for every component felt really
heavy to me, even for code that wasn't performance-critical.  On further
inspection of that feeling, it's probably just premature optimization and
something I should just get over.

Thank you!

> Here's another (untested) way to skin the cat, also considerably nicer
> than the strlcpy+strlcat version:

>  char *
>  vector_join(const struct vector *vector, const char *sep)
>  {
>    char *string;
>    size_t size;
>    FILE *f = xopen_memstream(&string, &size);
>    for (size_t i = 0; i < vector->count; i++)
>      xfprintf(f, "%s%s", i ? sep : "", vector->strings[i]);
>    xfclose(f);
>    return string;
>  }

> This may help to explain why POSIX didn't standardize strlcpy.

Ah, indeed.  I had completely missed the existence of open_memstream.
Thank you for this as well!

-- 
Russ Allbery (eagle@eyrie.org)              <http://www.eyrie.org/~eagle/>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-14 15:20                                         ` Paul Eggert
  2014-08-14 17:20                                           ` Russ Allbery
@ 2014-08-14 17:46                                           ` Rich Felker
  2014-08-15  7:51                                             ` Florian Weimer
  1 sibling, 1 reply; 79+ messages in thread
From: Rich Felker @ 2014-08-14 17:46 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Russ Allbery, libc-alpha

On Thu, Aug 14, 2014 at 08:19:46AM -0700, Paul Eggert wrote:
> Russ Allbery wrote:
> >I would really like ... for someone to explain to me why I'm wrong
> >in the form of more readable and maintainable code
> >that doesn't use strlcpy/strlcat.
> 
> It seems that our emails crossed; please take a look at
> https://sourceware.org/ml/libc-alpha/2014-08/msg00226.html
> 
> Here's another (untested) way to skin the cat, also considerably
> nicer than the strlcpy+strlcat version:
> 
>  char *
>  vector_join(const struct vector *vector, const char *sep)
>  {
>    char *string;
>    size_t size;
>    FILE *f = xopen_memstream(&string, &size);
>    for (size_t i = 0; i < vector->count; i++)
>      xfprintf(f, "%s%s", i ? sep : "", vector->strings[i]);
>    xfclose(f);
>    return string;
>  }
> 
> This may help to explain why POSIX didn't standardize strlcpy.

Actually open_memstream (without the x, uhg) is a really nice solution
to the problem, because it allows you to isolate your checks for
failure to two places:

1. Failure of the initial open_memstream() call
2. A single ferror() check before fclose().

Since the error flag is sticky and it's not an error to attempt to
write to a stream that already has errors, there's no need to check
individual write operations.

I think the synchronization overhead is a bit heavier than desired
(POSIX makes it difficult or impossible to implement open_memstream
without putting the memstream in the global open file list) but this
might be a small price to pay for the convenience and avoidance of the
major cons (like constantly having to check for errors, or worse,
built-in abort-on-fail) of other 'string library' solutions.

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-14 17:46                                           ` Rich Felker
@ 2014-08-15  7:51                                             ` Florian Weimer
  0 siblings, 0 replies; 79+ messages in thread
From: Florian Weimer @ 2014-08-15  7:51 UTC (permalink / raw)
  To: Rich Felker, Paul Eggert; +Cc: Russ Allbery, libc-alpha

On 08/14/2014 07:46 PM, Rich Felker wrote:
> I think the synchronization overhead is a bit heavier than desired
> (POSIX makes it difficult or impossible to implement open_memstream
> without putting the memstream in the global open file list)

This part could be addressed partially by having per-CPU open file 
lists.  You still need synchronization, but it's less to contend with 
other threads using the same functionality anymore.

-- 
Florian Weimer / Red Hat Product Security

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Time to add strlcpy/strlcat FINALLY
  2014-08-14 13:01                               ` Mike Frysinger
@ 2014-08-15 10:37                                 ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 79+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-08-15 10:37 UTC (permalink / raw)
  To: Mike Frysinger, libc-alpha; +Cc: mtk.manpages, William Park

On 08/14/2014 03:01 PM, Mike Frysinger wrote:
> On Wed 13 Aug 2014 23:02:05 William Park wrote:
>> What are the alternatives, though?  Everyone criticizes 'strcpy' but
>> it's included in glibc.  But, 'strlcpy' is excluded on the ground that
>> it's not sufficiently better.  Seems to me, someone has sensitive toes.
> 
> err, no, your example makes no sense.  one of these things is in the standards 
> documents while the other is not.  get strlcpy into POSIX :P.

These functions were apparently explicitly rejected by POSIX.

For those new to this well worn debate, my article here provides some
background: https://lwn.net/Articles/507319/

Cheers,

Michael


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-14  6:08                                 ` Paul Eggert
@ 2014-08-15 14:25                                   ` David A. Wheeler
  2014-08-15 15:36                                     ` Paul Eggert
  0 siblings, 1 reply; 79+ messages in thread
From: David A. Wheeler @ 2014-08-15 14:25 UTC (permalink / raw)
  To: eggert; +Cc: libc-alpha, eagle

On Wed, 13 Aug 2014 23:08:42 -0700, Paul Eggert <eggert@cs.ucla.edu> wrote:
> The following is certainly shorter and to my eyes considerably easier to 
> follow, which appears to be the goal here (not efficiency, obviously, or 
> we wouldn't be talking about strlcat).
> 
>   char *
>   vector_join(const struct vector *vector, const char *sep)
>   {
>     char *string = xstrdup("");
>     for (size_t i = 0; i < vector->count; i++) {
>       char *t = xasprintf("%s%s%s", string, i ? sep : "", 
> vector->strings[i]);
>       free (string);
>       string = t;
>     }
>     return string;
>   }

But xasprintf dynamically allocates memory, and thus is not a substitute for bounds-checking
functions like strlcpy, strcpy_s, etc. E.G., if you are *handed* a block of memory to use (with its length),
or if dynamic allocation is forbidden, or if you're trying to change cost-effectively change
1 million lines of C code to reduce the risks of buffer overflow.  Truncation is undesirable, but is a
WAY better result of usage mistakes compared to "1 million devices taken over by attackers" - the current model.

Please don't tell me "just recalculate the bounds correctly and constantly", because that's a
recipe for off-by-one errors.  Don't tell me "write it perfectly", that's been failing for 4 decades+ now.

This discussion about the need for bounds-checking routines keeps coming up, over and over again,
because it's STILL a serious problem.

The discussion won't go away until the problem is fixed.

--- David A. Wheeler

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-15 14:25                                   ` David A. Wheeler
@ 2014-08-15 15:36                                     ` Paul Eggert
  2014-08-15 16:14                                       ` David A. Wheeler
  0 siblings, 1 reply; 79+ messages in thread
From: Paul Eggert @ 2014-08-15 15:36 UTC (permalink / raw)
  To: dwheeler; +Cc: libc-alpha, eagle

David A. Wheeler wrote:
> if you are *handed* a block of memory to use (with its length),
> or if dynamic allocation is forbidden

That example reponded to Russ Allbery, who asked for dynamic allocation. 
  Other techniques should be used for fixed-size buffers.  For reasons 
that should be obvious I don't recommend the technique of silent 
truncation; but applications that require it can use snprintf, which is 
more portable and useful than strlcpy anyway.

> Don't tell me "write it perfectly", that's been failing for 4 decades+ now.

That's a false choice.  It's not strlcpy versus "write it perfectly". 
It's strlcpy versus other available approaches.  And the little 
empirical evidence that we have suggests that strlcpy is worse.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-15 15:36                                     ` Paul Eggert
@ 2014-08-15 16:14                                       ` David A. Wheeler
  2014-08-15 16:39                                         ` Rich Felker
  2014-08-15 22:04                                         ` Paul Eggert
  0 siblings, 2 replies; 79+ messages in thread
From: David A. Wheeler @ 2014-08-15 16:14 UTC (permalink / raw)
  To: libc-alpha, eagle

On Fri, 15 Aug 2014 08:35:52 -0700, Paul Eggert <eggert@cs.ucla.edu> wrote:
>   Other techniques should be used for fixed-size buffers.  For reasons 
> that should be obvious I don't recommend the technique of silent 
> truncation; but applications that require it can use snprintf, which is 
> more portable and useful than strlcpy anyway.

snprintf is overcomplicated for simple string copying (a common operation)
and does not directly support concatenation.
The presence of a formatting string when it's unnecessary is a serious problem;
a common mistake is to allow attackers to control a formatting string.
No formatting string means no opportunity for that mistake.
There's also the overhead (admittedly slight) of passing and processing the formatting string.

Many people who are trying to write secure software in C (such as the OpenBSD and Microsoft folks)
are increasingly trying to *stop* the use of traditional functions like strcpy and strncpy,
replacing them with functions that easily prevent out-of-bounds access when they
deal with statically-allocated buffers.
Their solutions are captured in strlcpy/strlcat and annex K, respectively.
The LibreSSL folks, like many security-minded folks, have to re-add strlcpy/strlcat:
  https://github.com/GostCrypt/libressl-portable/blob/master/configure.ac.tpl
But hey, the LibreSSL folks created strlcpy/strlcat, so that's unique to them, right?
Nope, OpenSSL defines "OPENSSL_strlcpy" (for example) in the first place
because they have to work around libraries without them.

Could you please post your empirical data?  Specific lines with context, etc.?
It's hard to seriously discuss pros & cons of such statements without them being posted for discussion.

--- David A. Wheeler

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-15 16:14                                       ` David A. Wheeler
@ 2014-08-15 16:39                                         ` Rich Felker
  2014-08-15 22:01                                           ` David A. Wheeler
  2014-08-15 22:04                                         ` Paul Eggert
  1 sibling, 1 reply; 79+ messages in thread
From: Rich Felker @ 2014-08-15 16:39 UTC (permalink / raw)
  To: David A. Wheeler; +Cc: libc-alpha, eagle

On Fri, Aug 15, 2014 at 12:14:21PM -0400, David A. Wheeler wrote:
> On Fri, 15 Aug 2014 08:35:52 -0700, Paul Eggert <eggert@cs.ucla.edu> wrote:
> >   Other techniques should be used for fixed-size buffers.  For reasons 
> > that should be obvious I don't recommend the technique of silent 
> > truncation; but applications that require it can use snprintf, which is 
> > more portable and useful than strlcpy anyway.
> 
> snprintf is overcomplicated for simple string copying (a common operation)

One extra argument of "%s" is not really over-complicated. I'll grant
one thing: having the return value with type int rather than size_t is
mildly problematic, in that snprintf can't be fully equivalent to
strlcpy or strncpy_s, but in cases where the input string is >2GB
you'd probably prefer to have an error rather than processing an
unbounded amount of input, as it's almost surely malicious. If your
data is really potentially that large, C strings are not the proper
format for it.

> and does not directly support concatenation.

Agreed, but concatenation without storing a "current position" between
operations is bad anyway (see: Schlemiel the painter). Still I agree
with you that this makes it hard to fix existing broken code using
only snprintf, and in fact it's dangerous when people try because they
do idiotic things that invoke UB like:

snprintf(buf, size, "%s%s", buf, foo);

> The presence of a formatting string when it's unnecessary is a serious problem;
> a common mistake is to allow attackers to control a formatting string.

Only if two conditions are met: (1) the programmer is a complete
idiot, and (2) you don't have compiler warnings to catch this usage.
Calling a printf-family function with a non-literal format string and
no variadic arguments is almost always an error, and that's why we
have warning options to flag it.

> No formatting string means no opportunity for that mistake.
> There's also the overhead (admittedly slight) of passing and processing the formatting string.

Actually it's a fairly heavy overhead; for small strings it certainly
dominates the runtime of the whole operation.

> Many people who are trying to write secure software in C (such as the OpenBSD and Microsoft folks)
> are increasingly trying to *stop* the use of traditional functions like strcpy and strncpy,

I reject the claim that MS's interest in stopping the use of these
traditional functions has anything to do with security. It's obvious
from the number of functions they "deprecated" and "replaced" with *_s
functions where there was no "security problem" with the original
function, and where the *_s function has so many arguments that it's
easy to misuse, that the intent was never security, and my view is
that it was always to undermine portability and fragment C as a
language. I think MS's continual interference in the C standards
process while they simultaneously refuse to implement any version of
the standard newer than the 1989 version supports this view of their
motives.

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-15 16:39                                         ` Rich Felker
@ 2014-08-15 22:01                                           ` David A. Wheeler
  2014-08-16  2:19                                             ` Rich Felker
  2014-08-16  2:26                                             ` Russ Allbery
  0 siblings, 2 replies; 79+ messages in thread
From: David A. Wheeler @ 2014-08-15 22:01 UTC (permalink / raw)
  To: libc-alpha

David A. Wheeler (me) said:
> > snprintf is overcomplicated for simple string copying (a common operation)

On Fri, 15 Aug 2014 12:39:12 -0400, Rich Felker <dalias@libc.org> wrote:
> One extra argument of "%s" is not really over-complicated. I'll grant
> one thing: having the return value with type int rather than size_t is
> mildly problematic, in that snprintf can't be fully equivalent to
> strlcpy or strncpy_s, but in cases where the input string is >2GB
> you'd probably prefer to have an error rather than processing an
> unbounded amount of input, as it's almost surely malicious. If your
> data is really potentially that large, C strings are not the proper
> format for it.

I agree snprintf's return type (int) is problematic compared to strlcpy/strlcat (size_t).

Your comments here are a little oblique; let me explain what *I* think the return type problem is
(I'm *guessing* we mean the same thing). snprintf returns the number of chars it *WOULD* have written,
not what it ACTUALLY wrote, so the "obvious" way to use it for a string copy looks like this:
 char buf[100];
 if (snprintf(buf, sizeof(buf), "%s", src) >= sizeof(buf)) {  // FAIL
   // handle truncation
 }
But wait, "int" is signed!!  Thus, if you give it a source with a length a little longer than INT_MAX,
then the int will (probably) wrap; snprintf will return a NEGATIVE number on
truncation, which will always less less than a sizeof, and thus the "handle truncation" is *skipped*
even when truncation has occurred. INT_MAX is only guaranteed to be +32767, so this
can happen easily on small systems (think "internet of things"), but even on some
32-bit systems this can happen.

This can be handled, but you can't just use a simple cast; errors also return negatives from snprintf.
Why do I need this complexity for trivial copy and concatenation again?
It might be better for it to just not copy so much (that's the point of rsize_t in annex K),
but clearly returning a value that is unlikely to be interpreted correctly is a problem.

> ... concatenation without storing a "current position" between
> operations is bad anyway (see: Schlemiel the painter).

I completely disagree; this is simply another trade-off.

In many cases you *should* hire Schlemiel.  Tracking current position
can be much faster, sure, but it is also a great way to create off-by-one errors.
In practice errors are often more than one, since it's easy to forget to change the position.
Concatenation code is often not in anything time-critical, anyway.
Thus the performance time "saved" via tracking current position is actually a net project negative
once you consider the increased development time and the increased risk and
potential impact of dangerous buffer overflows from using approaches
that easily lead to mistakes. Even in time-critical code Schlemiel
does quite well when the normal case is fairly short, and this is a
really really common case.

As strings get long, Schlemiel gets slower, but he's a more reliable worker :-).

> Still I agree with you that this makes it hard to fix existing broken code using
> only snprintf, and in fact it's dangerous when people try because they
> do idiotic things that invoke UB like:
> 
> snprintf(buf, size, "%s%s", buf, foo);

...

> > No formatting string means no opportunity for that mistake.
> > There's also the overhead (admittedly slight) of passing and processing the formatting string.
> 
> Actually it's a fairly heavy overhead; for small strings it certainly
> dominates the runtime of the whole operation.

That makes my argument stronger, then.

> > Many people who are trying to write secure software in C (such as the OpenBSD and Microsoft folks)
> > are increasingly trying to *stop* the use of traditional functions like strcpy and strncpy,
> 
> I reject the claim that MS's interest in stopping the use of these
> traditional functions has anything to do with security.

I disagree, but that's not my point.  My point is that buffer overflows are
a serious problem; we need widely-available functions for common cases that are
easy to use and make them much less likely.  Also, strlcpy/strlcat were created by OpenBSD
to counter buffer overflows, so clearly it's not just Microsoft that see a need for
standard easy-to-use functions to counter buffer overflows in fixed-length buffers.

--- David A. Wheeler

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-15 16:14                                       ` David A. Wheeler
  2014-08-15 16:39                                         ` Rich Felker
@ 2014-08-15 22:04                                         ` Paul Eggert
  2014-08-15 22:25                                           ` David A. Wheeler
  2014-08-18  0:15                                           ` David A. Wheeler
  1 sibling, 2 replies; 79+ messages in thread
From: Paul Eggert @ 2014-08-15 22:04 UTC (permalink / raw)
  To: dwheeler, libc-alpha, eagle

David A. Wheeler wrote:
> Could you please post your empirical data?

I'm afraid I long ago discarded the raw data, but you should be able to 
reproduce it by following the recipe outlined in 
<https://www.sourceware.org/ml/libc-alpha/2002-01/msg00159.html> and by 
looking at the first few grep hits.  Sorry, I didn't note down the 
working environment, but it was either C or en_US (either Latin-1 or 
UTF-8) and it was run on either GNU/Linux or Solaris at the time.

That URL is part of circa-2002 discussion on this topic, a discussion 
which closely resembled this one, right down to the list of 
participants.  Maybe we should repeat again in 2026?

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-15 22:04                                         ` Paul Eggert
@ 2014-08-15 22:25                                           ` David A. Wheeler
  2014-08-15 22:43                                             ` Adhemerval Zanella
  2014-08-18  0:15                                           ` David A. Wheeler
  1 sibling, 1 reply; 79+ messages in thread
From: David A. Wheeler @ 2014-08-15 22:25 UTC (permalink / raw)
  To: libc-alpha

On Fri, 15 Aug 2014 15:04:00 -0700, Paul Eggert <eggert@cs.ucla.edu> wrote:
> That URL is part of circa-2002 discussion on this topic, a discussion 
> which closely resembled this one, right down to the list of 
> participants.  Maybe we should repeat again in 2026?

I think there will be *no* need to wait until 2026 :-).
The article https://lwn.net/Articles/507319/
notes requests to add strlcpy/strlcat to glibc have occurred in at least
2000, 2004, 2007, 2011, and 2012.  We can now add 2014. There are probably other
years, and I'm not even including annex K (which would be ANOTHER list).
Yet with the data I have it looks like there's an increasing frequency of requests!

If glibc doesn't add functions that *easily* counter buffer
overflows in fixed-length strings, I predict with high confidence that requests will
keep coming, because this continues to be widely perceived as an unmet need.
I'm sure many requests are rejected by the glibc developers and stay rejected...
but this is *NOT* one of them.

These functions are used in OpenSSL, LibreSSL, OpenSSH, OpenBSD,
and the Linux kernel at *least*, and probably many others.
Projects keep re-implementing them, typically less efficiently than a glibc
implementation would be.  There are reasons for this.

--- David A. Wheeler

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-15 22:25                                           ` David A. Wheeler
@ 2014-08-15 22:43                                             ` Adhemerval Zanella
  2014-08-16  4:41                                               ` David A. Wheeler
  0 siblings, 1 reply; 79+ messages in thread
From: Adhemerval Zanella @ 2014-08-15 22:43 UTC (permalink / raw)
  To: libc-alpha

On 15-08-2014 19:24, David A. Wheeler wrote:
> These functions are used in OpenSSL, LibreSSL, OpenSSH, OpenBSD,
> and the Linux kernel at *least*, and probably many others.
> Projects keep re-implementing them, typically less efficiently than a glibc
> implementation would be.  There are reasons for this.

Well, from last kernel developers iterations [1], I would say these functions 
are also not well-liked in Linux...

[1] https://plus.google.com/111049168280159033135/posts/1amLbuhWbh5

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-15 22:01                                           ` David A. Wheeler
@ 2014-08-16  2:19                                             ` Rich Felker
  2014-08-16  2:26                                             ` Russ Allbery
  1 sibling, 0 replies; 79+ messages in thread
From: Rich Felker @ 2014-08-16  2:19 UTC (permalink / raw)
  To: libc-alpha

On Fri, Aug 15, 2014 at 06:00:59PM -0400, David A. Wheeler wrote:
> David A. Wheeler (me) said:
> > > snprintf is overcomplicated for simple string copying (a common operation)
> 
> On Fri, 15 Aug 2014 12:39:12 -0400, Rich Felker <dalias@libc.org> wrote:
> > One extra argument of "%s" is not really over-complicated. I'll grant
> > one thing: having the return value with type int rather than size_t is
> > mildly problematic, in that snprintf can't be fully equivalent to
> > strlcpy or strncpy_s, but in cases where the input string is >2GB
> > you'd probably prefer to have an error rather than processing an
> > unbounded amount of input, as it's almost surely malicious. If your
> > data is really potentially that large, C strings are not the proper
> > format for it.
> 
> I agree snprintf's return type (int) is problematic compared to strlcpy/strlcat (size_t).
> 
> Your comments here are a little oblique; let me explain what *I* think the return type problem is
> (I'm *guessing* we mean the same thing). snprintf returns the number of chars it *WOULD* have written,
> not what it ACTUALLY wrote, so the "obvious" way to use it for a string copy looks like this:
>  char buf[100];
>  if (snprintf(buf, sizeof(buf), "%s", src) >= sizeof(buf)) {  // FAIL
>    // handle truncation
>  }
> But wait, "int" is signed!!  Thus, if you give it a source with a length a little longer than INT_MAX,
> then the int will (probably) wrap;

No, it's not permitted to wrap, at least not by POSIX. It fails with
EOVERFLOW and returns a negative value. (ISO C is less explicit here,
but it says "Thus, the null-terminated output has been completely
written if and only if the returned value is nonnegative and less than
n." which makes it clear that you can check with such a test.)

> snprintf will return a NEGATIVE number on
> truncation, which will always less less than a sizeof, and thus the "handle truncation" is *skipped*

This can happen only if size_t has integer conversion rank lower than
int, which doesn't happen in practice and would only happen on a
pathological implementation. In reality, size_t has greater integer
conversion rank, and thus any negative int is implicitly converted to
a value larger than INT_MAX.

> even when truncation has occurred. INT_MAX is only guaranteed to be +32767, so this
> can happen easily on small systems (think "internet of things"), but even on some
> 32-bit systems this can happen.
> 
> This can be handled, but you can't just use a simple cast; errors also return negatives from snprintf.

A simple cast to uintmax_t will ensure the desired behavior of
catching both truncation and errors simply by >= comparison against
the buffer size.

> Why do I need this complexity for trivial copy and concatenation again?
> It might be better for it to just not copy so much (that's the point of rsize_t in annex K),
> but clearly returning a value that is unlikely to be interpreted correctly is a problem.

I consider this a purely theoretical problem, since there are no
systems anyone is interested in where size_t has rank lower than int.

> > ... concatenation without storing a "current position" between
> > operations is bad anyway (see: Schlemiel the painter).
> 
> I completely disagree; this is simply another trade-off.
> 
> In many cases you *should* hire Schlemiel.  Tracking current position

OK you just got your email a tweet. :)

> > > Many people who are trying to write secure software in C (such as the OpenBSD and Microsoft folks)
> > > are increasingly trying to *stop* the use of traditional functions like strcpy and strncpy,
> > 
> > I reject the claim that MS's interest in stopping the use of these
> > traditional functions has anything to do with security.
> 
> I disagree, but that's not my point.  My point is that buffer overflows are
> a serious problem; we need widely-available functions for common cases that are
> easy to use and make them much less likely.  Also, strlcpy/strlcat were created by OpenBSD
> to counter buffer overflows, so clearly it's not just Microsoft that see a need for
> standard easy-to-use functions to counter buffer overflows in fixed-length buffers.

Oh, I believe the OpenBSD ones are actually about security. They did
the right thing, made the "secure" functions have simple signatures
that are easy to use and hard to get wrong, and only "replaced" the
functions that really needed replacement. I just don't believe it for
the MS ones.

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-15 22:01                                           ` David A. Wheeler
  2014-08-16  2:19                                             ` Rich Felker
@ 2014-08-16  2:26                                             ` Russ Allbery
  2014-08-16  2:49                                               ` Rich Felker
  1 sibling, 1 reply; 79+ messages in thread
From: Russ Allbery @ 2014-08-16  2:26 UTC (permalink / raw)
  To: libc-alpha

"David A. Wheeler" <dwheeler@dwheeler.com> writes:

> snprintf returns the number of chars it *WOULD* have written, not what
> it ACTUALLY wrote, so the "obvious" way to use it for a string copy
> looks like this:

>  char buf[100];
>  if (snprintf(buf, sizeof(buf), "%s", src) >= sizeof(buf)) {  // FAIL
>    // handle truncation
>  }

> But wait, "int" is signed!!  Thus, if you give it a source with a length
> a little longer than INT_MAX, then the int will (probably) wrap;
> snprintf will return a NEGATIVE number on truncation,

I certainly hope not.  snprintf should decline to do any formatting at all
in that case and, yes, return a negative number, but because it failed
entirely.

The above code does not check for snprintf failure, and hence is not
really correct snprintf code.  Truncation and failure have to be checked
independently.  Yes, that means that you can't use snprintf to format
strings longer than INT_MAX.  I'm quite dubious that this is a serious
issue in all but very rare code, and that code can use other techniques.

>> ... concatenation without storing a "current position" between
>> operations is bad anyway (see: Schlemiel the painter).

> I completely disagree; this is simply another trade-off.

Right, I agree with this.  I think tracking that current position is, for
most code, representative of one of the bad habits of C programmers:
micro-optimizing code that no one is ever going to care about being fast.
Yes, the computer has to do more work to re-find the end of the string on
each operation, but usually no one is ever going to care, and the computer
will always get the correct answer.  If it makes the code any cleaner or
any more secure, one should absolutely take that tradeoff.

However, as Paul rightly points out, the same argument applies to malloc,
realloc, and free, and the asprintf or open_memstream code for
constructing strings is even cleaner and simpler.

> My point is that buffer overflows are a serious problem; we need
> widely-available functions for common cases that are easy to use and
> make them much less likely.  Also, strlcpy/strlcat were created by
> OpenBSD to counter buffer overflows, so clearly it's not just Microsoft
> that see a need for standard easy-to-use functions to counter buffer
> overflows in fixed-length buffers.

What generally you get from strlcpy/strlcat is turning buffer overflows
into truncations (which one can then check for, but which a lot of code
does not).  I generally agree with you that this is useful for legacy code
that uses a lot of static buffers.  I also generally agree with Paul that
static buffers are *themselves* a design flaw that should be avoided, and
it's not entirely clear to me that it's a good idea to modify code to add
strlcpy/strlcat and stop there, rather than modifying it to stop using
static buffers entirely.

And, honestly, if I were working on such legacy code, I would be very
tempted to use asprintf into a temporary variable to do whatever I want,
check the total length when finished to see if it will fit, and then
either return an error or memcpy the results into the static buffer.  That
doesn't change the calling contract, but gives you many of the safety
benefits of dynamic allocation.

-- 
Russ Allbery (eagle@eyrie.org)              <http://www.eyrie.org/~eagle/>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-16  2:26                                             ` Russ Allbery
@ 2014-08-16  2:49                                               ` Rich Felker
  2014-08-16  3:03                                                 ` Russ Allbery
  0 siblings, 1 reply; 79+ messages in thread
From: Rich Felker @ 2014-08-16  2:49 UTC (permalink / raw)
  To: libc-alpha

On Fri, Aug 15, 2014 at 07:25:51PM -0700, Russ Allbery wrote:
> > My point is that buffer overflows are a serious problem; we need
> > widely-available functions for common cases that are easy to use and
> > make them much less likely.  Also, strlcpy/strlcat were created by
> > OpenBSD to counter buffer overflows, so clearly it's not just Microsoft
> > that see a need for standard easy-to-use functions to counter buffer
> > overflows in fixed-length buffers.
> 
> What generally you get from strlcpy/strlcat is turning buffer overflows
> into truncations (which one can then check for, but which a lot of code
> does not).  I generally agree with you that this is useful for legacy code
> that uses a lot of static buffers.  I also generally agree with Paul that
> static buffers are *themselves* a design flaw that should be avoided, and
> it's not entirely clear to me that it's a good idea to modify code to add
> strlcpy/strlcat and stop there, rather than modifying it to stop using
> static buffers entirely.

If they're really using _static_ storage, that's a design bug, yes.
But in practice they're often using automatic-storage buffers with a
constant size limit. This is rarely a design bug unless the
requirements actually require supporting arbitrary-length data;
continually allocating memory until you OOM is, in practice, a much
bigger design bug.

> And, honestly, if I were working on such legacy code, I would be very
> tempted to use asprintf into a temporary variable to do whatever I want,
> check the total length when finished to see if it will fit, and then
> either return an error or memcpy the results into the static buffer.  That
> doesn't change the calling contract, but gives you many of the safety
> benefits of dynamic allocation.

It does change the calling contract. It changes a function which
cannot fail to a function which has a failure case that it needs to be
able to report to the caller, which is a radically different contract
because now the caller has to check whether it succeeded or failed.

And if you make the function abort the whole program on error (a
common disgusting solution to this problem which a depressing number
of people do) you've still changed the interface contract: now a
function which previously always returned (successfully, no less)
might return or might abort the program.

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-16  2:49                                               ` Rich Felker
@ 2014-08-16  3:03                                                 ` Russ Allbery
  0 siblings, 0 replies; 79+ messages in thread
From: Russ Allbery @ 2014-08-16  3:03 UTC (permalink / raw)
  To: libc-alpha

Rich Felker <dalias@libc.org> writes:
> On Fri, Aug 15, 2014 at 07:25:51PM -0700, Russ Allbery wrote:

>> What generally you get from strlcpy/strlcat is turning buffer overflows
>> into truncations (which one can then check for, but which a lot of code
>> does not).  I generally agree with you that this is useful for legacy
>> code that uses a lot of static buffers.  I also generally agree with
>> Paul that static buffers are *themselves* a design flaw that should be
>> avoided, and it's not entirely clear to me that it's a good idea to
>> modify code to add strlcpy/strlcat and stop there, rather than
>> modifying it to stop using static buffers entirely.

> If they're really using _static_ storage, that's a design bug, yes.  But
> in practice they're often using automatic-storage buffers with a
> constant size limit. This is rarely a design bug unless the requirements
> actually require supporting arbitrary-length data; continually
> allocating memory until you OOM is, in practice, a much bigger design
> bug.

Well, I don't think you should be assuming that any code that uses dynamic
buffers continues to allocate memory until gets OOM.  That's certainly not
a necessary property of code that uses dynamic memory allocation.  But I
think we're largely in agreement here.

>> And, honestly, if I were working on such legacy code, I would be very
>> tempted to use asprintf into a temporary variable to do whatever I
>> want, check the total length when finished to see if it will fit, and
>> then either return an error or memcpy the results into the static
>> buffer.  That doesn't change the calling contract, but gives you many
>> of the safety benefits of dynamic allocation.

> It does change the calling contract. It changes a function which cannot
> fail to a function which has a failure case that it needs to be able to
> report to the caller, which is a radically different contract because
> now the caller has to check whether it succeeded or failed.

Er, sorry, you seem to have lept to a conclusion that I hadn't intended,
probably because I was insufficiently clear.  I was assuming a legacy
contract such as the calling contract for getcwd, where one returns an
error if the passed buffer is too small.  (Yes, I know there are better
functions available now that does the same thing; just take it as an
example of the sort of contract that's often seen in legacy code.)  Even
if you're stuck with that contract and can't do the trick that was applied
to getcwd to do dynamic allocation, I would still be tempted to work in
terms of dynamic allocation internal to the fuction and only check to see
if the value will fit at the end of the operation.  It's usually safer and
cleaner.

> And if you make the function abort the whole program on error (a common
> disgusting solution to this problem which a depressing number of people
> do) you've still changed the interface contract: now a function which
> previously always returned (successfully, no less) might return or might
> abort the program.

Yes, we're in firm agreement here.  I only use abort() and assert() (at
least in library code) for cases where the triggering of those functions
represents a serious coding error *inside that library*, such as
double-checking a length calculation or some similar invariant to provide
a second level of protection against coding errors.  Not for reporting
errors from the caller.

-- 
Russ Allbery (eagle@eyrie.org)              <http://www.eyrie.org/~eagle/>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-15 22:43                                             ` Adhemerval Zanella
@ 2014-08-16  4:41                                               ` David A. Wheeler
  2014-08-16  5:01                                                 ` Rich Felker
  0 siblings, 1 reply; 79+ messages in thread
From: David A. Wheeler @ 2014-08-16  4:41 UTC (permalink / raw)
  To: azanella; +Cc: libc-alpha

On Fri, 15 Aug 2014 19:43:35 -0300, Adhemerval Zanella <azanella@linux.vnet.ibm.com> wrote:
> Well, from last kernel developers iterations [1], I would say these functions 
> are also not well-liked in Linux...
> [1] https://plus.google.com/111049168280159033135/posts/1amLbuhWbh5

Well, let's take a look...

>David Herrmann originally shared:
>> A big thanks to BSD for introducing the safe strlcpy as replacement for strncpy. There's no unexpected behavior anymo-- wait, no.. oh god! strlcpy requires the source to be 0 terminated, even if its longer than the target size. Why? Of course, so the return-value can be the length of the string that was tried to be written, instead of the real written length.
> Not the first time I see kernel-patches replacing the good old:
> strncpy(kernel, from_user, len - 1) + kernel[len] = 0
> with:
> strlcpy(kernel, from_user, len)

The "recommended" approach using strncpy is absurd.  As Linus Torvalds quickly responds on Jul 28, 2014
> Well, to be fair, "strncpy()" sucks too. It does the insane "pad with zero" which is a performance disaster with any sanely sized buffers.

Good thing we have snprintf to the rescue!! Oh wait, that *also* reads from the source even if it's longer than the target size.

I actually think that David Herrmann has a good point.  However, the usually-recommended alternatives, strncpy and snprintf, have the same semantics. If strlcpy/strlcat are to be faulted for this, then strncpy and snprintf should be rejected for the same reasons.

Now, it *is* true that Linus Torvalds continues with:
>  And yeah, the strlcpy return value is broken by design.
> If you're actually copying from user space in the kernel, do
>   ret = strncpy_from_user(buf, userptr, len);
>   if (ret < 0) return ret;
>   if (ret == len) return -ETOOLONG;

There is no similar function in glibc to my knowledge.  (I think the return value should just be negative if the data reaches len, to simplify truncation handling and force a strlcpy-like guarantee that the dest is terminated if it has length.)  Would something like that be more acceptable, since that would overcome the objection above, and obviously the Linux kernel developers *do* use a special copying routine for copying up to a given length into fixed-size buffers?

--- David A. Wheeler

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-16  4:41                                               ` David A. Wheeler
@ 2014-08-16  5:01                                                 ` Rich Felker
  2014-08-17 18:03                                                   ` David A. Wheeler
  0 siblings, 1 reply; 79+ messages in thread
From: Rich Felker @ 2014-08-16  5:01 UTC (permalink / raw)
  To: libc-alpha

On Sat, Aug 16, 2014 at 12:40:44AM -0400, David A. Wheeler wrote:
> The "recommended" approach using strncpy is absurd. As Linus
> Torvalds quickly responds on Jul 28, 2014
> > Well, to be fair, "strncpy()" sucks too. It does the insane "pad
> > with zero" which is a performance disaster with any sanely sized
> > buffers.
> 
> Good thing we have snprintf to the rescue!! Oh wait, that *also*
> reads from the source even if it's longer than the target size.

Actually snprintf has both if you want:

snprintf(dest, destsize, "%.*s", srcsize, src);

In this case, src is not required to point to a null-terminated
string; it needs only point to something which is either a
null-terminated string or an array of at least srcsize bytes.

Conceptually this approach makes a lot of sense, since controlling how
much you write into the (limited size) destination and controlling how
much you read from a particular source (to prevent DoS from insanely
long input) are two separate issues. In particular, if you were using
two source strings, and wanted to put a limit of 256 bytes on each of
them when concatenating, you wouldn't want just a 512-byte limit on
the output (this would wrongly accept one or the other being overly
long as long as the other was small) but individual limits on each
input. Note that it's somewhat difficult to treat this kind of source
truncation as an error with snprintf, but it can be done with the help
of %n.

Only caveat: srcsize needs to have type int. If you pass a size_t
here, you invoke UB, since the * expects an int. In practice it
probably always works anyway due to calling conventions, but formally
it's UB.

> Now, it *is* true that Linus Torvalds continues with:
> >  And yeah, the strlcpy return value is broken by design.
> > If you're actually copying from user space in the kernel, do
> >   ret = strncpy_from_user(buf, userptr, len);
> >   if (ret < 0) return ret;
> >   if (ret == len) return -ETOOLONG;
> 
> There is no similar function in glibc to my knowledge. (I think the
> return value should just be negative if the data reaches len, to
> simplify truncation handling and force a strlcpy-like guarantee that
> the dest is terminated if it has length.) Would something like that
> be more acceptable, since that would overcome the objection above,
> and obviously the Linux kernel developers *do* use a special copying
> routine for copying up to a given length into fixed-size buffers?

I wouldn't really take anything the kernel developers do as good
practice. The kernel is FULL of integer type/size mismatch bugs and
haphazard use of types like they don't matter. Many of these even
crept into the public APIs...

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-16  5:01                                                 ` Rich Felker
@ 2014-08-17 18:03                                                   ` David A. Wheeler
  2014-08-17 19:05                                                     ` dalias
  0 siblings, 1 reply; 79+ messages in thread
From: David A. Wheeler @ 2014-08-17 18:03 UTC (permalink / raw)
  To: dalias; +Cc: libc-alpha

On Sat, 16 Aug 2014 01:01:01 -0400, Rich Felker <dalias@libc.org> wrote:
> Actually snprintf has both if you want:
> 
> snprintf(dest, destsize, "%.*s", srcsize, src);

As far as I can tell this doesn't guarantee that *reading* stops immediately once
you exceed the length.

ISO/IEC 9899:2011 (E) section 7.21.6.1 ("The fprintf function") says under "s":
"If no l length modifier is present, the argument shall be a pointer to the initial
element of an array of character type.280) Characters from the array are
written up to (but not including) the terminating null character. If the
precision is specified, no more than that many bytes are written."

Sure, src isn't required to point to a null-terminated string, so the spec requires that
the implementation stop reading eventually.  But there's no guarantee in the spec
that it'll stop at any particular point.  It could read 2G, then notice :-).
If people are *really* objecting to strlcpy because it keeps reading, then this
is an overly complex *AND* weak alternative.

> Only caveat: srcsize needs to have type int. If you pass a size_t
> here, you invoke UB, since the * expects an int. In practice it
> probably always works anyway due to calling conventions, but formally
> it's UB.

String manipulations happen often, so a common pattern like this shouldn't be UB.
You could argue if the maxlength should be srcsize or destsize.
I could imagine that snprintf really stops reading once the precision is met, and
perhaps a future spec could be modified to guarantee that.

So the "simple" way to copy a string with fixed buffers is:

  len = snprintf(dest, destsize, "%.*s", (int) srcsize, src);
  if (len < 0 || len >= destsize) ... // error or truncation

I think that's absurdly complex for a common case, and it appears to depend on
assumptions not actually specified.

--- David A. Wheeler

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-17 18:03                                                   ` David A. Wheeler
@ 2014-08-17 19:05                                                     ` dalias
  2014-08-17 20:33                                                       ` David A. Wheeler
  0 siblings, 1 reply; 79+ messages in thread
From: dalias @ 2014-08-17 19:05 UTC (permalink / raw)
  To: libc-alpha

On Sun, Aug 17, 2014 at 02:03:25PM -0400, David A. Wheeler wrote:
> On Sat, 16 Aug 2014 01:01:01 -0400, Rich Felker <dalias@libc.org> wrote:
> > Actually snprintf has both if you want:
> > 
> > snprintf(dest, destsize, "%.*s", srcsize, src);
> 
> As far as I can tell this doesn't guarantee that *reading* stops immediately once
> you exceed the length.
> 
> ISO/IEC 9899:2011 (E) section 7.21.6.1 ("The fprintf function") says under "s":
> "If no l length modifier is present, the argument shall be a pointer to the initial
> element of an array of character type.280) Characters from the array are
> written up to (but not including) the terminating null character. If the
> precision is specified, no more than that many bytes are written."

You missed the next sentence:

"If the precision is not specified or is greater than the size of the
array, the array shall contain a null character."

> Sure, src isn't required to point to a null-terminated string, so the spec requires that
> the implementation stop reading eventually.  But there's no guarantee in the spec
> that it'll stop at any particular point.  It could read 2G, then notice :-).

Formally there is no such thing as reading past the end of the array
(or even indexing past the end of the array). Yes a pathological
implementation on a machine without trapping could in principle do
such reads on the hardware, but it could also contain useless loops
that spin doing nothing for 10 seconds. There is no way to specify
that actions which are undefined don't happen without specifying
bounded time to return, which is way outside the scope of C.

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-17 19:05                                                     ` dalias
@ 2014-08-17 20:33                                                       ` David A. Wheeler
  2014-08-17 23:25                                                         ` Rich Felker
  0 siblings, 1 reply; 79+ messages in thread
From: David A. Wheeler @ 2014-08-17 20:33 UTC (permalink / raw)
  To: libc-alpha

On Sun, 17 Aug 2014 15:05:50 -0400, dalias <dalias@libc.org> wrote:
> You missed the next sentence:
> 
> "If the precision is not specified or is greater than the size of the
> array, the array shall contain a null character."

That does NOT say that the reader must not read
beyond the precision; it merely specifies data constructs *within* it.
Granted, you'd *hope* the implementer would stop at the precision value,
but that is simply a hope.

> Formally there is no such thing as reading past the end of the array
> (or even indexing past the end of the array).

The spec could promise that it will not read more chars than those
given the precision when one is given.  No such promise is given.

--- David A. Wheeler

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-17 20:33                                                       ` David A. Wheeler
@ 2014-08-17 23:25                                                         ` Rich Felker
  2014-08-18  0:59                                                           ` David A. Wheeler
  0 siblings, 1 reply; 79+ messages in thread
From: Rich Felker @ 2014-08-17 23:25 UTC (permalink / raw)
  To: libc-alpha

On Sun, Aug 17, 2014 at 04:33:41PM -0400, David A. Wheeler wrote:
> On Sun, 17 Aug 2014 15:05:50 -0400, dalias <dalias@libc.org> wrote:
> > You missed the next sentence:
> > 
> > "If the precision is not specified or is greater than the size of the
> > array, the array shall contain a null character."
> 
> That does NOT say that the reader must not read
> beyond the precision; it merely specifies data constructs *within* it.
> Granted, you'd *hope* the implementer would stop at the precision value,
> but that is simply a hope.
> 
> > Formally there is no such thing as reading past the end of the array
> > (or even indexing past the end of the array).
> 
> The spec could promise that it will not read more chars than those
> given the precision when one is given.  No such promise is given.

Since the interface contract only requires an array of precision
bytes (or even fewer if a null terminator appears sooner), there is no
way to an implementation to "read more chars" unless it has some
hidden mechanism to determine the size of the pointed-to object, which
in reality is not going to exist. However I agree that it would be
nice to formally specify that no other part of the array (if the array
actually is larger) shall be accessed. This is not for timing purposes
(ISO C does not specify timing) but rather data races: if another
thread is modifying part of the array past precision, this should not
result in UB when snprintf accesses the part of the array bounded
between offsets 0 and precision-1.

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-15 22:04                                         ` Paul Eggert
  2014-08-15 22:25                                           ` David A. Wheeler
@ 2014-08-18  0:15                                           ` David A. Wheeler
  2014-08-18  8:03                                             ` Paul Eggert
  1 sibling, 1 reply; 79+ messages in thread
From: David A. Wheeler @ 2014-08-18  0:15 UTC (permalink / raw)
  To: libc-alpha

> David A. Wheeler wrote:
> > Could you please post your empirical data?

On Fri, 15 Aug 2014 15:04:00 -0700, Paul Eggert <eggert@cs.ucla.edu> graciously replied:
> I'm afraid I long ago discarded the raw data, but you should be able to 
> reproduce it by following the recipe outlined in 
> <https://www.sourceware.org/ml/libc-alpha/2002-01/msg00159.html> and by 
> looking at the first few grep hits.  Sorry, I didn't note down the 
> working environment, but it was either C or en_US (either Latin-1 or 
> UTF-8) and it was run on either GNU/Linux or Solaris at the time.

Thanks very much for the URL!!
I downloaded "openssh-6.6p1" (the current version) and ran:
  egrep -n 'strlcpy|strlcat' `find * -type f -name '*.[ch]' -print

If strlcpy/strlcat were truly security disasters, or unhelpful,
you'd expect their use in OpenSSH to have disappeared by now (12 years later).
It has not.  There are 140 instances of the terms strlcpy or strlcat, but that includes
5 inside comments.  Thus, there were 135 active uses of strlcpy or strlcat.
So... there are clearly people who care about security and
*specifically* use strlcpy/strlcat.

A few instances and discussion are below.  I don't have time for
a serious security analysis (and I don't think anyone else did so earlier),
so these are more "quick impressions" than anything else.
In many cases the lengths were checked; in many others
they were not (allowing truncation but still countering buffer overflow).

But first, a bottom line from my viewpoint.
If you are *checking* the results of simple
copies or concats, strlcpy/strlcat win easily (same for strcpy_s/strcat_s).
If you aren't checking and using strlcpy, snprintf isn't too bad compared to strlcpy,
but it obscures the purpose.  snprintf is no substitute for strlcat in general,
though in some cases it works okay.
If strlcpy encourages sloppy programming, we ought to ban snprintf too :-).


addrmatch.c:321:
 if (p == NULL || strlcpy(addrbuf, p, sizeof(addrbuf)) >= sizeof(addrbuf))
   return -1;

 This strlcpy use seems reasonable.  If the buffer is exceeded, the routine
 returns a parse error.  You *do* want to check for buffer excess here,
 since it's user input.  Nice and simple. Great!

 The one-line snprintf version is this horror:
 if (p == NULL || (len = snprintf(addrbuf, sizeof(addrbuf), "%s", p), len < 0 || len >= sizeof(addrbuf)))
   return -1;

 So the likely result of eliminating strlcpy is many more lines & complexity:
 if (p == NULL)
   return -1;
 len = snprintf(addrbuf, sizeof(addrbuf), "%s", p);
 if (len < 0 || len >= sizeof(addrbuf))
   return -1;

 You have to define "int len;" for both snprintf uses, in addition.

 Strlcpy wins.



auth.c:486:
 strlcpy(buf, cp, sizeof(buf));

 This is part of auth_secure_path(); it's processing a dirname() of a buf
 of length MAXPATHLEN that was originally filled by realpath().
 The realpath() function itself is broken by design (you can't really
 know how long a buffer to allocate yet it provides no control over
 its length). Thus, I'm going to assume that this is passed
 in by the user who invoked this in the first
 place (and thus we're not leaving a trust boundary, we're just
 doing sanity-checking).

 So.. do you really believe that MAXPATHLEN really is the max length?
 If you believe that, then it can't be exceeded, and thus there's no
 need to worry about truncation (and indeed, this code doesn't try to
 detect truncation).  The real problem is that MAXPATHLEN is not
 necessarily a constant, but you'd hit problems earlier in that case.
 So the strlcpy is probably not strictly necessary (if you believe
 in MAXPATHLEN) but harmless.

 But here's a philosophical difference. I'd rather people use limiting
 functions instead of requiring deep analysis to figure out if there's
 a potential buffer overflow.  It's belt-and-suspenders; adding an
 extra check costs nearly nothing in performance time, but now the analyst
 no longer has to determine if the buffer can be overwritten (it can't).
 If someone later replaces realpath() with a sensible function, the
 strlcpy() will prevent an easy buffer overflow... making debugging
 much simpler.

 Strlcpy slightly wins.


authfd.c:107:
 strlcpy(sunaddr.sun_path, authsocket, sizeof(sunaddr.sun_path));

 Here strlcpy is used to copy a path to sun_path.
 Truncation isn't checked... but it's not clear what else you
 could do when truncation occurs.  sun_path is defined
 to be a fixed length that the application program cannot change.
 Chopping it to a fixed length is a reasonable approach, since
 you really can't do much else.  An snprintf wouldn't be too bad
 here, but it would obscure the fact that it's a simple string copy:
 snprintf(sunaddr.sun_path, sizeof(sunaddr.sun_path), "%s", authsocket);

 Strlcpy slightly wins, but snprintf is reasonable enough.


authfile.c:1179-1180:
  if ((strlcpy(file, filename, sizeof file) < sizeof(file)) &&
      (strlcat(file, ".pub", sizeof file) < sizeof(file)) &&
      (key_try_load_public(pub, file, commentp) == 1))
      return pub;

  Here we have the first use of strlcat. 
  The obvious snprintf alternative is:

  len = snprintf(file, sizeof(file), "%s.pub", filename);
  if (len >= 0 && len < sizeof(file) &&
      (key_try_load_public(pub, file, commentp) == 1))
      return pub;

  I think the snprint version is cleaner to read, so I'd prefer it.
  However, note that in this case the "Schlemiel argument" is basically
  irrelevant. Yes, the strlcat version invokes Schlemiel,
  but we're talking filenames, so Schlemiel will be pretty quick,
  while setting up format processing is heavyweight.

  I would use snprintf in this case... but that's not an
  argument against strlcat itself.


auth-pam.c:742:
  mlen = strlen(msg);
  ...
  len = plen + mlen + 1;
  **prompts = xrealloc(**prompts, 1, len);
  strlcpy(**prompts + plen, msg, len - plen);
  plen += mlen;

  On a quick look, this appears to (re)allocate the necessary space,
  and then copy a string in.  It *might* be possible to substitute strcpy
  instead... but why in the world would I *WANT* to do that!?

  I think there is a philosophical difference here.  If you make a mistake,
  what is the ramification?  If you use functions that are designed to
  limit damage, then the mistakes you inevitably make are less likely
  to hurt your users.

  Advantage strlcpy, due to a philosophical preference for code that
  self-defends.


I completely agree that silent truncation can cause serious problems.
But silent truncation is typically *way* better than buffer overflows.

Below is the complete output from:
 egrep -n 'strlcpy|strlcat' `find * -type f -name '*.[ch]' -print` | sort


--- David A. Wheeler


==================================================


addrmatch.c:321:	if (p == NULL || strlcpy(addrbuf, p, sizeof(addrbuf)) >= sizeof(addrbuf))
auth.c:486:		strlcpy(buf, cp, sizeof(buf));
authfd.c:107:	strlcpy(sunaddr.sun_path, authsocket, sizeof(sunaddr.sun_path));
authfile.c:1179:	if ((strlcpy(file, filename, sizeof file) < sizeof(file)) &&
authfile.c:1180:	    (strlcat(file, ".pub", sizeof file) < sizeof(file)) &&
auth-pam.c:742:			strlcpy(**prompts + plen, msg, len - plen);
auth-pam.c:752:			strlcpy(**prompts + plen, msg, len - plen);
auth-pam.c:754:			strlcat(**prompts + plen, "\n", len - plen);
auth-rhosts.c:110:			strlcpy(userbuf, server_user, sizeof(userbuf));
channels.c:1084:	strlcpy(username, p, sizeof(username));
channels.c:3542:	strlcpy(addr.sun_path, pathname, sizeof addr.sun_path);
channels.c:3614:	strlcpy(buf, display, sizeof(buf));
clientloop.c:412:		strlcpy(proto, SSH_X11_PROTO, sizeof proto);
entropy.c:101:		strlcpy(addr_un->sun_path, socket_path,
key.c:437:		strlcat(retval, hex, dgst_raw_len * 3 + 1);
loginrec.c:1230:			strlcpy(li->hostname, ut.ut_host,
loginrec.c:1392:			strlcpy(li->hostname, utx.ut_host,
loginrec.c:1489:		strlcpy(lastlog_file, LASTLOG_FILE, sizeof(lastlog_file));
loginrec.c:1543:		strlcpy(last.ll_host, li->hostname,
loginrec.c:1578:	strlcpy(li->hostname, ll->ll_host,
loginrec.c:1603:		strlcpy(li->hostname, last.ll_host,
loginrec.c:1640:	strlcpy(li->hostname, utx->ut_host,
loginrec.c:1691:	strlcpy(ut.ut_line, "ssh:notty", sizeof(ut.ut_line));
loginrec.c:313:	if (strlcpy(li->username, pw->pw_name, sizeof(li->username)) >=
loginrec.c:380:		strlcpy(li->username, username, sizeof(li->username));
loginrec.c:390:		strlcpy(li->hostname, hostname, sizeof(li->hostname));
loginrec.c:564:		strlcpy(dst, src, dstsize);
loginrec.c:566:		strlcpy(dst, "/dev/", dstsize);
loginrec.c:567:		strlcat(dst, src, dstsize);
loginrec.c:578:		strlcpy(dst, src + 5, dstsize);
loginrec.c:580:		strlcpy(dst, src, dstsize);
loginrec.c:614:		/* note: _don't_ change this to strlcpy */
logintest.c:101:	strlcpy(username, pw->pw_name, sizeof(username));
logintest.c:109:	strlcpy(li1->progname, "OpenSSH-logintest", sizeof(li1->progname));
logintest.c:123:		strlcpy(li1->hostname, "localhost", sizeof(li1->hostname));
logintest.c:144:	strlcpy(s_t0, ctime(&t0), sizeof(s_t0));
logintest.c:146:	strlcpy(s_t1, ctime(&t1), sizeof(s_t1));
logintest.c:155:	strlcpy(s_logintime, ctime(&logintime), sizeof(s_logintime));
logintest.c:171:	strlcpy(s_logouttime, ctime(&logouttime), sizeof(s_logouttime));
logintest.c:186:	strlcpy(s_t2, ctime(&t2), sizeof(s_t2));
md5crypt.c:145:	strlcat(passwd, to64(l, 4), sizeof(passwd));
md5crypt.c:147:	strlcat(passwd, to64(l, 4), sizeof(passwd));
md5crypt.c:149:	strlcat(passwd, to64(l, 4), sizeof(passwd));
md5crypt.c:151:	strlcat(passwd, to64(l, 4), sizeof(passwd));
md5crypt.c:153:	strlcat(passwd, to64(l, 4), sizeof(passwd));
md5crypt.c:155:	strlcat(passwd, to64(l, 2), sizeof(passwd));
misc.c:608:				i = strlcat(buf, keys[j].repl, sizeof(buf));
misc.c:754:		strlcat(r, b, hl);
monitor_wrap.c:736:	strlcpy(namebuf, p, namebuflen); /* Possible truncation */
mux.c:1171:	if (strlcpy(addr.sun_path, options.control_path,
mux.c:2029:	if (strlcpy(addr.sun_path, path,
openbsd-compat/bsd-cray.c:226:	strlcpy(hostname,
openbsd-compat/bsd-cray.c:322:		strlcpy(ue.ue_name, "root", sizeof(ue.ue_name));
openbsd-compat/bsd-cray.c:323:		strlcpy(ue.ue_dir, "/", sizeof(ue.ue_dir));
openbsd-compat/bsd-cray.c:324:		strlcpy(ue.ue_shell, "/bin/sh", sizeof(ue.ue_shell));
openbsd-compat/bsd-cray.c:504:					strlcpy(acct_name, acid2nam(valid_acct), MAXACID);
openbsd-compat/fake-rfc2553.c:58:		if (strlcpy(serv, tmpserv, servlen) >= servlen)
openbsd-compat/fake-rfc2553.c:64:			if (strlcpy(host, inet_ntoa(sin->sin_addr),
openbsd-compat/fake-rfc2553.c:75:			if (strlcpy(host, hp->h_name, hostlen) >= hostlen)
openbsd-compat/fmt_scaled.c:232:		strlcpy(result, "0B", FMT_SCALED_STRSIZE);
openbsd-compat/glob.c:988:		strlcpy(buf, ".", sizeof buf);
openbsd-compat/inet_ntop.c:207:	strlcpy(dst, tmp, size);
openbsd-compat/inet_ntop.c:97:	strlcpy(dst, tmp, size);
openbsd-compat/openbsd-compat.h:75:size_t strlcpy(char *dst, const char *src, size_t siz);
openbsd-compat/openbsd-compat.h:80:size_t strlcat(char *dst, const char *src, size_t siz);
openbsd-compat/port-aix.c:420:			strlcpy(host, "::", hostlen);
openbsd-compat/port-linux.c:210:	strlcpy(newctx + len, newname, newlen - len);
openbsd-compat/port-linux.c:212:		strlcat(newctx, cx, newlen);
openbsd-compat/realpath.c:133:		resolved_len = strlcat(resolved, next_token, PATH_MAX);
openbsd-compat/realpath.c:179:				left_len = strlcat(symlink, left, sizeof(left));
openbsd-compat/realpath.c:185:			left_len = strlcpy(left, symlink, sizeof(left));
openbsd-compat/realpath.c:69:		left_len = strlcpy(left, path + 1, sizeof(left));
openbsd-compat/realpath.c:72:			strlcpy(resolved, ".", PATH_MAX);
openbsd-compat/realpath.c:76:		left_len = strlcpy(left, path, sizeof(left));
openbsd-compat/setproctitle.c:140:	strlcpy(buf, __progname, sizeof(buf));
openbsd-compat/setproctitle.c:145:		len = strlcat(buf, ": ", sizeof(buf));
openbsd-compat/setproctitle.c:161:	len = strlcpy(argv_start, ptitle, argv_env_len);
openbsd-compat/strlcat.c:1:/*	$OpenBSD: strlcat.c,v 1.13 2005/08/08 08:05:37 espie Exp $	*/
openbsd-compat/strlcat.c:19:/* OPENBSD ORIGINAL: lib/libc/string/strlcat.c */
openbsd-compat/strlcat.c:35:strlcat(char *dst, const char *src, size_t siz)
openbsd-compat/strlcpy.c:1:/*	$OpenBSD: strlcpy.c,v 1.11 2006/05/05 15:27:38 millert Exp $	*/
openbsd-compat/strlcpy.c:19:/* OPENBSD ORIGINAL: lib/libc/string/strlcpy.c */
openbsd-compat/strlcpy.c:33:strlcpy(char *dst, const char *src, size_t siz)
progressmeter.c:185:	strlcat(buf, " ", win_size);
progressmeter.c:190:	strlcat(buf, "/s ", win_size);
progressmeter.c:199:		strlcat(buf, "- stalled -", win_size);
progressmeter.c:201:		strlcat(buf, "  --:-- ETA", win_size);
progressmeter.c:221:			strlcat(buf, " ETA", win_size);
progressmeter.c:223:			strlcat(buf, "    ", win_size);
readconf.c:1118:			strlcpy(fwdarg, arg, sizeof(fwdarg));
readconf.c:526:			strlcpy(shorthost, thishost, sizeof(shorthost));
servconf.c:1359:			strlcat(p, " ", len);
servconf.c:1360:			strlcat(p, arg, len);
session.c:1470:			strlcpy(component, path, sizeof(component));
session.c:1859:		if (strlcpy(argv0 + 1, shell0, sizeof(argv0) - 1)
session.c:222:	strlcpy(sunaddr.sun_path, auth_sock_name, sizeof(sunaddr.sun_path));
session.c:2620:				strlcat(buf, ",", sizeof buf);
session.c:2621:			strlcat(buf, cp, sizeof buf);
session.c:2625:		strlcpy(buf, "notty", sizeof buf);
sftp.c:2103:			if (strlcpy(cmd, line, sizeof(cmd)) >= sizeof(cmd)) {
sftp.c:955:		strlcpy(s_used, "error", sizeof(s_used));
sftp.c:956:		strlcpy(s_avail, "error", sizeof(s_avail));
sftp.c:957:		strlcpy(s_root, "error", sizeof(s_root));
sftp.c:958:		strlcpy(s_total, "error", sizeof(s_total));
sftp-client.c:1728:	strlcpy(ret, p1, len);
sftp-client.c:1730:		strlcat(ret, "/", len);
sftp-client.c:1731:	strlcat(ret, p2, len);
sftp-glob.c:84:	strlcpy(ret->d_name, od->dir[od->offset++]->filename, MAXPATHLEN);
sftp-glob.c:86:	strlcpy(ret->d_name, od->dir[od->offset++]->filename,
sftp-server.c:255:			strlcat(ret, ",", sizeof(ret));	\
sftp-server.c:256:		strlcat(ret, str, sizeof(ret));		\
ssh.c:265:		if (strlcpy(cname, res->ai_canonname, clen) >= clen) {
ssh.c:981:	strlcpy(shorthost, thishost, sizeof(shorthost));
ssh-add.c:360:	strlcpy(prompt, "Enter lock password: ", sizeof(prompt));
ssh-add.c:363:		strlcpy(prompt, "Again: ", sizeof prompt);
ssh-agent.c:1138:		strlcpy(socket_name, agentsocket, sizeof socket_name);
ssh-agent.c:1153:	strlcpy(sunaddr.sun_path, socket_name, sizeof(sunaddr.sun_path));
sshconnect.c:1175:			strlcat(msg, "\nAre you sure you want "
sshconnect.c:1303:	strlcpy(padded, password, size);
sshconnect2.c:129:			strlcat(to, ",", maxlen); \
sshconnect2.c:130:		strlcat(to, from, maxlen); \
ssh-keygen.c:1024:		if (strlcpy(identity_file, cp, sizeof(identity_file)) >=
ssh-keygen.c:1038:		if (strlcpy(tmp, identity_file, sizeof(tmp)) >= sizeof(tmp) ||
ssh-keygen.c:1039:		    strlcat(tmp, ".XXXXXXXXXX", sizeof(tmp)) >= sizeof(tmp) ||
ssh-keygen.c:1040:		    strlcpy(old, identity_file, sizeof(old)) >= sizeof(old) ||
ssh-keygen.c:1041:		    strlcat(old, ".old", sizeof(old)) >= sizeof(old))
ssh-keygen.c:1394:		strlcpy(new_comment, identity_comment, sizeof(new_comment));
ssh-keygen.c:1421:	strlcat(identity_file, ".pub", sizeof(identity_file));
ssh-keygen.c:2320:			if (strlcpy(identity_file, optarg, sizeof(identity_file)) >=
ssh-keygen.c:2413:			if (strlcpy(out_file, optarg, sizeof(out_file)) >=
ssh-keygen.c:2419:			if (strlcpy(out_file, optarg, sizeof(out_file)) >=
ssh-keygen.c:252:		strlcpy(identity_file, buf, sizeof(identity_file));
ssh-keygen.c:2648:		strlcpy(comment, identity_comment, sizeof(comment));
ssh-keygen.c:2672:	strlcat(identity_file, ".pub", sizeof(identity_file));
ssh-keygen.c:560:		strlcat(encoded, line, sizeof(encoded));
ssh-keygen.c:931:		strlcpy(identity_file, key_types[i].path, sizeof(identity_file));
ssh-keygen.c:952:		strlcat(identity_file, ".pub", sizeof(identity_file));
sshlogin.c:78:	strlcpy(buf, li.hostname, bufsize);
sshpty.c:79:	strlcpy(namebuf, name, namebuflen);	/* possible truncation */
xmalloc.c:84:	strlcpy(cp, str, len);

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-17 23:25                                                         ` Rich Felker
@ 2014-08-18  0:59                                                           ` David A. Wheeler
  0 siblings, 0 replies; 79+ messages in thread
From: David A. Wheeler @ 2014-08-18  0:59 UTC (permalink / raw)
  To: libc-alpha

On Sun, 17 Aug 2014 19:25:20 -0400, Rich Felker <dalias@libc.org> wrote:
> Since the interface contract only requires an array of precision
> bytes (or even fewer if a null terminator appears sooner), there is no
> way to an implementation to "read more chars" unless it has some
> hidden mechanism to determine the size of the pointed-to object, which
> in reality is not going to exist.

Okay, that's a fair argument from a *spec* point of view.

> However I agree that it would be
> nice to formally specify that no other part of the array (if the array
> actually is larger) shall be accessed.

Agreed.

--- David A. Wheeler

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-14 17:04                                                     ` Rich Felker
@ 2014-08-18  7:31                                                       ` Andreas Schwab
  2014-08-18 19:20                                                         ` Rich Felker
  0 siblings, 1 reply; 79+ messages in thread
From: Andreas Schwab @ 2014-08-18  7:31 UTC (permalink / raw)
  To: Rich Felker; +Cc: Florian Weimer, libc-alpha

Rich Felker <dalias@libc.org> writes:

> I disagree strongly here. If you want that, you should be using a
> language where it's idiomatic. Doing it in C just gets you the worst
> of both worlds: (1) C's lack of syntax to express it concisely and
> lack of exceptions to handle allocation failures, which are a horrible
> pain to handle when they can happen at each operation rather than at
> isolated resource-allocation points,

There is no need for exceptions.  Only need to add a flag to the string
type that is set for an allocation failure (or if the buffer is full for
static allocation) that can be tested at the end of a sequence of calls.
Much like stdio's ferror, but without the stdio overhead.  The key point
is that the allocation size should be maintained as part of the string
type.

> and (2) the HLL style loss of fail-safety (all operations can fail
> because they do allocation under the hood), massive overhead in memory
> usage and memory fragmentation, load on malloc/free, etc.

It is easy to also support static allocation, but with the failure flag
as above the truncation can be detected reliably after the fact.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-18  0:15                                           ` David A. Wheeler
@ 2014-08-18  8:03                                             ` Paul Eggert
  2014-08-18 19:22                                               ` Rich Felker
  2014-08-21 22:45                                               ` David A. Wheeler
  0 siblings, 2 replies; 79+ messages in thread
From: Paul Eggert @ 2014-08-18  8:03 UTC (permalink / raw)
  To: dwheeler, libc-alpha

David A. Wheeler wrote:

> If strlcpy/strlcat were truly security disasters, or unhelpful,
> you'd expect their use in OpenSSH to have disappeared by now

I wouldn't expect that at all.  OpenSSH's authors have strongly 
advocated strlcpy and have much invested in it over the years.  Even if 
they conceded that strlcpy is not that helpful now (admittedly 
unlikely), inertia would probably induce them to keep it.

I looked at the five examples you discussed, and the results are 
striking.  Details at the end of this email.  Here's a summary:

* auth.c:486's use of strlcpy can lead to undefined behavior.

* authfd.c:107's use of strlcpy is silently truncating data in a context 
where the caller expects an error return.

* In authfile.c:1179-1180, snprintf is clearly preferable.

* In auth-pam.c:742, strcpy would be simpler.

* In addrmatch.c:321, strlen + strcpy would be clear, snprintf would 
also be OK, and strlcpy doesn't fix any bugs or make the code 
significantly clearer.

So in your five examples, the use of strlcpy (A) has not fixed any bugs, 
(B) has not made the code significantly clearer, (C) is involved with 
one bug, and (D) has possibly caused another bug due to silent truncation.

This is a worse result than from my quick perusal of OpenSSH a dozen 
years ago.  If this is the best evidence-based argument *for* strlcpy, 
imagine what the argument *against* it would look like!

Here are details for the above analysis.

> addrmatch.c:321:
> ... The one-line snprintf version is this horror:

That's because you wrote it in a horrible way.  This is better:

    if (snprintf(addrbuf, sizeof addrbuf, "%s", p) >= sizeof addrbuf)
      return -1;

Though I wouldn't use snprintf here, as the following distinguishes the 
check from the action more clearly:

    if (strlen(p) >= sizeof addrbuf)
      return -1;
    strcpy(addrbuf, p);

Regardless of the form one prefers, the use of strlcpy here does not fix 
any bugs or make the code significantly clearer, compared to using 
standard functions.

> auth.c:486:
>  strlcpy(buf, cp, sizeof(buf));
>  ... So.. do you really believe that MAXPATHLEN really is the max length?

It's not a matter of belief.  It's obvious from the code that sets 'cp', 
four lines earlier.  Worse, this use of strlcpy has undefined behavior 
when cp points into buf.  A fix would be:

    memmove(buf, cp, strlen(cp) + 1);

> authfd.c:107:
>  strlcpy(sunaddr.sun_path, authsocket, sizeof(sunaddr.sun_path));
> ... Truncation isn't checked... but it's not clear what else you
>  could do when truncation occurs.

No, it's quite clear.  You could return -1, which is what strlcpy's 
caller is supposed to do on failure.  Here, strlcpy might be 
contributing to a bug, and it certainly isn't helping: a programmer who 
had used strlen + strcpy would likely have done better here and returned 
-1 on overlong inputs.

> authfile.c:1179-1180:
>   if ((strlcpy(file, filename, sizeof file) < sizeof(file)) &&
>       (strlcat(file, ".pub", sizeof file) < sizeof(file)) &&
>       (key_try_load_public(pub, file, commentp) == 1))
>       return pub;
> ...
>   I would use snprintf in this case

Agreed.

> auth-pam.c:742:
>   mlen = strlen(msg);
>   ...
>   len = plen + mlen + 1;
>   **prompts = xrealloc(**prompts, 1, len);
>   strlcpy(**prompts + plen, msg, len - plen);
>   plen += mlen;
> ....
>   Advantage strlcpy, due to a philosophical preference

I'm afraid that veers too closely to "I like strlcpy because I like 
strlcpy".  strlcpy does not fix any bugs here compared to strcpy, and 
this was the point I originally made.  And strcpy would be simpler here.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-18  7:31                                                       ` Andreas Schwab
@ 2014-08-18 19:20                                                         ` Rich Felker
  0 siblings, 0 replies; 79+ messages in thread
From: Rich Felker @ 2014-08-18 19:20 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Florian Weimer, libc-alpha

On Mon, Aug 18, 2014 at 09:31:21AM +0200, Andreas Schwab wrote:
> Rich Felker <dalias@libc.org> writes:
> 
> > I disagree strongly here. If you want that, you should be using a
> > language where it's idiomatic. Doing it in C just gets you the worst
> > of both worlds: (1) C's lack of syntax to express it concisely and
> > lack of exceptions to handle allocation failures, which are a horrible
> > pain to handle when they can happen at each operation rather than at
> > isolated resource-allocation points,
> 
> There is no need for exceptions.  Only need to add a flag to the string
> type that is set for an allocation failure (or if the buffer is full for
> static allocation) that can be tested at the end of a sequence of calls.
> Much like stdio's ferror, but without the stdio overhead.

Indeed. So why has nobody designed and pushed such a good API? All the
"secure string" libraries I've seen have been full of abort() calls or
simply dereference null pointers and crash...

> The key point
> is that the allocation size should be maintained as part of the string
> type.

If you use high-level string objects as inputs to your string
functions, this makes it impossible (or at least tricky) to use the
tail of a string, or an embedded substring of a string, in-place as
the argument to another function. On the other hand if the objects
always provide read-only access to the C string, and all functions
take C strings (and possibly also lengths for when you want to use
non-null-terminated input) as input, I think it works fine.

> > and (2) the HLL style loss of fail-safety (all operations can fail
> > because they do allocation under the hood), massive overhead in memory
> > usage and memory fragmentation, load on malloc/free, etc.
> 
> It is easy to also support static allocation, but with the failure flag
> as above the truncation can be detected reliably after the fact.

Yes.

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-18  8:03                                             ` Paul Eggert
@ 2014-08-18 19:22                                               ` Rich Felker
  2014-08-21 22:45                                               ` David A. Wheeler
  1 sibling, 0 replies; 79+ messages in thread
From: Rich Felker @ 2014-08-18 19:22 UTC (permalink / raw)
  To: Paul Eggert; +Cc: dwheeler, libc-alpha

On Mon, Aug 18, 2014 at 01:03:30AM -0700, Paul Eggert wrote:
> Here are details for the above analysis.
> 
> >addrmatch.c:321:
> >... The one-line snprintf version is this horror:
> 
> That's because you wrote it in a horrible way.  This is better:
> 
>    if (snprintf(addrbuf, sizeof addrbuf, "%s", p) >= sizeof addrbuf)
>      return -1;
> 
> Though I wouldn't use snprintf here, as the following distinguishes
> the check from the action more clearly:
> 
>    if (strlen(p) >= sizeof addrbuf)
>      return -1;
>    strcpy(addrbuf, p);

Or even better:

>    if (strnlen(p, sizeof addrbuf) >= sizeof addrbuf)
>      return -1;
>    strcpy(addrbuf, p);

This avoids the unbounded read time ("DoS") issue.

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-18  8:03                                             ` Paul Eggert
  2014-08-18 19:22                                               ` Rich Felker
@ 2014-08-21 22:45                                               ` David A. Wheeler
  2014-08-22  0:37                                                 ` Rich Felker
                                                                   ` (2 more replies)
  1 sibling, 3 replies; 79+ messages in thread
From: David A. Wheeler @ 2014-08-21 22:45 UTC (permalink / raw)
  To: libc-alpha

On Mon, 18 Aug 2014 01:03:30 -0700, Paul Eggert <eggert@cs.ucla.edu> wrote:
> OpenSSH's authors have strongly 
> advocated strlcpy and have much invested in it over the years.  Even if 
> they conceded that strlcpy is not that helpful now (admittedly 
> unlikely), inertia would probably induce them to keep it.

There are now *hundreds* of packages that use strlcpy/strlcat.
It is NOT just the original OpenSSH authors that use them.
Every package has to keep re-implementing them, typically in less-efficient
portable ways, because glibc fails to include them.

Here's a list put together by deraadt, there are probably others:
http://marc.info/?l=openbsd-tech&m=138733933417096&w=2

I'd like to convince you to think about *risk*.  There's no doubt that it
is *possible* to write secure software using only strcpy, strlen, etc.
But while that is *true*, it is also *irrelevant*.

The probability of making a mistake using routines like strcpy is very high.
This is amply demonstrated by the continuing march of buffer overflows.
We need alternatives, ones that automatically *prevent* buffer overflows
and can be easily applied.

So with that, let me comment on the comments...

> > addrmatch.c:321:
> > ... The one-line snprintf version is this horror:
> 
> That's because you wrote it in a horrible way.  This is better:
> 
>     if (snprintf(addrbuf, sizeof addrbuf, "%s", p) >= sizeof addrbuf)
>       return -1;

The spec says snprintf can return <0, which this code fails to handle.
It's still not better, anyway; it sure isn't obvious that this is a string copy.
But to continue...

> Though I wouldn't use snprintf here, as the following distinguishes the 
> check from the action more clearly:
> 
>     if (strlen(p) >= sizeof addrbuf)
>       return -1;
>     strcpy(addrbuf, p);

No. That wastes a lot of *people* time and is dangerous during maintenance.
In many projects, every strcpy() will (correctly) set off warning bells
requiring multiple people to *prove* that it can't exceed the buffer overflows.
And it's not just during writing the initial code.  It's also easy to turn this kind
of code into a vulnerability when the code is maintained
(which is one reason this kind of code wastes a lot of people-time),
so every time the function is changed, people will have to re-check it, and
over time this can get painful.

So every strcpy(), even if it's safe, increases *development* and *maintenance* time.
Developer and reviewer time is often MORE important than execution time
(even in C code, only some paths usually matter).
It's not TOO hard in this case to show that the strcpy cannot be
exceeded, sure, but that's not the end.

A lot of developers who care about secure code are trying to *eliminate*
the use of strcpy and functions like it, because it's just too easy to make a
disastrous mistake. Use strcpy() where it's provably safe and on a path where
performance is very important, sure.  But it should NOT be used elsewhere.
Telling people "don't make any mistakes" hasn't worked so far, and it won't in the future.

So NO, this is NOT better.  This is WORSE.  The call to strcpy() is
*faster*, but it's only justifiable if it's on a fast path where the speed matters.

> Regardless of the form one prefers, the use of strlcpy here does not fix 
> any bugs or make the code significantly clearer, compared to using 
> standard functions.

The use of strlcpy *does* help, quite substantially.

It eliminates the risk that a later modification will cause a buffer overflow;
at worst it becomes a truncation instead.  That *matters*.

So I reach radically different conclusions here.  I think that
pattern will continue... :-).

> > auth.c:486:
> >  strlcpy(buf, cp, sizeof(buf));
> >  ... So.. do you really believe that MAXPATHLEN really is the max length?
> 
> It's not a matter of belief.  It's obvious from the code that sets 'cp', 
> four lines earlier.

(I was actually complaining about MAXPATHLEN nonsense in the standard,
and not really about the subject-at-hand, so let's skip that.)

>  Worse, this use of strlcpy has undefined behavior 
> when cp points into buf.

I don't think so.  strlcpy is required to copy the source left-to-right, since it stops at the \0,
and it's copying to the beginning of "buf", so I don't see an undefined behavior.

>  A fix would be:
> 
>     memmove(buf, cp, strlen(cp) + 1);

That's not a fix.  That's horrific.

That way of using memmove checks the length of the *source*,
and copies it, and it fails to do *any* checks that the buffer has enough length.
That kind of construct is *BEGGING* to be part of a buffer overflow.

It's true that in *this particular case* it won't overflow a buffer, as long as
nothing ever changes in the rest of the code surrounding it.
But anyone who keeps writing code this way, using functions
that do not *always* check the destination length, will eventually make a mistake.
Actually, in my experience, they'll make dozens of them.

We need simple mechanisms that people can use *every time* that
prevent buffer overflows.  Ones that don't require deep analysis and
deep re-analysis every time someone changes a line of code.

Most code in most C programs is *not* on a speed-critical path, but is instead
doing setup, special-case error handling, etc.  It should be *easy* to write
code that obviously cannot ever cause a buffer overflow, when it's
okay to spend a few extra cycles to do it.

> > authfd.c:107:
> >  strlcpy(sunaddr.sun_path, authsocket, sizeof(sunaddr.sun_path));
> > ... Truncation isn't checked... but it's not clear what else you
> >  could do when truncation occurs.
> 
> No, it's quite clear.  You could return -1, which is what strlcpy's 
> caller is supposed to do on failure.  Here, strlcpy might be 
> contributing to a bug, and it certainly isn't helping: a programmer who 
> had used strlen + strcpy would likely have done better here and returned 
> -1 on overlong inputs.

Or, a programmer using strcpy could just allow a buffer overflow, the usual result :-).

I agree that returning -1 would be a good idea. But note that even when
you don't check the return, which you could call a mistake, the usual result
of strlcpy is much safer - it is merely truncation (which often isn't exploitable).
A common result of strcpy misuse is a CVE id :-).

But let's go with your thought.  Changing this to return -1 is also easy:

if (strlcpy(sunaddr.sun_path, authsocket, sizeof(sunaddr.sun_path)) >= sizeof(sunaddr.sun_path))
  return -1;

> > auth-pam.c:742:
> >   mlen = strlen(msg);
> >   ...
> >   len = plen + mlen + 1;
> >   **prompts = xrealloc(**prompts, 1, len);
> >   strlcpy(**prompts + plen, msg, len - plen);
> >   plen += mlen;
> > ....
> >   Advantage strlcpy, due to a philosophical preference
> 
> I'm afraid that veers too closely to "I like strlcpy because I like 
> strlcpy".  strlcpy does not fix any bugs here compared to strcpy, and 
> this was the point I originally made.  And strcpy would be simpler here.

No. strlcpy/strlcat, and other routines like strcpy_s, have the advantage
that they significantly reduce the likelihood that the inevitable programming
mistakes will become a serious vulnerability.

It's a *risk* decision.  C should have routines that give you the fastest
possible routines when you need them... but it should also have some
less-risky functions for common cases.

--- David A. Wheeler

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-21 22:45                                               ` David A. Wheeler
@ 2014-08-22  0:37                                                 ` Rich Felker
  2014-08-22  1:39                                                 ` William Park
  2014-08-22  2:32                                                 ` Paul Eggert
  2 siblings, 0 replies; 79+ messages in thread
From: Rich Felker @ 2014-08-22  0:37 UTC (permalink / raw)
  To: David A. Wheeler; +Cc: libc-alpha

On Thu, Aug 21, 2014 at 06:45:29PM -0400, David A. Wheeler wrote:
> On Mon, 18 Aug 2014 01:03:30 -0700, Paul Eggert <eggert@cs.ucla.edu> wrote:
> > OpenSSH's authors have strongly 
> > advocated strlcpy and have much invested in it over the years.  Even if 
> > they conceded that strlcpy is not that helpful now (admittedly 
> > unlikely), inertia would probably induce them to keep it.
> 
> There are now *hundreds* of packages that use strlcpy/strlcat.
> It is NOT just the original OpenSSH authors that use them.
> Every package has to keep re-implementing them, typically in less-efficient
> portable ways, because glibc fails to include them.

This is exactly my reason for wanting them in glibc: I've repeatedly
seen buggy re-implementations that probably introduce new security
bugs. Even if these functions are bad (and I think they are), they're
sufficiently famous that lots of people cargo-cult the API (but not
the implementations) into their software, thereby introducing new
security bugs.

> Here's a list put together by deraadt, there are probably others:
> http://marc.info/?l=openbsd-tech&m=138733933417096&w=2

Thanks for digging this up!

> > > addrmatch.c:321:
> > > ... The one-line snprintf version is this horror:
> > 
> > That's because you wrote it in a horrible way.  This is better:
> > 
> >     if (snprintf(addrbuf, sizeof addrbuf, "%s", p) >= sizeof addrbuf)
> >       return -1;
> 
> The spec says snprintf can return <0, which this code fails to handle.
> It's still not better, anyway; it sure isn't obvious that this is a string copy.
> But to continue...

This is simply wrong as I already addressed. The resulting type of the
sizeof operator is size_t, and on all systems of any practical
interest, size_t has integer conversion rank higher than (signed) int.
Thus any negative value, after conversion, is >=INT_MAX, and in
practice, >= the size of any struct.

> > Though I wouldn't use snprintf here, as the following distinguishes the 
> > check from the action more clearly:
> > 
> >     if (strlen(p) >= sizeof addrbuf)
> >       return -1;
> >     strcpy(addrbuf, p);
> 
> No. That wastes a lot of *people* time and is dangerous during maintenance.
> In many projects, every strcpy() will (correctly) set off warning bells
> requiring multiple people to *prove* that it can't exceed the buffer overflows.
> And it's not just during writing the initial code.  It's also easy to turn this kind
> of code into a vulnerability when the code is maintained
> (which is one reason this kind of code wastes a lot of people-time),
> so every time the function is changed, people will have to re-check it, and
> over time this can get painful.
> 
> So every strcpy(), even if it's safe, increases *development* and *maintenance* time.
> Developer and reviewer time is often MORE important than execution time
> (even in C code, only some paths usually matter).
> It's not TOO hard in this case to show that the strcpy cannot be
> exceeded, sure, but that's not the end.

For the most part I agree with this argument. I've used the same
argument to reject use of sprintf as an "optimization" over snprintf
even where it's easy to prove that sprintf would be safe.

> >  Worse, this use of strlcpy has undefined behavior 
> > when cp points into buf.
> 
> I don't think so.  strlcpy is required to copy the source left-to-right, since it stops at the \0,
> and it's copying to the beginning of "buf", so I don't see an undefined behavior.

Are you sure? For standard functions this is false; aside from memcpy,
they are all documented with:

"If copying takes place between objects that overlap, the behavior is
undefined."

Since this is the general pattern, I would treat use of
strlcpy/strlcat with overlapping buffers as UB unless they're
explicitly documented to support such usage. Note that, if they are
documented to support overlapping buffers, this is another thing that
re-implementors could easily get wrong.

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-21 22:45                                               ` David A. Wheeler
  2014-08-22  0:37                                                 ` Rich Felker
@ 2014-08-22  1:39                                                 ` William Park
  2014-08-22  1:53                                                   ` Jonathan Nieder
  2014-08-22  2:32                                                 ` Paul Eggert
  2 siblings, 1 reply; 79+ messages in thread
From: William Park @ 2014-08-22  1:39 UTC (permalink / raw)
  To: libc-alpha

On Thu, Aug 21, 2014 at 06:45:29PM -0400, David A. Wheeler wrote:
> Every package has to keep re-implementing them, typically in less-efficient
> portable ways, because glibc fails to include them.
...
> We need alternatives, ones that automatically *prevent* buffer overflows
> and can be easily applied.
...
> We need simple mechanisms that people can use *every time* that
> prevent buffer overflows.
...
> It's a *risk* decision.  C should have routines that give you the fastest
> possible routines when you need them... but it should also have some
> less-risky functions for common cases.

I agree totally.  Main problem of not including in glibc is that every
programmer has to carry around his/her own version.  For me, I want
automatic termination, so my routines always assign \0 to the end of
destination array, if there is space for it.  But, it's pain to carry
them around.

Does it have to be in glibc, though?  Couldn't you include them (and
others) in a separate library?  Maybe there is, but it's not included in
my distro.  Hence, problem...
-- 
William

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-22  1:39                                                 ` William Park
@ 2014-08-22  1:53                                                   ` Jonathan Nieder
  2014-08-22  4:36                                                     ` William Park
  0 siblings, 1 reply; 79+ messages in thread
From: Jonathan Nieder @ 2014-08-22  1:53 UTC (permalink / raw)
  To: libc-alpha

William Park wrote:

>                                       Couldn't you include them (and
> others) in a separate library?  Maybe there is, but it's not included in
> my distro.

Does your distro include libbsd <http://libbsd.freedesktop.org/>?

Curious,
Jonathan

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-21 22:45                                               ` David A. Wheeler
  2014-08-22  0:37                                                 ` Rich Felker
  2014-08-22  1:39                                                 ` William Park
@ 2014-08-22  2:32                                                 ` Paul Eggert
  2014-08-22  2:51                                                   ` Rich Felker
  2014-09-08 23:21                                                   ` David A. Wheeler
  2 siblings, 2 replies; 79+ messages in thread
From: Paul Eggert @ 2014-08-22  2:32 UTC (permalink / raw)
  To: dwheeler, libc-alpha

David A. Wheeler wrote:

> I'd like to convince you to think about *risk*.

I had already thought about it.  There's no evidence that using strlcpy 
reduces risk significantly, compared to spending an equivalent amount of 
effort using standard alternatives.  If anything, the little evidence 
we've seen indicates the contrary.

Most of your email was about *style*, not about *risk*.  Style arguments 
are a recipe for endless dispute, which I'd rather avoid; so I'll let 
you have the last word on style preferences.  Going onto the technical 
points:

>>> addrmatch.c:321:
> The spec says snprintf can return <0, which this code fails to handle.

In general snprintf can return <0, but this call won't.  And even if it 
did, it wouldn't be a problem in practice, as Rich Felker mentioned.

>>> auth.c:486:
>>>   strlcpy(buf, cp, sizeof(buf));
>>>   ... So.. do you really believe that MAXPATHLEN really is the max length?
>>

>> ... this use of strlcpy has undefined behavior ...
>
> I don't think so.  strlcpy is required to copy the source left-to-right

The OpenBSD man page for strlcpy disagrees with you; see 
<http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man3/strlcpy.3>, 
which says "If the src and dst strings overlap, the behavior is 
undefined."  If strlcpy were standardized no doubt the same language 
would apply, as it's de rigueur for the string functions.

For the other three calls to strlcpy, you raised only style-based 
objections.

So, on a technical basis we have the same results as before: of the five 
strlcpy calls you brought up, one call can have undefined behavior, one 
call does silent truncation, and the other three do not fix any bugs 
compared to using standard routines.

To be honest, I was surprised by these results: I didn't think strlcpy 
would be this strikingly bad.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-22  2:32                                                 ` Paul Eggert
@ 2014-08-22  2:51                                                   ` Rich Felker
  2014-09-08 23:21                                                   ` David A. Wheeler
  1 sibling, 0 replies; 79+ messages in thread
From: Rich Felker @ 2014-08-22  2:51 UTC (permalink / raw)
  To: Paul Eggert; +Cc: dwheeler, libc-alpha

On Thu, Aug 21, 2014 at 07:32:40PM -0700, Paul Eggert wrote:
> David A. Wheeler wrote:
> 
> >I'd like to convince you to think about *risk*.
> 
> I had already thought about it.  There's no evidence that using
> strlcpy reduces risk significantly, compared to spending an
> equivalent amount of effort using standard alternatives.  If
> anything, the little evidence we've seen indicates the contrary.
> 
> Most of your email was about *style*, not about *risk*.  Style
> arguments are a recipe for endless dispute, which I'd rather avoid;
> so I'll let you have the last word on style preferences.  Going onto
> the technical points:

I agree totally that style arguments are a dead-end, but can we focus
on the facts (cited in the linked email from Theo) about how many
programs are using strlcpy/strlcat, and providing their own (high
risk) when the system doesn't provide one? This is the one situation
where adding strlcpy/strlcat to glibc woukd make an immediate
difference to security: programs which only define their own version
when the system lacks it would immediately get a safe version.

For other programs which are unconditionally defining their own
version, there would be no immediate effect, but it's plausible that
they would gradually transition to using the system one when it's
available.

> >>>auth.c:486:
> >>>  strlcpy(buf, cp, sizeof(buf));
> >>>  ... So.. do you really believe that MAXPATHLEN really is the max length?
> >>
> 
> >>... this use of strlcpy has undefined behavior ...
> >
> >I don't think so.  strlcpy is required to copy the source left-to-right
> 
> The OpenBSD man page for strlcpy disagrees with you; see <http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man3/strlcpy.3>,
> which says "If the src and dst strings overlap, the behavior is
> undefined."  If strlcpy were standardized no doubt the same language
> would apply, as it's de rigueur for the string functions.

I'm actually somewhat surprised OpenBSD specified any aspect of the
function, except passing an invalid pointer or lying about the buffer
size, to yield undefined behavior. Without looking at the man page I
thought it was more likely they might have just overlooked this issue,
if they hadn't explicitly defined the function to check for it and
either support it or consider it a reportable error.

> For the other three calls to strlcpy, you raised only style-based
> objections.
> 
> So, on a technical basis we have the same results as before: of the
> five strlcpy calls you brought up, one call can have undefined
> behavior, one call does silent truncation, and the other three do
> not fix any bugs compared to using standard routines.
> 
> To be honest, I was surprised by these results: I didn't think
> strlcpy would be this strikingly bad.

:-)

Rich

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-22  1:53                                                   ` Jonathan Nieder
@ 2014-08-22  4:36                                                     ` William Park
  0 siblings, 0 replies; 79+ messages in thread
From: William Park @ 2014-08-22  4:36 UTC (permalink / raw)
  To: libc-alpha

On Thu, Aug 21, 2014 at 06:52:54PM -0700, Jonathan Nieder wrote:
> William Park wrote:
> 
> >                                       Couldn't you include them (and
> > others) in a separate library?  Maybe there is, but it's not included in
> > my distro.
> 
> Does your distro include libbsd <http://libbsd.freedesktop.org/>?

Mine (Slackware) and all other major Linux distros don't install it by
default.  Searching pkgs.org, however, packages are available.  I guess
it's better than compiling from source every time.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-08-22  2:32                                                 ` Paul Eggert
  2014-08-22  2:51                                                   ` Rich Felker
@ 2014-09-08 23:21                                                   ` David A. Wheeler
  2014-09-09  3:34                                                     ` Paul Eggert
  1 sibling, 1 reply; 79+ messages in thread
From: David A. Wheeler @ 2014-09-08 23:21 UTC (permalink / raw)
  To: libc-alpha

Paul Eggert <eggert@cs.ucla.edu> wrote:
> ... There's no evidence that using strlcpy 
> reduces risk significantly, compared to spending an equivalent amount of 
> effort using standard alternatives.  If anything, the little evidence 
> we've seen indicates the contrary.

We may have to agree to disagree, since we don't agree on what the examples *mean*.

With the same data I conclude "strlcpy is obviously much better".
Yes, you CAN write secure code without strlcpy/strlcat,
as you showed in your examples, but I think that proves nothing.

> Most of your email was about *style*, not about *risk*.

I disagree. Strlcpy/strlcat (and the annex K functions) *always* limit
buffer overwrites to the stated length (as well as guaranteeing \0 termination
when there is a length). That is not a style; that is a fact.  In contrast,
any function that doesn't limit writing length (and doesn't allocate)
can produce a buffer overflow. That is not a style; that is a fact.

Therefore, using strlcpy/strlcat provides built-in protections against the risk of
programming error that can produce a buffer overflow, as compared to using
functions like strcpy.  We can argue about whether or not
the reduction is *worth* it (fair enough!), but that's a different argument.

You've shown how to rewrite code that uses strlcpy/strlcat into something else
that doesn't have buffer overflows.  But that is not the point.
If developers never make mistakes, there are no problems.
However, there is no evidence that mistakes have disappeared recently :-).

If someone makes a mistake with strlcpy/strlcat, typically the worst that
happens is silent truncation, which is not good but is typically much harder to exploit than
the buffer overflow vulnerabilities that occur when incorrectly using functions like strcpy and sprintf.

You pointed out that some strlcpy calls allow truncation, but that's the point: That is
the *worst* that happens with mistakes.  Having functions that decrease the risk of vulnerabilities
due to programmer error is not just a "style"; it is a key point.
Strlcpy/strlcat and annex K produce code that is less likely to be vulnerable
to buffer overflows *even if* the code around it makes a mistake.

> >>> addrmatch.c:321:
> > The spec says snprintf can return <0, which this code fails to handle.
> 
> In general snprintf can return <0, but this call won't.  And even if it 
> did, it wouldn't be a problem in practice, as Rich Felker mentioned.

As you note, snprintf *may* return a negative value.  Indeed, POSIX specifically requires it
if the # of characters exceeds INT_MAX.  Also, a cast to unsigned doesn't always work,
it depends on the system architecture and C implementation.  The non-spec assumptions
you have to make turn out to be painfully long when trying to use casts.

Perhaps focusing on my intended goal would help.
My goal is to have a set of functions that reduce the risk that a
programming mistake will lead to a security vulnerability (especially buffer overflow),
better than the current functions in GNU libc.  Ideally it would be a set of functions
portably available across C library implementations.
I accept that asprintf/vasprintf are very helpful for the dynamically-allocated case.

I'm still looking for easy-to-use functions that are designed to prevent
buffer overflows when the destination has a pre-allocated size.
Yes, snprintf helps, but is that really the best that can be done?
It's clunky and hides common cases (e.g., string copy and concatenation).

If annex K is unacceptable, would a change to it be acceptable?
For example, perhaps some _nhs variants that guaranteed to never invoke a handler?
Or something else?

Suggestions welcome.

--- David A. Wheeler

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: Implement C11 annex K?
  2014-09-08 23:21                                                   ` David A. Wheeler
@ 2014-09-09  3:34                                                     ` Paul Eggert
  0 siblings, 0 replies; 79+ messages in thread
From: Paul Eggert @ 2014-09-09  3:34 UTC (permalink / raw)
  To: dwheeler, libc-alpha

David A. Wheeler wrote:

> any function that doesn't limit writing length (and doesn't allocate)
> can produce a buffer overflow.

Sure, and that's also true for functions that *do* limit writing length. 
  Even strlcpy can produce a buffer overflow, because it can be given 
the wrong pointer or size.  None of these techniques guarantee 
well-defined behavior in all cases.  It's question not of absolutes, but 
of relative advantages.

> some strlcpy calls allow truncation, but that's the point: That is
> the *worst* that happens with mistakes.

No, a mistaken use of strlcpy can lead to undefined behavior, both as 
mentioned above and (in a different way) as in the auth.c:486 example 
discussed earlier.  Undefined behavior is worse than silent truncation.

> We can argue about whether or not the reduction is *worth* it

That's not what I'm saying.  I'm saying there's no empirical evidence 
that having programmers rewrite code to use strlcpy improves reliability 
significantly more than spending the same effort to improve reliability 
via standard functions and/or widely-used technology.  This issue is 
independent of whether the extra reliability is worth the effort.

Certainly the evidence we've seen doesn't look good for strlcpy/strlcat: 
of the five uses examined, one had undefined behavior, one had silent 
truncation, and the other three didn't fix any bugs compared to standard 
functions.  Admittedly this is a tiny set of cases, but still ... these 
cases were supposed to be evidence *for* strlcpy, and instead they lean 
*against* it.

>>>>> addrmatch.c:321:
> a cast to unsigned doesn't always work,
> it depends on the system architecture and C implementation.

It is not a problem for any platform that code runs on.  To allay any 
concerns about hypothetical oddball platforms I suggest adding 
'static_assert (INT_MAX < SIZE_MAX);' (or its pre-C11 equivalent, for 
older platforms).  Hypothetical problem solved.  Quite possibly OpenSSH 
is already making that assumption elsewhere anyway.

> snprintf helps, but is that really the best that can be done?

No, snprintf can be improved.  For example, the sprintf function family 
should be extended to support size_t rather than int values for sizes. 
This would fix an obvious botch in the traditional API, and would also 
solve the abovementioned hypothetical problem.

More generally, though, we should encourage static checks for these 
sorts of problems, instead of run-time checks or (worse) silent truncation.

> It's clunky

That's a style argument, a time-sink I'd rather avoid.

^ permalink raw reply	[flat|nested] 79+ messages in thread

end of thread, other threads:[~2014-09-09  3:34 UTC | newest]

Thread overview: 79+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1407616492.31098.ezmlm@sourceware.org>
2014-08-09 20:52 ` Implement C11 annex K? David A. Wheeler
2014-08-10  7:52   ` Andreas Jaeger
2014-08-10 15:03     ` Adhemerval Zanella
2014-08-11 15:32       ` Joseph S. Myers
2014-08-11 15:52         ` Paul Eggert
2014-08-11 16:06           ` Joseph S. Myers
2014-08-11 15:56         ` David A. Wheeler
2014-08-12  4:23       ` Rich Felker
     [not found]         ` <3565dfa0-060c-46b9-b08c-6edc4eaa1179@email.android.com>
2014-08-12 21:00           ` Rich Felker
     [not found]             ` <d4ae8119-f629-4235-8981-dd2ccc220fea@email.android.com>
2014-08-12 22:08               ` Rich Felker
2014-08-12 23:15                 ` David A. Wheeler
2014-08-12 23:48                   ` dalias
2014-08-13 19:23                     ` David A. Wheeler
2014-08-13 19:44                       ` Adhemerval Zanella
2014-08-13 19:45                         ` Adhemerval Zanella
2014-08-13 20:49                         ` Rich Felker
2014-08-13 20:41                       ` dalias
2014-08-13 20:55                       ` Joseph S. Myers
2014-08-13 21:25                         ` Paul Eggert
2014-08-13 21:35                           ` Rich Felker
2014-08-13 22:46                             ` Tolga Dalman
2014-08-13 23:59                               ` Russ Allbery
2014-08-14  0:55                                 ` Joseph S. Myers
2014-08-14  1:01                                   ` Russ Allbery
2014-08-14  2:25                                 ` Rich Felker
2014-08-14  5:25                                   ` Russ Allbery
2014-08-14  5:46                                     ` Rich Felker
2014-08-14  6:15                                       ` Russ Allbery
2014-08-14  9:55                                         ` Florian Weimer
2014-08-14 10:02                                           ` Andreas Schwab
2014-08-14 10:06                                             ` Florian Weimer
2014-08-14 10:13                                               ` Andreas Schwab
2014-08-14 16:26                                                 ` Rich Felker
2014-08-14 16:53                                                   ` Andreas Schwab
2014-08-14 17:04                                                     ` Rich Felker
2014-08-18  7:31                                                       ` Andreas Schwab
2014-08-18 19:20                                                         ` Rich Felker
2014-08-14 15:20                                         ` Paul Eggert
2014-08-14 17:20                                           ` Russ Allbery
2014-08-14 17:46                                           ` Rich Felker
2014-08-15  7:51                                             ` Florian Weimer
2014-08-14  6:08                                 ` Paul Eggert
2014-08-15 14:25                                   ` David A. Wheeler
2014-08-15 15:36                                     ` Paul Eggert
2014-08-15 16:14                                       ` David A. Wheeler
2014-08-15 16:39                                         ` Rich Felker
2014-08-15 22:01                                           ` David A. Wheeler
2014-08-16  2:19                                             ` Rich Felker
2014-08-16  2:26                                             ` Russ Allbery
2014-08-16  2:49                                               ` Rich Felker
2014-08-16  3:03                                                 ` Russ Allbery
2014-08-15 22:04                                         ` Paul Eggert
2014-08-15 22:25                                           ` David A. Wheeler
2014-08-15 22:43                                             ` Adhemerval Zanella
2014-08-16  4:41                                               ` David A. Wheeler
2014-08-16  5:01                                                 ` Rich Felker
2014-08-17 18:03                                                   ` David A. Wheeler
2014-08-17 19:05                                                     ` dalias
2014-08-17 20:33                                                       ` David A. Wheeler
2014-08-17 23:25                                                         ` Rich Felker
2014-08-18  0:59                                                           ` David A. Wheeler
2014-08-18  0:15                                           ` David A. Wheeler
2014-08-18  8:03                                             ` Paul Eggert
2014-08-18 19:22                                               ` Rich Felker
2014-08-21 22:45                                               ` David A. Wheeler
2014-08-22  0:37                                                 ` Rich Felker
2014-08-22  1:39                                                 ` William Park
2014-08-22  1:53                                                   ` Jonathan Nieder
2014-08-22  4:36                                                     ` William Park
2014-08-22  2:32                                                 ` Paul Eggert
2014-08-22  2:51                                                   ` Rich Felker
2014-09-08 23:21                                                   ` David A. Wheeler
2014-09-09  3:34                                                     ` Paul Eggert
2014-08-13 22:20                         ` Time to add strlcpy/strlcat FINALLY David A. Wheeler
2014-08-14  1:09                           ` Paul Eggert
2014-08-14  2:34                             ` Rich Felker
2014-08-14  3:02                             ` William Park
2014-08-14 13:01                               ` Mike Frysinger
2014-08-15 10:37                                 ` Michael Kerrisk (man-pages)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).