public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: "Alejandro Colomar (man-pages)" <alx.manpages@gmail.com>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>,
	Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
	Noah Goldstein <goldstein.w.n@gmail.com>,
	"H.J. Lu" <hjl.tools@gmail.com>
Cc: GNU C Library <libc-alpha@sourceware.org>
Subject: Re: [PATCH v2] x86-64: Optimize bzero
Date: Thu, 10 Feb 2022 18:50:06 +0100	[thread overview]
Message-ID: <78cdba88-9e00-798a-846b-f0f77559bfd5@gmail.com> (raw)
In-Reply-To: <1f75bda3-9e89-6860-a042-ef0406b072c1@linaro.org>

Hi,


On 2/10/22 14:16, Adhemerval Zanella via Libc-alpha wrote:
> On 10/02/2022 10:10, Adhemerval Zanella wrote:
>> On 10/02/2022 10:01, Wilco Dijkstra wrote:
>>>>> Agreed it's not clear if it's worth it to start replacing memset
calls with
>>>>> bzero calls, but at the very least this will improve existing code
that
>>>>> uses bzero.
>>>
>>> No code uses bzero, no compiler emits bzero. It died 2 decades ago...

Sorry to ruin your day, but there's a bzero(3) user here :)

There are rational reasons to prefer bzero(3) due to it's better
interface, and only one to prefer memset(3): standards people decided to
remove bzero(3).  See <https://stackoverflow.com/a/17097978/6872717>.

Consider the following quote from "UNIX Network Programming" by Stevens
et al., Section 1.2 (emphasis added):
> `bzero` is not an ANSI C function. It is derived from early Berkely
> networking code. Nevertheless, we use it throughout the text, instead
> of the ANSI C `memset` function, because `bzero` is easier to remember
> (with only two arguments) than `memset` (with three arguments). Almost
> every vendor that supports the sockets API also provides `bzero`, and
> if not, we provide a macro definition in our `unp.h` header.
>
> Indeed, **the author of TCPv3 [TCP/IP Illustrated, Volume 3 - Stevens
1996] made the mistake of swapping the second
> and third arguments to `memset` in 10 occurrences in the first
> printing**. A C compiler cannot catch this error because both arguments
> are of the same type. (Actually, the second argument is an `int` and
> the third argument is `size_t`, which is typically an `unsigned int`,
> but the values specified, 0 and 16, respectively, are still acceptable
> for the other type of argument.) The call to `memset` still worked,
> because only a few of the socket functions actually require that the
> final 8 bytes of an Internet socket address structure be set to 0.
> Nevertheless, it was an error, and one that could be avoided by using
> `bzero`, because swapping the two arguments to `bzero` will always be
> caught by the C compiler if function prototypes are used.

I consistently use bzero(3), and dislike the interface of memset(3) for
zeroing memory.  I checked how many memset(3) calls there are in a
codebase of mine, and there is exactly 1 call to memset(3), for setting
an array representing a large bitfield to 1s: memset(arr, UCHAR_MAX,
sizeof(arr)).  And there are 41 calls to bzero(3).

>>>
>>>> My point is this is a lot of code and infrastructure for a symbol
marked
>>>> as legacy for POSIX.1-2001 and removed on POSIX.1-2008 for the sake of
>>>> marginal gains in specific cases.
>>>
>>> Indeed, what we really should discuss is how to remove the last
traces of
>>> bcopy and bcmp from GLIBC. Do we need to keep a compatibility symbol
>>> or could we just get rid of it altogether?

I think it's sad that POSIX removed bzero(3).  In the end, people need
to zero memory, and there's no simpler interface than bzero(3) for that.
 memset(3) has a useless extra parameter.  Even if you just do a simple
wrapper as the following, which is no big overhead for glibc I guess,
you would be improving (or not worsening) my and hopefully others' lives:

static inline bzero(s, n)
{
	memset(s, 0, n);
}

Is that really a pain to maintain?

If libc ever removes bzero(3), it's not like I'm going to start using
memset(3).  I'll just define it myself.  That's not an improvement, but
the opposite, IMO.

Ideally, POSIX would reconsider some day, and reincorporate bzero(3),
but I don't expect that any soon :(.

>>
>> We need to keep the symbols as-is afaiu, since callers might still target
>> old POSIX where the symbol is defined as supported.  We might add either
>> compiler or linker warning stating the symbols is deprecated, but imho it
>> would just be better if we stop trying to microoptimize it and just use
>> the generic interface (which call memset).

No please, I know they are deprecated, and explicitly want to use it.
Don't need some extra spurious warning.

Other projects that I have touched (including nginx and shadow-utils),
seem to have some similar opinion to me, and they define memzero() or
some other similar name, with an interface identical to bzero(3) (just a
different name to avoid problems), to wrap either bzero(3) or memset(3),
depending on what is available.

Thanks,

Alex

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

  reply	other threads:[~2022-02-10 17:50 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-08 22:43 H.J. Lu
2022-02-08 23:56 ` Noah Goldstein
2022-02-09 11:41 ` Adhemerval Zanella
2022-02-09 22:14   ` Noah Goldstein
2022-02-10 12:35     ` Adhemerval Zanella
2022-02-10 13:01       ` Wilco Dijkstra
2022-02-10 13:10         ` Adhemerval Zanella
2022-02-10 13:16           ` Adhemerval Zanella
2022-02-10 13:17           ` Wilco Dijkstra
2022-02-10 13:22             ` Adhemerval Zanella
2022-02-10 17:50               ` Alejandro Colomar (man-pages) [this message]
2022-02-10 19:19                 ` Wilco Dijkstra
2022-02-10 20:27                   ` Alejandro Colomar (man-pages)
2022-02-10 20:42                     ` Adhemerval Zanella
2022-02-10 21:07                       ` Patrick McGehearty
2022-02-11 13:01                         ` Adhemerval Zanella
2022-02-12 23:46                           ` Noah Goldstein
2022-02-14 12:07                             ` Adhemerval Zanella
2022-02-14 12:41                               ` Noah Goldstein
2022-02-14 14:07                                 ` Adhemerval Zanella
2022-02-14 15:03                                   ` H.J. Lu
2022-05-04  6:35                                     ` Sunil Pandey
2022-05-04 12:52                                       ` Adhemerval Zanella
2022-05-04 14:50                                         ` H.J. Lu
2022-05-04 14:54                                           ` Adhemerval Zanella
2022-02-10 22:00                       ` Alejandro Colomar (man-pages)
2022-02-10 19:42                 ` Adhemerval Zanella
2022-02-10 18:28         ` Noah Goldstein
2022-02-10 18:35         ` Noah Goldstein
2022-02-15 13:38 Wilco Dijkstra
2022-02-23  8:12 ` Noah Goldstein
2022-02-23 12:09   ` Adhemerval Zanella
2022-02-24 13:16   ` Wilco Dijkstra
2022-02-24 15:48     ` H.J. Lu
2022-02-24 22:58     ` Noah Goldstein
2022-02-24 23:21       ` Noah Goldstein
2022-02-25 17:37         ` Noah Goldstein
2022-02-25 13:51       ` Wilco Dijkstra
2022-02-25 17:35         ` Noah Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=78cdba88-9e00-798a-846b-f0f77559bfd5@gmail.com \
    --to=alx.manpages@gmail.com \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=goldstein.w.n@gmail.com \
    --cc=hjl.tools@gmail.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).