public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: "Alejandro Colomar (man-pages)" <alx.manpages@gmail.com>,
	Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
	Noah Goldstein <goldstein.w.n@gmail.com>,
	"H.J. Lu" <hjl.tools@gmail.com>
Cc: GNU C Library <libc-alpha@sourceware.org>
Subject: Re: [PATCH v2] x86-64: Optimize bzero
Date: Thu, 10 Feb 2022 16:42:37 -0300	[thread overview]
Message-ID: <b7e8e4c2-37fe-42bd-7648-a390323e5ea1@linaro.org> (raw)
In-Reply-To: <78cdba88-9e00-798a-846b-f0f77559bfd5@gmail.com>



On 10/02/2022 14:50, Alejandro Colomar (man-pages) wrote:
> Hi,
> 
> 
> On 2/10/22 14:16, Adhemerval Zanella via Libc-alpha wrote:
>> On 10/02/2022 10:10, Adhemerval Zanella wrote:
>>> On 10/02/2022 10:01, Wilco Dijkstra wrote:
>>>>>> Agreed it's not clear if it's worth it to start replacing memset
> calls with
>>>>>> bzero calls, but at the very least this will improve existing code
> that
>>>>>> uses bzero.
>>>>
>>>> No code uses bzero, no compiler emits bzero. It died 2 decades ago...
> 
> Sorry to ruin your day, but there's a bzero(3) user here :)
> 
> There are rational reasons to prefer bzero(3) due to it's better
> interface, and only one to prefer memset(3): standards people decided to
> remove bzero(3).  See <https://stackoverflow.com/a/17097978/6872717>.
> 
> Consider the following quote from "UNIX Network Programming" by Stevens
> et al., Section 1.2 (emphasis added):
>> `bzero` is not an ANSI C function. It is derived from early Berkely
>> networking code. Nevertheless, we use it throughout the text, instead
>> of the ANSI C `memset` function, because `bzero` is easier to remember
>> (with only two arguments) than `memset` (with three arguments). Almost
>> every vendor that supports the sockets API also provides `bzero`, and
>> if not, we provide a macro definition in our `unp.h` header.
>>
>> Indeed, **the author of TCPv3 [TCP/IP Illustrated, Volume 3 - Stevens
> 1996] made the mistake of swapping the second
>> and third arguments to `memset` in 10 occurrences in the first
>> printing**. A C compiler cannot catch this error because both arguments
>> are of the same type. (Actually, the second argument is an `int` and
>> the third argument is `size_t`, which is typically an `unsigned int`,
>> but the values specified, 0 and 16, respectively, are still acceptable
>> for the other type of argument.) The call to `memset` still worked,
>> because only a few of the socket functions actually require that the
>> final 8 bytes of an Internet socket address structure be set to 0.
>> Nevertheless, it was an error, and one that could be avoided by using
>> `bzero`, because swapping the two arguments to `bzero` will always be
>> caught by the C compiler if function prototypes are used.
> 
> I consistently use bzero(3), and dislike the interface of memset(3) for
> zeroing memory.  I checked how many memset(3) calls there are in a
> codebase of mine, and there is exactly 1 call to memset(3), for setting
> an array representing a large bitfield to 1s: memset(arr, UCHAR_MAX,
> sizeof(arr)).  And there are 41 calls to bzero(3).

The bzero won't go away since glibc will continue to support old POSIX 
interfaces, but Wilco has a point: it is an interface that has been 
deprecated for more than 2 decades and usually only BSD-like code
still keeps using it (explicit_bzero was initialized from openbsd).

> 
>>>>
>>>>> My point is this is a lot of code and infrastructure for a symbol
> marked
>>>>> as legacy for POSIX.1-2001 and removed on POSIX.1-2008 for the sake of
>>>>> marginal gains in specific cases.
>>>>
>>>> Indeed, what we really should discuss is how to remove the last
> traces of
>>>> bcopy and bcmp from GLIBC. Do we need to keep a compatibility symbol
>>>> or could we just get rid of it altogether?
> 
> I think it's sad that POSIX removed bzero(3).  In the end, people need
> to zero memory, and there's no simpler interface than bzero(3) for that.
>  memset(3) has a useless extra parameter.  Even if you just do a simple
> wrapper as the following, which is no big overhead for glibc I guess,
> you would be improving (or not worsening) my and hopefully others' lives:
> 
> static inline bzero(s, n)
> {
> 	memset(s, 0, n);
> }
> 
> Is that really a pain to maintain?

Yes, this is *exactly* what we are trying to remove and I will oppose to
reintroduce it after all the cleanups we did in this front (for instance
18b10de7ced9e9).

Besides your code most likely is calling memset under the hoods, gcc by 
default converts bzero calls to memset.  It seems to be really tricky to 
actually force gcc emit a bzero call to libc, I would could make by not 
including string.h, adding an explicit prototype to bzero, along with
-fno-builtin:

$ cat bzero.c 
#include <stddef.h>
extern void bzero (void *, size_t s);

int main (int argc, char *argv[])
{
  char b[1024];
  bzero (b, sizeof b);
  return 0;
}
$ gcc -Wall -fno-builtin bzero.c -o test && objdump -R test | grep -w bzero
0000000000003fd0 R_X86_64_JUMP_SLOT  bzero@GLIBC_2.2.5

> 
> If libc ever removes bzero(3), it's not like I'm going to start using
> memset(3).  I'll just define it myself.  That's not an improvement, but
> the opposite, IMO.
> 
> Ideally, POSIX would reconsider some day, and reincorporate bzero(3),
> but I don't expect that any soon :(.

I really think it is unlikely, but it is just my view.

> 
>>>
>>> We need to keep the symbols as-is afaiu, since callers might still target
>>> old POSIX where the symbol is defined as supported.  We might add either
>>> compiler or linker warning stating the symbols is deprecated, but imho it
>>> would just be better if we stop trying to microoptimize it and just use
>>> the generic interface (which call memset).
> 
> No please, I know they are deprecated, and explicitly want to use it.
> Don't need some extra spurious warning.
> 
> Other projects that I have touched (including nginx and shadow-utils),
> seem to have some similar opinion to me, and they define memzero() or
> some other similar name, with an interface identical to bzero(3) (just a
> different name to avoid problems), to wrap either bzero(3) or memset(3),
> depending on what is available.

We are discussing different subjects here: what I want is to remove the
glibc *internal* optimization for bzero, which is essentially an
implementation detail.  In a first glance it would change performance,
however gcc does a hard job replacing bzero/bcmp/bcopy with their
str* counterparts, so it highly unlike that newer binaries will actually
call bzero.

I did an experiment on my system and indeed there is no calls to bzero.

$ cat count_bzero.sh
#!/bin/bash

files=`IFS=':';for i in $PATH; do test -d "$i" && find "$i" -maxdepth 1 -executable -type f; done`
total=0
for file in $files; do
  bzero=`objdump -R $file 2>&1`
  if [ $? -eq 0 ]; then
    ncalls=`echo $bzero | grep -w bzero | wc -l`
    ((total=total+ncalls))
    if [ $ncalls -gt 0 ]; then
      echo "$file: $ncalls"
    fi
  fi
done
echo "TOTAL=$total"
$ ./count_bzero.sh
TOTAL=0



  parent reply	other threads:[~2022-02-10 19:42 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-08 22:43 H.J. Lu
2022-02-08 23:56 ` Noah Goldstein
2022-02-09 11:41 ` Adhemerval Zanella
2022-02-09 22:14   ` Noah Goldstein
2022-02-10 12:35     ` Adhemerval Zanella
2022-02-10 13:01       ` Wilco Dijkstra
2022-02-10 13:10         ` Adhemerval Zanella
2022-02-10 13:16           ` Adhemerval Zanella
2022-02-10 13:17           ` Wilco Dijkstra
2022-02-10 13:22             ` Adhemerval Zanella
2022-02-10 17:50               ` Alejandro Colomar (man-pages)
2022-02-10 19:19                 ` Wilco Dijkstra
2022-02-10 20:27                   ` Alejandro Colomar (man-pages)
2022-02-10 20:42                     ` Adhemerval Zanella
2022-02-10 21:07                       ` Patrick McGehearty
2022-02-11 13:01                         ` Adhemerval Zanella
2022-02-12 23:46                           ` Noah Goldstein
2022-02-14 12:07                             ` Adhemerval Zanella
2022-02-14 12:41                               ` Noah Goldstein
2022-02-14 14:07                                 ` Adhemerval Zanella
2022-02-14 15:03                                   ` H.J. Lu
2022-05-04  6:35                                     ` Sunil Pandey
2022-05-04 12:52                                       ` Adhemerval Zanella
2022-05-04 14:50                                         ` H.J. Lu
2022-05-04 14:54                                           ` Adhemerval Zanella
2022-02-10 22:00                       ` Alejandro Colomar (man-pages)
2022-02-10 19:42                 ` Adhemerval Zanella [this message]
2022-02-10 18:28         ` Noah Goldstein
2022-02-10 18:35         ` Noah Goldstein
2022-02-15 13:38 Wilco Dijkstra
2022-02-23  8:12 ` Noah Goldstein
2022-02-23 12:09   ` Adhemerval Zanella
2022-02-24 13:16   ` Wilco Dijkstra
2022-02-24 15:48     ` H.J. Lu
2022-02-24 22:58     ` Noah Goldstein
2022-02-24 23:21       ` Noah Goldstein
2022-02-25 17:37         ` Noah Goldstein
2022-02-25 13:51       ` Wilco Dijkstra
2022-02-25 17:35         ` Noah Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b7e8e4c2-37fe-42bd-7648-a390323e5ea1@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=alx.manpages@gmail.com \
    --cc=goldstein.w.n@gmail.com \
    --cc=hjl.tools@gmail.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).