From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by sourceware.org (Postfix) with ESMTPS id CD5E03858D1E for ; Thu, 10 Feb 2022 17:50:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CD5E03858D1E Received: by mail-wr1-x435.google.com with SMTP id w11so10975368wra.4 for ; Thu, 10 Feb 2022 09:50:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=fv8vbrAuTWiTz+59TIRL2fGHmiB4woEAEIFKuSYrYIc=; b=potU6q5cgBn9/92+9f55x1WKsjVLR8LrdD4HdNTl9vjT2oanV1NLm9xj6gxsJ6cckZ Ui3IwabtgwkjtauBavH5FeMpnv0ZspywWF/TnTLf9dmvNAGN9ncctXgy7ih3suznl+gY 179D4KzqrTRuElqWpjKqoUzUeM9A/Mz9BIn/3sE7rN1L3wWGHWm7hviVssrmDAfdeBab LGu9AJUXTdtntlFSuXKchRl5VzOCAlQenb8PpoTxaycOtk/M5sncdxMCVvrb92wU2lNx mYU6+xZJwzcyTSo7QdrvQsZw0VZiebYR10gLaftJYoNzX1z19Q8juuwVsnNcJpWGa719 IR+Q== X-Gm-Message-State: AOAM5331dh4bKlQtME1wcyESNlBUuGSEUTmnIKWPFFlq1g3kblA3Z5m2 YAdhMHBolxrHLKV4vJVECJY= X-Google-Smtp-Source: ABdhPJyX9SZ14NAZIPo6/50NZ7VESKN/2OiZww3xoDC+Gq835UkzYhXX3WrOlQmRe5a96/SACyN2Zw== X-Received: by 2002:a5d:48c1:: with SMTP id p1mr7313139wrs.137.1644515407787; Thu, 10 Feb 2022 09:50:07 -0800 (PST) Received: from [192.168.0.160] ([170.253.36.171]) by smtp.gmail.com with ESMTPSA id q76sm2229147wme.1.2022.02.10.09.50.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 10 Feb 2022 09:50:07 -0800 (PST) Message-ID: <78cdba88-9e00-798a-846b-f0f77559bfd5@gmail.com> Date: Thu, 10 Feb 2022 18:50:06 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.1 Subject: Re: [PATCH v2] x86-64: Optimize bzero Content-Language: en-US To: Adhemerval Zanella , Wilco Dijkstra , Noah Goldstein , "H.J. Lu" Cc: GNU C Library References: <20220208224319.40271-1-hjl.tools@gmail.com> <1f75bda3-9e89-6860-a042-ef0406b072c1@linaro.org> From: "Alejandro Colomar (man-pages)" In-Reply-To: <1f75bda3-9e89-6860-a042-ef0406b072c1@linaro.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Feb 2022 17:50:10 -0000 Hi, On 2/10/22 14:16, Adhemerval Zanella via Libc-alpha wrote: > On 10/02/2022 10:10, Adhemerval Zanella wrote: >> On 10/02/2022 10:01, Wilco Dijkstra wrote: >>>>> Agreed it's not clear if it's worth it to start replacing memset calls with >>>>> bzero calls, but at the very least this will improve existing code that >>>>> uses bzero. >>> >>> No code uses bzero, no compiler emits bzero. It died 2 decades ago... Sorry to ruin your day, but there's a bzero(3) user here :) There are rational reasons to prefer bzero(3) due to it's better interface, and only one to prefer memset(3): standards people decided to remove bzero(3). See . Consider the following quote from "UNIX Network Programming" by Stevens et al., Section 1.2 (emphasis added): > `bzero` is not an ANSI C function. It is derived from early Berkely > networking code. Nevertheless, we use it throughout the text, instead > of the ANSI C `memset` function, because `bzero` is easier to remember > (with only two arguments) than `memset` (with three arguments). Almost > every vendor that supports the sockets API also provides `bzero`, and > if not, we provide a macro definition in our `unp.h` header. > > Indeed, **the author of TCPv3 [TCP/IP Illustrated, Volume 3 - Stevens 1996] made the mistake of swapping the second > and third arguments to `memset` in 10 occurrences in the first > printing**. A C compiler cannot catch this error because both arguments > are of the same type. (Actually, the second argument is an `int` and > the third argument is `size_t`, which is typically an `unsigned int`, > but the values specified, 0 and 16, respectively, are still acceptable > for the other type of argument.) The call to `memset` still worked, > because only a few of the socket functions actually require that the > final 8 bytes of an Internet socket address structure be set to 0. > Nevertheless, it was an error, and one that could be avoided by using > `bzero`, because swapping the two arguments to `bzero` will always be > caught by the C compiler if function prototypes are used. I consistently use bzero(3), and dislike the interface of memset(3) for zeroing memory. I checked how many memset(3) calls there are in a codebase of mine, and there is exactly 1 call to memset(3), for setting an array representing a large bitfield to 1s: memset(arr, UCHAR_MAX, sizeof(arr)). And there are 41 calls to bzero(3). >>> >>>> My point is this is a lot of code and infrastructure for a symbol marked >>>> as legacy for POSIX.1-2001 and removed on POSIX.1-2008 for the sake of >>>> marginal gains in specific cases. >>> >>> Indeed, what we really should discuss is how to remove the last traces of >>> bcopy and bcmp from GLIBC. Do we need to keep a compatibility symbol >>> or could we just get rid of it altogether? I think it's sad that POSIX removed bzero(3). In the end, people need to zero memory, and there's no simpler interface than bzero(3) for that. memset(3) has a useless extra parameter. Even if you just do a simple wrapper as the following, which is no big overhead for glibc I guess, you would be improving (or not worsening) my and hopefully others' lives: static inline bzero(s, n) { memset(s, 0, n); } Is that really a pain to maintain? If libc ever removes bzero(3), it's not like I'm going to start using memset(3). I'll just define it myself. That's not an improvement, but the opposite, IMO. Ideally, POSIX would reconsider some day, and reincorporate bzero(3), but I don't expect that any soon :(. >> >> We need to keep the symbols as-is afaiu, since callers might still target >> old POSIX where the symbol is defined as supported. We might add either >> compiler or linker warning stating the symbols is deprecated, but imho it >> would just be better if we stop trying to microoptimize it and just use >> the generic interface (which call memset). No please, I know they are deprecated, and explicitly want to use it. Don't need some extra spurious warning. Other projects that I have touched (including nginx and shadow-utils), seem to have some similar opinion to me, and they define memzero() or some other similar name, with an interface identical to bzero(3) (just a different name to avoid problems), to wrap either bzero(3) or memset(3), depending on what is available. Thanks, Alex -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/