From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com [IPv6:2607:f8b0:4864:20::62a]) by sourceware.org (Postfix) with ESMTPS id 839473858C27 for ; Thu, 16 Sep 2021 18:31:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 839473858C27 Received: by mail-pl1-x62a.google.com with SMTP id bb10so4472378plb.2 for ; Thu, 16 Sep 2021 11:31:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/+Xr1s6jJeZO5QmoAhs5yrJtcL6jRvg0R3iz6uPCnJU=; b=60FC5I7KTzNzqiAa4/QzqWZKZhh2QxPPamq0ZMVgW1C0PP1yqOyAvNrpsp1GTdBXSh 4vD7914asM5LCzBrsu809zcrHpXYXPgz648VAlaJNpPiNwgng3+FA8/Mf6VnXdwObeSs wYSOOnpleKaIOj48qmk6Fi2r6NJCoXXkGdXlG5G0Lvl+6RWjmTTpF9PTgPOHfCemVv7e f8DGCC73BN7UsiQLF0u1KdnywpQp8KCLgW4ZzYUm2smGHUuhM8nGox+CjatE936xdS/K ZfSSERVOqH1Yc0bubtZr7vMXlTrnceMQl1WVL+h20fW04TpXAuHGbv/6x3rI0g8ieyvh KClg== X-Gm-Message-State: AOAM5310mRbUBY6CNHLUIUtu6YNVDw+avLJUh6u0HkAMaJFYFnwWXqV/ dHvJDp7jy/QGkaLSbCdxqFMUqMzFc+LjqBz6r19bZzTL16lzZQ== X-Google-Smtp-Source: ABdhPJz1WMz3YrkIYtTJ+c3lpuiYwRiUJH+AaDHW+7LnUKs2fqc8G8j0A9ajCKGk/f6Cm3PP9YZjfNFqJAf28xcMF7Q= X-Received: by 2002:a17:90a:d516:: with SMTP id t22mr7609290pju.208.1631817113549; Thu, 16 Sep 2021 11:31:53 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Noah Goldstein Date: Thu, 16 Sep 2021 13:31:42 -0500 Message-ID: Subject: Re: [libc-coord] Add new ABI '__memcmpeq()' to libc To: Chris Kennelly Cc: libc-coord@lists.openwall.com, gcc@gcc.gnu.org, GNU C Library X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2021 18:31:56 -0000 On Thu, Sep 16, 2021 at 12:55 PM Chris Kennelly via Libc-alpha < libc-alpha@sourceware.org> wrote: > On Thu, Sep 16, 2021 at 1:04 PM Noah Goldstein > wrote: > > > Hi All, > > > > This is a proposal for a new interface to be supported by libc. > > > > The new interface is the same as the old 'bcmp()' routine. Essentially > > the goal of this proposal is to add a reserved namespace for a new > > function, '__memcmpeq()', which shares the same behavior as the old > > 'bcmp()'. > > > > #### Interface #### > > > > int __memcmpeq(void const * s1, const void * s2, size_t n) > > > > > > #### Description #### > > > > The '__memcmpeq()' function would compare the two byte sequences 's1' > > and 's2', each of length 'n'. If the two byte sequences are equal, the > > return would be zero. Otherwise it would return some non-zero > > value. 'memcmp()' is a valid implementation of '__memcmpeq()'. > > > > > > #### Use Case #### > > > > 1. The goal is that '__memcmpeq()' will be usable as an optimization > > by compilers if a program uses the return value of 'memcmp()' as a > > boolean. For example: > > > > > > void foo(const void* s1, const void* s2, size_t n) > > { > > if (!memcmp(s1, s2, n)) { > > printf("memcmp can be optimized to __memcmpeq in this use > case\n"); > > } > > } > > > > > > - In the above case '__memcmpeq()' could be used instead. Due to the > > simpler constraints on the return value of '__memcmpeq()', it will > > be able to be implemented more optimally for this case than > > 'memcmp()'. If there is no separately optimized version of > > '__memcmpeq()' can alias 'memcmp()' and thus be at least equally as > > fast. > > > > LLVM does this transformation (but to bcmp), as part of > https://reviews.llvm.org/rG8e16d73346f8091461319a7dfc4ddd18eedcff13. I > seem to recall a small amount of trickiness around determining whether the > platform had a bcmp. > > Since this is intentionally the same as bcmp, is it possible to clarify the > motivation for additional symbol? > The motivation is to get a new reserved namespace for a function that memcmp() calls can be transformed to if the return value is only used for its boolean value. I tried to add an optimized version of bcmp() to support LLVM's transformation: https://patches-gcc.linaro.org/patch/60168/ But the consensus seems to be that bcmp() is not suitable because 1) it is not a reserved namespace and 2) since it has had the same functionality as memcmp() programs might have started relying on that feature. Do you want me to update the above proposal with this information or were you just asking for more clarity for the thread? > > > > 2. Possibly use cases in security as the runtime of the function will > > be *more* oblivious to the byte sequences being compared. > > > > > > #### Argument Specifications #### > > > > 1. 's1' > > - All 'n' bytes in the byte sequence starting at 's1' and ending > > at, but not including, 's1 + n' must be accessible memory. There > > are no guarantees about the order the sequence will be > > traversed. > > 2. 's2' > > - All 'n' bytes in the byte sequence starting at 's2' and ending > > at, but not including, 's2 + n' must be accessible memory. There > > are no guarantees about the order the sequence will be > > traversed. > > 3. 'n' > > - 'n' may be any value that does not violate the specifications on > > 's1' and 's2'. > > > > If any of the argument specifications are violated there are no > > guarantees about the behavior of the interface. > > > > > > #### Return Value Specification #### > > > > If the byte sequences starting at 's1' and 's2' are equals the > > function will return zero. Otherwise the function will return a > > non-zero value. > > > > Equality between the byte sequences starting at 's1' and 's2' is > > defined as follows: > > > > 1. If 'n' is zero the two sequences are zero. > > 2. If 'n' is non-zero then for all 'i' in range [0, n) the byte at > > offset 'i' of 's1' equals the byte at offset 'i' in 's2'. > > > > For a simple C implementation of '__memcmpeq()' could be as follows: > > > > > > int __memcmpeq(const void* s1, const void* s2, size_t n) > > { > > int ret; > > size_t i; > > const char *s1c, *s2c; > > s1c = (const char*)s1; > > s2c = (const char*)s2; > > for (i = 0, ret = 0; ret == 0 && i < n; ++i) { > > ret = s1c[i] - s2c[i] > > } > > return ret; > > } > > > > > > #### Notes #### > > > > This interface is essentially old 'bcmp()' and 'memcmp()' will always > > be a valid implementation of '__memcmpeq()'. > > > > > > #### ABI vs API #### > > > > This proposal is for '__memcmpeq()' as a new ABI. As an ABI > > '__memcmpeq()' will have value, as using the return value of > > 'memcmp()' is quite idiomatic in C code. > > > > It is, however, possible that this would also be useful as a new API > > as well. Especially if there are likely use cases where the compiler > > would be unable to prove that '__memcmpeq()' would be a valid > > replacement for 'memcmp()'. > > > > > > #### Further Options #### > > > > If this proposal is received positively, libc could also add > > interfaces for '__streq()', '__strneq()', '__wcseq()' and '__wcsneq()' > > which similarly would loosen return value restrictions on 'strcmp()', > > 'strncmp()', 'wcscmp()' and 'wcsncmp()' respectively. > > > > Best, > > Noah > > >