From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-il1-x129.google.com (mail-il1-x129.google.com [IPv6:2607:f8b0:4864:20::129]) by sourceware.org (Postfix) with ESMTPS id 460A03858C2C for ; Thu, 16 Sep 2021 17:55:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 460A03858C2C Received: by mail-il1-x129.google.com with SMTP id v16so7510848ilg.3 for ; Thu, 16 Sep 2021 10:55:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZuZeepb+fOgfgSwmW3QyoP1eLfPcq+HYFyzmcZ4OcAw=; b=uIJHuprgTzfbndFV2SN4cEbrN2TY+tEDAJ15+lO69wuCtxk3o3DqQichVrTuIRcD3X VbomOEO/maMIs+CZVo7jRnn1Nyzi9HmD3YMXuBxl3woBwjhpxBDIE6tOodMGeWU1Ckuj FKJm3I36qMvjbM3muZWgkbmy6L5eK8f7Li9+g9exy+IqidXTniKKH5h/zmNzpClLXOSl fS0akOFc2eWYtbqnK7lOKzDsdTg34Zzjyj/O99f4YcaJn6gPrSxmdkfyKoKX4CStWYsl zZwYYIPCyK87RjFljJqnqppV9gAfu55zrKloKZBHYmjdGlHmZ430b6cas3AyWu3Y7fga IneA== X-Gm-Message-State: AOAM532FXJ75IcLFwjy8KXULbvXnQ3bhaEA8p4Z0N+GMK/TVjS6WVv3C yBcjFLED92Y6QHLQ04Y/vzG6Z6kD+tZRM0dtf8aplzb7T0s= X-Google-Smtp-Source: ABdhPJz3/oQtgtpGbqGZ25VAF3/VakRLFOk0AU3pVP+z1Rv35JPXywejmQ6m1VTpK7bHO5kKmzI05rQLVi/4+S5JOaU= X-Received: by 2002:a92:c56b:: with SMTP id b11mr4682250ilj.225.1631814948414; Thu, 16 Sep 2021 10:55:48 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Chris Kennelly Date: Thu, 16 Sep 2021 13:55:37 -0400 Message-ID: Subject: Re: [libc-coord] Add new ABI '__memcmpeq()' to libc To: libc-coord@lists.openwall.com Cc: GNU C Library , gcc@gcc.gnu.org X-Spam-Status: No, score=-17.8 required=5.0 tests=BAYES_00, DKIMWL_WL_MED, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2021 17:55:51 -0000 On Thu, Sep 16, 2021 at 1:04 PM Noah Goldstein wrote: > Hi All, > > This is a proposal for a new interface to be supported by libc. > > The new interface is the same as the old 'bcmp()' routine. Essentially > the goal of this proposal is to add a reserved namespace for a new > function, '__memcmpeq()', which shares the same behavior as the old > 'bcmp()'. > > #### Interface #### > > int __memcmpeq(void const * s1, const void * s2, size_t n) > > > #### Description #### > > The '__memcmpeq()' function would compare the two byte sequences 's1' > and 's2', each of length 'n'. If the two byte sequences are equal, the > return would be zero. Otherwise it would return some non-zero > value. 'memcmp()' is a valid implementation of '__memcmpeq()'. > > > #### Use Case #### > > 1. The goal is that '__memcmpeq()' will be usable as an optimization > by compilers if a program uses the return value of 'memcmp()' as a > boolean. For example: > > > void foo(const void* s1, const void* s2, size_t n) > { > if (!memcmp(s1, s2, n)) { > printf("memcmp can be optimized to __memcmpeq in this use case\n"); > } > } > > > - In the above case '__memcmpeq()' could be used instead. Due to the > simpler constraints on the return value of '__memcmpeq()', it will > be able to be implemented more optimally for this case than > 'memcmp()'. If there is no separately optimized version of > '__memcmpeq()' can alias 'memcmp()' and thus be at least equally as > fast. > LLVM does this transformation (but to bcmp), as part of https://reviews.llvm.org/rG8e16d73346f8091461319a7dfc4ddd18eedcff13. I seem to recall a small amount of trickiness around determining whether the platform had a bcmp. Since this is intentionally the same as bcmp, is it possible to clarify the motivation for additional symbol? > 2. Possibly use cases in security as the runtime of the function will > be *more* oblivious to the byte sequences being compared. > > > #### Argument Specifications #### > > 1. 's1' > - All 'n' bytes in the byte sequence starting at 's1' and ending > at, but not including, 's1 + n' must be accessible memory. There > are no guarantees about the order the sequence will be > traversed. > 2. 's2' > - All 'n' bytes in the byte sequence starting at 's2' and ending > at, but not including, 's2 + n' must be accessible memory. There > are no guarantees about the order the sequence will be > traversed. > 3. 'n' > - 'n' may be any value that does not violate the specifications on > 's1' and 's2'. > > If any of the argument specifications are violated there are no > guarantees about the behavior of the interface. > > > #### Return Value Specification #### > > If the byte sequences starting at 's1' and 's2' are equals the > function will return zero. Otherwise the function will return a > non-zero value. > > Equality between the byte sequences starting at 's1' and 's2' is > defined as follows: > > 1. If 'n' is zero the two sequences are zero. > 2. If 'n' is non-zero then for all 'i' in range [0, n) the byte at > offset 'i' of 's1' equals the byte at offset 'i' in 's2'. > > For a simple C implementation of '__memcmpeq()' could be as follows: > > > int __memcmpeq(const void* s1, const void* s2, size_t n) > { > int ret; > size_t i; > const char *s1c, *s2c; > s1c = (const char*)s1; > s2c = (const char*)s2; > for (i = 0, ret = 0; ret == 0 && i < n; ++i) { > ret = s1c[i] - s2c[i] > } > return ret; > } > > > #### Notes #### > > This interface is essentially old 'bcmp()' and 'memcmp()' will always > be a valid implementation of '__memcmpeq()'. > > > #### ABI vs API #### > > This proposal is for '__memcmpeq()' as a new ABI. As an ABI > '__memcmpeq()' will have value, as using the return value of > 'memcmp()' is quite idiomatic in C code. > > It is, however, possible that this would also be useful as a new API > as well. Especially if there are likely use cases where the compiler > would be unable to prove that '__memcmpeq()' would be a valid > replacement for 'memcmp()'. > > > #### Further Options #### > > If this proposal is received positively, libc could also add > interfaces for '__streq()', '__strneq()', '__wcseq()' and '__wcsneq()' > which similarly would loosen return value restrictions on 'strcmp()', > 'strncmp()', 'wcscmp()' and 'wcsncmp()' respectively. > > Best, > Noah >