From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io1-xd35.google.com (mail-io1-xd35.google.com [IPv6:2607:f8b0:4864:20::d35]) by sourceware.org (Postfix) with ESMTPS id 0A0E63858C39 for ; Thu, 16 Sep 2021 20:32:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0A0E63858C39 Received: by mail-io1-xd35.google.com with SMTP id q3so9440432iot.3 for ; Thu, 16 Sep 2021 13:32:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=kJdj9rvhR8QBhEKiQ+czupAXGFq50uKMIfUQQGLnXes=; b=RlsUjwp9UkSiTmmtTnR2z3RbqDCBOTpWTj8bWTU3eYPXUt4Gugicj6fqeUw2EnJmGA rSyk5Rr+6ljm015VPnSOYmzMIgM2nd2Ex1k8lVNF5ASTlz4qixpE7feJrFsR+2FB0Lej iRQrvGC4y0WtbN5SwnM7eIVKwMCU4zAGgIzEymIxqgyksBkJ3nmjbms8p3HIKGuqhlnH KI+5kZ7FIQDykpA7MDaPLNLr+KtcA5KIVWd/s+6DaoKG6hnn2mCqDeQn1+FAzr0sg7Me tTKaH1Whsp6EudPA8nRib9+SWWDHxMrUB4ur7kcmFsuT1r2irhO84Gzy92+vf2XxnE0w o7nw== X-Gm-Message-State: AOAM530czvO6pIl4dze9elgqEN5ox2I9wgOBn7gltvT6Vr+ju2BbX391 UIshX05UE5bxAmGDhVVm/z3wFaJ3rDuDsJk5yZ+QLQ== X-Google-Smtp-Source: ABdhPJyTmJv71G+22ZpfOgpE3UHr/KqBmoWy51FP0LcPvihC4422b4n2AlfW5z+zOkI5k0xphZnOmSGjHmDJAGUZtn8= X-Received: by 2002:a05:6638:a2d:: with SMTP id 13mr5846937jao.12.1631824359098; Thu, 16 Sep 2021 13:32:39 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Chris Kennelly Date: Thu, 16 Sep 2021 16:32:28 -0400 Message-ID: Subject: Re: [libc-coord] Add new ABI '__memcmpeq()' to libc To: Noah Goldstein Cc: libc-coord@lists.openwall.com, gcc@gcc.gnu.org, GNU C Library X-Spam-Status: No, score=-17.9 required=5.0 tests=BAYES_00, DKIMWL_WL_MED, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2021 20:32:42 -0000 On Thu, Sep 16, 2021 at 2:31 PM Noah Goldstein wrote: > > > On Thu, Sep 16, 2021 at 12:55 PM Chris Kennelly via Libc-alpha < > libc-alpha@sourceware.org> wrote: > >> On Thu, Sep 16, 2021 at 1:04 PM Noah Goldstein >> wrote: >> >> > Hi All, >> > >> > This is a proposal for a new interface to be supported by libc. >> > >> > The new interface is the same as the old 'bcmp()' routine. Essentially >> > the goal of this proposal is to add a reserved namespace for a new >> > function, '__memcmpeq()', which shares the same behavior as the old >> > 'bcmp()'. >> > >> > #### Interface #### >> > >> > int __memcmpeq(void const * s1, const void * s2, size_t n) >> > >> > >> > #### Description #### >> > >> > The '__memcmpeq()' function would compare the two byte sequences 's1' >> > and 's2', each of length 'n'. If the two byte sequences are equal, the >> > return would be zero. Otherwise it would return some non-zero >> > value. 'memcmp()' is a valid implementation of '__memcmpeq()'. >> > >> > >> > #### Use Case #### >> > >> > 1. The goal is that '__memcmpeq()' will be usable as an optimization >> > by compilers if a program uses the return value of 'memcmp()' as a >> > boolean. For example: >> > >> > >> > void foo(const void* s1, const void* s2, size_t n) >> > { >> > if (!memcmp(s1, s2, n)) { >> > printf("memcmp can be optimized to __memcmpeq in this use >> case\n"); >> > } >> > } >> > >> > >> > - In the above case '__memcmpeq()' could be used instead. Due to the >> > simpler constraints on the return value of '__memcmpeq()', it will >> > be able to be implemented more optimally for this case than >> > 'memcmp()'. If there is no separately optimized version of >> > '__memcmpeq()' can alias 'memcmp()' and thus be at least equally as >> > fast. >> > >> >> LLVM does this transformation (but to bcmp), as part of >> https://reviews.llvm.org/rG8e16d73346f8091461319a7dfc4ddd18eedcff13. I >> seem to recall a small amount of trickiness around determining whether the >> platform had a bcmp. >> >> Since this is intentionally the same as bcmp, is it possible to clarify >> the >> motivation for additional symbol? >> > > The motivation is to get a new reserved namespace for a function that > memcmp() calls can be transformed to if the return value is only used > for its boolean value. > > I tried to add an optimized version of bcmp() to support LLVM's > transformation: https://patches-gcc.linaro.org/patch/60168/ > But the consensus seems to be that bcmp() is not suitable because 1) > it is not a reserved namespace and 2) since it has had the same > functionality as memcmp() programs might have started relying on that > feature. > llvm-libc's bcmp differs from memcmp, but agreed that Hyrum's Law can cause problems on point #2. In terms of relying on the feature: If __memcmpeq is ever exposed as an a simple alias for memcmp (since the notes mention that it's a valid implementation), does that open up the possibility of depending on the bcmp-like behavior that we were trying to escape? > > Do you want me to update the above proposal with this information or > were you just asking for more clarity for the thread? > > >> >> >> > 2. Possibly use cases in security as the runtime of the function will >> > be *more* oblivious to the byte sequences being compared. >> > >> > >> > #### Argument Specifications #### >> > >> > 1. 's1' >> > - All 'n' bytes in the byte sequence starting at 's1' and ending >> > at, but not including, 's1 + n' must be accessible memory. There >> > are no guarantees about the order the sequence will be >> > traversed. >> > 2. 's2' >> > - All 'n' bytes in the byte sequence starting at 's2' and ending >> > at, but not including, 's2 + n' must be accessible memory. There >> > are no guarantees about the order the sequence will be >> > traversed. >> > 3. 'n' >> > - 'n' may be any value that does not violate the specifications on >> > 's1' and 's2'. >> > >> > If any of the argument specifications are violated there are no >> > guarantees about the behavior of the interface. >> > >> > >> > #### Return Value Specification #### >> > >> > If the byte sequences starting at 's1' and 's2' are equals the >> > function will return zero. Otherwise the function will return a >> > non-zero value. >> > >> > Equality between the byte sequences starting at 's1' and 's2' is >> > defined as follows: >> > >> > 1. If 'n' is zero the two sequences are zero. >> > 2. If 'n' is non-zero then for all 'i' in range [0, n) the byte at >> > offset 'i' of 's1' equals the byte at offset 'i' in 's2'. >> > >> > For a simple C implementation of '__memcmpeq()' could be as follows: >> > >> > >> > int __memcmpeq(const void* s1, const void* s2, size_t n) >> > { >> > int ret; >> > size_t i; >> > const char *s1c, *s2c; >> > s1c = (const char*)s1; >> > s2c = (const char*)s2; >> > for (i = 0, ret = 0; ret == 0 && i < n; ++i) { >> > ret = s1c[i] - s2c[i] >> > } >> > return ret; >> > } >> > >> > >> > #### Notes #### >> > >> > This interface is essentially old 'bcmp()' and 'memcmp()' will always >> > be a valid implementation of '__memcmpeq()'. >> > >> > >> > #### ABI vs API #### >> > >> > This proposal is for '__memcmpeq()' as a new ABI. As an ABI >> > '__memcmpeq()' will have value, as using the return value of >> > 'memcmp()' is quite idiomatic in C code. >> > >> > It is, however, possible that this would also be useful as a new API >> > as well. Especially if there are likely use cases where the compiler >> > would be unable to prove that '__memcmpeq()' would be a valid >> > replacement for 'memcmp()'. >> > >> > >> > #### Further Options #### >> > >> > If this proposal is received positively, libc could also add >> > interfaces for '__streq()', '__strneq()', '__wcseq()' and '__wcsneq()' >> > which similarly would loosen return value restrictions on 'strcmp()', >> > 'strncmp()', 'wcscmp()' and 'wcsncmp()' respectively. >> > >> > Best, >> > Noah >> > >> >