From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by sourceware.org (Postfix) with ESMTPS id 5B0543858413 for ; Thu, 16 Sep 2021 23:24:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5B0543858413 Received: by mail-pj1-x1032.google.com with SMTP id p12-20020a17090adf8c00b0019c959bc795so370411pjv.1 for ; Thu, 16 Sep 2021 16:24:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=aW7eC+v0h90nx0cQx0fyiY0PfpfUPgc3Fopc/aMBaJo=; b=M/OmVIpfwDmyIes78SGQT1Qk9MBH5IAxFIJFJzwoHoWLXSUn30E7CQf3Qz9FgKdoxI OJGlCSK6SVS0d3yxFsIfu0tUPIKphi6AwYQCnCq10vfy8453YzZroRuZzzIr+d41elN6 AXG5VW0c4rKvdgYFr16lrLnSQX2cMnkqP21Ynv0BjL58uS9XkWL6HS1gXx66Wzmv2Asb U0hnw6GsfnW4TnWhKmYfjmbDi/qkVaE5bRrR3KQPoIstX8e0+GHWK99aHvxYs7txx6M1 /3Yoc+pNZhjdL40OIzKAltA7OR1GMriu4jA3BEKpsy4csIBi4j6z+jn0NErvAlrbDUpx rYjw== X-Gm-Message-State: AOAM5329Rhv7QGawmFtDoaGns8ZKKzYiosy8U41yn1LmkaQNKEQrNPRY Jnv73JOgOCiay6Q2yzxEQZ3LD594Nlf/R0b0M6U= X-Google-Smtp-Source: ABdhPJws7nZZOEfVwFleqTvfpcAfSU7jubg2pp6CVn50x46gxdobI0iVZkVYX0ltD0+poQukhjY5eKyEcnEgxBJAWP4= X-Received: by 2002:a17:902:7c94:b0:13b:8d10:cc4f with SMTP id y20-20020a1709027c9400b0013b8d10cc4fmr6935983pll.54.1631834653079; Thu, 16 Sep 2021 16:24:13 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Noah Goldstein Date: Thu, 16 Sep 2021 18:24:02 -0500 Message-ID: Subject: Re: [libc-coord] Add new ABI '__memcmpeq()' to libc To: Chris Kennelly Cc: libc-coord@lists.openwall.com, gcc@gcc.gnu.org, GNU C Library X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, HTML_MESSAGE, KAM_LOTSOFHASH, KAM_STORAGE_GOOGLE, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_SBL_A autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2021 23:24:17 -0000 On Thu, Sep 16, 2021 at 5:25 PM Chris Kennelly via Libc-alpha < libc-alpha@sourceware.org> wrote: > On Thu, Sep 16, 2021 at 5:50 PM enh wrote: > > > plus testing for _equality_ can (as mentioned earlier) have slightly > > different properties from the three-way comparator behavior of > > bcmp()/memcmp(). > > > > llvm-libc's implementation only returns the boolean, though. > > The mem* functions are extremely sensitive to instruction cache effects, so > having 3 unique implementations (__memcmpeq, bcmp, memcmp) that do similar, > but subtly different things can be a hidden performance cost--one that is > hard to demonstrate with a microbenchmark. Our experience developing > optimized mem* routines ended up showing better performance in actual > applications when we accepted seemingly worse microbenchmark performance by > optimizing for code footprint instead (more extensive notes for mem* in > general > < > https://storage.googleapis.com/pub-tools-public-publication-data/pdf/4f7c3da72d557ed418828823a8e59942859d677f.pdf > > > and > memcmp specifically (section 4.4) > < > https://storage.googleapis.com/pub-tools-public-publication-data/pdf/e52f61fd2c51e8962305120548581efacbc06ffc.pdf > > > ). > Regarding the code bloat found in memcmp in the paper, I think that is pretty exclusive to the sse4 implementation: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/multiarch/memcmp-sse4.S;h=b82adcd5fab5b60a0327819f6041a689a276916a;hb=HEAD And I think there is a fair argument to not include a __memcmpeq() based on that implementation. The older versions: sse2: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/memcmp.S;h=870e15c5a080162b336b13bac24cf7afbac6874b;hb=HEAD avx2: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S;h=2621ec907aedb781fcf0444e831c801f18fa68ba;hb=HEAD evex: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/multiarch/memcmp-evex-movbe.S;h=654dc7ac8ccb9445b2c7107a7cf2d9f6ce4b1010;hb=HEAD Have a much more reasonable code size footprint. Also the __memcmpeq() code will itself have a smaller code size footprint that memcmp() With the implementations from my patch the code size is shrunk the following: sse2: -66 avx2: -436 avx2: -500 > The alternative would be to alias (as the NOTES suggest as a possible > implementation), but I think that raises James' question of why not just > use bcmp? Dependencies on non-boolean implementations of bcmp seem > rare--namely, I haven't actually seen one. > > > > On Thu, Sep 16, 2021 at 2:43 PM Joseph Myers > > wrote: > > > >> On Thu, 16 Sep 2021, James Y Knight wrote: > >> > >> > Wouldn't it be far simpler to just un-deprecate bcmp? > >> > >> The aim is to have something to which calls can be generated in all > >> standards modes. bcmp has never been part of ISO C; there's nothing to > >> undeprecate there. > > > > > >> -- > >> Joseph S. Myers > >> joseph@codesourcery.com > >> > > >