From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by sourceware.org (Postfix) with ESMTPS id 5B2EB3858D35 for ; Wed, 26 Jul 2023 20:44:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5B2EB3858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wr1-x430.google.com with SMTP id ffacd0b85a97d-31438512cafso213598f8f.2 for ; Wed, 26 Jul 2023 13:44:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690404246; x=1691009046; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ZK23sGLgNmh73Ogs6qpk+bZEBrB2pmYNifpZMO6M01Q=; b=MHAPVzMCtI7tV5qeV/T22nh+RcJuxVy02tKbuHag+BevhGvKJrDrwNUJwpAJnYK1Zk oK4rUeHokEH986zjFDzyYs6SWxQg4V7fSOaf/QCbzdoCshgSjLy0vt/yydJhPTt82ouX qxI+CYFWtkl+s4mtfdpPE2GvX9gnfTSe9nAxftzl1beD+uJ29FbBON3ngK4AGTPSlCaR BeeB16tfV5ebcLdxd92O3l8whfiM/OkgiKb0hKEJj49KBurrzBkHQk0cKMH0hhFzErVD uSBjfC/LbxwIFUB+eDGs2G7+7HKqx6cfFJw+BSQGSCM+t0dKHlSJbHgySzhOriQityuI LFpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690404246; x=1691009046; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ZK23sGLgNmh73Ogs6qpk+bZEBrB2pmYNifpZMO6M01Q=; b=SIvCkmw4LcjnatwC4MpAY+xk4AeU5RtT4M5cJILunyCbN0tjdv1GwiRMEwK4q2DfLr 7HYcB2wAYEwGxBBTMS4rROwq0KYMIP2h6OsP2QJSq73/NPURPhqtDbxS3veSr7Wm0B7g WmBhKlzGV1q7JP9+3a57APePl8Ee/oij/MIyqY1+7ZoR52O1DsPgqvaSoQ5qvQ757EBC AJ+Q5xK/+I0/xiEY4zoWNrHRgomsdqFQwU5dOWTUvRWMXfAFozxnLnLgrWvQ7tcULooh nHXWO+Frh6MKNPfFJSb/qtflUTs1LygYu4FBkdHztOF/zi1I1y7+wTQreXgFE2gwSDH9 QafA== X-Gm-Message-State: ABy/qLaWDvp9D7qcmxWEOrikJw14hd8yGBTz9SvGMrLyJxpb5E5sJPNf t4mW4OMCHPZn1KfFp+fuEfIutoLQHMogOFfN4TY= X-Google-Smtp-Source: APBJJlF3wkB17+Cr6g3PWV22g6vPihYnXKYgHgwVW3pU7jfUfSAXn468JcSmg8RgH62BDEwCI1HJKRluwEx4GItbAZA= X-Received: by 2002:adf:fdcf:0:b0:317:60f2:c08b with SMTP id i15-20020adffdcf000000b0031760f2c08bmr195536wrs.31.1690404245956; Wed, 26 Jul 2023 13:44:05 -0700 (PDT) MIME-Version: 1.0 References: <20230726160524.1955013-1-skpgkp2@gmail.com> In-Reply-To: From: Sunil Pandey Date: Wed, 26 Jul 2023 13:43:29 -0700 Message-ID: Subject: Re: [PATCH] x86_64: Optimize ffsll function code size. To: Noah Goldstein Cc: Richard Henderson , libc-alpha@sourceware.org, hjl.tools@gmail.com Content-Type: multipart/alternative; boundary="0000000000009b6d91060169e89e" X-Spam-Status: No, score=-8.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,GIT_PATCH_0,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --0000000000009b6d91060169e89e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Jul 26, 2023 at 10:00=E2=80=AFAM Noah Goldstein wrote: > On Wed, Jul 26, 2023 at 11:52=E2=80=AFAM Sunil Pandey via Libc-alpha > wrote: > > > > On Wed, Jul 26, 2023 at 9:38=E2=80=AFAM Richard Henderson < > > richard.henderson@linaro.org> wrote: > > > > > On 7/26/23 09:05, Sunil K Pandey via Libc-alpha wrote: > > > > Ffsll function size is 17 byte, this patch optimizes size to 16 byt= e. > > > > Currently ffsll function randomly regress by ~20%, depending on how > > > > code get aligned. > > > > > > > > This patch fixes ffsll function random performance regression. > > > > --- > > > > sysdeps/x86_64/ffsll.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/sysdeps/x86_64/ffsll.c b/sysdeps/x86_64/ffsll.c > > > > index a1c13d4906..dbded6f0a1 100644 > > > > --- a/sysdeps/x86_64/ffsll.c > > > > +++ b/sysdeps/x86_64/ffsll.c > > > > @@ -29,7 +29,7 @@ ffsll (long long int x) > > > > long long int tmp; > > > > > > > > asm ("bsfq %2,%0\n" /* Count low bits in X and > store > > > in %1. */ > > > > - "cmoveq %1,%0\n" /* If number was zero, use -1 > as > > > result. */ > > > > + "cmove %k1,%k0\n" /* If number was zero, use -1 as > result. > > > */ > > > > > > This no longer produces -1, but 0xffffffff in cnt. However, since the > > > return type is > > > 'int', cnt need not be 'long long int' either. I'm not sure why tmp > > > exists at all, since > > > cnt is the only register modified. > > > > > > > Here is the exact assembly produced with this change. > > ./build-x86_64-linux/string/ffsll.o: file format elf64-x86-64 > > > > > > Disassembly of section .text: > > > > 0000000000000000 : > > 0: ba ff ff ff ff mov $0xffffffff,%edx > > 5: 48 0f bc c7 bsf %rdi,%rax > > 9: 0f 44 c2 cmove %edx,%eax > > c: 83 c0 01 add $0x1,%eax > > f: c3 ret > > > > FWIW it should be: > ``` > 0000000000000000 <.text>: > 0: b8 ff ff ff ff mov $0xffffffff,%eax > 5: 48 0f bc c7 bsf %rdi,%rax > 9: ff c0 inc %eax > ``` > > And since its in inline asm no reason not to get that. > We shouldn't remove cmove because as per Intel BSF instruction manual, if the content of source operand is 0, the content of the destination operand is undefined. Also removing cmove doesn't provide any perf advantage. > > > > > > > > > > > r~ > > > > --0000000000009b6d91060169e89e--