From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by sourceware.org (Postfix) with ESMTPS id A84403858D35 for ; Wed, 26 Jul 2023 22:38:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A84403858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wm1-x336.google.com with SMTP id 5b1f17b1804b1-3fd190065a8so2787535e9.3 for ; Wed, 26 Jul 2023 15:38:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690411093; x=1691015893; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=FXhrl9CCcztb6+rpH+zlk2dlVWTgb6HkP+wiyAjbz2Y=; b=AipfxG+WEuK+vC/+iVW8kk3ljQTVNyxfpyl00gLXtWwAAllK8NAmQTMg5l9ilyIRll q+vQDQN8eh6PaG+9aFtOIOxzHcv9pL5IBVX5wOKVBD4oO0DJdYT/GJ2fXNMsygp3XGj0 x9/Oc4rm1EW9Dy+Ikd6qikPif2BVRkgOrBFrR3sMtYWgedfqXfk4BHOjlct5Roh8mH8N j8soI1y6I+3PUUSPUvsSxo1cC+gg9WL7Zv1056FjsnTRmqI9At8FsR1u3+3Nld+89z+V Sv72+CiBqvzUqHomM32ZPSnqp9kgerrUuovptqvhDJle+LF0DhTW0Zlqv8CLs2XqliER v5fA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690411093; x=1691015893; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=FXhrl9CCcztb6+rpH+zlk2dlVWTgb6HkP+wiyAjbz2Y=; b=asungUtaAjMdodSSkevp0KIXWDoKUMcpy3mwmyMl3ysyy/ObSL2As6zD807DyZbepm 1JQP2qOxSSKwTPOYOWV6SyAOKkelNGicp2uqDat0w2iJcZWOs6cGvaOoaxYYhPXsx1W7 BYc5gOdnekX6QdSr8I92FSstxc7KruM6HlAAfxqjSb7pTVbz46821455oYPX378puQTy 3asL8dVAksx+g4aOfZLhfCIXBZvqv1amCG3cJu5f+TvPVF6IQ2dUUV7jTdJitrAV7aJZ B0s+lW5VKaE+F1Mt7kPL1lEbmnqQGxeUDJ+HDmAf1JmRDUU2z3V9b3MtzR5kTPd1fvaW R+VQ== X-Gm-Message-State: ABy/qLbGGOzXK+VI6nZHzojNurotedZxy6H2VphNI613aYYsJgWqLDU6 ZlVT6ACl20H43BBnw0xnA6w+czvD2QNzvOq69rM= X-Google-Smtp-Source: APBJJlEsgvSGf/skpYfPLHk0GS7Ila2uC3awEhNz0RneXszVRg5tSPoda0gS7YQfq8PtIaN9HxZWgZUmIt6j6movh/U= X-Received: by 2002:a5d:5510:0:b0:317:54e2:26ca with SMTP id b16-20020a5d5510000000b0031754e226camr321754wrv.50.1690411092857; Wed, 26 Jul 2023 15:38:12 -0700 (PDT) MIME-Version: 1.0 References: <20230726160524.1955013-1-skpgkp2@gmail.com> In-Reply-To: From: Sunil Pandey Date: Wed, 26 Jul 2023 15:37:36 -0700 Message-ID: Subject: Re: [PATCH] x86_64: Optimize ffsll function code size. To: Noah Goldstein Cc: Richard Henderson , libc-alpha@sourceware.org, hjl.tools@gmail.com Content-Type: multipart/alternative; boundary="000000000000b6d7e606016b8026" X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,GIT_PATCH_0,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --000000000000b6d7e606016b8026 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Jul 26, 2023 at 2:05=E2=80=AFPM Noah Goldstein wrote: > On Wed, Jul 26, 2023 at 3:44=E2=80=AFPM Sunil Pandey = wrote: > > > > > > > > On Wed, Jul 26, 2023 at 10:00=E2=80=AFAM Noah Goldstein > wrote: > >> > >> On Wed, Jul 26, 2023 at 11:52=E2=80=AFAM Sunil Pandey via Libc-alpha > >> wrote: > >> > > >> > On Wed, Jul 26, 2023 at 9:38=E2=80=AFAM Richard Henderson < > >> > richard.henderson@linaro.org> wrote: > >> > > >> > > On 7/26/23 09:05, Sunil K Pandey via Libc-alpha wrote: > >> > > > Ffsll function size is 17 byte, this patch optimizes size to 16 > byte. > >> > > > Currently ffsll function randomly regress by ~20%, depending on > how > >> > > > code get aligned. > >> > > > > >> > > > This patch fixes ffsll function random performance regression. > >> > > > --- > >> > > > sysdeps/x86_64/ffsll.c | 2 +- > >> > > > 1 file changed, 1 insertion(+), 1 deletion(-) > >> > > > > >> > > > diff --git a/sysdeps/x86_64/ffsll.c b/sysdeps/x86_64/ffsll.c > >> > > > index a1c13d4906..dbded6f0a1 100644 > >> > > > --- a/sysdeps/x86_64/ffsll.c > >> > > > +++ b/sysdeps/x86_64/ffsll.c > >> > > > @@ -29,7 +29,7 @@ ffsll (long long int x) > >> > > > long long int tmp; > >> > > > > >> > > > asm ("bsfq %2,%0\n" /* Count low bits in X and > store > >> > > in %1. */ > >> > > > - "cmoveq %1,%0\n" /* If number was zero, use > -1 as > >> > > result. */ > >> > > > + "cmove %k1,%k0\n" /* If number was zero, use -1 as > result. > >> > > */ > >> > > > >> > > This no longer produces -1, but 0xffffffff in cnt. However, since > the > >> > > return type is > >> > > 'int', cnt need not be 'long long int' either. I'm not sure why t= mp > >> > > exists at all, since > >> > > cnt is the only register modified. > >> > > > >> > > >> > Here is the exact assembly produced with this change. > >> > ./build-x86_64-linux/string/ffsll.o: file format elf64-x86-64 > >> > > >> > > >> > Disassembly of section .text: > >> > > >> > 0000000000000000 : > >> > 0: ba ff ff ff ff mov $0xffffffff,%edx > >> > 5: 48 0f bc c7 bsf %rdi,%rax > >> > 9: 0f 44 c2 cmove %edx,%eax > >> > c: 83 c0 01 add $0x1,%eax > >> > f: c3 ret > >> > > >> > >> FWIW it should be: > >> ``` > >> 0000000000000000 <.text>: > >> 0: b8 ff ff ff ff mov $0xffffffff,%eax > >> 5: 48 0f bc c7 bsf %rdi,%rax > >> 9: ff c0 inc %eax > >> ``` > >> > >> And since its in inline asm no reason not to get that. > > > > > > We shouldn't remove cmove because as per Intel BSF instruction manual, > if the content of source operand > > is 0, the content of the destination operand is undefined. Also > removing cmove doesn't provide any perf > > advantage. > > We rely on that behavior in other areas (memchr-evex512 for example). > It's been confirmed it is defined to not changed the destination (like > AMD). > > It saves an instructions.. > > Either way, however, I think adhemerval is right and we should use builti= ns > for this. > Few things to notice here. This code is more than 20 years old, not sure what all intel processors this code may be running. Builtin will have the same issue, unless it can produce code 16 byte or less. Also the purpose of this patch is to fix random unexpected ~20% perf loss. > > > > >> > >> > > >> > > > >> > > > >> > > r~ > >> > > > --000000000000b6d7e606016b8026--