From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x33.google.com (mail-oa1-x33.google.com [IPv6:2001:4860:4864:20::33]) by sourceware.org (Postfix) with ESMTPS id 33CF13858D39 for ; Wed, 1 Nov 2023 22:17:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 33CF13858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 33CF13858D39 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:4860:4864:20::33 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698877079; cv=none; b=PkS0zgmxsmY+zus9c6r7YrskPpQH26SqCDa51GZPg7Trj3yT/c8L1u8llS0+/auD7nyvRAo2E3MU7Jx0l25e1f+nu5PCWjpqAKs5iw2aiWvAGWsVk9kgLnynh71Sg45bBfTLSt+2EaHcC2koYIy9J1lXuKL1kWGSRgomeufWtvA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698877079; c=relaxed/simple; bh=gUvRNdhXLrilIaKGeSvVr86TfJKUoqgf1q2rFUaiPxQ=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=q8OmD0FDVzB6NN/h6H80+w3qz8y0LqQd5TR7hOBLxUW0Qhbzm3qGk3rPpgraL3QF7UcVeL5O9AZeNJbKz677WdgEZLtNGLi/697mGEjiwSdeq0i0IbFBHYF2uaQlOIc3yT80ynV/6j0ri0+6Of+Dc2/N4USBbODkMJX+Buwe0LE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-oa1-x33.google.com with SMTP id 586e51a60fabf-1efad296d42so211647fac.2 for ; Wed, 01 Nov 2023 15:17:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698877068; x=1699481868; darn=sourceware.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hHgvI1EtdA2EkJxRovvrmoeqaaOhD1lxjsvio9+VuYI=; b=Ty8bsWylBpe00d3n5RH9YCRQHaGYWhp8bm6725B0gyPrG6YC0GUcW0N9fE17+oobAY 1gadl5cFkIXd5kNaCNcvfEp/l8/Cl1/szuYHzBZaG4tbh/DBxfXlXXAdqKSwiGqLM8Dh yXtjrbCLyeQawDTiJiRsbq0lhJicgq4M1dY/pz1tqyDmOLShJMktiGuqV5oUrSlTkWxU BhUYjC86ZwPD/UqcpPilu7+v+XAObEeNtGoA+yJGuFTMAfMX0rAzDu819jgmgNI4Ngtm awe60jI3FV6RkUSIUviuasU2e+DRSReSnMfjtedRb/sib+aDjkjOITL2jwKizlFuzsH0 iJgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698877068; x=1699481868; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hHgvI1EtdA2EkJxRovvrmoeqaaOhD1lxjsvio9+VuYI=; b=r1ZUJpv1Yf4DuoVRk82iTSU84G3SX/6125x53W0xbsMcvq1x9+pt24dbf6S4LEkeSi jrakleXqiW4uaDhpdjxVTOrTiyw0JKDOAJVhu/GsyTpwVv98jlLchFyVd6rRIWd57F5k D0LBpRuVyYgvgqWd0QStGpqmyEF3OMcvzDMU0iNI6wbG1e2h3KQAzZo4TfWQjmb8Mrpk lcUT0DcaqPGZy5N0sLD8b58Q5IVATYRZOWN2I5wNEEK/R5YRufNY7yOxRPzzQI6aOt54 AeZWnzriYcjVlK6t4l73r882N/VAahWvI6eFY4R4JQqRkSs5tALAfu5TTmeBsVmjX3mr sFuA== X-Gm-Message-State: AOJu0Yzw4W3Fop+SgRXPzAYxhPWoskpaIfxtRSLi51i6uUqhnDa+EwEv I3a3pRYLRNsmEBO1VE1ovpEgVv5bobFNvsIdPJ4= X-Google-Smtp-Source: AGHT+IHcYayg7CaHVPHAc0dM7m5pqZloFvuD9dU09f9FFZfo4QWnYZ/27v7jfDlG/ob0ohWHYomi/zXLLIW2e8G6B7Y= X-Received: by 2002:a05:6870:6f0e:b0:1d6:b7aa:c6ed with SMTP id qw14-20020a0568706f0e00b001d6b7aac6edmr17916928oab.56.1698877068409; Wed, 01 Nov 2023 15:17:48 -0700 (PDT) MIME-Version: 1.0 References: <20230921143837.2903914-1-goldstein.w.n@gmail.com> <20231004184855.3517478-1-goldstein.w.n@gmail.com> <87fs28gug9.fsf@oldenburg.str.redhat.com> <877cn1tcb7.fsf@mid.deneb.enyo.de> In-Reply-To: From: Noah Goldstein Date: Wed, 1 Nov 2023 17:17:36 -0500 Message-ID: Subject: Re: x86: Prepare `strrchr-evex` and `strrchr-evex512` for AVX10 To: Florian Weimer Cc: Florian Weimer , libc-alpha@sourceware.org, hjl.tools@gmail.com, carlos@systemhalted.org, Sunil Pandey Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Nov 1, 2023 at 4:22=E2=80=AFPM Noah Goldstein wrote: > > On Wed, Nov 1, 2023 at 4:11=E2=80=AFPM Noah Goldstein wrote: > > > > On Wed, Nov 1, 2023 at 4:04=E2=80=AFPM Florian Weimer wrote: > > > > > > * Florian Weimer: > > > > > > > * Noah Goldstein: > > > > > > > >> This commit refactors `strrchr-evex` and `strrchr-evex512` to use = a > > > >> common implementation: `strrchr-evex-base.S`. > > > >> > > > >> The motivation is `strrchr-evex` needed to be refactored to not us= e > > > >> 64-bit masked registers in preperation for AVX10. > > > >> > > > >> Once vec-width masked register combining was removed, the EVEX and > > > >> EVEX512 implementations can easily be implemented in the same file > > > >> without any major overhead. > > > >> > > > >> The net result is performance improvements (measured on TGL) for b= oth > > > >> `strrchr-evex` and `strrchr-evex512`. Although, note there are som= e > > > >> regressions in the test suite and it may be many of the cases that > > > >> make the total-geomean of improvement/regression across bench-strr= chr > > > >> are cold. The point of the performance measurement is to show ther= e > > > >> are no major regressions, but the primary motivation is preperatio= n > > > >> for AVX10. > > > >> > > > >> Benchmarks where taken on TGL: > > > >> https://www.intel.com/content/www/us/en/products/sku/213799/intel-= core-i711850h-processor-24m-cache-up-to-4-80-ghz/specifications.html > > > >> > > > >> EVEX geometric_mean(N=3D5) of all benchmarks New / Original : 0.= 74 > > > >> EVEX512 geometric_mean(N=3D5) of all benchmarks New / Original: 0.= 87 > > > >> > > > >> Full check passes on x86. > > > > > > > > I believe this caused some sort of regression because when we upgra= ded > > > > glibc in the Fedora rawhide buildroot, a lot of things started fail= ing: > > > > > > > > glibc-2.38.9000-13.fc40 broke rawhide buildroot on x86_64 > > > > > > > > > > > > The list of changes relative to the previous version is rather shor= t: > > > > > > > > - stdlib: fix grouping verification with multi-byte thousands separ= ator (bug 30964) > > > > - build-many-glibcs: Check for required system tools > > > > - x86: Prepare `strrchr-evex` and `strrchr-evex512` for AVX10 > > > > - aarch64: Optimise vecmath logs > > > > - aarch64: Cosmetic change in SVE exp routines > > > > - aarch64: Optimize SVE cos & cosf > > > > - aarch64: Improve vecmath sin routines > > > > - nss: Get rid of alloca usage in makedb's write_output. > > > > - debug: Add regression tests for BZ 30932 > > > > - Fix FORTIFY_SOURCE false positive > > > > - nss: Rearrange and sort Makefile variables > > > > - inet: Rearrange and sort Makefile variables > > > > - Fix off-by-one OOB write in iconv/tst-iconv-mt > > > > > > > > And this patch is the most likely one to cause issues. I will try = to > > > > revert the patch and see if it fixes the observed issues. > > > > > > We did the revert and the issues were gone. So I think this commit i= s > > > faulty. > > > > Bah, didn't see your last email. > > Thank you for reverting. Will look into the issue. > > Okay bug is missing VBMI2 check. But the VBMI2 stuff > isn't really needed so will update and repost w/ fixed ISA. Posted fix.