From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from hall.aurel32.net (hall.aurel32.net [IPv6:2001:bc8:30d7:100::1]) by sourceware.org (Postfix) with ESMTPS id 3592A3858413 for ; Sun, 2 Oct 2022 09:35:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3592A3858413 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=aurel32.net Authentication-Results: sourceware.org; spf=none smtp.mailfrom=aurel32.net DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=aurel32.net ; s=202004.hall; h=In-Reply-To:Content-Type:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Content-Transfer-Encoding:From:Reply-To: Subject:Content-ID:Content-Description:X-Debbugs-Cc; bh=8KTxlCXW/j2vF5KctysfZLFXDWUGvnEVbpEZhd1eA4E=; b=Zi4KuHex+KeFBa3rxTkHVef1jH QRJGkEhoficsmVdax5GwuXw5l3xagmrR2EYSES/6i1ZRC+1tug7xvU2p5aLDrNwpHhmx1q8AHIB3p 0NCFV4VdajqhtKMp712zslJRL7tB01PPXSmmNNoqdO0XclVG3USlfW1bH55dBF/eFVvizsrH6Y9Nn Pd2pA/oaj6bhL2XcTkH2BPSOXV4QYxHbOlF0eUGf9kFR9wOmfEVgr/z9YcVLkSTtpzXPUVIqzPDt2 niR3wVOpySrX5d84Lg2tEeeR7j3v1yWa2fT9xhBeH0haAPEBcuIWjOQjobUqmaP9Ibah3XWVhPIlr p9sFDgkw==; Received: from [2a01:e34:ec5d:a741:8a4c:7c4e:dc4c:1787] (helo=ohm.rr44.fr) by hall.aurel32.net with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oevNY-00FGJI-CJ; Sun, 02 Oct 2022 11:35:20 +0200 Received: from aurel32 by ohm.rr44.fr with local (Exim 4.96) (envelope-from ) id 1oevNX-00Cq4P-2r; Sun, 02 Oct 2022 11:35:19 +0200 Date: Sun, 2 Oct 2022 11:35:19 +0200 From: Aurelien Jarno To: Noah Goldstein Cc: libc-alpha@sourceware.org Subject: Re: [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) Message-ID: Mail-Followup-To: Noah Goldstein , libc-alpha@sourceware.org References: <20221001190911.2994478-1-aurelien@aurel32.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.2.7 (2022-08-07) X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_PASS,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 2022-10-01 15:17, Noah Goldstein via Libc-alpha wrote: > On Sat, Oct 1, 2022 at 3:11 PM Noah Goldstein wrote: > > > > On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno wrote: > > > > > > Some early Intel Haswell CPU have AVX2 instructions, but do not have > > > BMI2 instructions. Some AVX2 string functions only check for AVX2, but > > > use BMI2 or LZCNT instructions. This patchset tries to fix that. > > > > > > While most fixes only change ifunc-impl-list.c, and thus only concerns > > > the testsuite, the strn(case)cmp is a real issue affecting early Intel > > > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide. > > > > > > On the other hand, the check for LZCNT in memrchr is purely for > > > correctness, I am not aware of a CPU implementing AVX2 without LZCNT. > > > > > > This has been tested by remplacing all BMI2 and LZCNT instruction in the > > > source code by the "ud2" instruction and disabling the BMI1, BMI2 > > > feature detection, and running the testsuite. > > > > > > Resolves: BZ #29611 > > > > > > Aurelien Jarno (4): > > > x86: include BMI1 and BMI2 in x86-64-v3 level > > > x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp > > > implementations > > > x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations > > > x86-64: Require LZCNT for AVX2 memrchr implementation > > > > > > > We also need BMI2 check in ifunc-impl-list for: > > strcasecmp > > strcmp > > strcasecmp_l > > strrchr > > wcsrchr > > wcscmp > > > > If you want you can make patches, otherwise I can. > > This is a duplicate of a comment I left in the strn(case)cmp patchset, > but leaving here so the information is not scattered: > > The ifunc change in strncmp.c and ifunc-strcasecmp.h need to be backport > to 2.33, 2.34, 2.35. > > Also separate changes for ifunc need to be backport to strncmp.c: > 2.32, 2.31, 2.30, 2.29, 2.28 for a `tzcnt` usage that needs > BMI1. Is that really correct? According the commit log TZCNT is used in a way that is compatible with BSF: commit 1457016337072d1b6739f571846b619596990cb7 Author: Leonardo Sandoval Date: Thu May 3 11:09:30 2018 -0500 x86-64: Optimize strcmp/wcscmp and strncmp/wcsncmp with AVX2 Optimize x86-64 strcmp/wcscmp and strncmp/wcsncmp with AVX2. It uses vector comparison as much as possible. Peak performance observed on a SkyLake machine: 9x, 3x, 2.5x and 5.5x for strcmp, strncmp, wcscmp and wcsncmp, respectively. The larger the comparison length, the more benefit using avx2 functions, except on the strcmp, where peak is observed at length == 32 bytes. Select AVX2 strcmp/wcscmp on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. NB: It uses TZCNT instead of BSF since TZCNT produces the same result as BSF for non-zero input. TZCNT is faster than BSF and is executed as BSF if machine doesn't support TZCNT. -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurelien@aurel32.net http://www.aurel32.net