From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x530.google.com (mail-ed1-x530.google.com [IPv6:2a00:1450:4864:20::530]) by sourceware.org (Postfix) with ESMTPS id 469AE3858D32 for ; Sun, 2 Oct 2022 16:19:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 469AE3858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ed1-x530.google.com with SMTP id z97so11687125ede.8 for ; Sun, 02 Oct 2022 09:19:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date; bh=JJUjPTn4IoRcpOw6iywt6v3C2QHbU60OGHAPqWvEaN4=; b=W4cEhD5Y7GJC9e/+Voiz/KFXOZ9Sx1i0ieV2+gRb0TFtRgnKATzk9wIMptd2ojBXkZ lSdoe0Kscyuibef48DK8QgMkGf5IxRGOKjGWd2/UULwvGTI21ZSF3J6rJ3Ey7naalQsL c+Db3Z52Su8b6H1S9BPOngiqzc5HWZ9y1XXYhtRgiR+eqSC5tedHz3wavKrt+CP+YjYh fonochWujZhn0Dy4Hu0iMIjedQYtt4bnw50/5Kop9NFPA/DBwdS54paJRNJXygR71XV3 VzYJ+7S3YtJpw2yeKfydY9aIxiM4B9XwHnOAysrVyelajx5lpzkjdnBMQfDbxzH3YANh wIhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date; bh=JJUjPTn4IoRcpOw6iywt6v3C2QHbU60OGHAPqWvEaN4=; b=KqJAEzEUnbhVeDaMEaMqGeXAS7kuFqNfSgDTtfavu8lyZZR94nfPf7VtLXUQczZey3 ncw+ykTAliY4VBh7n5OCKsgh3SscxTphUYJtEI2V1AxvKXf3s5LA9ArUwS0XSEo/tIXt 7EWUqIrZeBfQ8LetBd24nKZgtpic36rTHwC9YJ+A1E+IRs1/2UT6+YsGb2+kC/QFtGgs xZG2HwOhG1cZnEVBzApp/suBFRu3PUYb70uTFiSZdFxAtOAhynjNaPbCXqX7iOrIT47f TyNVJfeXOmMdDsh4cjWs455MkzAHQYgu8t2UIRqTmHw7YJxwr7SgKjtORfK5PJESwNu7 l+yw== X-Gm-Message-State: ACrzQf35GoJPmSudMhcFvhDxqBfn2Ntl7uWr585hvmWFTJ+BxydsGeBR Khk6I+STxLQDxj+CzNC8fcTvE4lviQU9Yz8TjB/7lqre X-Google-Smtp-Source: AMsMyM6fmThL5y7FxmBBcox0fkAQThJptJVWYagz+iQBK0EowDcgNLR+h5n/mQCUpV+lHj6Elm5BcI6ulWHDepOrV00= X-Received: by 2002:a05:6402:28a1:b0:458:81c0:a379 with SMTP id eg33-20020a05640228a100b0045881c0a379mr10349016edb.388.1664727583927; Sun, 02 Oct 2022 09:19:43 -0700 (PDT) MIME-Version: 1.0 References: <20221001190911.2994478-1-aurelien@aurel32.net> In-Reply-To: From: Noah Goldstein Date: Sun, 2 Oct 2022 09:19:32 -0700 Message-ID: Subject: Re: [PATCH 0/4] x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) To: Noah Goldstein , libc-alpha@sourceware.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Sun, Oct 2, 2022 at 2:35 AM Aurelien Jarno wrote: > > On 2022-10-01 15:17, Noah Goldstein via Libc-alpha wrote: > > On Sat, Oct 1, 2022 at 3:11 PM Noah Goldstein wrote: > > > > > > On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno wrote: > > > > > > > > Some early Intel Haswell CPU have AVX2 instructions, but do not have > > > > BMI2 instructions. Some AVX2 string functions only check for AVX2, but > > > > use BMI2 or LZCNT instructions. This patchset tries to fix that. Think you're right. > > > > > > > > While most fixes only change ifunc-impl-list.c, and thus only concerns > > > > the testsuite, the strn(case)cmp is a real issue affecting early Intel > > > > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide. > > > > > > > > On the other hand, the check for LZCNT in memrchr is purely for > > > > correctness, I am not aware of a CPU implementing AVX2 without LZCNT. > > > > > > > > This has been tested by remplacing all BMI2 and LZCNT instruction in the > > > > source code by the "ud2" instruction and disabling the BMI1, BMI2 > > > > feature detection, and running the testsuite. > > > > > > > > Resolves: BZ #29611 > > > > > > > > Aurelien Jarno (4): > > > > x86: include BMI1 and BMI2 in x86-64-v3 level > > > > x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp > > > > implementations > > > > x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations > > > > x86-64: Require LZCNT for AVX2 memrchr implementation > > > > > > > > > > We also need BMI2 check in ifunc-impl-list for: > > > strcasecmp > > > strcmp > > > strcasecmp_l > > > strrchr > > > wcsrchr > > > wcscmp > > > > > > If you want you can make patches, otherwise I can. > > > > This is a duplicate of a comment I left in the strn(case)cmp patchset, > > but leaving here so the information is not scattered: > > > > The ifunc change in strncmp.c and ifunc-strcasecmp.h need to be backport > > to 2.33, 2.34, 2.35. > > > > Also separate changes for ifunc need to be backport to strncmp.c: > > 2.32, 2.31, 2.30, 2.29, 2.28 for a `tzcnt` usage that needs > > BMI1. > > Is that really correct? According the commit log TZCNT is used in a way > that is compatible with BSF: > > commit 1457016337072d1b6739f571846b619596990cb7 > Author: Leonardo Sandoval > Date: Thu May 3 11:09:30 2018 -0500 > > x86-64: Optimize strcmp/wcscmp and strncmp/wcsncmp with AVX2 > > Optimize x86-64 strcmp/wcscmp and strncmp/wcsncmp with AVX2. It uses vector > comparison as much as possible. Peak performance observed on a SkyLake > machine: 9x, 3x, 2.5x and 5.5x for strcmp, strncmp, wcscmp and wcsncmp, > respectively. The larger the comparison length, the more benefit using > avx2 functions, except on the strcmp, where peak is observed at length > == 32 bytes. Select AVX2 strcmp/wcscmp on AVX2 machines where vzeroupper > is preferred and AVX unaligned load is fast. > > NB: It uses TZCNT instead of BSF since TZCNT produces the same result > as BSF for non-zero input. TZCNT is faster than BSF and is executed > as BSF if machine doesn't support TZCNT. > > -- > Aurelien Jarno GPG: 4096R/1DDD8C9B > aurelien@aurel32.net http://www.aurel32.net