From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 104025 invoked by alias); 1 Jun 2018 14:46:45 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 104015 invoked by uid 89); 1 Jun 2018 14:46:45 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=no version=3.3.2 spammy= X-HELO: mail-ot0-f193.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=whx2V1pVn6ELgwFB23iIDzoKVQicY3DeFYemanZ4C7M=; b=bSjRrMqaeHk/B+Wzvn855khESHFlUvVYc8sUVT9lYKRFVBxN2tqSkXuSbLLkJp5n5Z 71g8ax9JGaNoS1qFuFo9OyyxQWlC8Q3T460UFg9sM7ZhgJ+fpSxvI8BFy3Uq9dk45MrI zRlIoHNNTLv6Mr2rz0wslqwGrOzPUZNV16sRn6nnd5HgNIjSh5CMaDgaCnylbzplYetE kYub3bOa2PNNeJGC1QtfXbG4EtYHEg4hgq50tgXIhOoutnYF28P/uGYpJZmRlXbElsgE wr8AIT5VRIzF5Ap7BLqoID+lo3RVhInsnABllx2fWdGI9rVn6fSj0zV+dWlOz8mRHZ8k t2VQ== X-Gm-Message-State: ALKqPwe+TKw4TyEwtOLO+g3K0JBUj33hSpjLMTjrxWL9zfT6E75UbkoH uDO9PyKpPSvnf2joaeYSV7K4WGzZFPEUVGHX9Pk= X-Google-Smtp-Source: ADUXVKJtaLHfrcmq26cGD9ocsmBm2gOkYaZLavQvQaM5q6p+gtun6C2nrwZNPmjM13GPMM1M+cchgVaqSroxtyWluB4= X-Received: by 2002:a9d:5508:: with SMTP id l8-v6mr6932740oth.159.1527864401636; Fri, 01 Jun 2018 07:46:41 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20180529185339.11541-1-leonardo.sandoval.gonzalez@linux.intel.com> References: <20180529185339.11541-1-leonardo.sandoval.gonzalez@linux.intel.com> From: "H.J. Lu" Date: Fri, 01 Jun 2018 14:46:00 -0000 Message-ID: Subject: Re: [PATCH v2] x86-64: Optimize strcmp/wcscmp with AVX2 To: Leonardo Sandoval Cc: GNU C Library Content-Type: text/plain; charset="UTF-8" X-SW-Source: 2018-06/txt/msg00007.txt.bz2 On Tue, May 29, 2018 at 11:53 AM, wrote: > From: Leonardo Sandoval > > Optimize x86-64 strcmp/strncmp/wcscmp/wcsncmp with AVX2. It uses vector > comparison as much as possible. Peak performance observed on a SkyLake > machine: 9x, 3x, 2.5x and 5.5x for strcmp, strncmp, wcscmp and wcsncmp, > respectively. The larger the comparison length, the more benefit using > avx2 functions, except on the strcmp, where peak is observed at length > == 32 bytes. Select AVX2 strcmp/wcscmp on AVX2 machines where vzeroupper > is preferred and AVX unaligned load is fast. > > NB: It uses TZCNT instead of BSF since TZCNT produces the same result > as BSF for non-zero input. TZCNT is faster than BSF and is executed > as BSF if machine doesn't support TZCNT. > > * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add > strcmp-avx2, strncmp-avx2, wcscmp-avx2, wcscmp-sse2, wcsncmp-avx2 and > wcsncmp-sse2. > * sysdeps/x86_64/multiarch/ifunc-impl-list.c > (__libc_ifunc_impl_list): Add tests for __strcmp_avx2, > __strncmp_avx2, __wcscmp_avx2, __wcsncmp_avx2, __wcscmp_sse2 > and __wcsncmp_sse2. > * sysdeps/x86_64/multiarch/strcmp.c (OPTIMIZE (avx2)): > (IFUNC_SELECTOR): Return OPTIMIZE (avx2) on AVX 2 machines if > AVX unaligned load is fast and vzeroupper is preferred. > * sysdeps/x86_64/multiarch/strncmp.c: Likewise. > * sysdeps/x86_64/multiarch/strcmp-avx2.S: New file. > * sysdeps/x86_64/multiarch/strncmp-avx2.S: Likewise. > * sysdeps/x86_64/multiarch/wcscmp-avx2.S: Likewise. > * sysdeps/x86_64/multiarch/wcscmp-sse2.S: Likewise. > * sysdeps/x86_64/multiarch/wcscmp.c: Likewise. > * sysdeps/x86_64/multiarch/wcsncmp-avx2.S: Likewise. > * sysdeps/x86_64/multiarch/wcsncmp-sse2.c: Likewise. > * sysdeps/x86_64/multiarch/wcsncmp.c: Likewise. > * sysdeps/x86_64/wcscmp.S (__wcscmp): Add alias only if __wcscmp > is undefined. Please mention strncmp and wcsncmp in commit subject. OK with this change. Thanks. -- H.J.