From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ZXSHCAS1.zhaoxin.com (ZXSHCAS1.zhaoxin.com [203.148.12.81]) by sourceware.org (Postfix) with ESMTPS id 44BD8385840F for ; Thu, 31 Mar 2022 04:55:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 44BD8385840F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=zhaoxin.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=zhaoxin.com Received: from zxbjmbx1.zhaoxin.com (10.29.252.163) by ZXSHCAS1.zhaoxin.com (10.28.252.161) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.27; Thu, 31 Mar 2022 12:54:56 +0800 Received: from zxbjmbx2.zhaoxin.com (10.29.252.164) by zxbjmbx1.zhaoxin.com (10.29.252.163) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.27; Thu, 31 Mar 2022 12:54:55 +0800 Received: from zxbjmbx2.zhaoxin.com ([fe80::4d77:9dba:64a8:8ec3]) by zxbjmbx2.zhaoxin.com ([fe80::4d77:9dba:64a8:8ec3%4]) with mapi id 15.01.2308.027; Thu, 31 Mar 2022 12:54:55 +0800 From: Mayshao-oc To: Noah Goldstein CC: "H.J. Lu" , GNU C Library , Florian Weimer , "Carlos O'Donell" , "Louis Qi(BJ-RD)" Subject: Re: Re: [PATCH v1 3/6] x86: Remove mem{move|cpy}-ssse3 Thread-Topic: Re: [PATCH v1 3/6] x86: Remove mem{move|cpy}-ssse3 Thread-Index: AQHYRLIoKLSV/tk2aEac8zVGxY1fYKzY6JJR Date: Thu, 31 Mar 2022 04:54:55 +0000 Message-ID: <991a21f7e3784cf99c1e725347acbea4@zhaoxin.com> References: <89bb3f1942814671ae858dcef4b3b870@zhaoxin.com> <09816f3ba25043339d57121bbae3d991@zhaoxin.com> <239c5445e4ea4c55b85a5b7db70983a2@zhaoxin.com> , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.29.32.65] MIME-Version: 1.0 X-Spam-Status: No, score=-7.6 required=5.0 tests=BAYES_00, HTML_MESSAGE, KAM_DMARC_STATUS, KAM_NUMSUBJECT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Mar 2022 04:55:03 -0000 On Thur, Mar 31, 2022 at 11:47 AM Noah Goldstein = wrote: > On Wed, Mar 30, 2022 at 10:34 PM Mayshao-oc wrot= e: > > > > On Thur, Mar 31, 2022 at 12:45 AM Noah Goldstein wrote: > > > > > > > On Wed, Mar 30, 2022 at 4:57 AM Mayshao-oc w= rote: > > > > > > > > On Tue, Mar 29, 2022 at 10:57 AM Noah Goldstein wrote: > > > > > > > > > > > > >On Mon, Mar 28, 2022 at 9:51 PM Mayshao-oc wrote: > > > > > > > > > > > > On Mon, Mar 28, 2022 at 9:07 PM H.J. Lu = wrote: > > > > > > > > > > > > > > > > > > > On Mon, Mar 28, 2022 at 1:10 AM Mayshao-oc wrote: > > > > > > > > > > > > > > > > On Fri, Mar 25, 2022 at 6:36 PM Noah Goldstein wrote: > > > > > > > > > > > > > > > > > With SSE2, SSE4.1, AVX2, and EVEX versions very few targe= ts prefer > > > > > > > > > SSSE3. As a result its no longer with the code size cost. > > > > > > > > > --- > > > > > > > > > sysdeps/x86_64/multiarch/Makefile | 2 - > > > > > > > > > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 15 - > > > > > > > > > sysdeps/x86_64/multiarch/ifunc-memmove.h | 18 +- > > > > > > > > > sysdeps/x86_64/multiarch/memcpy-ssse3.S | 3151 -------= ------------- > > > > > > > > > sysdeps/x86_64/multiarch/memmove-ssse3.S | 4 - > > > > > > > > > 5 files changed, 7 insertions(+), 3183 deletions(-) > > > > > > > > > delete mode 100644 sysdeps/x86_64/multiarch/memcpy-ssse3.= S > > > > > > > > > delete mode 100644 sysdeps/x86_64/multiarch/memmove-ssse3= .S > > > > > > > > > > > > > > > > On some platforms, such as Zhaoxin, the memcpy performance = of SSSE3 > > > > > > > > is better than that of AVX2, and the current computer syste= m has sufficient > > > > > > > > disk capacity and memory capacity. > > > > > > > > > > > > > > How does the SSSE3 version compare against the SSE2 version? > > > > > > > > > > > > On some Zhaoxin processors, the overall performance of SSSE3 is= about > > > > > > 10% higher than that of SSE2. > > > > > > > > > > > > > > > > > > Best Regards, > > > > > > May Shao > > > > > > > > > > Any chance you can post the result from running `bench-memset` or= some > > > > > equivalent benchmark? Curious where the regressions are. Ideally = we would > > > > > fix the SSE2 version so its optimal. > > > > > > > > Bench-memcpy on Zhaoxin KX-6000 processor shows that, when length <= =3D4 or > > > > length >=3D 128, memcpy SSSE3 can achieve an average performance im= provement > > > > of 25% compared to SSSE2. > > > > > > Thanks > > > > > > The size <=3D 4 regression is expected as profiles of SPEC show the [= 5, 32] sized > > > copies to significantly hotter. > > > > > > Regarding the large sizes, it seems to be because the SSSE3 version a= voids > > > unaligned loads/stores much more aggressively. > > > > Agree. > > > > > For now we will keep the function. Will look into a replacement that = isn't so > > > costly to code size. > > > > Thanks very much for your support. > > Will SSE4.1 be an issue for you? I think the only reasonable way to fix t= his is > with `pshufb`. Zhaoxin supports SSE4.1, I think there should be no problem. If you have a ready patch, I=91d love to try it soon. Thanks again. > > > > > Out of curiosity, is bench-memcpy-random performance also improved wi= th > > > SSSE3? The jump table / branches generally look really nice in micro-= benchmarks > > > but that may not be fully indicative of how it will performance in an > > > application. > > > > Bench-memcpy-random shows about a 5% performance drop for SSSE3: > > Thanks. > > > __memcpy_sse2_unaligned __memcpy_ssse3 Improvement(ssse3 ove= r sse2) > > length=3D32768 805982 874585 -8.51% > > length=3D65536 885317 940458 -6.23% > > length=3D131072 929177 979173 -5.38% > > length=3D262144 980083 1033130 -5.41% > > length=3D524288 1042590 1095560 -5.08% > > length=3D1048576 1078020 1127990 -4.64% > > > > > > > > > > > > I have attached the test results, hope this is what you want to see= . > > > > > > > > > > > > It is strongly recommended to keep the SSSE3 version. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > H.J.