From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from xry111.site (xry111.site [89.208.246.23]) by sourceware.org (Postfix) with ESMTPS id D1E6B3858408 for ; Mon, 20 Nov 2023 02:23:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D1E6B3858408 Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=xry111.site Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=xry111.site ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D1E6B3858408 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=89.208.246.23 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700447031; cv=none; b=ZtQALvPIjGmNpvYUY5w8uNCB0qW9WJ0Pi7i7YuoAWumc6HfVXpHGqLr9fgMQw/15mG6f6gxCinKgs28UnFV5Li0tdOf2Yai84SUw5du//J662nhz7W03SL6k6lWEj6/AuGbTvJcYlOEbA9LS85xBmIljKV2qOXYpmoKjFiOj71M= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700447031; c=relaxed/simple; bh=G43XITRpz8dxxF5xZ6bVmGO8QV3G3MQnW55Hnbetqcs=; h=DKIM-Signature:Message-ID:Subject:From:To:Date:MIME-Version; b=Q5Sym1dr8ikBL19nC3cs/wFxCxJNGAYKJmwHdgjcYQZZMJyujcFBTlpoT+J+Ak/CbeSCfkZd0m77aj55cOSNn78SMUNUO/QpI7KznQELGgMa06yWjx1jsCjpVtgbnhPusHa6rom6cm+JzfZ5oD41sRJM//9mLGpG0/b9zncvJxg= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site; s=default; t=1700447028; bh=G43XITRpz8dxxF5xZ6bVmGO8QV3G3MQnW55Hnbetqcs=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=YPMqtijaF66tFnUuMvCVMLRXXdbZLijONMftL+mT955e4EUnnQPPMCdtJDcskPTw1 /3+hQ3K1GXv918145VbGTu+6k6O2qeCsa1H9h9wVhGWyG7oMac740eoJwa8shc3bQM NT4yVTgJFtPLQ1p1BBHdZNjq+LFLs/nvMQtUDsBU= Received: from [127.0.0.1] (unknown [IPv6:2001:470:683e::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id 0B72E66B39; Sun, 19 Nov 2023 21:23:47 -0500 (EST) Message-ID: Subject: Re: [RFA] New pass for sign/zero extension elimination From: Xi Ruoyao To: Jeff Law , "gcc-patches@gcc.gnu.org" Cc: Jivan Hakobyan Date: Mon, 20 Nov 2023 10:23:46 +0800 In-Reply-To: <6d5f8ba7-0c60-4789-87ae-68617ce6ac2c@ventanamicro.com> References: <6d5f8ba7-0c60-4789-87ae-68617ce6ac2c@ventanamicro.com> Autocrypt: addr=xry111@xry111.site; prefer-encrypt=mutual; keydata=mDMEYnkdPhYJKwYBBAHaRw8BAQdAsY+HvJs3EVKpwIu2gN89cQT/pnrbQtlvd6Yfq7egugi0HlhpIFJ1b3lhbyA8eHJ5MTExQHhyeTExMS5zaXRlPoiTBBMWCgA7FiEEkdD1djAfkk197dzorKrSDhnnEOMFAmJ5HT4CGwMFCwkIBwICIgIGFQoJCAsCBBYCAwECHgcCF4AACgkQrKrSDhnnEOPHFgD8D9vUToTd1MF5bng9uPJq5y3DfpcxDp+LD3joA3U2TmwA/jZtN9xLH7CGDHeClKZK/ZYELotWfJsqRcthOIGjsdAPuDgEYnkdPhIKKwYBBAGXVQEFAQEHQG+HnNiPZseiBkzYBHwq/nN638o0NPwgYwH70wlKMZhRAwEIB4h4BBgWCgAgFiEEkdD1djAfkk197dzorKrSDhnnEOMFAmJ5HT4CGwwACgkQrKrSDhnnEOPjXgD/euD64cxwqDIqckUaisT3VCst11RcnO5iRHm6meNIwj0BALLmWplyi7beKrOlqKfuZtCLbiAPywGfCNg8LOTt4iMD Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.50.1 MIME-Version: 1.0 X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,LIKELY_SPAM_FROM,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,WEIRD_PORT autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Sun, 2023-11-19 at 17:47 -0700, Jeff Law wrote: > This is work originally started by Joern @ Embecosm. >=20 > There's been a long standing sense that we're generating too many=20 > sign/zero extensions on the RISC-V port.=C2=A0 REE is useful, but it's re= ally=20 > focused on a relatively narrow part of the extension problem. >=20 > What Joern's patch does is introduce a new pass which tracks liveness of= =20 > chunks of pseudo regs.=C2=A0 Specifically it tracks bits 0..7, 8..15, 16.= .31=20 > and 32..63. >=20 > If it encounters a sign/zero extend that sets bits that are never read,= =20 > then it replaces the sign/zero extension with a narrowing subreg.=C2=A0 T= he > narrowing subreg usually gets eliminated by subsequent passes (it's just= =20 > a copy after all). >=20 > Jivan has done some analysis and found that it eliminates roughly 1% of= =20 > the dynamic instruction stream for x264 as well as some redundant=20 > extensions in the coremark benchmark (both on rv64).=C2=A0 In my own test= ing=20 > as I worked through issues on other architectures I clearly saw it=20 > helping in various places within GCC itself or in the testsuite. >=20 > The basic structure is to first do a fairly standard liveness analysis > on the chunks, seeding original state with the liveness data from DF.=20 > Once that's stable, we do a final pass to identify the useless=20 > extensions and transform them into narrowing subregs. >=20 > A few key points to remember. >=20 > For destination processing it is always safe to ignore a destination.=20 > Ignoring a destination merely means that whatever was live after the=20 > given insn will continue to be live before the insn.=C2=A0 What is not sa= fe > is to clear a bit in the LIVENOW bitmap for a destination chunk that is= =20 > not set.=C2=A0 This comes into play with things like STRICT_LOW_PART. >=20 > For source processing the safe thing to do is to set all the chunks in a= =20 > register as live.=C2=A0 It is never safe to fail to process a source oper= and. >=20 > When a destination object is not fully live, we try to transfer that=20 > limited liveness to the source operands.=C2=A0 So for example if bits 16.= .63=20 > are dead in a destination of a PLUS, we need not mark bits 16..63 as=20 > live for the source operands.=C2=A0 We have to be careful -- consider a s= hift=20 > count on a target without SHIFT_COUNT_TRUNCATED set.=C2=A0 So we have bot= h a=20 > list of RTL codes where we can transfer liveness and a few codes where > one of the operands may need to be fully live (ex, a shift count) while= =20 > the other input may not need to be fully live (value left shifted). >=20 > Locally we have had this enabled at -O1 and above to encourage testing,= =20 > but I'm thinking that for the trunk enabling at -O2 and above is the=20 > right thing to do. >=20 > This has (of course) been tested on rv64.=C2=A0 It's also been bootstrapp= ed > and regression tested on x86.=C2=A0 Bootstrap and regression tested (C on= ly)=20 > for m68k, sh4, sh4eb, alpha.=C2=A0 Earlier versions were also bootstrappe= d=20 > and regression tested on ppc, hppa and s390x (C only for those as well).= =20 > =C2=A0 It's also been tested on the various crosses in my tester.=C2=A0 S= o we've > got reasonable coverage of 16, 32 and 64 bit targets, big and little=20 > endian, with and without SHIFT_COUNT_TRUNCATED and all kinds of other=20 > oddities. >=20 > The included tests are for RISC-V only because not all targets are going= =20 > to have extraneous extensions.=C2=A0=C2=A0 There's tests from coremark, x= 264 and > GCC's bz database.=C2=A0 It probably wouldn't be hard to add aarch64=20 > testscases.=C2=A0 The BZs listed are improved by this patch for aarch64. >=20 > Given the amount of work Jivan and I have done, I'm not comfortable=20 > self-approving at this time.=C2=A0 I'd much rather have another set of ey= es > on the code.=C2=A0 Hopefully the code is documented well enough for that = to > be useful exercise. >=20 > So, no need to work from Pago Pago for this patch.=C2=A0 I may make anoth= er > attempt at the eswin conditional move work while working virtually in=20 > Pago Pago though. >=20 > Thoughts, comments, recommendations? Unfortunately, I get some ICE building stage 1 libgcc with this patch on loongarch64-linux-gnu: during RTL pass: ext_dce ../../../gcc/libgcc/libgcc2.c: In function =E2=80=98__absvdi2=E2=80=99: ../../../gcc/libgcc/libgcc2.c:224:1: internal compiler error: Segmentation = fault 224 | } | ^ 0x120baa477 crash_signal ../../gcc/gcc/toplev.cc:316 0x1216aeeb4 ext_dce_process_sets ../../gcc/gcc/ext-dce.cc:128 0x1216afbaf ext_dce_process_bb ../../gcc/gcc/ext-dce.cc:647 0x1216afbaf ext_dce ../../gcc/gcc/ext-dce.cc:802 0x1216afbaf execute ../../gcc/gcc/ext-dce.cc:868 Please submit a full bug report, with preprocessed source (by using -frepor= t-bug). Please include the complete backtrace with any bug report. See for instructions. --=20 Xi Ruoyao School of Aerospace Science and Technology, Xidian University