From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 121923 invoked by alias); 29 Apr 2017 17:22:02 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 121900 invoked by uid 89); 29 Apr 2017 17:22:00 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=201701, H*c:Windows-1252, definitive, H*RU:sk:VE1EUR0 X-HELO: EUR01-VE1-obe.outbound.protection.outlook.com Received: from mail-oln040092066094.outbound.protection.outlook.com (HELO EUR01-VE1-obe.outbound.protection.outlook.com) (40.92.66.94) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 29 Apr 2017 17:21:58 +0000 Received: from VE1EUR01FT019.eop-EUR01.prod.protection.outlook.com (10.152.2.56) by VE1EUR01HT230.eop-EUR01.prod.protection.outlook.com (10.152.3.214) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.1047.9; Sat, 29 Apr 2017 17:21:57 +0000 Received: from AM4PR0701MB2162.eurprd07.prod.outlook.com (10.152.2.57) by VE1EUR01FT019.mail.protection.outlook.com (10.152.2.231) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1047.9 via Frontend Transport; Sat, 29 Apr 2017 17:21:56 +0000 Received: from AM4PR0701MB2162.eurprd07.prod.outlook.com ([fe80::40a0:2642:4335:50ed]) by AM4PR0701MB2162.eurprd07.prod.outlook.com ([fe80::40a0:2642:4335:50ed%19]) with mapi id 15.01.1075.005; Sat, 29 Apr 2017 17:21:56 +0000 From: Bernd Edlinger To: "Richard Earnshaw (lists)" , "gcc-patches@gcc.gnu.org" CC: Ramana Radhakrishnan , Kyrill Tkachov , Wilco Dijkstra Subject: [PING**2] [PATCH, ARM] correctly encode the CC reg data flow Date: Sat, 29 Apr 2017 17:32:00 -0000 Message-ID: References: <3f5e5538-5dd3-b416-904f-b87f115336fe@arm.com> In-Reply-To: authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=hotmail.de; x-incomingtopheadermarker: OriginalChecksum:CC64A7177BB60A1BA3DD9CD71CC87D261F2009552A59AFB8E446EAD7B6DD0994;UpperCasedChecksum:3EDEC4C37B0BA07F59389B9272B0529985E46D1CECB66248C78A345FCFF1D714;SizeAsReceived:8782;Count:43 x-ms-exchange-messagesentrepresentingtype: 1 x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;VE1EUR01HT230;5:U+BxDXMj5bEc/KgYeH3VSJZAxcburMNJ5e/SUU3cU4O69YGtQElncsaVZKFbgZK8q3VOVO1Q14yS+koL2m/KoS07p5osrwZmV1ebM2oOEus4lxkkebdmic2WCVr+ubSoVn1ccsHOJDB45nLE5pwIpQ==;24:vJRXtM4ky+iAb/LTwI61aKsW6XSHElTyM7tFgv+lGQTMEwn7uborDJdEe/z8+K2zzXS28tL9903LGmp/UGpWAgwYitUO7BVCtIplE2u/xyk=;7:SRNZwbqp+3iL5w0XdSaxDqQmRpZPdFaVLbMcoSp6nVs4uwAirhD0NmF6LY4z2Lwb9zU+RqpZ5Q/WfUSUXuiOgf+hWACAN2FNKwbE3JYW4sHkwoR1OTN/CSayNGO2D5/nRPphCaonj/3fq7lG39vXH95RsD2hp35ZwMFYufJTdK6Gr2oBQsahHNKs6+3TH0c2waUMIfG+numzVTOQmEc9WmSs2NgbsLVx5BjOQ5gy21ksl9yL8xrcTrn+ekatKjm6iUK6XLjJKb8PiIl5/qtGTruFVWlyBLsx8QwbVXN615cNN6koXsrX8d7tWDcREDK6 x-incomingheadercount: 43 x-eopattributedmessage: 0 x-forefront-antispam-report: EFV:NLI;SFV:NSPM;SFS:(7070007)(98901004);DIR:OUT;SFP:1901;SCL:1;SRVR:VE1EUR01HT230;H:AM4PR0701MB2162.eurprd07.prod.outlook.com;FPR:;SPF:None;LANG:en; x-ms-office365-filtering-correlation-id: 8e616cb3-4cdb-40a0-3719-08d48f243be3 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(201702061074)(5061506573)(5061507331)(1603103135)(2017031320274)(2017031324274)(2017031323274)(2017031322274)(1601125374)(1603101448)(1701031045);SRVR:VE1EUR01HT230; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(444000031);SRVR:VE1EUR01HT230;BCL:0;PCL:0;RULEID:;SRVR:VE1EUR01HT230; x-forefront-prvs: 02929ECF07 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="Windows-1252" Content-ID: <7FBF5C75321AC44589BFBF03E6700C60@eurprd07.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Apr 2017 17:21:56.7667 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1EUR01HT230 X-SW-Source: 2017-04/txt/msg01564.txt.bz2 Ping... On 04/20/17 20:11, Bernd Edlinger wrote: > Ping... > > for this patch: > https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01351.html > > On 01/18/17 16:36, Bernd Edlinger wrote: >> On 01/13/17 19:28, Bernd Edlinger wrote: >>> On 01/13/17 17:10, Bernd Edlinger wrote: >>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote: >>>>> On 18/12/16 12:58, Bernd Edlinger wrote: >>>>>> Hi, >>>>>> >>>>>> this is related to PR77308, the follow-up patch will depend on this >>>>>> one. >>>>>> >>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned >>>>>> before reload, a mis-compilation in libgcc function >>>>>> __gnu_satfractdasq >>>>>> was discovered, see [1] for more details. >>>>>> >>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly >>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split >>>>>> up into this: >>>>>> >>>>>> [(set (reg:CC CC_REGNUM) >>>>>> (compare:CC (match_dup 0) (match_dup 1))) >>>>>> (parallel [(set (reg:CC CC_REGNUM) >>>>>> (compare:CC (match_dup 3) (match_dup 4))) >>>>>> (set (match_dup 2) >>>>>> (minus:SI (match_dup 5) >>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int >>>>>> 0))))])] >>>>>> >>>>>> [(set (reg:CC CC_REGNUM) >>>>>> (compare:CC (match_dup 2) (match_dup 3))) >>>>>> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0)) >>>>>> (set (reg:CC CC_REGNUM) >>>>>> (compare:CC (match_dup 0) (match_dup 1))))] >>>>>> >>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare >>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC >>>>>> from before. Therefore the *arm_cmpsi_insn appears to be >>>>>> redundant and thus got removed, because the data values are >>>>>> identical. >>>>>> >>>>>> I think that applies to a number of similar pattern where data >>>>>> flow is happening through the CC reg. >>>>>> >>>>>> So this is a kind of correctness issue, and should be fixed >>>>>> independently from the optimization issue PR77308. >>>>>> >>>>>> Therefore I think the patterns need to specify the true >>>>>> value that will be in the CC reg, in order for cse to >>>>>> know what the instructions are really doing. >>>>>> >>>>>> >>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>>>>> Is it OK for trunk? >>>>>> >>>>> >>>>> I agree you've found a valid problem here, but I have some issues with >>>>> the patch itself. >>>>> >>>>> >>>>> (define_insn_and_split "subdi3_compare1" >>>>> [(set (reg:CC_NCV CC_REGNUM) >>>>> (compare:CC_NCV >>>>> (match_operand:DI 1 "register_operand" "r") >>>>> (match_operand:DI 2 "register_operand" "r"))) >>>>> (set (match_operand:DI 0 "register_operand" "=3D&r") >>>>> (minus:DI (match_dup 1) (match_dup 2)))] >>>>> "TARGET_32BIT" >>>>> "#" >>>>> "&& reload_completed" >>>>> [(parallel [(set (reg:CC CC_REGNUM) >>>>> (compare:CC (match_dup 1) (match_dup 2))) >>>>> (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))]) >>>>> (parallel [(set (reg:CC_C CC_REGNUM) >>>>> (compare:CC_C >>>>> (zero_extend:DI (match_dup 4)) >>>>> (plus:DI (zero_extend:DI (match_dup 5)) >>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>> (set (match_dup 3) >>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])] >>>>> >>>>> >>>>> This pattern is now no-longer self consistent in that before the split >>>>> the overall result for the condition register is in mode CC_NCV, but >>>>> afterwards it is just CC_C. >>>>> >>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly >>>>> reflect the result of the 64-bit comparison), but that then implies >>>>> that >>>>> the cc mode of subsi3_carryin_compare is incorrect as well and >>>>> should in >>>>> fact also be CC_NCV. Thinking about this pattern, I'm inclined to >>>>> agree >>>>> that CC_NCV is the correct mode for this operation >>>>> >>>>> I'm not sure if there are other consequences that will fall out from >>>>> fixing this (it's possible that we might need a change to >>>>> select_cc_mode >>>>> as well). >>>>> >>>> >>>> Yes, this is still a bit awkward... >>>> >>>> The N and V bit will be the correct result for the subdi3_compare1 >>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...) >>>> only gets the C bit correct, the expression for N and V is a different >>>> one. >>>> >>>> It probably works, because the subsi3_carryin_compare instruction sets >>>> more CC bits than the pattern does explicitly specify the value. >>>> We know the subsi3_carryin_compare also computes the NV bits, but it is >>>> hard to write down the correct rtl expression for it. >>>> >>>> In theory the pattern should describe everything correctly, >>>> maybe, like: >>>> >>>> set (reg:CC_C CC_REGNUM) >>>> (compare:CC_C >>>> (zero_extend:DI (match_dup 4)) >>>> (plus:DI (zero_extend:DI (match_dup 5)) >>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>> set (reg:CC_NV CC_REGNUM) >>>> (compare:CC_NV >>>> (match_dup 4)) >>>> (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int >>>> 0))) >>>> set (match_dup 3) >>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>> >>>> >>>> But I doubt that will work to set CC_REGNUM with two different modes >>>> in parallel? >>>> >>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly >>>> defines C from the DImode result, and NV from the SImode result, >>>> similar to the CC_NOOVmode, that also leaves something open what >>>> bits it really defines? >>>> >>>> >>>> What do you think? >>>> >>>> >>>> Thanks >>>> Bernd. >>> >>> I think maybe the right solution is to invent a new CCmode >>> that defines C as if the comparison is done in DImode >>> but N and V as if the comparison is done in SImode. >>> >>> I thought maybe I would call it CC_NCV_CIC (CIC =3D Carry-In-Compare), >>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because >>> only N and Z are set correctly), but in a different patch of course. >>> >>> Attached is a new version that implements the new CCmode. >>> >>> How do you like this new version? >>> >>> It seems to be able to build a cross-compiler at least. >>> >>> I will start a new bootstrap with this new patch, but that can take some >>> time until I have definitive results. >>> >>> Is there still a chance that it can go into gcc-7 or should it wait >>> for the next stage1? >>> >>> Thanks >>> Bernd. >> >> >> I thought I should also look at where the subdi_compare1 amd the >> negdi2_compare patterns are used, and look if the caller is fine with >> not having all CC bits available. >> >> And indeed usubv4 turns out to be questionabe, because it >> emits gen_sub3_compare1 and uses arm_gen_unlikely_cbranch (LTU, >> CCmode) which is inconsistent when subdi3_compare1 no longer uses >> CCmode. >> >> To correct this, the branch should use CC_Cmode which is always defined. >> >> So I tried to test this pattern, with the following test programs, >> and found that the code actually improves when the branch uses CC_Cmode >> instead of CCmode, both for SImode and DImode data, which was a bit >> surprising. >> >> I used this test program to see how the usubv4 pattern works: >> >> cat test.c (DImode) >> unsigned long long x, y, z; >> int b; >> void test() >> { >> b =3D __builtin_sub_overflow (y,z, &x); >> } >> >> >> unpatched code used 8 byte more stack than patched, >> because the DImode subtraction is effectively done twice. >> >> cat test1.c (SImode) >> unsigned long x, y, z; >> int b; >> void test() >> { >> b =3D __builtin_sub_overflow (y,z, &x); >> } >> >> which generates (unpatched): >> cmp r3, r0 >> sub ip, r3, r0 >> >> instead of expected (patched): >> subs r3, r3, r2 >> >> >> The condition is extracted by ifconversion and/or combine >> and complicates the resulting code instead of simplifying. >> >> I think this happens only when the branch and the subsi/di3_compare1 >> is using the same CC mode. >> >> That does not happen when the CC modes disagree, as with the >> proposed patch. All other uses of the pattern are already using >> CC_Cmode or CC_Vmode in the branch, and these do not change. >> >> Attached is an updated version of the patch, that happens to >> improve the code generation of the usubsi4 and usubdi4 pattern, >> as a side effect. >> >> >> Bootstrapped and reg-tested on arm-linux-gnueabihf. >> Is it OK for trunk? >> >> >> Thanks >> Bernd.