From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17323 invoked by alias); 5 Jul 2017 18:11:08 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 17302 invoked by uid 89); 5 Jul 2017 18:11:05 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=disagree, HX-OriginatorOrg:outlook.com, Hx-exchange-antispam-report-cfa-test:444000031, HX-MS-Exchange-CrossTenant-fromentityheader:Internet X-HELO: EUR02-VE1-obe.outbound.protection.outlook.com Received: from mail-oln040092069058.outbound.protection.outlook.com (HELO EUR02-VE1-obe.outbound.protection.outlook.com) (40.92.69.58) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 05 Jul 2017 18:11:03 +0000 Received: from VE1EUR02FT026.eop-EUR02.prod.protection.outlook.com (10.152.12.51) by VE1EUR02HT082.eop-EUR02.prod.protection.outlook.com (10.152.13.187) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.1220.9; Wed, 5 Jul 2017 18:10:59 +0000 Received: from AM5PR0701MB2657.eurprd07.prod.outlook.com (10.152.12.56) by VE1EUR02FT026.mail.protection.outlook.com (10.152.12.77) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1220.9 via Frontend Transport; Wed, 5 Jul 2017 18:10:59 +0000 Received: from AM5PR0701MB2657.eurprd07.prod.outlook.com ([fe80::6545:2237:2207:f519]) by AM5PR0701MB2657.eurprd07.prod.outlook.com ([fe80::6545:2237:2207:f519%18]) with mapi id 15.01.1240.013; Wed, 5 Jul 2017 18:10:59 +0000 From: Bernd Edlinger To: "Richard Earnshaw (lists)" , "gcc-patches@gcc.gnu.org" CC: Ramana Radhakrishnan , Kyrill Tkachov , Wilco Dijkstra Subject: [PING**6] [PATCH, ARM] correctly encode the CC reg data flow Date: Wed, 05 Jul 2017 18:11:00 -0000 Message-ID: References: <3f5e5538-5dd3-b416-904f-b87f115336fe@arm.com> <74eaaa44-40f0-4b12-1aec-4b9926158efe@hotmail.de> In-Reply-To: <74eaaa44-40f0-4b12-1aec-4b9926158efe@hotmail.de> authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=hotmail.de; x-incomingtopheadermarker: OriginalChecksum:8450F103DFB471D7CE159B2EC0438492ABB9EDD3CE64B699DC9BAD6EC83C7C74;UpperCasedChecksum:74587E95684A3A376201854AD3E5067CC5EE85A9073EF3ADF5D23080DF7A2C9E;SizeAsReceived:8211;Count:46 x-ms-exchange-messagesentrepresentingtype: 1 x-tmn: [ulzrsN+ldsKxSAe7GsF5yhINV+t7icql] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;VE1EUR02HT082;7:sjEXE6RGrH7ZE1JICaKTinGYvE++qi7F4ML425DgEQ3hV9ekUoHclYODYR5vSHipFN4uMU1PswfCv4KRWjX5o0H7jaioXTUArIE4xi+8lEoaXQn2SNulMajLghsLeVR9L71ZXU/3mIythKGNJoBkXWkXwxMVfak4dmeg+O7VCuyGT2/tKbtvkxF+E4WK9oOpZiPzcFZg8iaAJudSoweQUNybN0o1pS9h4680c/dfWAOLkiVRFI57KL3pUwjeOWHhVyX0fgYzmIF/4AqVOX8J4Tfb9nBj6hrqlGP3OeqS+wXffLhMsgE2KN+SzImAQ52NgyYAtCLeHE/1BVqoCz98D/iQvI16tUQYNH4Oyv7P5Cjirie0rYapbfgsJMgYL3ozzIFHy39und85dnMV69FKaen3KU9TSEUhQSWFKtSZrk/oYtXxS/k5I37YlgmYGqa8aKY0lxj/6MmQj2PUi9VkLE9Qf35GoEn3ylXJQOR/Ke9NDE/aMMLKXoZju+d7aB2jD0WheQo1cNKwLfQCpurOPZl9DQc7afAYtTpFBIUAjibjbTF9gXCtb8p61hm8bgsDugU+mq8TIYXeNae71nNEaSNjMXpab6SmQ4b3PeHaRlWEAJ4iVBMYy/xFGecuZ+l8HeZ0iMkIn7Kg42Aetrhlzxe6VCGe2v2agX1wCjLk8oP41/tKX+LoY88Pz4XCw7ztbYDX+av7ijYxsFb4jYWAB9yPdSEZsvoddbi5WuaMLX6niY57yRkH8pLtFpOYfxO0QJEN0oM/FF6RMm2p99tMPA== x-incomingheadercount: 46 x-eopattributedmessage: 0 x-forefront-antispam-report: EFV:NLI;SFV:NSPM;SFS:(7070007)(98901004);DIR:OUT;SFP:1901;SCL:1;SRVR:VE1EUR02HT082;H:AM5PR0701MB2657.eurprd07.prod.outlook.com;FPR:;SPF:None;LANG:en; x-ms-office365-filtering-correlation-id: 77e6722a-3d17-48a0-7e25-08d4c3d12ffd x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(300000503095)(300135400095)(201702061074)(5061506573)(5061507331)(1603103135)(2017031320274)(2017031324274)(2017031323274)(2017031322274)(1601125374)(1603101448)(1701031045)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:VE1EUR02HT082; x-ms-traffictypediagnostic: VE1EUR02HT082: x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(444000031);SRVR:VE1EUR02HT082;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:VE1EUR02HT082; x-forefront-prvs: 0359162B6D spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="Windows-1252" Content-ID: <140EBAF8EBC35D4FB10EBF3067A5147D@eurprd07.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-originalarrivaltime: 05 Jul 2017 18:10:59.8325 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1EUR02HT082 X-SW-Source: 2017-07/txt/msg00286.txt.bz2 Ping... On 06/14/17 14:33, Bernd Edlinger wrote: > Ping... >=20 > On 06/01/17 18:00, Bernd Edlinger wrote: >> Ping... >> >> On 05/12/17 18:49, Bernd Edlinger wrote: >>> Ping... >>> >>> On 04/29/17 19:21, Bernd Edlinger wrote: >>>> Ping... >>>> >>>> On 04/20/17 20:11, Bernd Edlinger wrote: >>>>> Ping... >>>>> >>>>> for this patch: >>>>> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01351.html >>>>> >>>>> On 01/18/17 16:36, Bernd Edlinger wrote: >>>>>> On 01/13/17 19:28, Bernd Edlinger wrote: >>>>>>> On 01/13/17 17:10, Bernd Edlinger wrote: >>>>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote: >>>>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> this is related to PR77308, the follow-up patch will depend on=20 >>>>>>>>>> this >>>>>>>>>> one. >>>>>>>>>> >>>>>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned >>>>>>>>>> before reload, a mis-compilation in libgcc function >>>>>>>>>> __gnu_satfractdasq >>>>>>>>>> was discovered, see [1] for more details. >>>>>>>>>> >>>>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly >>>>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split >>>>>>>>>> up into this: >>>>>>>>>> >>>>>>>>>> [(set (reg:CC CC_REGNUM) >>>>>>>>>> (compare:CC (match_dup 0) (match_dup 1))) >>>>>>>>>> (parallel [(set (reg:CC CC_REGNUM) >>>>>>>>>> (compare:CC (match_dup 3) (match_dup 4))) >>>>>>>>>> (set (match_dup 2) >>>>>>>>>> (minus:SI (match_dup 5) >>>>>>>>>> (ltu:SI (reg:CC_C CC_REGNUM)=20 >>>>>>>>>> (const_int >>>>>>>>>> 0))))])] >>>>>>>>>> >>>>>>>>>> [(set (reg:CC CC_REGNUM) >>>>>>>>>> (compare:CC (match_dup 2) (match_dup 3))) >>>>>>>>>> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0)) >>>>>>>>>> (set (reg:CC CC_REGNUM) >>>>>>>>>> (compare:CC (match_dup 0) (match_dup 1))))] >>>>>>>>>> >>>>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare >>>>>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC >>>>>>>>>> from before. Therefore the *arm_cmpsi_insn appears to be >>>>>>>>>> redundant and thus got removed, because the data values are >>>>>>>>>> identical. >>>>>>>>>> >>>>>>>>>> I think that applies to a number of similar pattern where data >>>>>>>>>> flow is happening through the CC reg. >>>>>>>>>> >>>>>>>>>> So this is a kind of correctness issue, and should be fixed >>>>>>>>>> independently from the optimization issue PR77308. >>>>>>>>>> >>>>>>>>>> Therefore I think the patterns need to specify the true >>>>>>>>>> value that will be in the CC reg, in order for cse to >>>>>>>>>> know what the instructions are really doing. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>>>>>>>>> Is it OK for trunk? >>>>>>>>>> >>>>>>>>> >>>>>>>>> I agree you've found a valid problem here, but I have some issues >>>>>>>>> with >>>>>>>>> the patch itself. >>>>>>>>> >>>>>>>>> >>>>>>>>> (define_insn_and_split "subdi3_compare1" >>>>>>>>> [(set (reg:CC_NCV CC_REGNUM) >>>>>>>>> (compare:CC_NCV >>>>>>>>> (match_operand:DI 1 "register_operand" "r") >>>>>>>>> (match_operand:DI 2 "register_operand" "r"))) >>>>>>>>> (set (match_operand:DI 0 "register_operand" "=3D&r") >>>>>>>>> (minus:DI (match_dup 1) (match_dup 2)))] >>>>>>>>> "TARGET_32BIT" >>>>>>>>> "#" >>>>>>>>> "&& reload_completed" >>>>>>>>> [(parallel [(set (reg:CC CC_REGNUM) >>>>>>>>> (compare:CC (match_dup 1) (match_dup 2))) >>>>>>>>> (set (match_dup 0) (minus:SI (match_dup 1) (match_dup >>>>>>>>> 2)))]) >>>>>>>>> (parallel [(set (reg:CC_C CC_REGNUM) >>>>>>>>> (compare:CC_C >>>>>>>>> (zero_extend:DI (match_dup 4)) >>>>>>>>> (plus:DI (zero_extend:DI (match_dup 5)) >>>>>>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>>>>>> (set (match_dup 3) >>>>>>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >>>>>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])] >>>>>>>>> >>>>>>>>> >>>>>>>>> This pattern is now no-longer self consistent in that before the >>>>>>>>> split >>>>>>>>> the overall result for the condition register is in mode=20 >>>>>>>>> CC_NCV, but >>>>>>>>> afterwards it is just CC_C. >>>>>>>>> >>>>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly >>>>>>>>> reflect the result of the 64-bit comparison), but that then=20 >>>>>>>>> implies >>>>>>>>> that >>>>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and >>>>>>>>> should in >>>>>>>>> fact also be CC_NCV. Thinking about this pattern, I'm inclined to >>>>>>>>> agree >>>>>>>>> that CC_NCV is the correct mode for this operation >>>>>>>>> >>>>>>>>> I'm not sure if there are other consequences that will fall out=20 >>>>>>>>> from >>>>>>>>> fixing this (it's possible that we might need a change to >>>>>>>>> select_cc_mode >>>>>>>>> as well). >>>>>>>>> >>>>>>>> >>>>>>>> Yes, this is still a bit awkward... >>>>>>>> >>>>>>>> The N and V bit will be the correct result for the subdi3_compare1 >>>>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...) >>>>>>>> only gets the C bit correct, the expression for N and V is a=20 >>>>>>>> different >>>>>>>> one. >>>>>>>> >>>>>>>> It probably works, because the subsi3_carryin_compare=20 >>>>>>>> instruction sets >>>>>>>> more CC bits than the pattern does explicitly specify the value. >>>>>>>> We know the subsi3_carryin_compare also computes the NV bits, but >>>>>>>> it is >>>>>>>> hard to write down the correct rtl expression for it. >>>>>>>> >>>>>>>> In theory the pattern should describe everything correctly, >>>>>>>> maybe, like: >>>>>>>> >>>>>>>> set (reg:CC_C CC_REGNUM) >>>>>>>> (compare:CC_C >>>>>>>> (zero_extend:DI (match_dup 4)) >>>>>>>> (plus:DI (zero_extend:DI (match_dup 5)) >>>>>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>>>>> set (reg:CC_NV CC_REGNUM) >>>>>>>> (compare:CC_NV >>>>>>>> (match_dup 4)) >>>>>>>> (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int >>>>>>>> 0))) >>>>>>>> set (match_dup 3) >>>>>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >>>>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>>>>> >>>>>>>> >>>>>>>> But I doubt that will work to set CC_REGNUM with two different=20 >>>>>>>> modes >>>>>>>> in parallel? >>>>>>>> >>>>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly >>>>>>>> defines C from the DImode result, and NV from the SImode result, >>>>>>>> similar to the CC_NOOVmode, that also leaves something open what >>>>>>>> bits it really defines? >>>>>>>> >>>>>>>> >>>>>>>> What do you think? >>>>>>>> >>>>>>>> >>>>>>>> Thanks >>>>>>>> Bernd. >>>>>>> >>>>>>> I think maybe the right solution is to invent a new CCmode >>>>>>> that defines C as if the comparison is done in DImode >>>>>>> but N and V as if the comparison is done in SImode. >>>>>>> >>>>>>> I thought maybe I would call it CC_NCV_CIC (CIC =3D Carry-In-Compar= e), >>>>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because >>>>>>> only N and Z are set correctly), but in a different patch of course. >>>>>>> >>>>>>> Attached is a new version that implements the new CCmode. >>>>>>> >>>>>>> How do you like this new version? >>>>>>> >>>>>>> It seems to be able to build a cross-compiler at least. >>>>>>> >>>>>>> I will start a new bootstrap with this new patch, but that can take >>>>>>> some >>>>>>> time until I have definitive results. >>>>>>> >>>>>>> Is there still a chance that it can go into gcc-7 or should it wait >>>>>>> for the next stage1? >>>>>>> >>>>>>> Thanks >>>>>>> Bernd. >>>>>> >>>>>> >>>>>> I thought I should also look at where the subdi_compare1 amd the >>>>>> negdi2_compare patterns are used, and look if the caller is fine with >>>>>> not having all CC bits available. >>>>>> >>>>>> And indeed usubv4 turns out to be questionabe, because it >>>>>> emits gen_sub3_compare1 and uses arm_gen_unlikely_cbranch (LTU, >>>>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses >>>>>> CCmode. >>>>>> >>>>>> To correct this, the branch should use CC_Cmode which is always=20 >>>>>> defined. >>>>>> >>>>>> So I tried to test this pattern, with the following test programs, >>>>>> and found that the code actually improves when the branch uses=20 >>>>>> CC_Cmode >>>>>> instead of CCmode, both for SImode and DImode data, which was a bit >>>>>> surprising. >>>>>> >>>>>> I used this test program to see how the usubv4 pattern works: >>>>>> >>>>>> cat test.c (DImode) >>>>>> unsigned long long x, y, z; >>>>>> int b; >>>>>> void test() >>>>>> { >>>>>> b =3D __builtin_sub_overflow (y,z, &x); >>>>>> } >>>>>> >>>>>> >>>>>> unpatched code used 8 byte more stack than patched, >>>>>> because the DImode subtraction is effectively done twice. >>>>>> >>>>>> cat test1.c (SImode) >>>>>> unsigned long x, y, z; >>>>>> int b; >>>>>> void test() >>>>>> { >>>>>> b =3D __builtin_sub_overflow (y,z, &x); >>>>>> } >>>>>> >>>>>> which generates (unpatched): >>>>>> cmp r3, r0 >>>>>> sub ip, r3, r0 >>>>>> >>>>>> instead of expected (patched): >>>>>> subs r3, r3, r2 >>>>>> >>>>>> >>>>>> The condition is extracted by ifconversion and/or combine >>>>>> and complicates the resulting code instead of simplifying. >>>>>> >>>>>> I think this happens only when the branch and the subsi/di3_compare1 >>>>>> is using the same CC mode. >>>>>> >>>>>> That does not happen when the CC modes disagree, as with the >>>>>> proposed patch. All other uses of the pattern are already using >>>>>> CC_Cmode or CC_Vmode in the branch, and these do not change. >>>>>> >>>>>> Attached is an updated version of the patch, that happens to >>>>>> improve the code generation of the usubsi4 and usubdi4 pattern, >>>>>> as a side effect. >>>>>> >>>>>> >>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>>>>> Is it OK for trunk? >>>>>> >>>>>> >>>>>> Thanks >>>>>> Bernd.