From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 31570 invoked by alias); 14 Jun 2017 12:34:13 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 31545 invoked by uid 89); 14 Jun 2017 12:34:11 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: EUR03-DB5-obe.outbound.protection.outlook.com Received: from mail-oln040092071039.outbound.protection.outlook.com (HELO EUR03-DB5-obe.outbound.protection.outlook.com) (40.92.71.39) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 14 Jun 2017 12:34:09 +0000 Received: from VE1EUR03FT025.eop-EUR03.prod.protection.outlook.com (10.152.18.52) by VE1EUR03HT100.eop-EUR03.prod.protection.outlook.com (10.152.19.128) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.1143.11; Wed, 14 Jun 2017 12:34:10 +0000 Received: from AM4PR0701MB2162.eurprd07.prod.outlook.com (10.152.18.54) by VE1EUR03FT025.mail.protection.outlook.com (10.152.18.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1157.12 via Frontend Transport; Wed, 14 Jun 2017 12:34:10 +0000 Received: from AM4PR0701MB2162.eurprd07.prod.outlook.com ([fe80::f508:a3ef:5f6e:1e39]) by AM4PR0701MB2162.eurprd07.prod.outlook.com ([fe80::f508:a3ef:5f6e:1e39%17]) with mapi id 15.01.1178.013; Wed, 14 Jun 2017 12:34:10 +0000 From: Bernd Edlinger To: "Richard Earnshaw (lists)" , "gcc-patches@gcc.gnu.org" CC: Ramana Radhakrishnan , Kyrill Tkachov , Wilco Dijkstra Subject: [PING**5] [PATCH, ARM] correctly encode the CC reg data flow Date: Wed, 14 Jun 2017 12:34:00 -0000 Message-ID: References: <3f5e5538-5dd3-b416-904f-b87f115336fe@arm.com> In-Reply-To: authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=hotmail.de; x-incomingtopheadermarker: OriginalChecksum:0A5FCF3099652BEFCC622497CD3AE00B546D7FEA43E2E67CAF8BC5FFBBDE6B5A;UpperCasedChecksum:58F558361FC34BBA358CB2F27EF3B564C46583D8A1051DCEFC72E0F5068CBACC;SizeAsReceived:8164;Count:46 x-ms-exchange-messagesentrepresentingtype: 1 x-tmn: [kzWoviQMFAAdMmPV7PxNdkI6KCSYDaJs] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;VE1EUR03HT100;24:nIydyK9LHAfJNRT6y2LVCoooruBnXXH09cnX4GQKzAzD8Yx6mq4tfBB62EFDVIw6qYGbi0yntt3ogSFyNMxxKFUMMQQjWZKxyzHr2/GFpiY=;7:MA1891If9WNrkKuk/Dd3OF+ZZY37bWBKiPqAOM/lp3U8G8NuDpC0X1GkvgL40YYJdh6PBtEPODQd2G0gGE5c5FzpeE1ROTMGxOLOiuwN7eDrVQHB06o/DWQD4Ejk84N4Uxd+msxQKg+tg7iUmvVpM8avl4yMVQPQQvV6vNaofVmU2/7imIeSBFiyF7GZOZgFPAJ5Rs9nxs9DLphrhGOdyBUbxYU9iwYCZwbnUP4jSxU1hdI203aqZdSWoQ5JHT4MDqv3pfebxZAwLIoTIVH8VWbpWQOU+j7DtripVCwHx3rJIAGMU/dk6ZgBD6mPsjfZ x-incomingheadercount: 46 x-eopattributedmessage: 0 x-forefront-antispam-report: EFV:NLI;SFV:NSPM;SFS:(7070007)(98901004);DIR:OUT;SFP:1901;SCL:1;SRVR:VE1EUR03HT100;H:AM4PR0701MB2162.eurprd07.prod.outlook.com;FPR:;SPF:None;LANG:en; x-ms-traffictypediagnostic: VE1EUR03HT100: x-ms-office365-filtering-correlation-id: 60081f01-9100-4c9b-4968-08d4b321a76c x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(201702061074)(5061506573)(5061507331)(1603103135)(2017031320274)(2017031324274)(2017031323274)(2017031322274)(1601125374)(1603101448)(1701031045);SRVR:VE1EUR03HT100; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(444000031);SRVR:VE1EUR03HT100;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:VE1EUR03HT100; x-forefront-prvs: 033857D0BD spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="Windows-1252" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Jun 2017 12:34:10.0875 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1EUR03HT100 X-SW-Source: 2017-06/txt/msg01031.txt.bz2 Ping... On 06/01/17 18:00, Bernd Edlinger wrote: > Ping... >=20 > On 05/12/17 18:49, Bernd Edlinger wrote: >> Ping... >> >> On 04/29/17 19:21, Bernd Edlinger wrote: >>> Ping... >>> >>> On 04/20/17 20:11, Bernd Edlinger wrote: >>>> Ping... >>>> >>>> for this patch: >>>> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01351.html >>>> >>>> On 01/18/17 16:36, Bernd Edlinger wrote: >>>>> On 01/13/17 19:28, Bernd Edlinger wrote: >>>>>> On 01/13/17 17:10, Bernd Edlinger wrote: >>>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote: >>>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> this is related to PR77308, the follow-up patch will depend on=20 >>>>>>>>> this >>>>>>>>> one. >>>>>>>>> >>>>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned >>>>>>>>> before reload, a mis-compilation in libgcc function >>>>>>>>> __gnu_satfractdasq >>>>>>>>> was discovered, see [1] for more details. >>>>>>>>> >>>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly >>>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split >>>>>>>>> up into this: >>>>>>>>> >>>>>>>>> [(set (reg:CC CC_REGNUM) >>>>>>>>> (compare:CC (match_dup 0) (match_dup 1))) >>>>>>>>> (parallel [(set (reg:CC CC_REGNUM) >>>>>>>>> (compare:CC (match_dup 3) (match_dup 4))) >>>>>>>>> (set (match_dup 2) >>>>>>>>> (minus:SI (match_dup 5) >>>>>>>>> (ltu:SI (reg:CC_C CC_REGNUM)=20 >>>>>>>>> (const_int >>>>>>>>> 0))))])] >>>>>>>>> >>>>>>>>> [(set (reg:CC CC_REGNUM) >>>>>>>>> (compare:CC (match_dup 2) (match_dup 3))) >>>>>>>>> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0)) >>>>>>>>> (set (reg:CC CC_REGNUM) >>>>>>>>> (compare:CC (match_dup 0) (match_dup 1))))] >>>>>>>>> >>>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare >>>>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC >>>>>>>>> from before. Therefore the *arm_cmpsi_insn appears to be >>>>>>>>> redundant and thus got removed, because the data values are >>>>>>>>> identical. >>>>>>>>> >>>>>>>>> I think that applies to a number of similar pattern where data >>>>>>>>> flow is happening through the CC reg. >>>>>>>>> >>>>>>>>> So this is a kind of correctness issue, and should be fixed >>>>>>>>> independently from the optimization issue PR77308. >>>>>>>>> >>>>>>>>> Therefore I think the patterns need to specify the true >>>>>>>>> value that will be in the CC reg, in order for cse to >>>>>>>>> know what the instructions are really doing. >>>>>>>>> >>>>>>>>> >>>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>>>>>>>> Is it OK for trunk? >>>>>>>>> >>>>>>>> >>>>>>>> I agree you've found a valid problem here, but I have some issues >>>>>>>> with >>>>>>>> the patch itself. >>>>>>>> >>>>>>>> >>>>>>>> (define_insn_and_split "subdi3_compare1" >>>>>>>> [(set (reg:CC_NCV CC_REGNUM) >>>>>>>> (compare:CC_NCV >>>>>>>> (match_operand:DI 1 "register_operand" "r") >>>>>>>> (match_operand:DI 2 "register_operand" "r"))) >>>>>>>> (set (match_operand:DI 0 "register_operand" "=3D&r") >>>>>>>> (minus:DI (match_dup 1) (match_dup 2)))] >>>>>>>> "TARGET_32BIT" >>>>>>>> "#" >>>>>>>> "&& reload_completed" >>>>>>>> [(parallel [(set (reg:CC CC_REGNUM) >>>>>>>> (compare:CC (match_dup 1) (match_dup 2))) >>>>>>>> (set (match_dup 0) (minus:SI (match_dup 1) (match_dup >>>>>>>> 2)))]) >>>>>>>> (parallel [(set (reg:CC_C CC_REGNUM) >>>>>>>> (compare:CC_C >>>>>>>> (zero_extend:DI (match_dup 4)) >>>>>>>> (plus:DI (zero_extend:DI (match_dup 5)) >>>>>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>>>>> (set (match_dup 3) >>>>>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >>>>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])] >>>>>>>> >>>>>>>> >>>>>>>> This pattern is now no-longer self consistent in that before the >>>>>>>> split >>>>>>>> the overall result for the condition register is in mode CC_NCV,=20 >>>>>>>> but >>>>>>>> afterwards it is just CC_C. >>>>>>>> >>>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly >>>>>>>> reflect the result of the 64-bit comparison), but that then implies >>>>>>>> that >>>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and >>>>>>>> should in >>>>>>>> fact also be CC_NCV. Thinking about this pattern, I'm inclined to >>>>>>>> agree >>>>>>>> that CC_NCV is the correct mode for this operation >>>>>>>> >>>>>>>> I'm not sure if there are other consequences that will fall out=20 >>>>>>>> from >>>>>>>> fixing this (it's possible that we might need a change to >>>>>>>> select_cc_mode >>>>>>>> as well). >>>>>>>> >>>>>>> >>>>>>> Yes, this is still a bit awkward... >>>>>>> >>>>>>> The N and V bit will be the correct result for the subdi3_compare1 >>>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...) >>>>>>> only gets the C bit correct, the expression for N and V is a=20 >>>>>>> different >>>>>>> one. >>>>>>> >>>>>>> It probably works, because the subsi3_carryin_compare instruction=20 >>>>>>> sets >>>>>>> more CC bits than the pattern does explicitly specify the value. >>>>>>> We know the subsi3_carryin_compare also computes the NV bits, but >>>>>>> it is >>>>>>> hard to write down the correct rtl expression for it. >>>>>>> >>>>>>> In theory the pattern should describe everything correctly, >>>>>>> maybe, like: >>>>>>> >>>>>>> set (reg:CC_C CC_REGNUM) >>>>>>> (compare:CC_C >>>>>>> (zero_extend:DI (match_dup 4)) >>>>>>> (plus:DI (zero_extend:DI (match_dup 5)) >>>>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>>>> set (reg:CC_NV CC_REGNUM) >>>>>>> (compare:CC_NV >>>>>>> (match_dup 4)) >>>>>>> (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int >>>>>>> 0))) >>>>>>> set (match_dup 3) >>>>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >>>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>>>> >>>>>>> >>>>>>> But I doubt that will work to set CC_REGNUM with two different modes >>>>>>> in parallel? >>>>>>> >>>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly >>>>>>> defines C from the DImode result, and NV from the SImode result, >>>>>>> similar to the CC_NOOVmode, that also leaves something open what >>>>>>> bits it really defines? >>>>>>> >>>>>>> >>>>>>> What do you think? >>>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> Bernd. >>>>>> >>>>>> I think maybe the right solution is to invent a new CCmode >>>>>> that defines C as if the comparison is done in DImode >>>>>> but N and V as if the comparison is done in SImode. >>>>>> >>>>>> I thought maybe I would call it CC_NCV_CIC (CIC =3D Carry-In-Compare= ), >>>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because >>>>>> only N and Z are set correctly), but in a different patch of course. >>>>>> >>>>>> Attached is a new version that implements the new CCmode. >>>>>> >>>>>> How do you like this new version? >>>>>> >>>>>> It seems to be able to build a cross-compiler at least. >>>>>> >>>>>> I will start a new bootstrap with this new patch, but that can take >>>>>> some >>>>>> time until I have definitive results. >>>>>> >>>>>> Is there still a chance that it can go into gcc-7 or should it wait >>>>>> for the next stage1? >>>>>> >>>>>> Thanks >>>>>> Bernd. >>>>> >>>>> >>>>> I thought I should also look at where the subdi_compare1 amd the >>>>> negdi2_compare patterns are used, and look if the caller is fine with >>>>> not having all CC bits available. >>>>> >>>>> And indeed usubv4 turns out to be questionabe, because it >>>>> emits gen_sub3_compare1 and uses arm_gen_unlikely_cbranch (LTU, >>>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses >>>>> CCmode. >>>>> >>>>> To correct this, the branch should use CC_Cmode which is always=20 >>>>> defined. >>>>> >>>>> So I tried to test this pattern, with the following test programs, >>>>> and found that the code actually improves when the branch uses=20 >>>>> CC_Cmode >>>>> instead of CCmode, both for SImode and DImode data, which was a bit >>>>> surprising. >>>>> >>>>> I used this test program to see how the usubv4 pattern works: >>>>> >>>>> cat test.c (DImode) >>>>> unsigned long long x, y, z; >>>>> int b; >>>>> void test() >>>>> { >>>>> b =3D __builtin_sub_overflow (y,z, &x); >>>>> } >>>>> >>>>> >>>>> unpatched code used 8 byte more stack than patched, >>>>> because the DImode subtraction is effectively done twice. >>>>> >>>>> cat test1.c (SImode) >>>>> unsigned long x, y, z; >>>>> int b; >>>>> void test() >>>>> { >>>>> b =3D __builtin_sub_overflow (y,z, &x); >>>>> } >>>>> >>>>> which generates (unpatched): >>>>> cmp r3, r0 >>>>> sub ip, r3, r0 >>>>> >>>>> instead of expected (patched): >>>>> subs r3, r3, r2 >>>>> >>>>> >>>>> The condition is extracted by ifconversion and/or combine >>>>> and complicates the resulting code instead of simplifying. >>>>> >>>>> I think this happens only when the branch and the subsi/di3_compare1 >>>>> is using the same CC mode. >>>>> >>>>> That does not happen when the CC modes disagree, as with the >>>>> proposed patch. All other uses of the pattern are already using >>>>> CC_Cmode or CC_Vmode in the branch, and these do not change. >>>>> >>>>> Attached is an updated version of the patch, that happens to >>>>> improve the code generation of the usubsi4 and usubdi4 pattern, >>>>> as a side effect. >>>>> >>>>> >>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>>>> Is it OK for trunk? >>>>> >>>>> >>>>> Thanks >>>>> Bernd.