From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 37077 invoked by alias); 6 Sep 2017 12:44:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 37061 invoked by uid 89); 6 Sep 2017 12:44:49 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=0X, carries, Hx-spam-relays-external:sk:HE1EUR0, H*RU:sk:HE1EUR0 X-HELO: EUR02-VE1-obe.outbound.protection.outlook.com Received: from mail-oln040092069076.outbound.protection.outlook.com (HELO EUR02-VE1-obe.outbound.protection.outlook.com) (40.92.69.76) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 06 Sep 2017 12:44:42 +0000 Received: from HE1EUR02FT061.eop-EUR02.prod.protection.outlook.com (10.152.10.58) by HE1EUR02HT222.eop-EUR02.prod.protection.outlook.com (10.152.11.50) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.1385.11; Wed, 6 Sep 2017 12:44:38 +0000 Received: from AM5PR0701MB2657.eurprd07.prod.outlook.com (10.152.10.58) by HE1EUR02FT061.mail.protection.outlook.com (10.152.11.17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.20.13.11 via Frontend Transport; Wed, 6 Sep 2017 12:44:38 +0000 Received: from AM5PR0701MB2657.eurprd07.prod.outlook.com ([fe80::8c96:1341:5db1:7f8c]) by AM5PR0701MB2657.eurprd07.prod.outlook.com ([fe80::8c96:1341:5db1:7f8c%18]) with mapi id 15.20.0035.010; Wed, 6 Sep 2017 12:44:38 +0000 From: Bernd Edlinger To: Kyrill Tkachov , "Richard Earnshaw (lists)" , "gcc-patches@gcc.gnu.org" CC: Ramana Radhakrishnan , Wilco Dijkstra Subject: Re: [PATCH, ARM] correctly encode the CC reg data flow Date: Wed, 06 Sep 2017 12:44:00 -0000 Message-ID: References: <3f5e5538-5dd3-b416-904f-b87f115336fe@arm.com> <59AD5B44.2080509@foss.arm.com> In-Reply-To: authentication-results: foss.arm.com; dkim=none (message not signed) header.d=none;foss.arm.com; dmarc=none action=none header.from=hotmail.de; x-incomingtopheadermarker: OriginalChecksum:3BBBA4BD3417299CD4EA5B022CF764B0230AD9FFDC50A21FB46B2D97D3B792E3;UpperCasedChecksum:018AF8B72456AF3BB17790FA98EBF90A352FAD3244A478026B6A4238A08044DC;SizeAsReceived:7731;Count:47 x-ms-exchange-messagesentrepresentingtype: 1 x-tmn: [irnZYlxMtCQPr+9NdHiF9s2/ObMDohr7] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;HE1EUR02HT222;6:GQr7LZI6WlBQLLEYbwOHSqyEl0ZupjtO3jKjdo/c28GFzQE/dw/TEEiSuCTkJfDtVHNF6werfLjL5h1+ig0a4T1LH392QMGG0Kz1YCkmx0TTaOu2Z02ic/P1TDkO+klaCa8KfP/69sbaoUJ9QClnhDdWLPZqYtITVJxi0k+5baxRamSvA5AIeI5Z7jz4M5haU3qlBeRLIv2vqIIQmseLCzv6JFP/m226bDCRDij7ZX/gRHeoIfcppCvYrDU8jIJvkbXGsReclqkrzqtcghS+mhsAxXQJDlbIU1pdUvpDVr1cBnCU0Dr50P8D0ma7kj4RkF+B4NOlsGLz5cHBnq3y3A==;5:bQRQ8UMT674inXBDUI4IxrQM7NM5Y7R1fa/12du9uPJyZjuMFIHKYze/huhvRQpuPiVyg9PR9WYc6f2TcJKJS5L/BGhDSdq26DiwLPmWUZ9oJWIeRJArtofHKP5FplzYGlTTjFT/J83snnhVZMnx9w==;24:0mIyjtBAqYLS4NoOey08poTZVy7vHC01B9xgPJsuOrltzhlxg0kprKGR8Pqe4y3wQKKtosMCBddkdmvFjlsHZBh74EVJOS6KG4kJhQs9qhA=;7:ku5/t0QFetnK5OrifutlHIKRNQyzCxHvfLA0NvNWJWi7303q1fhzASZoSfiSslgbCaYKodgZhKBY0uqEWizMeU4g3hiK/qzLFqauNmaYnddT9r2GJbJb0bISL5VC/Qs1LqvbrIux18aRiruBw6JAvYsYmBsJjFGEuv1St3NIioCq3kSqKFqI54f2aVBbhpXOlMfOrCjDXOutVyh78vGc4YR6csTe3sgo2ms5H0atuqY= x-incomingheadercount: 47 x-eopattributedmessage: 0 x-ms-office365-filtering-correlation-id: 3b9cfb26-d3ac-49f7-2add-08d4f52508cc x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(300000503095)(300135400095)(201702061074)(5061506573)(5061507331)(1603103135)(2017031320274)(2017031324274)(2017031323274)(2017031322404)(1601125374)(1603101448)(1701031045)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:HE1EUR02HT222; x-ms-traffictypediagnostic: HE1EUR02HT222: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(444000031);SRVR:HE1EUR02HT222;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:HE1EUR02HT222; x-forefront-prvs: 0422860ED4 x-forefront-antispam-report: SFV:NSPM;SFS:(7070007)(98901004);DIR:OUT;SFP:1901;SCL:1;SRVR:HE1EUR02HT222;H:AM5PR0701MB2657.eurprd07.prod.outlook.com;FPR:;SPF:None;LANG:; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="Windows-1252" Content-ID: <392855043BCED84EACEB68B2C9108CC5@eurprd07.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-originalarrivaltime: 06 Sep 2017 12:44:38.5963 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1EUR02HT222 X-SW-Source: 2017-09/txt/msg00360.txt.bz2 On 09/04/17 21:54, Bernd Edlinger wrote: > Hi Kyrill, >=20 > Thanks for your review! >=20 >=20 > On 09/04/17 15:55, Kyrill Tkachov wrote: >> Hi Bernd, >> >> On 18/01/17 15:36, Bernd Edlinger wrote: >>> On 01/13/17 19:28, Bernd Edlinger wrote: >>>> On 01/13/17 17:10, Bernd Edlinger wrote: >>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote: >>>>>> On 18/12/16 12:58, Bernd Edlinger wrote: >>>>>>> Hi, >>>>>>> >>>>>>> this is related to PR77308, the follow-up patch will depend on this >>>>>>> one. >>>>>>> >>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned >>>>>>> before reload, a mis-compilation in libgcc function=20 >>>>>>> __gnu_satfractdasq >>>>>>> was discovered, see [1] for more details. >>>>>>> >>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly >>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split >>>>>>> up into this: >>>>>>> >>>>>>> [(set (reg:CC CC_REGNUM) >>>>>>> (compare:CC (match_dup 0) (match_dup 1))) >>>>>>> (parallel [(set (reg:CC CC_REGNUM) >>>>>>> (compare:CC (match_dup 3) (match_dup 4))) >>>>>>> (set (match_dup 2) >>>>>>> (minus:SI (match_dup 5) >>>>>>> (ltu:SI (reg:CC_C CC_REGNUM)=20 >>>>>>> (const_int >>>>>>> 0))))])] >>>>>>> >>>>>>> [(set (reg:CC CC_REGNUM) >>>>>>> (compare:CC (match_dup 2) (match_dup 3))) >>>>>>> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0)) >>>>>>> (set (reg:CC CC_REGNUM) >>>>>>> (compare:CC (match_dup 0) (match_dup 1))))] >>>>>>> >>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare >>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC >>>>>>> from before. Therefore the *arm_cmpsi_insn appears to be >>>>>>> redundant and thus got removed, because the data values are=20 >>>>>>> identical. >>>>>>> >>>>>>> I think that applies to a number of similar pattern where data >>>>>>> flow is happening through the CC reg. >>>>>>> >>>>>>> So this is a kind of correctness issue, and should be fixed >>>>>>> independently from the optimization issue PR77308. >>>>>>> >>>>>>> Therefore I think the patterns need to specify the true >>>>>>> value that will be in the CC reg, in order for cse to >>>>>>> know what the instructions are really doing. >>>>>>> >>>>>>> >>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>>>>>> Is it OK for trunk? >>>>>>> >>>>>> I agree you've found a valid problem here, but I have some issues=20 >>>>>> with >>>>>> the patch itself. >>>>>> >>>>>> >>>>>> (define_insn_and_split "subdi3_compare1" >>>>>> [(set (reg:CC_NCV CC_REGNUM) >>>>>> (compare:CC_NCV >>>>>> (match_operand:DI 1 "register_operand" "r") >>>>>> (match_operand:DI 2 "register_operand" "r"))) >>>>>> (set (match_operand:DI 0 "register_operand" "=3D&r") >>>>>> (minus:DI (match_dup 1) (match_dup 2)))] >>>>>> "TARGET_32BIT" >>>>>> "#" >>>>>> "&& reload_completed" >>>>>> [(parallel [(set (reg:CC CC_REGNUM) >>>>>> (compare:CC (match_dup 1) (match_dup 2))) >>>>>> (set (match_dup 0) (minus:SI (match_dup 1) (match_dup=20 >>>>>> 2)))]) >>>>>> (parallel [(set (reg:CC_C CC_REGNUM) >>>>>> (compare:CC_C >>>>>> (zero_extend:DI (match_dup 4)) >>>>>> (plus:DI (zero_extend:DI (match_dup 5)) >>>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>>> (set (match_dup 3) >>>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])] >>>>>> >>>>>> >>>>>> This pattern is now no-longer self consistent in that before the=20 >>>>>> split >>>>>> the overall result for the condition register is in mode CC_NCV, but >>>>>> afterwards it is just CC_C. >>>>>> >>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly >>>>>> reflect the result of the 64-bit comparison), but that then=20 >>>>>> implies that >>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and=20 >>>>>> should in >>>>>> fact also be CC_NCV. Thinking about this pattern, I'm inclined to=20 >>>>>> agree >>>>>> that CC_NCV is the correct mode for this operation >>>>>> >>>>>> I'm not sure if there are other consequences that will fall out from >>>>>> fixing this (it's possible that we might need a change to=20 >>>>>> select_cc_mode >>>>>> as well). >>>>>> >>>>> Yes, this is still a bit awkward... >>>>> >>>>> The N and V bit will be the correct result for the subdi3_compare1 >>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...) >>>>> only gets the C bit correct, the expression for N and V is a different >>>>> one. >>>>> >>>>> It probably works, because the subsi3_carryin_compare instruction sets >>>>> more CC bits than the pattern does explicitly specify the value. >>>>> We know the subsi3_carryin_compare also computes the NV bits, but=20 >>>>> it is >>>>> hard to write down the correct rtl expression for it. >>>>> >>>>> In theory the pattern should describe everything correctly, >>>>> maybe, like: >>>>> >>>>> set (reg:CC_C CC_REGNUM) >>>>> (compare:CC_C >>>>> (zero_extend:DI (match_dup 4)) >>>>> (plus:DI (zero_extend:DI (match_dup 5)) >>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>> set (reg:CC_NV CC_REGNUM) >>>>> (compare:CC_NV >>>>> (match_dup 4)) >>>>> (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM)=20 >>>>> (const_int 0))) >>>>> set (match_dup 3) >>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>> >>>>> >>>>> But I doubt that will work to set CC_REGNUM with two different modes >>>>> in parallel? >>>>> >>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly >>>>> defines C from the DImode result, and NV from the SImode result, >>>>> similar to the CC_NOOVmode, that also leaves something open what >>>>> bits it really defines? >>>>> >>>>> >>>>> What do you think? >>>>> >>>>> >>>>> Thanks >>>>> Bernd. >>>> I think maybe the right solution is to invent a new CCmode >>>> that defines C as if the comparison is done in DImode >>>> but N and V as if the comparison is done in SImode. >>>> >>>> I thought maybe I would call it CC_NCV_CIC (CIC =3D Carry-In-Compare), >>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because >>>> only N and Z are set correctly), but in a different patch of course. >>>> >>>> Attached is a new version that implements the new CCmode. >>>> >>>> How do you like this new version? >>>> >>>> It seems to be able to build a cross-compiler at least. >>>> >>>> I will start a new bootstrap with this new patch, but that can take=20 >>>> some >>>> time until I have definitive results. >>>> >>>> Is there still a chance that it can go into gcc-7 or should it wait >>>> for the next stage1? >>>> >>>> Thanks >>>> Bernd. >>> >>> I thought I should also look at where the subdi_compare1 amd the >>> negdi2_compare patterns are used, and look if the caller is fine with >>> not having all CC bits available. >>> >>> And indeed usubv4 turns out to be questionabe, because it >>> emits gen_sub3_compare1 and uses arm_gen_unlikely_cbranch (LTU, >>> CCmode) which is inconsistent when subdi3_compare1 no longer uses >>> CCmode. >>> >>> To correct this, the branch should use CC_Cmode which is always defined. >>> >>> So I tried to test this pattern, with the following test programs, >>> and found that the code actually improves when the branch uses CC_Cmode >>> instead of CCmode, both for SImode and DImode data, which was a bit >>> surprising. >>> >>> I used this test program to see how the usubv4 pattern works: >>> >>> cat test.c (DImode) >>> unsigned long long x, y, z; >>> int b; >>> void test() >>> { >>> b =3D __builtin_sub_overflow (y,z, &x); >>> } >>> >>> >>> unpatched code used 8 byte more stack than patched, >>> because the DImode subtraction is effectively done twice. >>> >>> cat test1.c (SImode) >>> unsigned long x, y, z; >>> int b; >>> void test() >>> { >>> b =3D __builtin_sub_overflow (y,z, &x); >>> } >>> >>> which generates (unpatched): >>> cmp r3, r0 >>> sub ip, r3, r0 >>> >>> instead of expected (patched): >>> subs r3, r3, r2 >>> >>> >>> The condition is extracted by ifconversion and/or combine >>> and complicates the resulting code instead of simplifying. >>> >>> I think this happens only when the branch and the subsi/di3_compare1 >>> is using the same CC mode. >>> >>> That does not happen when the CC modes disagree, as with the >>> proposed patch. All other uses of the pattern are already using >>> CC_Cmode or CC_Vmode in the branch, and these do not change. >>> >>> Attached is an updated version of the patch, that happens to >>> improve the code generation of the usubsi4 and usubdi4 pattern, >>> as a side effect. >>> >>> >>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>> Is it OK for trunk? >> >> I'm very sorry it has taken so long to review. >> I've been ramping up on the context recently now so I'll try to move=20 >> this along... >> >> This patch looks mostly ok to me from reading the patterns and the=20 >> discussion around it. >> I have one concern: >> >> >> (define_insn_and_split "negdi2_compare" >> - [(set (reg:CC CC_REGNUM) >> - (compare:CC >> + [(set (reg:CC_NCV CC_REGNUM) >> + (compare:CC_NCV >> (const_int 0) >> (match_operand:DI 1 "register_operand" "0,r"))) >> (set (match_operand:DI 0 "register_operand" "=3Dr,&r") >> @@ -4647,8 +4650,12 @@ >> (compare:CC (const_int 0) (match_dup 1))) >> (set (match_dup 0) (minus:SI (const_int 0) >> (match_dup 1)))]) >> - (parallel [(set (reg:CC CC_REGNUM) >> - (compare:CC (const_int 0) (match_dup 3))) >> + (parallel [(set (reg:CC_NCV_CIC CC_REGNUM) >> + (compare:CC_NCV_CIC >> + (const_int 0) >> + (plus:DI >> + (zero_extend:DI (match_dup 3)) >> + (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >> (set (match_dup 2) >> (minus:SI >> (minus:SI (const_int 0) (match_dup 3)) >> >> >> I was somewhat concerned with having the first operand of the COMPARE=20 >> being a const_int 0 and the second being >> a complex expression as the RTL canonicalization rules usually require=20 >> the complex operand going first if possible. >> Reading the RTL rules in rtl.texi I see it says this: >> "If one of the operands is a constant, it should be placed in the >> second operand and the comparison code adjusted as appropriate." >> So it seems that the pre-existing pattern that puts const_int 0 as the=20 >> first operand already breaks that rule. >> I think we should fix that and update the use of condition code to a=20 >> GEU rather than LTU as well. >> >=20 Well, the sentence before that one is even more explicit: "Normally, @var{x} and @var{y} must have the same mode. Otherwise, @code{compare} is valid only if the mode of @var{x} is in class @code{MODE_INT} and @var{y} is a @code{const_int} or @code{const_double} with mode @code{VOIDmode}." So because the const_int 0 has VOIDmode the comparison is done in y-mode not x-mode. But unfortunately I see no way how to accomplish this, because this assumes that the compare can be easily swapped if the conditional instruction just uses one of GT/GE/LE/LT or GTU/GEU/LEU/LTU. But that is only the case for plain CCmode. And in this example we ask for "overflow", but while 0-X can overflow X-0 simply can't. And moreover there are non-symmetric modes like CC_NCVmode which only support LT/GE/LTU/GEU but not the swapped conditions GT/LE/GTU/LEU. I think the only solution would be to adjust the spec to reflect the implementation: Index: rtl.texi =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- rtl.texi (revision 251752) +++ rtl.texi (working copy) @@ -2252,6 +2252,13 @@ If one of the operands is a constant, it should be placed in the second operand and the comparison code adjusted as appropriate. +There may be exceptions from this rule if the mode @var{m} carries +not enough information for the swapped comparison operator, or +if we ask for overflow from the subtraction. That means, while +0-X may overfow X-0 can never overflow. Under these conditions +a compare may have the constant expression at the left side. +Examples are the ARM negdi2_compare pattern and similar. + A @code{compare} specifying two @code{VOIDmode} constants is not valid since there is no way to know in what mode the comparison is to be performed; the comparison must either be folded during the compilation Please advise. Thanks Bernd. >=20 > Hmmm... >=20 > I think the compare is not a commutative operation, and swapping > the arguments will imply a different value in the flags. >=20 > So if I write > (set (reg:CC_NCV CC_REGNUM) > (compare:CC_NCV > (const_int 0) > (reg:DI 123))) >=20 > I have C,N,V set to the result of (0 - r123), C =3D usable for LTU or GEU, > N,V =3D usable for LT, GE >=20 > But if I write > (set (reg:CC_NCV CC_REGNUM) > (compare:CC_NCV > (reg:DI 123) > (const_int 0))) >=20 > I have C,N,V set to the result of (r123 - 0), but the expansion stays > the same and the actual value in the flags is defined by the expansion. > Of course there exists probably no matching expansion for that. >=20 > Note that both LTU in the above hunk are in a parallel-stmt and operate > on the flags from the previous pattern, so changing these to GEU > will probably be wrong. >=20 > Both (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)) in the negdi2_compare > use the flags from the previous (set (reg:CC CC_REGNUM) (compare:CC > (const_int 0) (match_dup 1)). >=20 > One use of the resulting flags (I know of) is in negvdi3 where we > have: >=20 > emit_insn (gen_negdi2_compare (operands[0], operands[1])); > arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]); >=20 > I think only 0-x can overflow while x-0 can never overflow. >=20 > Of course the CC_NCV_CIC mode bends the definition of the RTL compare > a lot and I guess if this pattern is created by a splitter, this can > only be expanded by an exactly matching pattern, there is (hopefully) > no way how combine could mess with this pattern due to the exotic > CCmode. So while I think it would work to swap only the notation of > all CC_NCV_CIC patterns, _without_ changing the assembler-parts and the > consuming statements, that would make it quite hard to follow for the > human reader at least. >=20 > What do you think? >=20 >=20 > Bernd.