From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 34493 invoked by alias); 6 Sep 2017 12:52:14 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 34461 invoked by uid 89); 6 Sep 2017 12:52:13 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.9 required=5.0 tests=BAYES_00,GIT_PATCH_2,GIT_PATCH_3,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: foss.arm.com Received: from usa-sjc-mx-foss1.foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 06 Sep 2017 12:52:03 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B016180D; Wed, 6 Sep 2017 05:52:01 -0700 (PDT) Received: from e105689-lin.cambridge.arm.com (e105689-lin.cambridge.arm.com [10.2.207.32]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5E0473F540; Wed, 6 Sep 2017 05:52:00 -0700 (PDT) Subject: Re: [PATCH, ARM] correctly encode the CC reg data flow To: Bernd Edlinger , Kyrill Tkachov , "gcc-patches@gcc.gnu.org" Cc: Ramana Radhakrishnan , Wilco Dijkstra References: <3f5e5538-5dd3-b416-904f-b87f115336fe@arm.com> <59AD5B44.2080509@foss.arm.com> From: "Richard Earnshaw (lists)" Message-ID: <6f24b217-9131-6aef-0cdb-f9f26a538fe9@arm.com> Date: Wed, 06 Sep 2017 12:52:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-SW-Source: 2017-09/txt/msg00361.txt.bz2 On 06/09/17 13:44, Bernd Edlinger wrote: > On 09/04/17 21:54, Bernd Edlinger wrote: >> Hi Kyrill, >> >> Thanks for your review! >> >> >> On 09/04/17 15:55, Kyrill Tkachov wrote: >>> Hi Bernd, >>> >>> On 18/01/17 15:36, Bernd Edlinger wrote: >>>> On 01/13/17 19:28, Bernd Edlinger wrote: >>>>> On 01/13/17 17:10, Bernd Edlinger wrote: >>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote: >>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> this is related to PR77308, the follow-up patch will depend on this >>>>>>>> one. >>>>>>>> >>>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned >>>>>>>> before reload, a mis-compilation in libgcc function >>>>>>>> __gnu_satfractdasq >>>>>>>> was discovered, see [1] for more details. >>>>>>>> >>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly >>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split >>>>>>>> up into this: >>>>>>>> >>>>>>>> [(set (reg:CC CC_REGNUM) >>>>>>>> (compare:CC (match_dup 0) (match_dup 1))) >>>>>>>> (parallel [(set (reg:CC CC_REGNUM) >>>>>>>> (compare:CC (match_dup 3) (match_dup 4))) >>>>>>>> (set (match_dup 2) >>>>>>>> (minus:SI (match_dup 5) >>>>>>>> (ltu:SI (reg:CC_C CC_REGNUM) >>>>>>>> (const_int >>>>>>>> 0))))])] >>>>>>>> >>>>>>>> [(set (reg:CC CC_REGNUM) >>>>>>>> (compare:CC (match_dup 2) (match_dup 3))) >>>>>>>> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0)) >>>>>>>> (set (reg:CC CC_REGNUM) >>>>>>>> (compare:CC (match_dup 0) (match_dup 1))))] >>>>>>>> >>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare >>>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC >>>>>>>> from before. Therefore the *arm_cmpsi_insn appears to be >>>>>>>> redundant and thus got removed, because the data values are >>>>>>>> identical. >>>>>>>> >>>>>>>> I think that applies to a number of similar pattern where data >>>>>>>> flow is happening through the CC reg. >>>>>>>> >>>>>>>> So this is a kind of correctness issue, and should be fixed >>>>>>>> independently from the optimization issue PR77308. >>>>>>>> >>>>>>>> Therefore I think the patterns need to specify the true >>>>>>>> value that will be in the CC reg, in order for cse to >>>>>>>> know what the instructions are really doing. >>>>>>>> >>>>>>>> >>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>>>>>>> Is it OK for trunk? >>>>>>>> >>>>>>> I agree you've found a valid problem here, but I have some issues >>>>>>> with >>>>>>> the patch itself. >>>>>>> >>>>>>> >>>>>>> (define_insn_and_split "subdi3_compare1" >>>>>>> [(set (reg:CC_NCV CC_REGNUM) >>>>>>> (compare:CC_NCV >>>>>>> (match_operand:DI 1 "register_operand" "r") >>>>>>> (match_operand:DI 2 "register_operand" "r"))) >>>>>>> (set (match_operand:DI 0 "register_operand" "=&r") >>>>>>> (minus:DI (match_dup 1) (match_dup 2)))] >>>>>>> "TARGET_32BIT" >>>>>>> "#" >>>>>>> "&& reload_completed" >>>>>>> [(parallel [(set (reg:CC CC_REGNUM) >>>>>>> (compare:CC (match_dup 1) (match_dup 2))) >>>>>>> (set (match_dup 0) (minus:SI (match_dup 1) (match_dup >>>>>>> 2)))]) >>>>>>> (parallel [(set (reg:CC_C CC_REGNUM) >>>>>>> (compare:CC_C >>>>>>> (zero_extend:DI (match_dup 4)) >>>>>>> (plus:DI (zero_extend:DI (match_dup 5)) >>>>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>>>> (set (match_dup 3) >>>>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >>>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])] >>>>>>> >>>>>>> >>>>>>> This pattern is now no-longer self consistent in that before the >>>>>>> split >>>>>>> the overall result for the condition register is in mode CC_NCV, but >>>>>>> afterwards it is just CC_C. >>>>>>> >>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly >>>>>>> reflect the result of the 64-bit comparison), but that then >>>>>>> implies that >>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and >>>>>>> should in >>>>>>> fact also be CC_NCV. Thinking about this pattern, I'm inclined to >>>>>>> agree >>>>>>> that CC_NCV is the correct mode for this operation >>>>>>> >>>>>>> I'm not sure if there are other consequences that will fall out from >>>>>>> fixing this (it's possible that we might need a change to >>>>>>> select_cc_mode >>>>>>> as well). >>>>>>> >>>>>> Yes, this is still a bit awkward... >>>>>> >>>>>> The N and V bit will be the correct result for the subdi3_compare1 >>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...) >>>>>> only gets the C bit correct, the expression for N and V is a different >>>>>> one. >>>>>> >>>>>> It probably works, because the subsi3_carryin_compare instruction sets >>>>>> more CC bits than the pattern does explicitly specify the value. >>>>>> We know the subsi3_carryin_compare also computes the NV bits, but >>>>>> it is >>>>>> hard to write down the correct rtl expression for it. >>>>>> >>>>>> In theory the pattern should describe everything correctly, >>>>>> maybe, like: >>>>>> >>>>>> set (reg:CC_C CC_REGNUM) >>>>>> (compare:CC_C >>>>>> (zero_extend:DI (match_dup 4)) >>>>>> (plus:DI (zero_extend:DI (match_dup 5)) >>>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>>> set (reg:CC_NV CC_REGNUM) >>>>>> (compare:CC_NV >>>>>> (match_dup 4)) >>>>>> (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) >>>>>> (const_int 0))) >>>>>> set (match_dup 3) >>>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5)) >>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))) >>>>>> >>>>>> >>>>>> But I doubt that will work to set CC_REGNUM with two different modes >>>>>> in parallel? >>>>>> >>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly >>>>>> defines C from the DImode result, and NV from the SImode result, >>>>>> similar to the CC_NOOVmode, that also leaves something open what >>>>>> bits it really defines? >>>>>> >>>>>> >>>>>> What do you think? >>>>>> >>>>>> >>>>>> Thanks >>>>>> Bernd. >>>>> I think maybe the right solution is to invent a new CCmode >>>>> that defines C as if the comparison is done in DImode >>>>> but N and V as if the comparison is done in SImode. >>>>> >>>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare), >>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because >>>>> only N and Z are set correctly), but in a different patch of course. >>>>> >>>>> Attached is a new version that implements the new CCmode. >>>>> >>>>> How do you like this new version? >>>>> >>>>> It seems to be able to build a cross-compiler at least. >>>>> >>>>> I will start a new bootstrap with this new patch, but that can take >>>>> some >>>>> time until I have definitive results. >>>>> >>>>> Is there still a chance that it can go into gcc-7 or should it wait >>>>> for the next stage1? >>>>> >>>>> Thanks >>>>> Bernd. >>>> >>>> I thought I should also look at where the subdi_compare1 amd the >>>> negdi2_compare patterns are used, and look if the caller is fine with >>>> not having all CC bits available. >>>> >>>> And indeed usubv4 turns out to be questionabe, because it >>>> emits gen_sub3_compare1 and uses arm_gen_unlikely_cbranch (LTU, >>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses >>>> CCmode. >>>> >>>> To correct this, the branch should use CC_Cmode which is always defined. >>>> >>>> So I tried to test this pattern, with the following test programs, >>>> and found that the code actually improves when the branch uses CC_Cmode >>>> instead of CCmode, both for SImode and DImode data, which was a bit >>>> surprising. >>>> >>>> I used this test program to see how the usubv4 pattern works: >>>> >>>> cat test.c (DImode) >>>> unsigned long long x, y, z; >>>> int b; >>>> void test() >>>> { >>>> b = __builtin_sub_overflow (y,z, &x); >>>> } >>>> >>>> >>>> unpatched code used 8 byte more stack than patched, >>>> because the DImode subtraction is effectively done twice. >>>> >>>> cat test1.c (SImode) >>>> unsigned long x, y, z; >>>> int b; >>>> void test() >>>> { >>>> b = __builtin_sub_overflow (y,z, &x); >>>> } >>>> >>>> which generates (unpatched): >>>> cmp r3, r0 >>>> sub ip, r3, r0 >>>> >>>> instead of expected (patched): >>>> subs r3, r3, r2 >>>> >>>> >>>> The condition is extracted by ifconversion and/or combine >>>> and complicates the resulting code instead of simplifying. >>>> >>>> I think this happens only when the branch and the subsi/di3_compare1 >>>> is using the same CC mode. >>>> >>>> That does not happen when the CC modes disagree, as with the >>>> proposed patch. All other uses of the pattern are already using >>>> CC_Cmode or CC_Vmode in the branch, and these do not change. >>>> >>>> Attached is an updated version of the patch, that happens to >>>> improve the code generation of the usubsi4 and usubdi4 pattern, >>>> as a side effect. >>>> >>>> >>>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>>> Is it OK for trunk? >>> >>> I'm very sorry it has taken so long to review. >>> I've been ramping up on the context recently now so I'll try to move >>> this along... >>> >>> This patch looks mostly ok to me from reading the patterns and the >>> discussion around it. >>> I have one concern: >>> >>> >>> (define_insn_and_split "negdi2_compare" >>> - [(set (reg:CC CC_REGNUM) >>> - (compare:CC >>> + [(set (reg:CC_NCV CC_REGNUM) >>> + (compare:CC_NCV >>> (const_int 0) >>> (match_operand:DI 1 "register_operand" "0,r"))) >>> (set (match_operand:DI 0 "register_operand" "=r,&r") >>> @@ -4647,8 +4650,12 @@ >>> (compare:CC (const_int 0) (match_dup 1))) >>> (set (match_dup 0) (minus:SI (const_int 0) >>> (match_dup 1)))]) >>> - (parallel [(set (reg:CC CC_REGNUM) >>> - (compare:CC (const_int 0) (match_dup 3))) >>> + (parallel [(set (reg:CC_NCV_CIC CC_REGNUM) >>> + (compare:CC_NCV_CIC >>> + (const_int 0) >>> + (plus:DI >>> + (zero_extend:DI (match_dup 3)) >>> + (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0))))) >>> (set (match_dup 2) >>> (minus:SI >>> (minus:SI (const_int 0) (match_dup 3)) >>> >>> >>> I was somewhat concerned with having the first operand of the COMPARE >>> being a const_int 0 and the second being >>> a complex expression as the RTL canonicalization rules usually require >>> the complex operand going first if possible. >>> Reading the RTL rules in rtl.texi I see it says this: >>> "If one of the operands is a constant, it should be placed in the >>> second operand and the comparison code adjusted as appropriate." >>> So it seems that the pre-existing pattern that puts const_int 0 as the >>> first operand already breaks that rule. >>> I think we should fix that and update the use of condition code to a >>> GEU rather than LTU as well. >>> >> > > Well, the sentence before that one is even more explicit: > > "Normally, @var{x} and @var{y} must have the same mode. Otherwise, > @code{compare} is valid only if the mode of @var{x} is in class > @code{MODE_INT} and @var{y} is a @code{const_int} or > @code{const_double} with mode @code{VOIDmode}." > > So because the const_int 0 has VOIDmode the comparison is done > in y-mode not x-mode. > > But unfortunately I see no way how to accomplish this, > because this assumes that the compare can be easily swapped > if the conditional instruction just uses one of GT/GE/LE/LT > or GTU/GEU/LEU/LTU. But that is only the case for plain CCmode. > > And in this example we ask for "overflow", but while 0-X can > overflow X-0 simply can't. And moreover there are non-symmetric > modes like CC_NCVmode which only support LT/GE/LTU/GEU but not > the swapped conditions GT/LE/GTU/LEU. > > I think the only solution would be to adjust the spec to > reflect the implementation: > > Index: rtl.texi > =================================================================== > --- rtl.texi (revision 251752) > +++ rtl.texi (working copy) > @@ -2252,6 +2252,13 @@ > If one of the operands is a constant, it should be placed in the > second operand and the comparison code adjusted as appropriate. > > +There may be exceptions from this rule if the mode @var{m} carries > +not enough information for the swapped comparison operator, or There may be exceptions _to_ ... if mode @var{m} does not carry enough... > +if we ask for overflow from the subtraction. Aren't we really trying to 'detect overflow' rather than 'ask' for it? > That means, while > +0-X may overfow X-0 can never overflow. Under these conditions > +a compare may have the constant expression at the left side. In these circumstances the constant will be in the first operand . (left and right don't really make sense for RTL). > +Examples are the ARM negdi2_compare pattern and similar. > + > A @code{compare} specifying two @code{VOIDmode} constants is not valid > since there is no way to know in what mode the comparison is to be > performed; the comparison must either be folded during the compilation > > > > Please advise. > > Thanks > Bernd. > > >> >> Hmmm... >> >> I think the compare is not a commutative operation, and swapping >> the arguments will imply a different value in the flags. >> >> So if I write >> (set (reg:CC_NCV CC_REGNUM) >> (compare:CC_NCV >> (const_int 0) >> (reg:DI 123))) >> >> I have C,N,V set to the result of (0 - r123), C = usable for LTU or GEU, >> N,V = usable for LT, GE >> >> But if I write >> (set (reg:CC_NCV CC_REGNUM) >> (compare:CC_NCV >> (reg:DI 123) >> (const_int 0))) >> >> I have C,N,V set to the result of (r123 - 0), but the expansion stays >> the same and the actual value in the flags is defined by the expansion. >> Of course there exists probably no matching expansion for that. >> >> Note that both LTU in the above hunk are in a parallel-stmt and operate >> on the flags from the previous pattern, so changing these to GEU >> will probably be wrong. >> >> Both (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)) in the negdi2_compare >> use the flags from the previous (set (reg:CC CC_REGNUM) (compare:CC >> (const_int 0) (match_dup 1)). >> >> One use of the resulting flags (I know of) is in negvdi3 where we >> have: >> >> emit_insn (gen_negdi2_compare (operands[0], operands[1])); >> arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]); >> >> I think only 0-x can overflow while x-0 can never overflow. >> >> Of course the CC_NCV_CIC mode bends the definition of the RTL compare >> a lot and I guess if this pattern is created by a splitter, this can >> only be expanded by an exactly matching pattern, there is (hopefully) >> no way how combine could mess with this pattern due to the exotic >> CCmode. So while I think it would work to swap only the notation of >> all CC_NCV_CIC patterns, _without_ changing the assembler-parts and the >> consuming statements, that would make it quite hard to follow for the >> human reader at least. >> >> What do you think? >> >> >> Bernd.