From: "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com>
To: Bernd Edlinger <bernd.edlinger@hotmail.de>,
Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>,
"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>,
Wilco Dijkstra <wilco.dijkstra@arm.com>
Subject: Re: [PATCH, ARM] correctly encode the CC reg data flow
Date: Wed, 06 Sep 2017 12:52:00 -0000 [thread overview]
Message-ID: <6f24b217-9131-6aef-0cdb-f9f26a538fe9@arm.com> (raw)
In-Reply-To: <AM5PR0701MB26572C99CDF0198F38A7DB9BE4970@AM5PR0701MB2657.eurprd07.prod.outlook.com>
On 06/09/17 13:44, Bernd Edlinger wrote:
> On 09/04/17 21:54, Bernd Edlinger wrote:
>> Hi Kyrill,
>>
>> Thanks for your review!
>>
>>
>> On 09/04/17 15:55, Kyrill Tkachov wrote:
>>> Hi Bernd,
>>>
>>> On 18/01/17 15:36, Bernd Edlinger wrote:
>>>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>>>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> this is related to PR77308, the follow-up patch will depend on this
>>>>>>>> one.
>>>>>>>>
>>>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>>>>> before reload, a mis-compilation in libgcc function
>>>>>>>> __gnu_satfractdasq
>>>>>>>> was discovered, see [1] for more details.
>>>>>>>>
>>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>>>>> up into this:
>>>>>>>>
>>>>>>>> [(set (reg:CC CC_REGNUM)
>>>>>>>> (compare:CC (match_dup 0) (match_dup 1)))
>>>>>>>> (parallel [(set (reg:CC CC_REGNUM)
>>>>>>>> (compare:CC (match_dup 3) (match_dup 4)))
>>>>>>>> (set (match_dup 2)
>>>>>>>> (minus:SI (match_dup 5)
>>>>>>>> (ltu:SI (reg:CC_C CC_REGNUM)
>>>>>>>> (const_int
>>>>>>>> 0))))])]
>>>>>>>>
>>>>>>>> [(set (reg:CC CC_REGNUM)
>>>>>>>> (compare:CC (match_dup 2) (match_dup 3)))
>>>>>>>> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>>>> (set (reg:CC CC_REGNUM)
>>>>>>>> (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>>>>
>>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>>>>> from before. Therefore the *arm_cmpsi_insn appears to be
>>>>>>>> redundant and thus got removed, because the data values are
>>>>>>>> identical.
>>>>>>>>
>>>>>>>> I think that applies to a number of similar pattern where data
>>>>>>>> flow is happening through the CC reg.
>>>>>>>>
>>>>>>>> So this is a kind of correctness issue, and should be fixed
>>>>>>>> independently from the optimization issue PR77308.
>>>>>>>>
>>>>>>>> Therefore I think the patterns need to specify the true
>>>>>>>> value that will be in the CC reg, in order for cse to
>>>>>>>> know what the instructions are really doing.
>>>>>>>>
>>>>>>>>
>>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>>>> Is it OK for trunk?
>>>>>>>>
>>>>>>> I agree you've found a valid problem here, but I have some issues
>>>>>>> with
>>>>>>> the patch itself.
>>>>>>>
>>>>>>>
>>>>>>> (define_insn_and_split "subdi3_compare1"
>>>>>>> [(set (reg:CC_NCV CC_REGNUM)
>>>>>>> (compare:CC_NCV
>>>>>>> (match_operand:DI 1 "register_operand" "r")
>>>>>>> (match_operand:DI 2 "register_operand" "r")))
>>>>>>> (set (match_operand:DI 0 "register_operand" "=&r")
>>>>>>> (minus:DI (match_dup 1) (match_dup 2)))]
>>>>>>> "TARGET_32BIT"
>>>>>>> "#"
>>>>>>> "&& reload_completed"
>>>>>>> [(parallel [(set (reg:CC CC_REGNUM)
>>>>>>> (compare:CC (match_dup 1) (match_dup 2)))
>>>>>>> (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
>>>>>>> 2)))])
>>>>>>> (parallel [(set (reg:CC_C CC_REGNUM)
>>>>>>> (compare:CC_C
>>>>>>> (zero_extend:DI (match_dup 4))
>>>>>>> (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>> (set (match_dup 3)
>>>>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>>>>
>>>>>>>
>>>>>>> This pattern is now no-longer self consistent in that before the
>>>>>>> split
>>>>>>> the overall result for the condition register is in mode CC_NCV, but
>>>>>>> afterwards it is just CC_C.
>>>>>>>
>>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>>>>> reflect the result of the 64-bit comparison), but that then
>>>>>>> implies that
>>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and
>>>>>>> should in
>>>>>>> fact also be CC_NCV. Thinking about this pattern, I'm inclined to
>>>>>>> agree
>>>>>>> that CC_NCV is the correct mode for this operation
>>>>>>>
>>>>>>> I'm not sure if there are other consequences that will fall out from
>>>>>>> fixing this (it's possible that we might need a change to
>>>>>>> select_cc_mode
>>>>>>> as well).
>>>>>>>
>>>>>> Yes, this is still a bit awkward...
>>>>>>
>>>>>> The N and V bit will be the correct result for the subdi3_compare1
>>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>>>>> only gets the C bit correct, the expression for N and V is a different
>>>>>> one.
>>>>>>
>>>>>> It probably works, because the subsi3_carryin_compare instruction sets
>>>>>> more CC bits than the pattern does explicitly specify the value.
>>>>>> We know the subsi3_carryin_compare also computes the NV bits, but
>>>>>> it is
>>>>>> hard to write down the correct rtl expression for it.
>>>>>>
>>>>>> In theory the pattern should describe everything correctly,
>>>>>> maybe, like:
>>>>>>
>>>>>> set (reg:CC_C CC_REGNUM)
>>>>>> (compare:CC_C
>>>>>> (zero_extend:DI (match_dup 4))
>>>>>> (plus:DI (zero_extend:DI (match_dup 5))
>>>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>> set (reg:CC_NV CC_REGNUM)
>>>>>> (compare:CC_NV
>>>>>> (match_dup 4))
>>>>>> (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM)
>>>>>> (const_int 0)))
>>>>>> set (match_dup 3)
>>>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>
>>>>>>
>>>>>> But I doubt that will work to set CC_REGNUM with two different modes
>>>>>> in parallel?
>>>>>>
>>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>>>>> defines C from the DImode result, and NV from the SImode result,
>>>>>> similar to the CC_NOOVmode, that also leaves something open what
>>>>>> bits it really defines?
>>>>>>
>>>>>>
>>>>>> What do you think?
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Bernd.
>>>>> I think maybe the right solution is to invent a new CCmode
>>>>> that defines C as if the comparison is done in DImode
>>>>> but N and V as if the comparison is done in SImode.
>>>>>
>>>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>>>>> only N and Z are set correctly), but in a different patch of course.
>>>>>
>>>>> Attached is a new version that implements the new CCmode.
>>>>>
>>>>> How do you like this new version?
>>>>>
>>>>> It seems to be able to build a cross-compiler at least.
>>>>>
>>>>> I will start a new bootstrap with this new patch, but that can take
>>>>> some
>>>>> time until I have definitive results.
>>>>>
>>>>> Is there still a chance that it can go into gcc-7 or should it wait
>>>>> for the next stage1?
>>>>>
>>>>> Thanks
>>>>> Bernd.
>>>>
>>>> I thought I should also look at where the subdi_compare1 amd the
>>>> negdi2_compare patterns are used, and look if the caller is fine with
>>>> not having all CC bits available.
>>>>
>>>> And indeed usubv<mode>4 turns out to be questionabe, because it
>>>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
>>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses
>>>> CCmode.
>>>>
>>>> To correct this, the branch should use CC_Cmode which is always defined.
>>>>
>>>> So I tried to test this pattern, with the following test programs,
>>>> and found that the code actually improves when the branch uses CC_Cmode
>>>> instead of CCmode, both for SImode and DImode data, which was a bit
>>>> surprising.
>>>>
>>>> I used this test program to see how the usubv<mode>4 pattern works:
>>>>
>>>> cat test.c (DImode)
>>>> unsigned long long x, y, z;
>>>> int b;
>>>> void test()
>>>> {
>>>> b = __builtin_sub_overflow (y,z, &x);
>>>> }
>>>>
>>>>
>>>> unpatched code used 8 byte more stack than patched,
>>>> because the DImode subtraction is effectively done twice.
>>>>
>>>> cat test1.c (SImode)
>>>> unsigned long x, y, z;
>>>> int b;
>>>> void test()
>>>> {
>>>> b = __builtin_sub_overflow (y,z, &x);
>>>> }
>>>>
>>>> which generates (unpatched):
>>>> cmp r3, r0
>>>> sub ip, r3, r0
>>>>
>>>> instead of expected (patched):
>>>> subs r3, r3, r2
>>>>
>>>>
>>>> The condition is extracted by ifconversion and/or combine
>>>> and complicates the resulting code instead of simplifying.
>>>>
>>>> I think this happens only when the branch and the subsi/di3_compare1
>>>> is using the same CC mode.
>>>>
>>>> That does not happen when the CC modes disagree, as with the
>>>> proposed patch. All other uses of the pattern are already using
>>>> CC_Cmode or CC_Vmode in the branch, and these do not change.
>>>>
>>>> Attached is an updated version of the patch, that happens to
>>>> improve the code generation of the usubsi4 and usubdi4 pattern,
>>>> as a side effect.
>>>>
>>>>
>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>> Is it OK for trunk?
>>>
>>> I'm very sorry it has taken so long to review.
>>> I've been ramping up on the context recently now so I'll try to move
>>> this along...
>>>
>>> This patch looks mostly ok to me from reading the patterns and the
>>> discussion around it.
>>> I have one concern:
>>>
>>>
>>> (define_insn_and_split "negdi2_compare"
>>> - [(set (reg:CC CC_REGNUM)
>>> - (compare:CC
>>> + [(set (reg:CC_NCV CC_REGNUM)
>>> + (compare:CC_NCV
>>> (const_int 0)
>>> (match_operand:DI 1 "register_operand" "0,r")))
>>> (set (match_operand:DI 0 "register_operand" "=r,&r")
>>> @@ -4647,8 +4650,12 @@
>>> (compare:CC (const_int 0) (match_dup 1)))
>>> (set (match_dup 0) (minus:SI (const_int 0)
>>> (match_dup 1)))])
>>> - (parallel [(set (reg:CC CC_REGNUM)
>>> - (compare:CC (const_int 0) (match_dup 3)))
>>> + (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
>>> + (compare:CC_NCV_CIC
>>> + (const_int 0)
>>> + (plus:DI
>>> + (zero_extend:DI (match_dup 3))
>>> + (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>> (set (match_dup 2)
>>> (minus:SI
>>> (minus:SI (const_int 0) (match_dup 3))
>>>
>>>
>>> I was somewhat concerned with having the first operand of the COMPARE
>>> being a const_int 0 and the second being
>>> a complex expression as the RTL canonicalization rules usually require
>>> the complex operand going first if possible.
>>> Reading the RTL rules in rtl.texi I see it says this:
>>> "If one of the operands is a constant, it should be placed in the
>>> second operand and the comparison code adjusted as appropriate."
>>> So it seems that the pre-existing pattern that puts const_int 0 as the
>>> first operand already breaks that rule.
>>> I think we should fix that and update the use of condition code to a
>>> GEU rather than LTU as well.
>>>
>>
>
> Well, the sentence before that one is even more explicit:
>
> "Normally, @var{x} and @var{y} must have the same mode. Otherwise,
> @code{compare} is valid only if the mode of @var{x} is in class
> @code{MODE_INT} and @var{y} is a @code{const_int} or
> @code{const_double} with mode @code{VOIDmode}."
>
> So because the const_int 0 has VOIDmode the comparison is done
> in y-mode not x-mode.
>
> But unfortunately I see no way how to accomplish this,
> because this assumes that the compare can be easily swapped
> if the conditional instruction just uses one of GT/GE/LE/LT
> or GTU/GEU/LEU/LTU. But that is only the case for plain CCmode.
>
> And in this example we ask for "overflow", but while 0-X can
> overflow X-0 simply can't. And moreover there are non-symmetric
> modes like CC_NCVmode which only support LT/GE/LTU/GEU but not
> the swapped conditions GT/LE/GTU/LEU.
>
> I think the only solution would be to adjust the spec to
> reflect the implementation:
>
> Index: rtl.texi
> ===================================================================
> --- rtl.texi (revision 251752)
> +++ rtl.texi (working copy)
> @@ -2252,6 +2252,13 @@
> If one of the operands is a constant, it should be placed in the
> second operand and the comparison code adjusted as appropriate.
>
> +There may be exceptions from this rule if the mode @var{m} carries
> +not enough information for the swapped comparison operator, or
There may be exceptions _to_ ... if mode @var{m} does not carry enough...
> +if we ask for overflow from the subtraction.
Aren't we really trying to 'detect overflow' rather than 'ask' for it?
> That means, while
> +0-X may overfow X-0 can never overflow. Under these conditions
> +a compare may have the constant expression at the left side.
In these circumstances the constant will be in the first operand .
(left and right don't really make sense for RTL).
> +Examples are the ARM negdi2_compare pattern and similar.
> +
> A @code{compare} specifying two @code{VOIDmode} constants is not valid
> since there is no way to know in what mode the comparison is to be
> performed; the comparison must either be folded during the compilation
>
>
>
> Please advise.
>
> Thanks
> Bernd.
>
>
>>
>> Hmmm...
>>
>> I think the compare is not a commutative operation, and swapping
>> the arguments will imply a different value in the flags.
>>
>> So if I write
>> (set (reg:CC_NCV CC_REGNUM)
>> (compare:CC_NCV
>> (const_int 0)
>> (reg:DI 123)))
>>
>> I have C,N,V set to the result of (0 - r123), C = usable for LTU or GEU,
>> N,V = usable for LT, GE
>>
>> But if I write
>> (set (reg:CC_NCV CC_REGNUM)
>> (compare:CC_NCV
>> (reg:DI 123)
>> (const_int 0)))
>>
>> I have C,N,V set to the result of (r123 - 0), but the expansion stays
>> the same and the actual value in the flags is defined by the expansion.
>> Of course there exists probably no matching expansion for that.
>>
>> Note that both LTU in the above hunk are in a parallel-stmt and operate
>> on the flags from the previous pattern, so changing these to GEU
>> will probably be wrong.
>>
>> Both (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)) in the negdi2_compare
>> use the flags from the previous (set (reg:CC CC_REGNUM) (compare:CC
>> (const_int 0) (match_dup 1)).
>>
>> One use of the resulting flags (I know of) is in negvdi3 where we
>> have:
>>
>> emit_insn (gen_negdi2_compare (operands[0], operands[1]));
>> arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
>>
>> I think only 0-x can overflow while x-0 can never overflow.
>>
>> Of course the CC_NCV_CIC mode bends the definition of the RTL compare
>> a lot and I guess if this pattern is created by a splitter, this can
>> only be expanded by an exactly matching pattern, there is (hopefully)
>> no way how combine could mess with this pattern due to the exotic
>> CCmode. So while I think it would work to swap only the notation of
>> all CC_NCV_CIC patterns, _without_ changing the assembler-parts and the
>> consuming statements, that would make it quite hard to follow for the
>> human reader at least.
>>
>> What do you think?
>>
>>
>> Bernd.
next prev parent reply other threads:[~2017-09-06 12:52 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-18 13:15 Bernd Edlinger
2017-01-13 13:50 ` Richard Earnshaw (lists)
2017-01-13 16:10 ` Bernd Edlinger
2017-01-13 18:29 ` Bernd Edlinger
2017-01-18 15:43 ` Bernd Edlinger
2017-04-20 19:10 ` [PING] " Bernd Edlinger
2017-04-29 17:32 ` [PING**2] " Bernd Edlinger
2017-05-12 16:50 ` [PING**3] " Bernd Edlinger
2017-06-01 16:01 ` [PING**4] " Bernd Edlinger
[not found] ` <eb07f6a9-522b-0497-fc13-f3e4508b8277@hotmail.de>
2017-06-14 12:34 ` [PING**5] " Bernd Edlinger
[not found] ` <74eaaa44-40f0-4b12-1aec-4b9926158efe@hotmail.de>
2017-07-05 18:11 ` [PING**6] " Bernd Edlinger
2017-09-04 13:55 ` Kyrill Tkachov
2017-09-04 19:54 ` Bernd Edlinger
[not found] ` <a55cfa36-bb99-3433-f99e-c261fbe5dac1@hotmail.de>
2017-09-06 12:44 ` Bernd Edlinger
2017-09-06 12:52 ` Richard Earnshaw (lists) [this message]
2017-09-06 13:00 ` Bernd Edlinger
2017-09-06 13:17 ` Bernd Edlinger
2017-09-06 15:31 ` Kyrill Tkachov
2017-09-17 8:38 ` [PING] " Bernd Edlinger
2017-10-09 13:02 ` Richard Earnshaw (lists)
2017-10-10 19:11 ` Bernd Edlinger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6f24b217-9131-6aef-0cdb-f9f26a538fe9@arm.com \
--to=richard.earnshaw@arm.com \
--cc=bernd.edlinger@hotmail.de \
--cc=gcc-patches@gcc.gnu.org \
--cc=kyrylo.tkachov@foss.arm.com \
--cc=ramana.radhakrishnan@arm.com \
--cc=wilco.dijkstra@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).