From: Bernd Edlinger <bernd.edlinger@hotmail.de>
To: "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com>,
"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>,
Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>,
Wilco Dijkstra <wilco.dijkstra@arm.com>
Subject: [PING**6] [PATCH, ARM] correctly encode the CC reg data flow
Date: Wed, 05 Jul 2017 18:11:00 -0000 [thread overview]
Message-ID: <AM5PR0701MB265712DF15C66EDC87ACCEB9E4D40@AM5PR0701MB2657.eurprd07.prod.outlook.com> (raw)
In-Reply-To: <74eaaa44-40f0-4b12-1aec-4b9926158efe@hotmail.de>
Ping...
On 06/14/17 14:33, Bernd Edlinger wrote:
> Ping...
>
> On 06/01/17 18:00, Bernd Edlinger wrote:
>> Ping...
>>
>> On 05/12/17 18:49, Bernd Edlinger wrote:
>>> Ping...
>>>
>>> On 04/29/17 19:21, Bernd Edlinger wrote:
>>>> Ping...
>>>>
>>>> On 04/20/17 20:11, Bernd Edlinger wrote:
>>>>> Ping...
>>>>>
>>>>> for this patch:
>>>>> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01351.html
>>>>>
>>>>> On 01/18/17 16:36, Bernd Edlinger wrote:
>>>>>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>>>>>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>>>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> this is related to PR77308, the follow-up patch will depend on
>>>>>>>>>> this
>>>>>>>>>> one.
>>>>>>>>>>
>>>>>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>>>>>>> before reload, a mis-compilation in libgcc function
>>>>>>>>>> __gnu_satfractdasq
>>>>>>>>>> was discovered, see [1] for more details.
>>>>>>>>>>
>>>>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>>>>>>> up into this:
>>>>>>>>>>
>>>>>>>>>> [(set (reg:CC CC_REGNUM)
>>>>>>>>>> (compare:CC (match_dup 0) (match_dup 1)))
>>>>>>>>>> (parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>>> (compare:CC (match_dup 3) (match_dup 4)))
>>>>>>>>>> (set (match_dup 2)
>>>>>>>>>> (minus:SI (match_dup 5)
>>>>>>>>>> (ltu:SI (reg:CC_C CC_REGNUM)
>>>>>>>>>> (const_int
>>>>>>>>>> 0))))])]
>>>>>>>>>>
>>>>>>>>>> [(set (reg:CC CC_REGNUM)
>>>>>>>>>> (compare:CC (match_dup 2) (match_dup 3)))
>>>>>>>>>> (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>>>>>> (set (reg:CC CC_REGNUM)
>>>>>>>>>> (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>>>>>>
>>>>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>>>>>>> from before. Therefore the *arm_cmpsi_insn appears to be
>>>>>>>>>> redundant and thus got removed, because the data values are
>>>>>>>>>> identical.
>>>>>>>>>>
>>>>>>>>>> I think that applies to a number of similar pattern where data
>>>>>>>>>> flow is happening through the CC reg.
>>>>>>>>>>
>>>>>>>>>> So this is a kind of correctness issue, and should be fixed
>>>>>>>>>> independently from the optimization issue PR77308.
>>>>>>>>>>
>>>>>>>>>> Therefore I think the patterns need to specify the true
>>>>>>>>>> value that will be in the CC reg, in order for cse to
>>>>>>>>>> know what the instructions are really doing.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>>>>>> Is it OK for trunk?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I agree you've found a valid problem here, but I have some issues
>>>>>>>>> with
>>>>>>>>> the patch itself.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (define_insn_and_split "subdi3_compare1"
>>>>>>>>> [(set (reg:CC_NCV CC_REGNUM)
>>>>>>>>> (compare:CC_NCV
>>>>>>>>> (match_operand:DI 1 "register_operand" "r")
>>>>>>>>> (match_operand:DI 2 "register_operand" "r")))
>>>>>>>>> (set (match_operand:DI 0 "register_operand" "=&r")
>>>>>>>>> (minus:DI (match_dup 1) (match_dup 2)))]
>>>>>>>>> "TARGET_32BIT"
>>>>>>>>> "#"
>>>>>>>>> "&& reload_completed"
>>>>>>>>> [(parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>> (compare:CC (match_dup 1) (match_dup 2)))
>>>>>>>>> (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
>>>>>>>>> 2)))])
>>>>>>>>> (parallel [(set (reg:CC_C CC_REGNUM)
>>>>>>>>> (compare:CC_C
>>>>>>>>> (zero_extend:DI (match_dup 4))
>>>>>>>>> (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>>> (set (match_dup 3)
>>>>>>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This pattern is now no-longer self consistent in that before the
>>>>>>>>> split
>>>>>>>>> the overall result for the condition register is in mode
>>>>>>>>> CC_NCV, but
>>>>>>>>> afterwards it is just CC_C.
>>>>>>>>>
>>>>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>>>>>>> reflect the result of the 64-bit comparison), but that then
>>>>>>>>> implies
>>>>>>>>> that
>>>>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and
>>>>>>>>> should in
>>>>>>>>> fact also be CC_NCV. Thinking about this pattern, I'm inclined to
>>>>>>>>> agree
>>>>>>>>> that CC_NCV is the correct mode for this operation
>>>>>>>>>
>>>>>>>>> I'm not sure if there are other consequences that will fall out
>>>>>>>>> from
>>>>>>>>> fixing this (it's possible that we might need a change to
>>>>>>>>> select_cc_mode
>>>>>>>>> as well).
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, this is still a bit awkward...
>>>>>>>>
>>>>>>>> The N and V bit will be the correct result for the subdi3_compare1
>>>>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>>>>>>> only gets the C bit correct, the expression for N and V is a
>>>>>>>> different
>>>>>>>> one.
>>>>>>>>
>>>>>>>> It probably works, because the subsi3_carryin_compare
>>>>>>>> instruction sets
>>>>>>>> more CC bits than the pattern does explicitly specify the value.
>>>>>>>> We know the subsi3_carryin_compare also computes the NV bits, but
>>>>>>>> it is
>>>>>>>> hard to write down the correct rtl expression for it.
>>>>>>>>
>>>>>>>> In theory the pattern should describe everything correctly,
>>>>>>>> maybe, like:
>>>>>>>>
>>>>>>>> set (reg:CC_C CC_REGNUM)
>>>>>>>> (compare:CC_C
>>>>>>>> (zero_extend:DI (match_dup 4))
>>>>>>>> (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>> (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>> set (reg:CC_NV CC_REGNUM)
>>>>>>>> (compare:CC_NV
>>>>>>>> (match_dup 4))
>>>>>>>> (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>>>>>>>> 0)))
>>>>>>>> set (match_dup 3)
>>>>>>>> (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>> (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>>
>>>>>>>>
>>>>>>>> But I doubt that will work to set CC_REGNUM with two different
>>>>>>>> modes
>>>>>>>> in parallel?
>>>>>>>>
>>>>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>>>>>>> defines C from the DImode result, and NV from the SImode result,
>>>>>>>> similar to the CC_NOOVmode, that also leaves something open what
>>>>>>>> bits it really defines?
>>>>>>>>
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Bernd.
>>>>>>>
>>>>>>> I think maybe the right solution is to invent a new CCmode
>>>>>>> that defines C as if the comparison is done in DImode
>>>>>>> but N and V as if the comparison is done in SImode.
>>>>>>>
>>>>>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>>>>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>>>>>>> only N and Z are set correctly), but in a different patch of course.
>>>>>>>
>>>>>>> Attached is a new version that implements the new CCmode.
>>>>>>>
>>>>>>> How do you like this new version?
>>>>>>>
>>>>>>> It seems to be able to build a cross-compiler at least.
>>>>>>>
>>>>>>> I will start a new bootstrap with this new patch, but that can take
>>>>>>> some
>>>>>>> time until I have definitive results.
>>>>>>>
>>>>>>> Is there still a chance that it can go into gcc-7 or should it wait
>>>>>>> for the next stage1?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Bernd.
>>>>>>
>>>>>>
>>>>>> I thought I should also look at where the subdi_compare1 amd the
>>>>>> negdi2_compare patterns are used, and look if the caller is fine with
>>>>>> not having all CC bits available.
>>>>>>
>>>>>> And indeed usubv<mode>4 turns out to be questionabe, because it
>>>>>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
>>>>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses
>>>>>> CCmode.
>>>>>>
>>>>>> To correct this, the branch should use CC_Cmode which is always
>>>>>> defined.
>>>>>>
>>>>>> So I tried to test this pattern, with the following test programs,
>>>>>> and found that the code actually improves when the branch uses
>>>>>> CC_Cmode
>>>>>> instead of CCmode, both for SImode and DImode data, which was a bit
>>>>>> surprising.
>>>>>>
>>>>>> I used this test program to see how the usubv<mode>4 pattern works:
>>>>>>
>>>>>> cat test.c (DImode)
>>>>>> unsigned long long x, y, z;
>>>>>> int b;
>>>>>> void test()
>>>>>> {
>>>>>> b = __builtin_sub_overflow (y,z, &x);
>>>>>> }
>>>>>>
>>>>>>
>>>>>> unpatched code used 8 byte more stack than patched,
>>>>>> because the DImode subtraction is effectively done twice.
>>>>>>
>>>>>> cat test1.c (SImode)
>>>>>> unsigned long x, y, z;
>>>>>> int b;
>>>>>> void test()
>>>>>> {
>>>>>> b = __builtin_sub_overflow (y,z, &x);
>>>>>> }
>>>>>>
>>>>>> which generates (unpatched):
>>>>>> cmp r3, r0
>>>>>> sub ip, r3, r0
>>>>>>
>>>>>> instead of expected (patched):
>>>>>> subs r3, r3, r2
>>>>>>
>>>>>>
>>>>>> The condition is extracted by ifconversion and/or combine
>>>>>> and complicates the resulting code instead of simplifying.
>>>>>>
>>>>>> I think this happens only when the branch and the subsi/di3_compare1
>>>>>> is using the same CC mode.
>>>>>>
>>>>>> That does not happen when the CC modes disagree, as with the
>>>>>> proposed patch. All other uses of the pattern are already using
>>>>>> CC_Cmode or CC_Vmode in the branch, and these do not change.
>>>>>>
>>>>>> Attached is an updated version of the patch, that happens to
>>>>>> improve the code generation of the usubsi4 and usubdi4 pattern,
>>>>>> as a side effect.
>>>>>>
>>>>>>
>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>> Is it OK for trunk?
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Bernd.
next prev parent reply other threads:[~2017-07-05 18:11 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-18 13:15 Bernd Edlinger
2017-01-13 13:50 ` Richard Earnshaw (lists)
2017-01-13 16:10 ` Bernd Edlinger
2017-01-13 18:29 ` Bernd Edlinger
2017-01-18 15:43 ` Bernd Edlinger
2017-04-20 19:10 ` [PING] " Bernd Edlinger
2017-04-29 17:32 ` [PING**2] " Bernd Edlinger
2017-05-12 16:50 ` [PING**3] " Bernd Edlinger
2017-06-01 16:01 ` [PING**4] " Bernd Edlinger
[not found] ` <eb07f6a9-522b-0497-fc13-f3e4508b8277@hotmail.de>
2017-06-14 12:34 ` [PING**5] " Bernd Edlinger
[not found] ` <74eaaa44-40f0-4b12-1aec-4b9926158efe@hotmail.de>
2017-07-05 18:11 ` Bernd Edlinger [this message]
2017-09-04 13:55 ` Kyrill Tkachov
2017-09-04 19:54 ` Bernd Edlinger
[not found] ` <a55cfa36-bb99-3433-f99e-c261fbe5dac1@hotmail.de>
2017-09-06 12:44 ` Bernd Edlinger
2017-09-06 12:52 ` Richard Earnshaw (lists)
2017-09-06 13:00 ` Bernd Edlinger
2017-09-06 13:17 ` Bernd Edlinger
2017-09-06 15:31 ` Kyrill Tkachov
2017-09-17 8:38 ` [PING] " Bernd Edlinger
2017-10-09 13:02 ` Richard Earnshaw (lists)
2017-10-10 19:11 ` Bernd Edlinger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AM5PR0701MB265712DF15C66EDC87ACCEB9E4D40@AM5PR0701MB2657.eurprd07.prod.outlook.com \
--to=bernd.edlinger@hotmail.de \
--cc=Richard.Earnshaw@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=kyrylo.tkachov@foss.arm.com \
--cc=ramana.radhakrishnan@arm.com \
--cc=wilco.dijkstra@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).