public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Bernd Edlinger <bernd.edlinger@hotmail.de>
To: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>,
	"Richard Earnshaw (lists)"	<Richard.Earnshaw@arm.com>,
	"gcc-patches@gcc.gnu.org"	<gcc-patches@gcc.gnu.org>
Cc: Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>,
	Wilco Dijkstra	<wilco.dijkstra@arm.com>
Subject: Re: [PATCH, ARM] correctly encode the CC reg data flow
Date: Wed, 06 Sep 2017 12:44:00 -0000	[thread overview]
Message-ID: <AM5PR0701MB26572C99CDF0198F38A7DB9BE4970@AM5PR0701MB2657.eurprd07.prod.outlook.com> (raw)
In-Reply-To: <a55cfa36-bb99-3433-f99e-c261fbe5dac1@hotmail.de>

On 09/04/17 21:54, Bernd Edlinger wrote:
> Hi Kyrill,
> 
> Thanks for your review!
> 
> 
> On 09/04/17 15:55, Kyrill Tkachov wrote:
>> Hi Bernd,
>>
>> On 18/01/17 15:36, Bernd Edlinger wrote:
>>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> this is related to PR77308, the follow-up patch will depend on this
>>>>>>> one.
>>>>>>>
>>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>>>> before reload, a mis-compilation in libgcc function 
>>>>>>> __gnu_satfractdasq
>>>>>>> was discovered, see [1] for more details.
>>>>>>>
>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>>>> up into this:
>>>>>>>
>>>>>>>     [(set (reg:CC CC_REGNUM)
>>>>>>>           (compare:CC (match_dup 0) (match_dup 1)))
>>>>>>>      (parallel [(set (reg:CC CC_REGNUM)
>>>>>>>                      (compare:CC (match_dup 3) (match_dup 4)))
>>>>>>>                 (set (match_dup 2)
>>>>>>>                      (minus:SI (match_dup 5)
>>>>>>>                               (ltu:SI (reg:CC_C CC_REGNUM) 
>>>>>>> (const_int
>>>>>>> 0))))])]
>>>>>>>
>>>>>>>     [(set (reg:CC CC_REGNUM)
>>>>>>>           (compare:CC (match_dup 2) (match_dup 3)))
>>>>>>>      (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>>>                 (set (reg:CC CC_REGNUM)
>>>>>>>                      (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>>>
>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>>>>> redundant and thus got removed, because the data values are 
>>>>>>> identical.
>>>>>>>
>>>>>>> I think that applies to a number of similar pattern where data
>>>>>>> flow is happening through the CC reg.
>>>>>>>
>>>>>>> So this is a kind of correctness issue, and should be fixed
>>>>>>> independently from the optimization issue PR77308.
>>>>>>>
>>>>>>> Therefore I think the patterns need to specify the true
>>>>>>> value that will be in the CC reg, in order for cse to
>>>>>>> know what the instructions are really doing.
>>>>>>>
>>>>>>>
>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>>> Is it OK for trunk?
>>>>>>>
>>>>>> I agree you've found a valid problem here, but I have some issues 
>>>>>> with
>>>>>> the patch itself.
>>>>>>
>>>>>>
>>>>>> (define_insn_and_split "subdi3_compare1"
>>>>>>    [(set (reg:CC_NCV CC_REGNUM)
>>>>>>      (compare:CC_NCV
>>>>>>        (match_operand:DI 1 "register_operand" "r")
>>>>>>        (match_operand:DI 2 "register_operand" "r")))
>>>>>>     (set (match_operand:DI 0 "register_operand" "=&r")
>>>>>>      (minus:DI (match_dup 1) (match_dup 2)))]
>>>>>>    "TARGET_32BIT"
>>>>>>    "#"
>>>>>>    "&& reload_completed"
>>>>>>    [(parallel [(set (reg:CC CC_REGNUM)
>>>>>>             (compare:CC (match_dup 1) (match_dup 2)))
>>>>>>            (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 
>>>>>> 2)))])
>>>>>>     (parallel [(set (reg:CC_C CC_REGNUM)
>>>>>>             (compare:CC_C
>>>>>>               (zero_extend:DI (match_dup 4))
>>>>>>               (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>                    (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>            (set (match_dup 3)
>>>>>>             (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>                   (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>>>
>>>>>>
>>>>>> This pattern is now no-longer self consistent in that before the 
>>>>>> split
>>>>>> the overall result for the condition register is in mode CC_NCV, but
>>>>>> afterwards it is just CC_C.
>>>>>>
>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>>>> reflect the result of the 64-bit comparison), but that then 
>>>>>> implies that
>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and 
>>>>>> should in
>>>>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to 
>>>>>> agree
>>>>>> that CC_NCV is the correct mode for this operation
>>>>>>
>>>>>> I'm not sure if there are other consequences that will fall out from
>>>>>> fixing this (it's possible that we might need a change to 
>>>>>> select_cc_mode
>>>>>> as well).
>>>>>>
>>>>> Yes, this is still a bit awkward...
>>>>>
>>>>> The N and V bit will be the correct result for the subdi3_compare1
>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>>>> only gets the C bit correct, the expression for N and V is a different
>>>>> one.
>>>>>
>>>>> It probably works, because the subsi3_carryin_compare instruction sets
>>>>> more CC bits than the pattern does explicitly specify the value.
>>>>> We know the subsi3_carryin_compare also computes the NV bits, but 
>>>>> it is
>>>>> hard to write down the correct rtl expression for it.
>>>>>
>>>>> In theory the pattern should describe everything correctly,
>>>>> maybe, like:
>>>>>
>>>>> set (reg:CC_C CC_REGNUM)
>>>>>      (compare:CC_C
>>>>>        (zero_extend:DI (match_dup 4))
>>>>>        (plus:DI (zero_extend:DI (match_dup 5))
>>>>>                 (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>> set (reg:CC_NV CC_REGNUM)
>>>>>      (compare:CC_NV
>>>>>       (match_dup 4))
>>>>>       (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) 
>>>>> (const_int 0)))
>>>>> set (match_dup 3)
>>>>>      (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>                (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>
>>>>>
>>>>> But I doubt that will work to set CC_REGNUM with two different modes
>>>>> in parallel?
>>>>>
>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>>>> defines C from the DImode result, and NV from the SImode result,
>>>>> similar to the CC_NOOVmode, that also leaves something open what
>>>>> bits it really defines?
>>>>>
>>>>>
>>>>> What do you think?
>>>>>
>>>>>
>>>>> Thanks
>>>>> Bernd.
>>>> I think maybe the right solution is to invent a new CCmode
>>>> that defines C as if the comparison is done in DImode
>>>> but N and V as if the comparison is done in SImode.
>>>>
>>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>>>> only N and Z are set correctly), but in a different patch of course.
>>>>
>>>> Attached is a new version that implements the new CCmode.
>>>>
>>>> How do you like this new version?
>>>>
>>>> It seems to be able to build a cross-compiler at least.
>>>>
>>>> I will start a new bootstrap with this new patch, but that can take 
>>>> some
>>>> time until I have definitive results.
>>>>
>>>> Is there still a chance that it can go into gcc-7 or should it wait
>>>> for the next stage1?
>>>>
>>>> Thanks
>>>> Bernd.
>>>
>>> I thought I should also look at where the subdi_compare1 amd the
>>> negdi2_compare patterns are used, and look if the caller is fine with
>>> not having all CC bits available.
>>>
>>> And indeed usubv<mode>4 turns out to be questionabe, because it
>>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses
>>> CCmode.
>>>
>>> To correct this, the branch should use CC_Cmode which is always defined.
>>>
>>> So I tried to test this pattern, with the following test programs,
>>> and found that the code actually improves when the branch uses CC_Cmode
>>> instead of CCmode, both for SImode and DImode data, which was a bit
>>> surprising.
>>>
>>> I used this test program to see how the usubv<mode>4 pattern works:
>>>
>>> cat test.c (DImode)
>>> unsigned long long x, y, z;
>>> int b;
>>> void test()
>>> {
>>>     b = __builtin_sub_overflow (y,z, &x);
>>> }
>>>
>>>
>>> unpatched code used 8 byte more stack than patched,
>>> because the DImode subtraction is effectively done twice.
>>>
>>> cat test1.c (SImode)
>>> unsigned long x, y, z;
>>> int b;
>>> void test()
>>> {
>>>     b = __builtin_sub_overflow (y,z, &x);
>>> }
>>>
>>> which generates (unpatched):
>>>           cmp     r3, r0
>>>           sub     ip, r3, r0
>>>
>>> instead of expected (patched):
>>>     subs    r3, r3, r2
>>>
>>>
>>> The condition is extracted by ifconversion and/or combine
>>> and complicates the resulting code instead of simplifying.
>>>
>>> I think this happens only when the branch and the subsi/di3_compare1
>>> is using the same CC mode.
>>>
>>> That does not happen when the CC modes disagree, as with the
>>> proposed patch.  All other uses of the pattern are already using
>>> CC_Cmode or CC_Vmode in the branch, and these do not change.
>>>
>>> Attached is an updated version of the patch, that happens to
>>> improve the code generation of the usubsi4 and usubdi4 pattern,
>>> as a side effect.
>>>
>>>
>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>> Is it OK for trunk?
>>
>> I'm very sorry it has taken so long to review.
>> I've been ramping up on the context recently now so I'll try to move 
>> this along...
>>
>> This patch looks mostly ok to me from reading the patterns and the 
>> discussion around it.
>> I have one concern:
>>
>>
>>   (define_insn_and_split "negdi2_compare"
>> -  [(set (reg:CC CC_REGNUM)
>> -    (compare:CC
>> +  [(set (reg:CC_NCV CC_REGNUM)
>> +    (compare:CC_NCV
>>         (const_int 0)
>>         (match_operand:DI 1 "register_operand" "0,r")))
>>      (set (match_operand:DI 0 "register_operand" "=r,&r")
>> @@ -4647,8 +4650,12 @@
>>              (compare:CC (const_int 0) (match_dup 1)))
>>             (set (match_dup 0) (minus:SI (const_int 0)
>>                          (match_dup 1)))])
>> -   (parallel [(set (reg:CC CC_REGNUM)
>> -           (compare:CC (const_int 0) (match_dup 3)))
>> +   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
>> +           (compare:CC_NCV_CIC
>> +             (const_int 0)
>> +             (plus:DI
>> +               (zero_extend:DI (match_dup 3))
>> +               (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>            (set (match_dup 2)
>>             (minus:SI
>>              (minus:SI (const_int 0) (match_dup 3))
>>
>>
>> I was somewhat concerned with having the first operand of the COMPARE 
>> being a const_int 0 and the second being
>> a complex expression as the RTL canonicalization rules usually require 
>> the complex operand going first if possible.
>> Reading the RTL rules in rtl.texi I see it says this:
>> "If one of the operands is a constant, it should be placed in the
>> second operand and the comparison code adjusted as appropriate."
>> So it seems that the pre-existing pattern that puts const_int 0 as the 
>> first operand already breaks that rule.
>> I think we should fix that and update the use of condition code to a 
>> GEU rather than LTU as well.
>>
> 

Well, the sentence before that one is even more explicit:

"Normally, @var{x} and @var{y} must have the same mode.  Otherwise,
@code{compare} is valid only if the mode of @var{x} is in class
@code{MODE_INT} and @var{y} is a @code{const_int} or
@code{const_double} with mode @code{VOIDmode}."

So because the const_int 0 has VOIDmode the comparison is done
in y-mode not x-mode.

But unfortunately I see no way how to accomplish this,
because this assumes that the compare can be easily swapped
if the conditional instruction just uses one of GT/GE/LE/LT
or GTU/GEU/LEU/LTU.  But that is only the case for plain CCmode.

And in this example we ask for "overflow", but while 0-X can
overflow X-0 simply can't.  And moreover there are non-symmetric
modes like CC_NCVmode which only support LT/GE/LTU/GEU but not
the swapped conditions GT/LE/GTU/LEU.

I think the only solution would be to adjust the spec to
reflect the implementation:

Index: rtl.texi
===================================================================
--- rtl.texi	(revision 251752)
+++ rtl.texi	(working copy)
@@ -2252,6 +2252,13 @@
  If one of the operands is a constant, it should be placed in the
  second operand and the comparison code adjusted as appropriate.

+There may be exceptions from this rule if the mode @var{m} carries
+not enough information for the swapped comparison operator, or
+if we ask for overflow from the subtraction.  That means, while
+0-X may overfow X-0 can never overflow.  Under these conditions
+a compare may have the constant expression at the left side.
+Examples are the ARM negdi2_compare pattern and similar.
+
  A @code{compare} specifying two @code{VOIDmode} constants is not valid
  since there is no way to know in what mode the comparison is to be
  performed; the comparison must either be folded during the compilation



Please advise.

Thanks
Bernd.


> 
> Hmmm...
> 
> I think the compare is not a commutative operation, and swapping
> the arguments will imply a different value in the flags.
> 
> So if I write
> (set (reg:CC_NCV CC_REGNUM)
>       (compare:CC_NCV
>         (const_int 0)
>         (reg:DI 123)))
> 
> I have C,N,V set to the result of (0 - r123), C = usable for LTU or GEU,
> N,V = usable for LT, GE
> 
> But if I write
> (set (reg:CC_NCV CC_REGNUM)
>       (compare:CC_NCV
>         (reg:DI 123)
>         (const_int 0)))
> 
> I have C,N,V set to the result of (r123 - 0), but the expansion stays
> the same and the actual value in the flags is defined by the expansion.
> Of course there exists probably no matching expansion for that.
> 
> Note that both LTU in the above hunk are in a parallel-stmt and operate
> on the flags from the previous pattern, so changing these to GEU
> will probably be wrong.
> 
> Both (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)) in the negdi2_compare
> use the flags from the previous (set (reg:CC CC_REGNUM) (compare:CC
> (const_int 0) (match_dup 1)).
> 
> One use of the resulting flags (I know of) is in negvdi3 where we
> have:
> 
>    emit_insn (gen_negdi2_compare (operands[0], operands[1]));
>    arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
> 
> I think only 0-x can overflow while x-0 can never overflow.
> 
> Of course the CC_NCV_CIC mode bends the definition of the RTL compare
> a lot and I guess if this pattern is created by a splitter, this can
> only be expanded by an exactly matching pattern, there is (hopefully)
> no way how combine could mess with this pattern due to the exotic
> CCmode.  So while I think it would work to swap only the notation of
> all CC_NCV_CIC patterns, _without_ changing the assembler-parts and the
> consuming statements, that would make it quite hard to follow for the
> human reader at least.
> 
> What do you think?
> 
> 
> Bernd.

  parent reply	other threads:[~2017-09-06 12:44 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-18 13:15 Bernd Edlinger
2017-01-13 13:50 ` Richard Earnshaw (lists)
2017-01-13 16:10   ` Bernd Edlinger
2017-01-13 18:29     ` Bernd Edlinger
2017-01-18 15:43       ` Bernd Edlinger
2017-04-20 19:10         ` [PING] " Bernd Edlinger
2017-04-29 17:32           ` [PING**2] " Bernd Edlinger
2017-05-12 16:50             ` [PING**3] " Bernd Edlinger
2017-06-01 16:01               ` [PING**4] " Bernd Edlinger
     [not found]               ` <eb07f6a9-522b-0497-fc13-f3e4508b8277@hotmail.de>
2017-06-14 12:34                 ` [PING**5] " Bernd Edlinger
     [not found]                 ` <74eaaa44-40f0-4b12-1aec-4b9926158efe@hotmail.de>
2017-07-05 18:11                   ` [PING**6] " Bernd Edlinger
2017-09-04 13:55         ` Kyrill Tkachov
2017-09-04 19:54           ` Bernd Edlinger
     [not found]           ` <a55cfa36-bb99-3433-f99e-c261fbe5dac1@hotmail.de>
2017-09-06 12:44             ` Bernd Edlinger [this message]
2017-09-06 12:52               ` Richard Earnshaw (lists)
2017-09-06 13:00                 ` Bernd Edlinger
2017-09-06 13:17                 ` Bernd Edlinger
2017-09-06 15:31                   ` Kyrill Tkachov
2017-09-17  8:38                     ` [PING] " Bernd Edlinger
2017-10-09 13:02                   ` Richard Earnshaw (lists)
2017-10-10 19:11                     ` Bernd Edlinger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM5PR0701MB26572C99CDF0198F38A7DB9BE4970@AM5PR0701MB2657.eurprd07.prod.outlook.com \
    --to=bernd.edlinger@hotmail.de \
    --cc=Richard.Earnshaw@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=kyrylo.tkachov@foss.arm.com \
    --cc=ramana.radhakrishnan@arm.com \
    --cc=wilco.dijkstra@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).