public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Kyrill  Tkachov <kyrylo.tkachov@foss.arm.com>
To: Bernd Edlinger <bernd.edlinger@hotmail.de>,
	 Christophe Lyon <christophe.lyon@linaro.org>
Cc: Ramana Radhakrishnan <ramana.gcc@googlemail.com>,
	 GCC Patches <gcc-patches@gcc.gnu.org>,
	Richard Earnshaw <richard.earnshaw@arm.com>,
	 Wilco Dijkstra <wilco.dijkstra@arm.com>
Subject: Re: [PING**2] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)
Date: Tue, 05 Sep 2017 17:45:00 -0000	[thread overview]
Message-ID: <59AEE2B6.7040102@foss.arm.com> (raw)
In-Reply-To: <AM5PR0701MB2657A6382FD554966E1E2D35E4960@AM5PR0701MB2657.eurprd07.prod.outlook.com>

Hi Bernd,

On 05/09/17 15:25, Bernd Edlinger wrote:
> Hi Christophe,
>
> On 09/05/17 10:45, Christophe Lyon wrote:
>> Hi Bernd,
>>
>>
>> On 4 September 2017 at 16:52, Kyrill  Tkachov
>> <kyrylo.tkachov@foss.arm.com> wrote:
>>> On 29/04/17 18:45, Bernd Edlinger wrote:
>>>> Ping...
>>>>
>>>> I attached a rebased version since there was a merge conflict in
>>>> the xordi3 pattern, otherwise the patch is still identical.
>>>> It splits adddi3, subdi3, anddi3, iordi3, xordi3 and one_cmpldi2
>>>> early when the target has no neon or iwmmxt.
>>>>
>>>>
>>>> Thanks
>>>> Bernd.
>>>>
>>>>
>>>>
>>>> On 11/28/16 20:42, Bernd Edlinger wrote:
>>>>> On 11/25/16 12:30, Ramana Radhakrishnan wrote:
>>>>>> On Sun, Nov 6, 2016 at 2:18 PM, Bernd Edlinger
>>>>>> <bernd.edlinger@hotmail.de> wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>> This improves the stack usage on the sha512 test case for the case
>>>>>>> without hardware fpu and without iwmmxt by splitting all di-mode
>>>>>>> patterns right while expanding which is similar to what the
>>>>>>> shift-pattern
>>>>>>> does.  It does nothing in the case iwmmxt and fpu=neon or vfp as well
>>>>>>> as
>>>>>>> thumb1.
>>>>>>>
>>>>>> I would go further and do this in the absence of Neon, the VFP unit
>>>>>> being there doesn't help with DImode operations i.e. we do not have 64
>>>>>> bit integer arithmetic instructions without Neon. The main reason why
>>>>>> we have the DImode patterns split so late is to give a chance for
>>>>>> folks who want to do 64 bit arithmetic in Neon a chance to make this
>>>>>> work as well as support some of the 64 bit Neon intrinsics which IIRC
>>>>>> map down to these instructions. Doing this just for soft-float doesn't
>>>>>> improve the default case only. I don't usually test iwmmxt and I'm not
>>>>>> sure who has the ability to do so, thus keeping this restriction for
>>>>>> iwMMX is fine.
>>>>>>
>>>>>>
>>>>> Yes I understand, thanks for pointing that out.
>>>>>
>>>>> I was not aware what iwmmxt exists at all, but I noticed that most
>>>>> 64bit expansions work completely different, and would break if we split
>>>>> the pattern early.
>>>>>
>>>>> I can however only look at the assembler outout for iwmmxt, and make
>>>>> sure that the stack usage does not get worse.
>>>>>
>>>>> Thus the new version of the patch keeps only thumb1, neon and iwmmxt as
>>>>> it is: around 1570 (thumb1), 2300 (neon) and 2200 (wimmxt) bytes stack
>>>>> for the test cases, and vfp and soft-float at around 270 bytes stack
>>>>> usage.
>>>>>
>>>>>>> It reduces the stack usage from 2300 to near optimal 272 bytes (!).
>>>>>>>
>>>>>>> Note this also splits many ldrd/strd instructions and therefore I will
>>>>>>> post a followup-patch that mitigates this effect by enabling the
>>>>>>> ldrd/strd
>>>>>>> peephole optimization after the necessary reg-testing.
>>>>>>>
>>>>>>>
>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>> What do you mean by arm-linux-gnueabihf - when folks say that I
>>>>>> interpret it as --with-arch=armv7-a --with-float=hard
>>>>>> --with-fpu=vfpv3-d16 or (--with-fpu=neon).
>>>>>>
>>>>>> If you've really bootstrapped and regtested it on armhf, doesn't this
>>>>>> patch as it stand have no effect there i.e. no change ?
>>>>>> arm-linux-gnueabihf usually means to me someone has configured with
>>>>>> --with-float=hard, so there are no regressions in the hard float ABI
>>>>>> case,
>>>>>>
>>>>> I know it proves little.  When I say arm-linux-gnueabihf
>>>>> I do in fact mean --enable-languages=all,ada,go,obj-c++
>>>>> --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16
>>>>> --with-float=hard.
>>>>>
>>>>> My main interest in the stack usage is of course not because of linux,
>>>>> but because of eCos where we have very small task stacks and in fact
>>>>> no fpu support by the O/S at all, so that patch is exactly what we need.
>>>>>
>>>>>
>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf
>>>>> Is it OK for trunk?
>>>
>>> The code is ok.
>>> AFAICS testing this with --with-fpu=vfpv3-d16 does exercise the new code as
>>> the splits
>>> will happen for !TARGET_NEON (it is of course !TARGET_IWMMXT and
>>> TARGET_IWMMXT2
>>> is irrelevant here).
>>>
>>> So this is ok for trunk.
>>> Thanks, and sorry again for the delay.
>>> Kyrill
>>>
>> This patch (r251663) causes a regression on armeb-none-linux-gnueabihf
>> --with-mode arm
>> --with-cpu cortex-a9
>> --with-fpu vfpv3-d16-fp16
>> FAIL:    gcc.dg/vect/vect-singleton_1.c (internal compiler error)
>> FAIL:    gcc.dg/vect/vect-singleton_1.c -flto -ffat-lto-objects
>> (internal compiler error)
>>
>> the test passes if gcc is configured --with-fpu neon-fp16
>>
> Thank you very much for what you do!
>
> I am able to reproduce this.
>
> Combine creates an invalid insn out of these two insns:
>
> (insn 12 8 18 2 (parallel [
>               (set (reg:DI 122)
>                   (plus:DI (reg:DI 116 [ aD.5331 ])
>                       (reg:DI 119 [ bD.5332 ])))
>               (clobber (reg:CC 100 cc))
>           ]) "vect-singleton_1.c":28 1 {*arm_adddi3}
>        (expr_list:REG_DEAD (reg:DI 119 [ bD.5332 ])
>           (expr_list:REG_DEAD (reg:DI 116 [ aD.5331 ])
>               (expr_list:REG_UNUSED (reg:CC 100 cc)
>                   (nil)))))
> (insn 18 12 19 2 (set (reg/i:DI 16 s0)
>           (reg:DI 122)) "vect-singleton_1.c":28 650 {*movdi_vfp}
>        (expr_list:REG_DEAD (reg:DI 122)
>           (nil)))
>
> =>
>
> (insn 18 12 19 2 (parallel [
>               (set (reg/i:DI 16 s0)
>                   (plus:DI (reg:DI 116 [ aD.5331 ])
>                       (reg:DI 119 [ bD.5332 ])))
>               (clobber (reg:CC 100 cc))
>           ]) "vect-singleton_1.c":28 1 {*arm_adddi3}
>        (expr_list:REG_UNUSED (reg:CC 100 cc)
>           (expr_list:REG_DEAD (reg:DI 116 [ aD.5331 ])
>               (expr_list:REG_DEAD (reg:DI 119 [ bD.5332 ])
>                   (nil)))))
>
>
> But after reload this is fixed again, so that is no issue for a
> splitter that runs always after reload.
>
> But now the splitter runs into an assertion in gen_highpart(op0).
> The same problem should exist with the subdi3 as well, but
> not with the other patterns in this patch.
>
> I think the right way to fix this is use a non-vfp register
> predicate to prevent combine to create the un-splittable
> instruction in the first place.
>
> See attached my proposed fix for this defect.
>
> Christophe, You could do me a favor, and test it in your environment
> as I have nothing comparable.
>
>
> Kyrill, is this patch OK after bootstrap / reg-testing?
>

Thanks for tracking this down so quickly.
Like Wilco said in his reply, you can just use arm_general_register_operand
and for the adddi_operand predicate use a name like 
arm_general_adddi_operand
(_nv associates to some with non-volatile or non-valid).

I don't think anddi_notdi_di has the same issue as that one still 
requires splitting after reload
in all cases and the registers are guaranteed to be core registers by then.

So ok with using arm_general_register_operand and renaming the new 
adjusted adddi_operand-based predicate to arm_general_adddi_operand, 
assuming bootstrap and testing goes ok.

Thanks,
Kyrill

  parent reply	other threads:[~2017-09-05 17:45 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-06 14:18 Bernd Edlinger
2016-11-25 11:30 ` Ramana Radhakrishnan
2016-11-28 19:42   ` Bernd Edlinger
     [not found]     ` <VI1PR0802MB2621FFBFA3252B40E5978C9F838D0@VI1PR0802MB2621.eurprd08.prod.outlook.com>
2016-11-29 21:37       ` Bernd Edlinger
     [not found]         ` <AM5PR0802MB261038521472515DDE3E58DA838D0@AM5PR0802MB2610.eurprd08.prod.outlook.com>
2016-11-30 12:01           ` Wilco Dijkstra
2016-11-30 17:01             ` Bernd Edlinger
2016-12-08 19:50               ` Bernd Edlinger
2017-01-11 16:55                 ` Richard Earnshaw (lists)
2017-01-11 17:19                   ` Bernd Edlinger
2017-04-29 19:17     ` [PING**2] " Bernd Edlinger
2017-05-12 16:51       ` [PING**3] " Bernd Edlinger
2017-06-01 16:01         ` [PING**4] " Bernd Edlinger
     [not found]         ` <bd5e03b1-860f-dd16-2030-9ce0f9a94c7c@hotmail.de>
2017-06-14 12:35           ` [PING**5] " Bernd Edlinger
     [not found]           ` <9a0fbb5d-9909-ef4d-6871-0cb4f7971bbb@hotmail.de>
2017-07-05 18:14             ` [PING**6] " Bernd Edlinger
2017-09-04 14:52       ` [PING**2] " Kyrill Tkachov
2017-09-05  8:47         ` Christophe Lyon
2017-09-05 14:25           ` Bernd Edlinger
2017-09-05 15:02             ` Wilco Dijkstra
2017-09-05 17:48               ` Bernd Edlinger
2017-09-05 17:53                 ` Kyrill Tkachov
2017-09-05 18:20                   ` Christophe Lyon
2017-09-06  7:35                     ` Christophe Lyon
2017-09-05 21:28                 ` Wilco Dijkstra
2017-09-06  9:31                   ` Bernd Edlinger
2017-09-05 17:45             ` Kyrill Tkachov [this message]
2016-11-16 17:28 [PING] " Bernd Edlinger
2016-11-24 14:24 ` [PING**2] " Bernd Edlinger
2016-12-05 13:41   ` [PING] " Bernd Edlinger
2016-12-12  5:59     ` [PING**2] " Bernd Edlinger
2016-12-19  8:22       ` Bernd Edlinger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59AEE2B6.7040102@foss.arm.com \
    --to=kyrylo.tkachov@foss.arm.com \
    --cc=bernd.edlinger@hotmail.de \
    --cc=christophe.lyon@linaro.org \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=ramana.gcc@googlemail.com \
    --cc=richard.earnshaw@arm.com \
    --cc=wilco.dijkstra@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).