public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Bernd Edlinger <bernd.edlinger@hotmail.de>
To: Ramana Radhakrishnan <ramana.gcc@googlemail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
	Kyrill Tkachov	<kyrylo.tkachov@foss.arm.com>,
	Richard Earnshaw <richard.earnshaw@arm.com>,
	Wilco Dijkstra <wilco.dijkstra@arm.com>
Subject: [PING**4] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)
Date: Thu, 01 Jun 2017 16:01:00 -0000	[thread overview]
Message-ID: <AM4PR0701MB2162DAE800C4EEDB4C6E41E0E4F60@AM4PR0701MB2162.eurprd07.prod.outlook.com> (raw)
In-Reply-To: <AM4PR0701MB2162C4EC4B8E114B38F714DCE4E20@AM4PR0701MB2162.eurprd07.prod.outlook.com>

Ping...

On 05/12/17 18:49, Bernd Edlinger wrote:
> Ping...
> 
> On 04/29/17 19:45, Bernd Edlinger wrote:
>> Ping...
>>
>> I attached a rebased version since there was a merge conflict in
>> the xordi3 pattern, otherwise the patch is still identical.
>> It splits adddi3, subdi3, anddi3, iordi3, xordi3 and one_cmpldi2
>> early when the target has no neon or iwmmxt.
>>
>>
>> Thanks
>> Bernd.
>>
>>
>>
>> On 11/28/16 20:42, Bernd Edlinger wrote:
>>> On 11/25/16 12:30, Ramana Radhakrishnan wrote:
>>>> On Sun, Nov 6, 2016 at 2:18 PM, Bernd Edlinger
>>>> <bernd.edlinger@hotmail.de> wrote:
>>>>> Hi!
>>>>>
>>>>> This improves the stack usage on the sha512 test case for the case
>>>>> without hardware fpu and without iwmmxt by splitting all di-mode
>>>>> patterns right while expanding which is similar to what the
>>>>> shift-pattern
>>>>> does.  It does nothing in the case iwmmxt and fpu=neon or vfp as
>>>>> well as
>>>>> thumb1.
>>>>>
>>>>
>>>> I would go further and do this in the absence of Neon, the VFP unit
>>>> being there doesn't help with DImode operations i.e. we do not have 64
>>>> bit integer arithmetic instructions without Neon. The main reason why
>>>> we have the DImode patterns split so late is to give a chance for
>>>> folks who want to do 64 bit arithmetic in Neon a chance to make this
>>>> work as well as support some of the 64 bit Neon intrinsics which IIRC
>>>> map down to these instructions. Doing this just for soft-float doesn't
>>>> improve the default case only. I don't usually test iwmmxt and I'm not
>>>> sure who has the ability to do so, thus keeping this restriction for
>>>> iwMMX is fine.
>>>>
>>>>
>>>
>>> Yes I understand, thanks for pointing that out.
>>>
>>> I was not aware what iwmmxt exists at all, but I noticed that most
>>> 64bit expansions work completely different, and would break if we split
>>> the pattern early.
>>>
>>> I can however only look at the assembler outout for iwmmxt, and make
>>> sure that the stack usage does not get worse.
>>>
>>> Thus the new version of the patch keeps only thumb1, neon and iwmmxt as
>>> it is: around 1570 (thumb1), 2300 (neon) and 2200 (wimmxt) bytes stack
>>> for the test cases, and vfp and soft-float at around 270 bytes stack
>>> usage.
>>>
>>>>> It reduces the stack usage from 2300 to near optimal 272 bytes (!).
>>>>>
>>>>> Note this also splits many ldrd/strd instructions and therefore I will
>>>>> post a followup-patch that mitigates this effect by enabling the
>>>>> ldrd/strd
>>>>> peephole optimization after the necessary reg-testing.
>>>>>
>>>>>
>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>
>>>> What do you mean by arm-linux-gnueabihf - when folks say that I
>>>> interpret it as --with-arch=armv7-a --with-float=hard
>>>> --with-fpu=vfpv3-d16 or (--with-fpu=neon).
>>>>
>>>> If you've really bootstrapped and regtested it on armhf, doesn't this
>>>> patch as it stand have no effect there i.e. no change ?
>>>> arm-linux-gnueabihf usually means to me someone has configured with
>>>> --with-float=hard, so there are no regressions in the hard float ABI
>>>> case,
>>>>
>>>
>>> I know it proves little.  When I say arm-linux-gnueabihf
>>> I do in fact mean --enable-languages=all,ada,go,obj-c++
>>> --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16
>>> --with-float=hard.
>>>
>>> My main interest in the stack usage is of course not because of linux,
>>> but because of eCos where we have very small task stacks and in fact
>>> no fpu support by the O/S at all, so that patch is exactly what we need.
>>>
>>>
>>> Bootstrapped and reg-tested on arm-linux-gnueabihf
>>> Is it OK for trunk?
>>>
>>>
>>> Thanks
>>> Bernd.

  reply	other threads:[~2017-06-01 16:01 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-06 14:18 Bernd Edlinger
2016-11-25 11:30 ` Ramana Radhakrishnan
2016-11-28 19:42   ` Bernd Edlinger
     [not found]     ` <VI1PR0802MB2621FFBFA3252B40E5978C9F838D0@VI1PR0802MB2621.eurprd08.prod.outlook.com>
2016-11-29 21:37       ` Bernd Edlinger
     [not found]         ` <AM5PR0802MB261038521472515DDE3E58DA838D0@AM5PR0802MB2610.eurprd08.prod.outlook.com>
2016-11-30 12:01           ` Wilco Dijkstra
2016-11-30 17:01             ` Bernd Edlinger
2016-12-08 19:50               ` Bernd Edlinger
2017-01-11 16:55                 ` Richard Earnshaw (lists)
2017-01-11 17:19                   ` Bernd Edlinger
2017-04-29 19:17     ` [PING**2] " Bernd Edlinger
2017-05-12 16:51       ` [PING**3] " Bernd Edlinger
2017-06-01 16:01         ` Bernd Edlinger [this message]
     [not found]         ` <bd5e03b1-860f-dd16-2030-9ce0f9a94c7c@hotmail.de>
2017-06-14 12:35           ` [PING**5] " Bernd Edlinger
     [not found]           ` <9a0fbb5d-9909-ef4d-6871-0cb4f7971bbb@hotmail.de>
2017-07-05 18:14             ` [PING**6] " Bernd Edlinger
2017-09-04 14:52       ` [PING**2] " Kyrill Tkachov
2017-09-05  8:47         ` Christophe Lyon
2017-09-05 14:25           ` Bernd Edlinger
2017-09-05 15:02             ` Wilco Dijkstra
2017-09-05 17:48               ` Bernd Edlinger
2017-09-05 17:53                 ` Kyrill Tkachov
2017-09-05 18:20                   ` Christophe Lyon
2017-09-06  7:35                     ` Christophe Lyon
2017-09-05 21:28                 ` Wilco Dijkstra
2017-09-06  9:31                   ` Bernd Edlinger
2017-09-05 17:45             ` Kyrill Tkachov
2016-11-16 17:28 [PING] " Bernd Edlinger
2016-11-24 14:24 ` [PING**2] " Bernd Edlinger
2016-12-05 13:41   ` [PING] " Bernd Edlinger
2016-12-12  5:59     ` [PING**2] " Bernd Edlinger
2016-12-19  8:22       ` Bernd Edlinger
2016-12-27 14:13         ` [PING**3] " Bernd Edlinger
2017-01-05  9:11           ` [PING**4] " Bernd Edlinger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM4PR0701MB2162DAE800C4EEDB4C6E41E0E4F60@AM4PR0701MB2162.eurprd07.prod.outlook.com \
    --to=bernd.edlinger@hotmail.de \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=kyrylo.tkachov@foss.arm.com \
    --cc=ramana.gcc@googlemail.com \
    --cc=richard.earnshaw@arm.com \
    --cc=wilco.dijkstra@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).