public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jeff Law <law@redhat.com>
To: Richard Biener <rguenther@suse.de>, Jakub Jelinek <jakub@redhat.com>
Cc: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>,
	       Uros Bizjak <ubizjak@gmail.com>,
	GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH][v3] GIMPLE store merging pass
Date: Wed, 07 Sep 2016 12:43:00 -0000	[thread overview]
Message-ID: <6571269d-c92c-6fa6-7878-7a456886d807@redhat.com> (raw)
In-Reply-To: <alpine.LSU.2.11.1609071010210.26629@t29.fhfr.qr>

On 09/07/2016 02:19 AM, Richard Biener wrote:
> On Tue, 6 Sep 2016, Jakub Jelinek wrote:
>
>> On Tue, Sep 06, 2016 at 04:59:23PM +0100, Kyrill Tkachov wrote:
>>> On 06/09/16 16:32, Jakub Jelinek wrote:
>>>> On Tue, Sep 06, 2016 at 04:14:47PM +0100, Kyrill Tkachov wrote:
>>>>> The v3 of this patch addresses feedback I received on the version posted at [1].
>>>>> The merged store buffer is now represented as a char array that we splat values onto with
>>>>> native_encode_expr and native_interpret_expr. This allows us to merge anything that native_encode_expr
>>>>> accepts, including floating point values and short vectors. So this version extends the functionality
>>>>> of the previous one in that it handles floating point values as well.
>>>>>
>>>>> The first phase of the algorithm that detects the contiguous stores is also slightly refactored according
>>>>> to feedback to read more fluently.
>>>>>
>>>>> Richi, I experimented with merging up to MOVE_MAX bytes rather than word size but I got worse results on aarch64.
>>>>> MOVE_MAX there is 16 (because it has load/store register pair instructions) but the 128-bit immediates that we ended
>>>>> synthesising were too complex. Perhaps the TImode immediate store RTL expansions could be improved, but for now
>>>>> I've left the maximum merge size to be BITS_PER_WORD.
>>>> At least from playing with this kind of things in the RTL PR22141 patch,
>>>> I remember storing 64-bit constants on x86_64 compared to storing 2 32-bit
>>>> constants usually isn't a win (not just for speed optimized blocks but also for
>>>> -Os).  For 64-bit store if the constant isn't signed 32-bit or unsigned
>>>> 32-bit you need movabsq into some temporary register which has like 3 times worse
>>>> latency than normal store if I remember well, and then store it.
>>>
>>> We could restrict the maximum width of the stores generated to 32 bits on x86_64.
>>> I think this would need another parameter or target macro for the target to set.
>>> Alternatively, is it a possibility for x86 to be a bit smarter in its DImode mov-immediate
>>> expansion? For example break up the 64-bit movabsq immediate into two SImode immediates?
>>
>> If you want a 64-bit store, you'd need to merge the two, and that would be
>> even more expensive.  It is a matter of say:
>> 	movl $0x12345678, (%rsp)
>> 	movl $0x09abcdef, 4(%rsp)
>> vs.
>> 	movabsq $0x09abcdef12345678, %rax
>> 	movq %rax, (%rsp)
>> vs.
>> 	movl $0x09abcdef, %eax
>> 	salq $32, %rax
>> 	orq $0x12345678, %rax
>> 	movq $rax, (%rsp)
>
> vs.
>
>         movq $LC0, (%rsp)
>
> ?
>
>> etc.  Guess it needs to be benchmarked on contemporary CPUs, I'm pretty sure
>> the last sequence is the worst one.
>
> I think the important part to notice is that it should be straight forward
> for a target / the expander to split a large store from an immediate
> into any of the above but very hard to do the opposite.  Thus from a
> GIMPLE side "canonicalizing" to large stores (that are eventually
> supported and well-aligned) seems best to me.
Agreed.


>
>>> I'm aware of that. The patch already has logic to avoid emitting unaligned accesses
>>> for SLOW_UNALIGNED_ACCESS targets. Beyond that the patch introduces the parameter
>>> PARAM_STORE_MERGING_ALLOW_UNALIGNED that can be used by the user or target to
>>> forbid generation of unaligned stores by the pass altogether. Beyond that I'm not sure
>>> how to behave more intelligently here. Any ideas?
>>
>> Dunno, the heuristics was the main problem with my patch.  Generally, I'd
>> say there is a difference between cold and hot blocks, in cold ones perhaps
>> unaligned stores are more appropriate (if supported at all and not way too
>> slow), while in hot ones less desirable.
>
> Note that I repeatedly argue that if we can canonicalize sth to "larger"
> then even if unaligned, the expander should be able to produce optimal
> code again (it might not do, of course).
And agreed.  Furthermore, it's in line with our guiding principles WRT 
separation of the tree/SSA optimizers from target dependencies.

So let's push those decisions into the expanders/backend/target and 
canonicalize to the larger stores.

jeff


  reply	other threads:[~2016-09-07 12:38 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-06 15:16 Kyrill Tkachov
2016-09-06 15:33 ` Jakub Jelinek
2016-09-06 16:21   ` Kyrill Tkachov
2016-09-06 16:34     ` Jakub Jelinek
2016-09-06 16:38       ` Kyrill Tkachov
2016-09-07  9:11       ` Richard Biener
2016-09-07 12:43         ` Jeff Law [this message]
2016-09-07 13:32         ` Bernd Schmidt
2016-09-07 20:47         ` Jakub Jelinek
2016-09-07 20:44 ` Bernhard Reutner-Fischer
2016-09-08  8:54   ` Kyrill Tkachov
2016-09-08 15:47     ` Bernhard Reutner-Fischer
2016-09-13  9:47       ` Kyrill Tkachov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6571269d-c92c-6fa6-7878-7a456886d807@redhat.com \
    --to=law@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=kyrylo.tkachov@foss.arm.com \
    --cc=rguenther@suse.de \
    --cc=ubizjak@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).