Re: [pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117].

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Sandiford <richard.sandiford@arm.com>
To: Iain Sandoe <iain@sandoe.co.uk>
Cc: Vladimir Makarov via Gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117].
Date: Mon, 21 Feb 2022 10:55:15 +0000	[thread overview]
Message-ID: <mpto830lca4.fsf@arm.com> (raw)
In-Reply-To: <3252FEE6-00A8-4201-84DA-623850C16698@sandoe.co.uk> (Iain Sandoe's message of "Sun, 20 Feb 2022 17:34:33 +0000")

Iain Sandoe <iain@sandoe.co.uk> writes:
> Hi Folks.
>> On 14 Feb 2022, at 16:58, Vladimir Makarov <vmakarov@redhat.com> wrote:
>> On 2022-02-14 11:00, Richard Sandiford wrote:
>
>>> Vladimir Makarov via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>>> 
>>>> Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.
>>>> 
>>>> I think there is no need for this code and it is misleading.  If
>>>> 'mem[low_sum]' does not work, I don't think that 'reg=low_sum;mem[reg]'
>>>> will help for any existing target.  As machine-dependent code for any
>>>> target most probably (for ppc64 darwin it is exactly the case) checks
>>>> address only in memory, it can wrongly accept wrong address by reloading
>>>> it into reg and use it in memory. So these are my arguments for the
>>>> remove this code from process_address_1.
>>> I'm probably making too much of this, but:
>>> 
>>> I think the code is potentially useful in that existing targets do forbid
>>> forbid lo_sum addresses in certain contexts (due to limited offset range)
>>> while still wanting lo_sum to be used to be load the address.  If we
>>> handle the high/lo_sum split in generic code then we have more chance
>>> of being able to optimise things.  So it feels like this is setting an
>>> unfortunate precedent.
>>> 
>>> I still don't understand what went wrong before though (the PR trail
>>> was a bit too long to process :-)).  Is there a case where
>>> (lo_sum (high X) X) != X?  If so, that seems like a target bug to me.
>>> Or does the target accept (set R1 (lo_sum R2 X)) for an X that cannot
>>> be split into a HIGH/LO_SUM pair?  I'd argue that's a target bug too.
>>> 
>> Sometimes it is hard to make a line where an RA bug is a bug in machine-dependent code or in RA itself.
>> 
>> For this case I would say it is a bug in the both parts.
>> 
>> Low-sum is generated by LRA and it does not know that it should be wrapped by unspec for darwin. Generally speaking we could avoid the change in LRA but it would require to do non-trivial analysis in machine dependent code to find cases when 'reg=low_sum ... mem[reg]' is incorrect code for darwin (PIC) target (and may be some other PIC targets too). Therefore I believe the change in LRA is a good solution even if the change can potentially result in less optimized code for some cases.  Taking your concern into account we could probably improve the patch by introducing a hook (I never liked such solutions as we already have too many hooks directing RA) or better to make the LRA change working only for PIC target. Something like this (it probably needs better recognition of pic target):
>> 
>> --- a/gcc/lra-constraints.cc
>> +++ b/gcc/lra-constraints.cc
>> @@ -3616,21 +3616,21 @@ process_address_1 (int nop, bool check_only_p,
>>           if (HAVE_lo_sum)
>>             {
>>               /* addr => lo_sum (new_base, addr), case (2) above.  */
>>               insn = emit_insn (gen_rtx_SET
>>                                 (new_reg,
>>                                  gen_rtx_HIGH (Pmode, copy_rtx (addr))));
>>               code = recog_memoized (insn);
>>               if (code >= 0)
>>                 {
>>                   *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
>> -                 if (!valid_address_p (op, &ad, cn))
>> +                 if (!valid_address_p (op, &ad, cn) && !flag_pic)
>
> IMO the PIC aspect of this is possibly misleading
>  - the issue is that we have an invalid address, and that such addresses in this case need to be legitimised by wrapping them in an UNSPEC. 
> - My concern about the generic code was that I would not expect Darwin to be the only platform that might need to wrap an invlaid address in an unspec  [TOC, TLS, small data etc. spring to mind].

Yeah, that part is pretty common.

> I need some help understanding the expected pathway through this code that could be useful.
>
> we start with an invalid address.
>
> 1. we build (set reg (high invalid_address))
>  - Darwin was allowing this (and the lo_sum) [eveywhere, not just here] on the basis that the target legitimizer would be called later to fix it up.  (that is why the initial recog passes) - but AFAICT we never call the target’s address legitimizer.

I think it's really the other way around: legitimisers are (or should be)
called before recog, with the legitimisers producing instructions that
recog is known to accept.

Once something has been recog()ed, the target has to be prepared
to generate code for it, either directly from an asm template or
indirectly from a define_split.  There's no backing out beyond
that point.

For:

  (set … (mem (lo_sum (reg Y) (…))))

the lo_sum is an address that needs to be legitimised (because it's
in a mem) but in:

  (set (reg X) (lo_sum (reg Y) (…)))    // A
  (set … (mem (reg X)))

it isn't: it's just an ordinary bit of arithmetic whose result happens
to be used as an address later.  If the target accepts A then it must be
prepared to generate code for it, without other hooks being called first.

That distinction also (unfortunately) applies to the canonical form
of multiplication by 2^N.  In an address (i.e. inside a mem) it's:

  (mult … (const_int 2^N))

whereas outside of an address (outside of a mem) it's:

  (lshift … (const_int N))

Also, the address legitimisation hooks are just optimisations.  It should
always be valid to legitimise an address by moving the address into a
register via force_operand instead (at least when can_create_psuedos_p).

So the real work of legitimising symbolic references is done by the move
patterns.  Any legitimisation done by the address hooks is just an
optimisation on top of that.

>  - I am curious about what (other) circumstance there would be where a (high of an invalid address would be useful.
>
> 2. …  assuming the we allowed the build of the (high invalid)
>
>  - we now build the lo_sum and check to see if it is valid.
>
> 3. if it is _not_ valid, we load it into a reg
>
>   - I am not sure (outside the comment about about post-legitimiizer use) about how an invalid lo_sum can be used in this way.
>
>   - assuming we accept this, we then test to see if the register is a valid address (my guess is that test will pass pretty much everywhere, since we picked a suitable register in the first place).
>
> ^^^ this is mostly for my education - the stuff below is a potential solution to leaving lra-constraints unchanged and fixing the Darwin bug….
>
> [ part of me wonders why we do not just call the target’s address legitimizer when we have an illegal address ]
>
> ——— current WIP:
>
> So .. I have split the Darwin patterns into a hi/lo pair for non-PIC and a hi/lo pair for PIC.
>
> I added a predicate for the PIC case that requires it to match (unspec [….] UNSPEC_MACHOPIC_OFFSET).
>
> Thus both the attempts (high and (lo_sum will fail recog now which has the same effect as punting on the bad lo_sum (and the original testcase now generates correct code) …

I don't know the Darwin code, so I can't speak to whether splitting
the PIC patterns out makes sense, but: yeah, the end result sounds
right to me FWIW.

Thanks,
Richard

> … the hardware is slow (well, it isn’t really - faster than a Cray XMP .. but …) .. so regstraps on 11.2 (to check the fix works) and master will take some time.
>
> Then, hopefully, there is a target-local fix (and the LRA change could be reverted if that’s the concensus about the best solution) ...
>
> cheers
> Iain

next prev parent reply	other threads:[~2022-02-21 10:55 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-11 23:59 Iain Sandoe
2022-02-14  9:44 ` Richard Sandiford
2022-02-14 14:36   ` Vladimir Makarov
2022-02-14 16:00     ` Richard Sandiford
2022-02-14 16:21       ` Iain Sandoe
2022-02-14 16:35         ` Richard Sandiford
2022-02-14 16:58       ` Vladimir Makarov
2022-02-20 17:34         ` Iain Sandoe
2022-02-21 10:55           ` Richard Sandiford [this message]
2022-02-22 14:44           ` Vladimir Makarov
2022-02-24 16:02             ` Iain Sandoe
2022-02-26 16:52               ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mpto830lca4.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=iain@sandoe.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).