From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <iain@sandoe.co.uk>
Received: from smtp001-out.apm-internet.net (smtp001-out.apm-internet.net
 [85.119.248.222])
 by sourceware.org (Postfix) with ESMTPS id DBE063858D28
 for <gcc-patches@gcc.gnu.org>; Sun, 20 Feb 2022 17:34:35 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DBE063858D28
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=sandoe.co.uk
Authentication-Results: sourceware.org; spf=none smtp.mailfrom=sandoe.co.uk
Received: (qmail 63110 invoked from network); 20 Feb 2022 17:34:34 -0000
X-APM-Out-ID: 16453784746310
X-APM-Authkey: 257869/1(257869/1) 3
Received: from unknown (HELO ?192.168.1.214?) (81.138.1.83)
 by smtp001.apm-internet.net with SMTP; 20 Feb 2022 17:34:34 -0000
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.21\))
Subject: Re: [pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced
 constants [PR104117].
From: Iain Sandoe <iain@sandoe.co.uk>
In-Reply-To: <54dd4562-2ef0-34f0-a99f-56695e3788aa@redhat.com>
Date: Sun, 20 Feb 2022 17:34:33 +0000
Cc: Vladimir Makarov via Gcc-patches <gcc-patches@gcc.gnu.org>,
 Vladimir Makarov <vmakarov@redhat.com>
Content-Transfer-Encoding: quoted-printable
Message-Id: <3252FEE6-00A8-4201-84DA-623850C16698@sandoe.co.uk>
References: <20220211235904.25765-1-iain@sandoe.co.uk>
 <mptv8xhye7g.fsf@arm.com> <dc92ba2a-e5b3-bc16-ee1a-cfc0860dfbf4@redhat.com>
 <mpt7d9xwi86.fsf@arm.com> <54dd4562-2ef0-34f0-a99f-56695e3788aa@redhat.com>
To: Richard Sandiford <richard.sandiford@arm.com>
X-Mailer: Apple Mail (2.3445.104.21)
X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00, BODY_8BITS, KAM_COUK,
 KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW,
 RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Sun, 20 Feb 2022 17:34:38 -0000

Hi Folks.

> On 14 Feb 2022, at 16:58, Vladimir Makarov <vmakarov@redhat.com> =
wrote:
> On 2022-02-14 11:00, Richard Sandiford wrote:

>> Vladimir Makarov via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>>=20
>>> Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.
>>>=20
>>> I think there is no need for this code and it is misleading.  If
>>> 'mem[low_sum]' does not work, I don't think that =
'reg=3Dlow_sum;mem[reg]'
>>> will help for any existing target.  As machine-dependent code for =
any
>>> target most probably (for ppc64 darwin it is exactly the case) =
checks
>>> address only in memory, it can wrongly accept wrong address by =
reloading
>>> it into reg and use it in memory. So these are my arguments for the
>>> remove this code from process_address_1.
>> I'm probably making too much of this, but:
>>=20
>> I think the code is potentially useful in that existing targets do =
forbid
>> forbid lo_sum addresses in certain contexts (due to limited offset =
range)
>> while still wanting lo_sum to be used to be load the address.  If we
>> handle the high/lo_sum split in generic code then we have more chance
>> of being able to optimise things.  So it feels like this is setting =
an
>> unfortunate precedent.
>>=20
>> I still don't understand what went wrong before though (the PR trail
>> was a bit too long to process :-)).  Is there a case where
>> (lo_sum (high X) X) !=3D X?  If so, that seems like a target bug to =
me.
>> Or does the target accept (set R1 (lo_sum R2 X)) for an X that cannot
>> be split into a HIGH/LO_SUM pair?  I'd argue that's a target bug too.
>>=20
> Sometimes it is hard to make a line where an RA bug is a bug in =
machine-dependent code or in RA itself.
>=20
> For this case I would say it is a bug in the both parts.
>=20
> Low-sum is generated by LRA and it does not know that it should be =
wrapped by unspec for darwin. Generally speaking we could avoid the =
change in LRA but it would require to do non-trivial analysis in machine =
dependent code to find cases when 'reg=3Dlow_sum ... mem[reg]' is =
incorrect code for darwin (PIC) target (and may be some other PIC =
targets too). Therefore I believe the change in LRA is a good solution =
even if the change can potentially result in less optimized code for =
some cases.  Taking your concern into account we could probably improve =
the patch by introducing a hook (I never liked such solutions as we =
already have too many hooks directing RA) or better to make the LRA =
change working only for PIC target. Something like this (it probably =
needs better recognition of pic target):
>=20
> --- a/gcc/lra-constraints.cc
> +++ b/gcc/lra-constraints.cc
> @@ -3616,21 +3616,21 @@ process_address_1 (int nop, bool check_only_p,
>           if (HAVE_lo_sum)
>             {
>               /* addr =3D> lo_sum (new_base, addr), case (2) above.  =
*/
>               insn =3D emit_insn (gen_rtx_SET
>                                 (new_reg,
>                                  gen_rtx_HIGH (Pmode, copy_rtx =
(addr))));
>               code =3D recog_memoized (insn);
>               if (code >=3D 0)
>                 {
>                   *ad.inner =3D gen_rtx_LO_SUM (Pmode, new_reg, addr);
> -                 if (!valid_address_p (op, &ad, cn))
> +                 if (!valid_address_p (op, &ad, cn) && !flag_pic)

IMO the PIC aspect of this is possibly misleading
 - the issue is that we have an invalid address, and that such addresses =
in this case need to be legitimised by wrapping them in an UNSPEC.=20
- My concern about the generic code was that I would not expect Darwin =
to be the only platform that might need to wrap an invlaid address in an =
unspec  [TOC, TLS, small data etc. spring to mind].

I need some help understanding the expected pathway through this code =
that could be useful.

we start with an invalid address.

1. we build (set reg (high invalid_address))
 - Darwin was allowing this (and the lo_sum) [eveywhere, not just here] =
on the basis that the target legitimizer would be called later to fix it =
up.  (that is why the initial recog passes) - but AFAICT we never call =
the target=E2=80=99s address legitimizer.

 - I am curious about what (other) circumstance there would be where a =
(high of an invalid address would be useful.

2. =E2=80=A6  assuming the we allowed the build of the (high invalid)

 - we now build the lo_sum and check to see if it is valid.

3. if it is _not_ valid, we load it into a reg

  - I am not sure (outside the comment about about post-legitimiizer =
use) about how an invalid lo_sum can be used in this way.

  - assuming we accept this, we then test to see if the register is a =
valid address (my guess is that test will pass pretty much everywhere, =
since we picked a suitable register in the first place).

^^^ this is mostly for my education - the stuff below is a potential =
solution to leaving lra-constraints unchanged and fixing the Darwin =
bug=E2=80=A6.

[ part of me wonders why we do not just call the target=E2=80=99s =
address legitimizer when we have an illegal address ]

=E2=80=94=E2=80=94=E2=80=94 current WIP:

So .. I have split the Darwin patterns into a hi/lo pair for non-PIC and =
a hi/lo pair for PIC.

I added a predicate for the PIC case that requires it to match (unspec =
[=E2=80=A6.] UNSPEC_MACHOPIC_OFFSET).

Thus both the attempts (high and (lo_sum will fail recog now which has =
the same effect as punting on the bad lo_sum (and the original testcase =
now generates correct code) =E2=80=A6

=E2=80=A6 the hardware is slow (well, it isn=E2=80=99t really - faster =
than a Cray XMP .. but =E2=80=A6) .. so regstraps on 11.2 (to check the =
fix works) and master will take some time.

Then, hopefully, there is a target-local fix (and the LRA change could =
be reverted if that=E2=80=99s the concensus about the best solution) ...

cheers
Iain