Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Uros Bizjak <ubizjak@gmail.com>,
	 Richard Biener <richard.guenther@gmail.com>
Subject: Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]
Date: Wed, 12 Jul 2023 11:23:23 +0100	[thread overview]
Message-ID: <mptfs5tv36s.fsf@arm.com> (raw)
In-Reply-To: <CAFiYyc3Hq_9sj1ypgmcJp++FHYHFCMBNiCBKyAwskk0Vo5QCXA@mail.gmail.com> (Richard Biener via Gcc-patches's message of "Mon, 10 Jul 2023 13:57:32 +0200")

Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Mon, Jul 10, 2023 at 1:01 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>>
>> On Mon, Jul 10, 2023 at 11:47 AM Richard Biener
>> <richard.guenther@gmail.com> wrote:
>> >
>> > On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>> > >
>> > > On Mon, Jul 10, 2023 at 11:17 AM Richard Biener
>> > > <richard.guenther@gmail.com> wrote:
>> > > >
>> > > > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches
>> > > > <gcc-patches@gcc.gnu.org> wrote:
>> > > > >
>> > > > > As shown in the PR, simplify_gen_subreg call in simplify_replace_fn_rtx:
>> > > > >
>> > > > > (gdb) list
>> > > > > 469           if (code == SUBREG)
>> > > > > 470             {
>> > > > > 471               op0 = simplify_replace_fn_rtx (SUBREG_REG (x),
>> > > > > old_rtx, fn, data);
>> > > > > 472               if (op0 == SUBREG_REG (x))
>> > > > > 473                 return x;
>> > > > > 474               op0 = simplify_gen_subreg (GET_MODE (x), op0,
>> > > > > 475                                          GET_MODE (SUBREG_REG (x)),
>> > > > > 476                                          SUBREG_BYTE (x));
>> > > > > 477               return op0 ? op0 : x;
>> > > > > 478             }
>> > > > >
>> > > > > simplifies with following arguments:
>> > > > >
>> > > > > (gdb) p debug_rtx (op0)
>> > > > > (const_vector:V4QI [
>> > > > >         (const_int -52 [0xffffffffffffffcc]) repeated x4
>> > > > >     ])
>> > > > > (gdb) p debug_rtx (x)
>> > > > > (subreg:V16QI (reg:V4QI 98) 0)
>> > > > >
>> > > > > to:
>> > > > >
>> > > > > (gdb) p debug_rtx (op0)
>> > > > > (const_vector:V16QI [
>> > > > >         (const_int -52 [0xffffffffffffffcc]) repeated x16
>> > > > >     ])
>> > > > >
>> > > > > This simplification is invalid, it is not possible to get V16QImode vector
>> > > > > from V4QImode vector, even when all elements are duplicates.
>> >
>> > ^^^
>> >
>> > I think this simplification is valid.  A simplification to
>> >
>> > (const_vector:V16QI [
>> >          (const_int -52 [0xffffffffffffffcc]) repeated x4
>> >          (const_int 0 [0]) repeated x12
>> >      ])
>> >
>> > would be valid as well.
>> >
>> > > > > The simplification happens in simplify_context::simplify_subreg:
>> > > > >
>> > > > > (gdb) list
>> > > > > 7558          if (VECTOR_MODE_P (outermode)
>> > > > > 7559              && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode)
>> > > > > 7560              && vec_duplicate_p (op, &elt))
>> > > > > 7561            return gen_vec_duplicate (outermode, elt);
>> > > > >
>> > > > > but the above simplification is valid only for non-paradoxical registers,
>> > > > > where outermode <= innermode.  We should not assume that elements outside
>> > > > > the original register are valid, let alone all duplicates.
>> > > >
>> > > > Hmm, but looking at the audit trail the x86 backend expects them to be zero?
>> > > > Isn't that wrong as well?
>> > >
>> > > If you mean Comment #10, it is just an observation that
>> > > simplify_replace_rtx simplifies arguments from Comment #9 to:
>> > >
>> > > (gdb) p debug_rtx (src)
>> > > (const_vector:V8HI [
>> > >         (const_int 204 [0xcc]) repeated x4
>> > >         (const_int 0 [0]) repeated x4
>> > >     ])
>> > >
>> > > instead of:
>> > >
>> > > (gdb) p debug_rtx (src)
>> > > (const_vector:V8HI [
>> > >         (const_int 204 [0xcc]) repeated x8
>> > >     ])
>> > >
>> > > which is in line with the statement below.
>> > > >
>> > > > That is, I think putting any random value into the upper lanes when
>> > > > constant folding
>> > > > a paradoxical subreg sounds OK to me, no?
>> > >
>> > > The compiler is putting zero there as can be seen from the above new RTX.
>> > >
>> > > > Of course we might choose to not do such constant propagation for
>> > > > efficiency reason - at least
>> > > > when the resulting CONST_* would require a larger constant pool entry
>> > > > or more costly
>> > > > construction.
>> > >
>> > > This is probably a follow-up improvement, where this patch tries to
>> > > fix a specific invalid simplification of simplify_replace_rtx that is
>> > > invalid universally.
>> >
>> > How so?  What specifies the values of the paradoxical subreg for the
>> > bytes not covered by the subreg operand?
>>
>> I don't know why 0 is generated here (and if it is valid) for
>> paradoxical bytes, but 0xcc is not correct, since it sets REG_EQUAL to
>> the wrong constant and triggers unwanted propagation later on.
>
> Quoting what I wrote in the PR below.  I think pragmatically the fix is
> good - we might miss some opportunistic folding this way but we for
> sure may not optimistically register an equality via REG_EQUAL without
> enforcing it (removing the producer and replacing it with the optimistic
> constant).
>
> So consider the patch approved if no other RTL maintainer chimes in
> within 48h.

Sorry, can you hold off a bit longer?  Wanted to have a look but the
deadline is about to expire.

I think at least a comment is needed, since like Richard says,
the transformation seems correct for paradoxical subregs, even if it
isn't wanted for other reasons.

Thanks,
Richard

>
> Thanks,
> Richard.
>
>
> I can see cprop1 adds the REG_EQUAL note:
>
> (insn 22 21 23 4 (set (reg:V8HI 100)
>         (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
>                 (parallel [
>                         (const_int 0 [0])
>                         (const_int 1 [0x1])
>                         (const_int 2 [0x2])
>                         (const_int 3 [0x3])
>                         (const_int 4 [0x4])
>                         (const_int 5 [0x5])
>                          (const_int 6 [0x6])
>                          (const_int 7 [0x7])
>                      ])))) "t.c":12:42 7557 {sse4_1_zero_extendv8qiv8hi2}
> -     (expr_list:REG_DEAD (reg:V4QI 98)
> -        (nil)))
> +     (expr_list:REG_EQUAL (const_vector:V8HI [
> +                (const_int 204 [0xcc]) repeated x8
> +            ])
> +        (expr_list:REG_DEAD (reg:V4QI 98)
> +            (nil))))
>
> but I don't see yet what the actual wrong transform based on this
> REG_EQUAL note is?
>
> It looks like we CSE the above with
>
> -   46: r122:V8QI=[`*.LC3']
> -      REG_EQUAL const_vector
> -   48: r125:V8HI=zero_extend(vec_select(r122:V8QI#0,parallel))
> -      REG_EQUAL const_vector
> -      REG_DEAD r122:V8QI
> -   49: r126:V8HI=r124:V8HI*r125:V8HI
> -      REG_DEAD r125:V8HI
> +   49: r126:V8HI=r124:V8HI*r100:V8HI
>
> but otherwise do nothing.  So the issue is that we rely on the "undefined"
> vals to have a specific value (from the earlier REG_EQUAL note) but actual
> code generation doesn't ensure this (it doesn't need to).  That said,
> the issue isn't the constant folding per-se but that we do not actually
> constant fold but register an equality that doesn't hold.
>
>
>
>> Uros.

next prev parent reply	other threads:[~2023-07-12 10:23 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-09  8:52 Uros Bizjak
2023-07-10  9:16 ` Richard Biener
2023-07-10  9:26   ` Uros Bizjak
2023-07-10  9:47     ` Richard Biener
2023-07-10 11:01       ` Uros Bizjak
2023-07-10 11:57         ` Richard Biener
2023-07-12 10:23           ` Richard Sandiford [this message]
2023-07-12 10:58             ` Uros Bizjak
2023-07-12 11:04               ` Uros Bizjak
2023-07-12 11:13                 ` Richard Biener
2023-07-12 11:47                   ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptfs5tv36s.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.guenther@gmail.com \
    --cc=ubizjak@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).