public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@arm.com>
To: Hongtao Liu <crazylht@gmail.com>
Cc: Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org>,
	ebotcazou@libertysurf.fr, steven@gcc.gnu.org,
	Jakub Jelinek <jakub@redhat.com>
Subject: Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.
Date: Wed, 20 Jan 2021 14:14:35 +0000	[thread overview]
Message-ID: <mptzh13zodw.fsf@arm.com> (raw)
In-Reply-To: <CAMZc-byHHb4ZUur0TQ-yeNUvm_oHstbProRsSCLPED7eRHaOsQ@mail.gmail.com> (Hongtao Liu's message of "Wed, 20 Jan 2021 12:35:15 +0800")

Hongtao Liu <crazylht@gmail.com> writes:
> On Wed, Jan 20, 2021 at 12:10 AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > On Tue, Jan 19, 2021 at 12:38:47PM +0000, Richard Sandiford via Gcc-patches wrote:
>> >> > actually only the lower 16bits are needed, the original insn is like
>> >> >
>> >> > .294.r.ira
>> >> > (insn 69 68 70 13 (set (reg:HI 96 [ _52 ])
>> >> >         (subreg:HI (reg:DI 82 [ var_6.0_1 ]) 0)) "test.c":21:23 76
>> >> > {*movhi_internal}
>> >> >      (nil))
>> >> > (insn 78 75 82 13 (set (reg:V4HI 140 [ _283 ])
>> >> >         (vec_duplicate:V4HI (truncate:HI (subreg:SI (reg:HI 96 [ _52
>> >> > ]) 0)))) 1412 {*vec_dupv4hi}
>> >> >      (nil))
>> >> >
>> >> > .295r.reload
>> >> > (insn 69 68 70 13 (set (reg:HI 5 di [orig:96 _52 ] [96])
>> >> >         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "test.c":21:23 76
>> >> > {*movhi_internal}
>> >> >      (nil))
>> >> > (insn 489 75 78 13 (set (reg:SI 22 xmm2 [297])
>> >> >         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
>> >> >      (nil))
>> >> > (insn 78 489 490 13 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140])
>> >> >         (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297]))))
>> >> > 1412 {*vec_dupv4hi}
>> >> >      (nil))
>> >> >
>> >> > and insn 489 is created by lra/reload which seems ok for the sequence,
>> >> > but problemistic with considering the logic of hardreg_cprop.
>> >>
>> >> It looks OK even with the regcprop behaviour though:
>> >>
>> >> - insn 69 defines only the low 16 bits of di,
>> >> - insn 489 defines only the low 16 bits of xmm2, but copies bits 16-31
>> >>   too (with unknown contents)
>> >> - insn 78 uses only the low 16 bits of xmm2 (the unknown contents
>> >>   introduced by insn 489 are truncated away)
>> >>
>> >> So where do bits 16-31 become significant?  What goes wrong if they're
>> >> not zero?
>> >
>> > The k0 register is initialized I believe with
>> > (insn 20 2 21 2 (set (reg:DI 68 k0 [orig:82 var_6.0_1 ] [82])
>> >         (mem/c:DI (symbol_ref:DI ("var_6") [flags 0x40]  <var_decl 0x7f7babeaaf30 var_6>) [3 var_6+0 S8 A64])) "pr98694.C":21:10 74 {*movdi_internal}
>> >      (nil))
>> > and so it contains all 64-bits, and then the code sometimes uses all the
>> > bits, sometimes just the low 16-bits and sometimes low 32-bits of that
>> > value.
>> > (insn 69 68 70 12 (set (reg:HI 5 di [orig:96 _52 ] [96])
>> >         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":27:23 76 {*movhi_internal}
>> >      (nil))
>> > (insn 74 73 75 12 (set (reg:SI 36 r8 [orig:149 _52 ] [149])
>> >         (zero_extend:SI (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82]))) 144 {*zero_extendhisi2}
>> >      (nil))
>> > (insn 489 75 78 12 (set (reg:SI 22 xmm2 [297])
>> >         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
>> >      (nil))
>> > (insn 78 489 490 12 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140])
>> >         (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297])))) 1412 {*vec_dupv4hi}
>> >      (expr_list:REG_DEAD (reg:SI 22 xmm2 [297])
>> >         (nil)))
>> > are examples when it uses only the low 16 bits from that, and
>> > (insn 487 72 73 12 (set (reg:SI 1 dx [148])
>> >         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal}
>> >      (nil))
>> >
>> > (insn 85 84 491 13 (set (reg:SI 37 r9 [orig:86 _11 ] [86])
>> >         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":28:14 75 {*movsi_internal}
>> >      (nil))
>> >
>> > (insn 491 85 88 13 (set (reg:SI 3 bx [299])
>> >         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal}
>> >      (nil))
>> > (insn 88 491 89 13 (set (reg:CCNO 17 flags)
>> >         (compare:CCNO (reg:SI 3 bx [299])
>> >             (const_int 0 [0]))) 7 {*cmpsi_ccno_1}
>> >      (expr_list:REG_DEAD (reg:SI 3 bx [299])
>> >         (nil)))
>> >
>> > (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
>> >         (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
>> >      (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
>> >         (nil)))
>> > are examples where it uses low 32-bits from k0.
>> > So the
>> >  (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
>> > -        (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
>> > -     (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
>> > +        (reg:SI 22 xmm2 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 {*movsi_internal}
>> > +     (expr_list:REG_DEAD (reg:SI 22 xmm2 [orig:86 _11 ] [86])
>> >          (nil)))
>> > cprop_hardreg change indeed looks bogus, while xmm2 has SImode, it holds
>> > only the low 16-bits of the value and has the upper bits undefined, while r9
>> > it is replacing had all of the low 32-bits well defined.
>>
>> Ah, ok, thanks for the extra context.
>>
>> So AIUI the problem when recording xmm2<-di isn't just:
>>
>>  [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
>>
>> but also that:
>>
>>  [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
>>
>> For example, all registers in this sequence can be part of the same chain:
>>
>>     (set (reg:HI R1) (reg:HI R0))
>>     (set (reg:SI R2) (reg:SI R1)) // [A]
>>     (set (reg:DI R3) (reg:DI R2)) // [A]
>>     (set (reg:SI R4) (reg:SI R[0-3]))
>>     (set (reg:HI R5) (reg:HI R[0-4]))
>>
>> But:
>>
>>     (set (reg:SI R1) (reg:SI R0))
>>     (set (reg:HI R2) (reg:HI R1))
>>     (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
>>
>> is problematic because it dips below the precision of the oldest regno
>> and then increases again.
>>
>> When this happens, I guess we have two choices:
>>
>> (1) what the patch does: treat R3 as the start of a new chain.
>> (2) pretend that the copy occured in vd->e[sr].mode instead
>>     (i.e. copy vd->e[sr].mode to vd->e[dr].mode)
>>
>> I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P.
>> Maybe the optimisation provided by (2) compared to (1) isn't common
>> enough to be worth the complication.
>>
>> I think we should test [B] as well as [A] though.  The pass is set
>> up to do some quite elaborate mode changes and I think rejecting
>> [A] on its own would make some of the other code redundant.
>> It also feels like it should be a seperate “if” or “else if”,
>> with its own comment.
>>
> Update patch.
>> Thanks,
>> Richard
>
>
>
> -- 
> BR,
> Hongtao
>
> From a52b3c8a90a0bf6cbda8ce86d99c82c6182863a7 Mon Sep 17 00:00:00 2001
> From: liuhongt <hongtao.liu@intel.com>
> Date: Mon, 18 Jan 2021 16:55:32 +0800
> Subject: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by
>  cprop_hardreg.
>
> If SRC had been assigned a mode narrower than the copy, we can't link
> DEST into the chain even they have same
> hard_regno_nregs(i.e. HImode/SImode in i386 backend).

This is a bit out of date now.  Maybe just say “can't always link”
instead of just “can't link”.

> i.e
>         kmovw   %k0, %edi
>         vmovd   %edi, %xmm2
> 	vpshuflw        $0, %xmm2, %xmm0
>         kmovw   %k0, %r8d
>         kmovd   %k0, %r9d
> ...
> -	 movl %r9d, %r11d
> +	 vmovd %xmm2, %r11d
>
> gcc/ChangeLog:
>
> 	PR rtl-optimization/98694
> 	* regcprop.c (copy_value): If SRC had been assigned a mode
> 	narrower than the copy, we can't link DEST into the chain even
> 	they have same hard_regno_nregs(i.e. HImode/SImode in i386
> 	backend).
>
> gcc/testsuite/ChangeLog:
>
> 	PR rtl-optimization/98694
> 	* gcc.target/i386/pr98694.c: New test.
> ---
>  gcc/regcprop.c                          | 33 +++++++++++++++++++++
>  gcc/testsuite/gcc.target/i386/pr98694.c | 38 +++++++++++++++++++++++++
>  2 files changed, 71 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr98694.c
>
> diff --git a/gcc/regcprop.c b/gcc/regcprop.c
> index dd62cb36013..908298beaea 100644
> --- a/gcc/regcprop.c
> +++ b/gcc/regcprop.c
> @@ -358,6 +358,39 @@ copy_value (rtx dest, rtx src, struct value_data *vd)
>    else if (sn > hard_regno_nregs (sr, vd->e[sr].mode))
>      return;
>  
> +  /* If SRC had been assigned a mode narrower than the copy, Although
> +     they have same hard_regno_nregs, it's not safe to link DEST into the
> +     chain. .i.e.

How about:

  It is not safe to link DEST into the chain if SRC was defined in some
  narrower mode M and if M is also narrower than the mode of the first
  register in the chain.  For example:

> +     (set (reg:DI r1) (reg:DI r0))
> +     (set (reg:HI r2) (reg:HI r1))
> +     (set (reg:SI r3) (reg:SI r2)) //Should be a new chain start at r3
> +     (set (reg:SI r4) (reg:SI r1))
> +     (set (reg:SI r5) (reg:SI r4))
> +     the upper part of r3 is undefined, if adding it to the chain, it may be
> +     prop to r5 which has defined upper bits, .i.e. pr98694.

And for this:

     (set (reg:DI r1) (reg:DI r0))
     (set (reg:HI r2) (reg:HI r1))
     (set (reg:SI r3) (reg:SI r2)) // Should be a new chain starting at r3
     (set (reg:SI r4) (reg:SI r1))
     (set (reg:SI r5) (reg:SI r4))

  the upper part of r3 is undefined.  If we added it to the chain,
  it may be used to replace r5, which has defined upper bits.
  See PR98694 for details.

> +
> +     [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
> +     [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)
> +     Condition B is added to to catch optimization opportunities of
> +
> +     (set (reg:HI R1) (reg:HI R0))
> +     (set (reg:SI R2) (reg:SI R1)) // [A]
> +     (set (reg:DI R3) (reg:DI R2)) // [A]
> +     (set (reg:SI R4) (reg:SI R[0-3]))
> +     (set (reg:HI R5) (reg:HI R[0-4]))
> +

Maybe add here:

  in which all registers have only 16 defined bits.

> +     but problematic for
> +
> +     (set (reg:SI R1) (reg:SI R0))
> +     (set (reg:HI R2) (reg:HI R1))
> +     (set (reg:SI R3) (reg:SI R2)) // [A] && [B]
> +
> +     to be fixed????   */

I think we should drop this part.  Your example above covers it in
more detail.

OK with those changes and the one that HJ asked for.

Thanks,
Richard

  parent reply	other threads:[~2021-01-20 14:14 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-18  9:16 Hongtao Liu
2021-01-18 10:18 ` Richard Sandiford
2021-01-18 10:43   ` Hongtao Liu
2021-01-18 10:51     ` Hongtao Liu
2021-01-18 11:10     ` Richard Sandiford
2021-01-19  0:59       ` Hongtao Liu
2021-01-19 12:38         ` Richard Sandiford
2021-01-19 14:45           ` Jakub Jelinek
2021-01-19 16:10             ` Richard Sandiford
2021-01-20  4:35               ` Hongtao Liu
2021-01-20  4:40                 ` Hongtao Liu
2021-01-20 12:56                 ` H.J. Lu
2021-01-20 14:14                 ` Richard Sandiford [this message]
2021-01-21  5:25                   ` Hongtao Liu
2021-05-05 17:44               ` [PATCH] regcprop: Fix another cprop_hardreg bug [PR100342] Jakub Jelinek
2021-05-06  8:50                 ` Jakub Jelinek
2021-05-11 10:59                 ` Richard Sandiford
2021-05-13 15:37                   ` Jakub Jelinek
2021-05-13 17:01                     ` Jakub Jelinek
2021-05-14  9:09                       ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptzh13zodw.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=crazylht@gmail.com \
    --cc=ebotcazou@libertysurf.fr \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=steven@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).