public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "lehua.ding at rivai dot ai" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/108999] New: Maybe LRA produce inaccurate hardware register occupancy information for subreg operand
Date: Fri, 03 Mar 2023 01:31:38 +0000	[thread overview]
Message-ID: <bug-108999-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108999

            Bug ID: 108999
           Summary: Maybe LRA produce inaccurate hardware register
                    occupancy information for subreg operand
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: lehua.ding at rivai dot ai
  Target Milestone: ---

The problem code on the compiler explorer is here:
https://godbolt.org/z/GaGWEahPY

The problem is that the line `mov z1.d, z4.d` of the assembly code[1] is
unnecessary. I find the reason is the LRA pass[2] thinks the hard registers
`(subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) [16, 16])` occupied is in
conflict with `(reg/v:VNx2DI 98 [ v19 ])`[3]. That is not true because r103
occupied 32 and 33, and r98 occupied 34 according to the dump information of
IRA[4]. This is because the function process_alt_operands in lra-constraints.cc
source file[5]. When it checks whether the operand 0 of insn 39 is in conflict
with other operands of insn 39, it set operand 0 occupied 33 and 34 according
to the mode(`biggest_mode[i]`) and the start hard regno
33(`clobbered_hard_regno`). The mode it used is VNx4DI, I think it should use
Vnx2DI which is the proper mode for the entire operand 0. So for getting the
occupied hard register of the normal subreg operand, it is maybe too wider if
use the inner reg's mode.

References:

[1] assembly code
```
subreg_coalesce5:
        mov     p1.b, p0.b
        ld2d    {z0.d - z1.d}, p0/z, [x0]
        cmp     w1, 0
        ble     .L2
        sxtw    x1, w1
        mov     x0, 0
.L3:
        ld1d    z3.d, p1/z, [x2, x0, lsl 3]
        ld1d    z2.d, p1/z, [x3, x0, lsl 3]
        add     x0, x0, 1
        movprfx z4.d, p0/z, z1.d
        mla     z4.d, p0/m, z3.d, z2.d
        movprfx z0.d, p0/z, z0.d
        mla     z0.d, p0/m, z3.d, z2.d
        mov     z1.d, z4.d
        cmp     x1, x0
        bne     .L3
.L2:
        st2d    {z0.d - z1.d}, p0, [x4]
        ret
```

[2] partial content of LRA dump info
```
...
            0 Early clobber: reject++
            0 Conflict early clobber reload: reject--
          alt=0,overall=6,losers=1,rld_nregs=0
            0 Early clobber: reject++
          alt=1,overall=1,losers=0,rld_nregs=0
         Choosing alt 1 in insn 36:  (0) &w  (1) Upl  (2) w  (3) w  (4) 0  (5)
Dz {*cond_fmavnx2di_any}
            0 Early clobber: reject++
            0 Matched conflict early clobber reloads: reject--
          alt=0,overall=6,losers=1,rld_nregs=0
            0 Early clobber: reject++
            0 Conflict early clobber reload: reject--
          alt=1,overall=6,losers=1,rld_nregs=0
            0 Early clobber: reject++
            2 Matching earlyclobber alt: reject--
          alt=2,overall=6,losers=1,rld_nregs=1
            0 Early clobber: reject++
            3 Matching earlyclobber alt: reject--
          alt=3,overall=6,losers=1,rld_nregs=1
            0 Early clobber: reject++
            5 Matching earlyclobber alt: reject--
            5 Non-pseudo reload: reject+=2
            5 Non input pseudo reload: reject++
            alt=4,overall=9,losers=1 -- refuse
            Staticly defined alt reject+=6
            0 Early clobber: reject++
            5 Non-pseudo reload: reject+=2
            5 Non input pseudo reload: reject++
            alt=5,overall=16,losers=1 -- refuse
         Choosing alt 0 in insn 39:  (0) =&w  (1) Upl  (2) w  (3) w  (4) w  (5)
Dz {*cond_fmavnx2di_any}
      Creating newreg=117, assigning class FP_REGS to r117
   39:
r117:VNx2DI=unspec[r104:VNx16BI#0,r97:VNx2DI*r98:VNx2DI+r103:VNx4DI#[16,16],const_vector]
284
      REG_DEAD r98:VNx2DI
      REG_DEAD r97:VNx2DI
    Inserting insn reload after:
   76: r103:VNx4DI#[16,16]=r117:VNx2DI
...
```

[3] partial rtl of IRA pass
```lisp
(insn 36 43 37 4 (set (subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) 0)
        (unspec:VNx2DI [
                (subreg:VNx2BI (reg/v:VNx16BI 104 [ pg ]) 0)
                (plus:VNx2DI (mult:VNx2DI (reg/v:VNx2DI 97 [ v18 ])
                        (reg/v:VNx2DI 98 [ v19 ]))
                    (subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) 0))
                (const_vector:VNx2DI repeat [
                        (const_int 0 [0])
                    ])
            ] UNSPEC_SEL)) "/app/example.c":13:25 discrim 1 7465
{*cond_fmavnx2di_any}
     (nil))
(insn 39 37 40 4 (set (subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) [16, 16])
        (unspec:VNx2DI [
                (subreg:VNx2BI (reg/v:VNx16BI 104 [ pg ]) 0)
                (plus:VNx2DI (mult:VNx2DI (reg/v:VNx2DI 97 [ v18 ])
                        (reg/v:VNx2DI 98 [ v19 ]))
                    (subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) [16, 16]))
                (const_vector:VNx2DI repeat [
                        (const_int 0 [0])
                    ])
            ] UNSPEC_SEL)) "/app/example.c":14:25 discrim 1 7465
{*cond_fmavnx2di_any}
     (expr_list:REG_DEAD (reg/v:VNx2DI 98 [ v19 ])
        (expr_list:REG_DEAD (reg/v:VNx2DI 97 [ v18 ])
            (nil))))
```

[4] partial content of IRA dump info
```
Disposition:
    6:r96  l0    69   24:r97  l0    35   23:r98  l0    34    4:r99  l0     1
    3:r102 l0     0    2:r103 l0    32    1:r104 l0    68    5:r106 l0     1
    7:r107 l0     2    8:r108 l0     3    0:r109 l0     4   12:r110 l0    68
    9:r111 l0     0   14:r112 l0     1   13:r113 l0     2   11:r114 l0     3
   10:r115 l0     4
```

[5] partial source code of  process_alt_operands
```c++
/* lra-constraints.cc */
static bool
process_alt_operands (int only_alternative)
{
  for (nop = 0; nop < n_operands; nop++)
    {
      ...
      biggest_mode[nop] = GET_MODE (op);
      if (GET_CODE (op) == SUBREG)
        {
            /* !!! Here use reg instead of subreg's mode */
            biggest_mode[nop] = wider_subreg_mode (op);
            operand_reg[nop] = reg = SUBREG_REG (op);
        }
    }
  ...
  for (nalt = 0; nalt < n_alternatives; nalt++)
    {
      ...
      for (nop = 0; nop < early_clobbered_regs_num; nop++)
        {
          ...
          /* !!! Here set operand0 occupied 33 and 34, where:
                   biggest_mode[i] is VNx4DI
                   clobbered_hard_regno is 33 */
          add_to_hard_reg_set (&temp_set, biggest_mode[i],
clobbered_hard_regno);
          ...
        }
    }
  ...
}
```

             reply	other threads:[~2023-03-03  1:31 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-03  1:31 lehua.ding at rivai dot ai [this message]
2023-03-03 15:09 ` [Bug rtl-optimization/108999] " vmakarov at gcc dot gnu.org
2023-03-09 13:45 ` cvs-commit at gcc dot gnu.org
2023-03-30  8:17 ` lehua.ding at rivai dot ai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-108999-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).