public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/108999] New: Maybe LRA produce inaccurate hardware register occupancy information for subreg operand
@ 2023-03-03  1:31 lehua.ding at rivai dot ai
  2023-03-03 15:09 ` [Bug rtl-optimization/108999] " vmakarov at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: lehua.ding at rivai dot ai @ 2023-03-03  1:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108999

            Bug ID: 108999
           Summary: Maybe LRA produce inaccurate hardware register
                    occupancy information for subreg operand
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: lehua.ding at rivai dot ai
  Target Milestone: ---

The problem code on the compiler explorer is here:
https://godbolt.org/z/GaGWEahPY

The problem is that the line `mov z1.d, z4.d` of the assembly code[1] is
unnecessary. I find the reason is the LRA pass[2] thinks the hard registers
`(subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) [16, 16])` occupied is in
conflict with `(reg/v:VNx2DI 98 [ v19 ])`[3]. That is not true because r103
occupied 32 and 33, and r98 occupied 34 according to the dump information of
IRA[4]. This is because the function process_alt_operands in lra-constraints.cc
source file[5]. When it checks whether the operand 0 of insn 39 is in conflict
with other operands of insn 39, it set operand 0 occupied 33 and 34 according
to the mode(`biggest_mode[i]`) and the start hard regno
33(`clobbered_hard_regno`). The mode it used is VNx4DI, I think it should use
Vnx2DI which is the proper mode for the entire operand 0. So for getting the
occupied hard register of the normal subreg operand, it is maybe too wider if
use the inner reg's mode.

References:

[1] assembly code
```
subreg_coalesce5:
        mov     p1.b, p0.b
        ld2d    {z0.d - z1.d}, p0/z, [x0]
        cmp     w1, 0
        ble     .L2
        sxtw    x1, w1
        mov     x0, 0
.L3:
        ld1d    z3.d, p1/z, [x2, x0, lsl 3]
        ld1d    z2.d, p1/z, [x3, x0, lsl 3]
        add     x0, x0, 1
        movprfx z4.d, p0/z, z1.d
        mla     z4.d, p0/m, z3.d, z2.d
        movprfx z0.d, p0/z, z0.d
        mla     z0.d, p0/m, z3.d, z2.d
        mov     z1.d, z4.d
        cmp     x1, x0
        bne     .L3
.L2:
        st2d    {z0.d - z1.d}, p0, [x4]
        ret
```

[2] partial content of LRA dump info
```
...
            0 Early clobber: reject++
            0 Conflict early clobber reload: reject--
          alt=0,overall=6,losers=1,rld_nregs=0
            0 Early clobber: reject++
          alt=1,overall=1,losers=0,rld_nregs=0
         Choosing alt 1 in insn 36:  (0) &w  (1) Upl  (2) w  (3) w  (4) 0  (5)
Dz {*cond_fmavnx2di_any}
            0 Early clobber: reject++
            0 Matched conflict early clobber reloads: reject--
          alt=0,overall=6,losers=1,rld_nregs=0
            0 Early clobber: reject++
            0 Conflict early clobber reload: reject--
          alt=1,overall=6,losers=1,rld_nregs=0
            0 Early clobber: reject++
            2 Matching earlyclobber alt: reject--
          alt=2,overall=6,losers=1,rld_nregs=1
            0 Early clobber: reject++
            3 Matching earlyclobber alt: reject--
          alt=3,overall=6,losers=1,rld_nregs=1
            0 Early clobber: reject++
            5 Matching earlyclobber alt: reject--
            5 Non-pseudo reload: reject+=2
            5 Non input pseudo reload: reject++
            alt=4,overall=9,losers=1 -- refuse
            Staticly defined alt reject+=6
            0 Early clobber: reject++
            5 Non-pseudo reload: reject+=2
            5 Non input pseudo reload: reject++
            alt=5,overall=16,losers=1 -- refuse
         Choosing alt 0 in insn 39:  (0) =&w  (1) Upl  (2) w  (3) w  (4) w  (5)
Dz {*cond_fmavnx2di_any}
      Creating newreg=117, assigning class FP_REGS to r117
   39:
r117:VNx2DI=unspec[r104:VNx16BI#0,r97:VNx2DI*r98:VNx2DI+r103:VNx4DI#[16,16],const_vector]
284
      REG_DEAD r98:VNx2DI
      REG_DEAD r97:VNx2DI
    Inserting insn reload after:
   76: r103:VNx4DI#[16,16]=r117:VNx2DI
...
```

[3] partial rtl of IRA pass
```lisp
(insn 36 43 37 4 (set (subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) 0)
        (unspec:VNx2DI [
                (subreg:VNx2BI (reg/v:VNx16BI 104 [ pg ]) 0)
                (plus:VNx2DI (mult:VNx2DI (reg/v:VNx2DI 97 [ v18 ])
                        (reg/v:VNx2DI 98 [ v19 ]))
                    (subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) 0))
                (const_vector:VNx2DI repeat [
                        (const_int 0 [0])
                    ])
            ] UNSPEC_SEL)) "/app/example.c":13:25 discrim 1 7465
{*cond_fmavnx2di_any}
     (nil))
(insn 39 37 40 4 (set (subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) [16, 16])
        (unspec:VNx2DI [
                (subreg:VNx2BI (reg/v:VNx16BI 104 [ pg ]) 0)
                (plus:VNx2DI (mult:VNx2DI (reg/v:VNx2DI 97 [ v18 ])
                        (reg/v:VNx2DI 98 [ v19 ]))
                    (subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) [16, 16]))
                (const_vector:VNx2DI repeat [
                        (const_int 0 [0])
                    ])
            ] UNSPEC_SEL)) "/app/example.c":14:25 discrim 1 7465
{*cond_fmavnx2di_any}
     (expr_list:REG_DEAD (reg/v:VNx2DI 98 [ v19 ])
        (expr_list:REG_DEAD (reg/v:VNx2DI 97 [ v18 ])
            (nil))))
```

[4] partial content of IRA dump info
```
Disposition:
    6:r96  l0    69   24:r97  l0    35   23:r98  l0    34    4:r99  l0     1
    3:r102 l0     0    2:r103 l0    32    1:r104 l0    68    5:r106 l0     1
    7:r107 l0     2    8:r108 l0     3    0:r109 l0     4   12:r110 l0    68
    9:r111 l0     0   14:r112 l0     1   13:r113 l0     2   11:r114 l0     3
   10:r115 l0     4
```

[5] partial source code of  process_alt_operands
```c++
/* lra-constraints.cc */
static bool
process_alt_operands (int only_alternative)
{
  for (nop = 0; nop < n_operands; nop++)
    {
      ...
      biggest_mode[nop] = GET_MODE (op);
      if (GET_CODE (op) == SUBREG)
        {
            /* !!! Here use reg instead of subreg's mode */
            biggest_mode[nop] = wider_subreg_mode (op);
            operand_reg[nop] = reg = SUBREG_REG (op);
        }
    }
  ...
  for (nalt = 0; nalt < n_alternatives; nalt++)
    {
      ...
      for (nop = 0; nop < early_clobbered_regs_num; nop++)
        {
          ...
          /* !!! Here set operand0 occupied 33 and 34, where:
                   biggest_mode[i] is VNx4DI
                   clobbered_hard_regno is 33 */
          add_to_hard_reg_set (&temp_set, biggest_mode[i],
clobbered_hard_regno);
          ...
        }
    }
  ...
}
```

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug rtl-optimization/108999] Maybe LRA produce inaccurate hardware register occupancy information for subreg operand
  2023-03-03  1:31 [Bug rtl-optimization/108999] New: Maybe LRA produce inaccurate hardware register occupancy information for subreg operand lehua.ding at rivai dot ai
@ 2023-03-03 15:09 ` vmakarov at gcc dot gnu.org
  2023-03-09 13:45 ` cvs-commit at gcc dot gnu.org
  2023-03-30  8:17 ` lehua.ding at rivai dot ai
  2 siblings, 0 replies; 4+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2023-03-03 15:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108999

Vladimir Makarov <vmakarov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at gcc dot gnu.org

--- Comment #1 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
Thank you for filling this PR up.

I am going to fix this on the next week.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug rtl-optimization/108999] Maybe LRA produce inaccurate hardware register occupancy information for subreg operand
  2023-03-03  1:31 [Bug rtl-optimization/108999] New: Maybe LRA produce inaccurate hardware register occupancy information for subreg operand lehua.ding at rivai dot ai
  2023-03-03 15:09 ` [Bug rtl-optimization/108999] " vmakarov at gcc dot gnu.org
@ 2023-03-09 13:45 ` cvs-commit at gcc dot gnu.org
  2023-03-30  8:17 ` lehua.ding at rivai dot ai
  2 siblings, 0 replies; 4+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-03-09 13:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108999

--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Vladimir Makarov <vmakarov@gcc.gnu.org>:

https://gcc.gnu.org/g:a6457974a1f443ab58d2334c02260299616c78b8

commit r13-6551-ga6457974a1f443ab58d2334c02260299616c78b8
Author: Vladimir N. Makarov <vmakarov@redhat.com>
Date:   Thu Mar 9 08:41:09 2023 -0500

    LRA: For clobbered regs use operand mode instead of the biggest mode

    LRA is too conservative in calculation of conflicts with clobbered regs by
    using the biggest access mode.  This results in failure of possible reg
    coalescing and worse code.  This patch solves the problem.

            PR rtl-optimization/108999

    gcc/ChangeLog:

            * lra-constraints.cc (process_alt_operands): Use operand modes for
            clobbered regs instead of the biggest access mode.

    gcc/testsuite/ChangeLog:

            * gcc.target/aarch64/pr108999.c: New.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug rtl-optimization/108999] Maybe LRA produce inaccurate hardware register occupancy information for subreg operand
  2023-03-03  1:31 [Bug rtl-optimization/108999] New: Maybe LRA produce inaccurate hardware register occupancy information for subreg operand lehua.ding at rivai dot ai
  2023-03-03 15:09 ` [Bug rtl-optimization/108999] " vmakarov at gcc dot gnu.org
  2023-03-09 13:45 ` cvs-commit at gcc dot gnu.org
@ 2023-03-30  8:17 ` lehua.ding at rivai dot ai
  2 siblings, 0 replies; 4+ messages in thread
From: lehua.ding at rivai dot ai @ 2023-03-30  8:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108999

Lehua Ding <lehua.ding at rivai dot ai> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #3 from Lehua Ding <lehua.ding at rivai dot ai> ---
Fixed.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-03-30  8:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-03  1:31 [Bug rtl-optimization/108999] New: Maybe LRA produce inaccurate hardware register occupancy information for subreg operand lehua.ding at rivai dot ai
2023-03-03 15:09 ` [Bug rtl-optimization/108999] " vmakarov at gcc dot gnu.org
2023-03-09 13:45 ` cvs-commit at gcc dot gnu.org
2023-03-30  8:17 ` lehua.ding at rivai dot ai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).