From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 1D78E3858C83; Fri, 3 Mar 2023 01:31:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1D78E3858C83 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1677807099; bh=I26sH1TYt81gG6Tdg49q1IcllvoPts9b8xtpekOFWYs=; h=From:To:Subject:Date:From; b=x1TYu7YljFmg5kGEoj4G0RMAIT3YRsn9KlwuVnjzhNmPKeU0hNf6T5m8sNHuVHp1l uyPMeDoUrfKN4fyWOYITnaV9DJdDftbpWLppNx8BSr12iowupWQMyEyr+geRlvYmy7 veCgc8JeZN8iY+xJvRBpHnMC7Jytk64sQfo7E6lA= From: "lehua.ding at rivai dot ai" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/108999] New: Maybe LRA produce inaccurate hardware register occupancy information for subreg operand Date: Fri, 03 Mar 2023 01:31:38 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: lehua.ding at rivai dot ai X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108999 Bug ID: 108999 Summary: Maybe LRA produce inaccurate hardware register occupancy information for subreg operand Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: lehua.ding at rivai dot ai Target Milestone: --- The problem code on the compiler explorer is here: https://godbolt.org/z/GaGWEahPY The problem is that the line `mov z1.d, z4.d` of the assembly code[1] is unnecessary. I find the reason is the LRA pass[2] thinks the hard registers `(subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) [16, 16])` occupied is in conflict with `(reg/v:VNx2DI 98 [ v19 ])`[3]. That is not true because r103 occupied 32 and 33, and r98 occupied 34 according to the dump information of IRA[4]. This is because the function process_alt_operands in lra-constraint= s.cc source file[5]. When it checks whether the operand 0 of insn 39 is in confl= ict with other operands of insn 39, it set operand 0 occupied 33 and 34 accordi= ng to the mode(`biggest_mode[i]`) and the start hard regno 33(`clobbered_hard_regno`). The mode it used is VNx4DI, I think it should u= se Vnx2DI which is the proper mode for the entire operand 0. So for getting the occupied hard register of the normal subreg operand, it is maybe too wider = if use the inner reg's mode. References: [1] assembly code ``` subreg_coalesce5: mov p1.b, p0.b ld2d {z0.d - z1.d}, p0/z, [x0] cmp w1, 0 ble .L2 sxtw x1, w1 mov x0, 0 .L3: ld1d z3.d, p1/z, [x2, x0, lsl 3] ld1d z2.d, p1/z, [x3, x0, lsl 3] add x0, x0, 1 movprfx z4.d, p0/z, z1.d mla z4.d, p0/m, z3.d, z2.d movprfx z0.d, p0/z, z0.d mla z0.d, p0/m, z3.d, z2.d mov z1.d, z4.d cmp x1, x0 bne .L3 .L2: st2d {z0.d - z1.d}, p0, [x4] ret ``` [2] partial content of LRA dump info ``` ... 0 Early clobber: reject++ 0 Conflict early clobber reload: reject-- alt=3D0,overall=3D6,losers=3D1,rld_nregs=3D0 0 Early clobber: reject++ alt=3D1,overall=3D1,losers=3D0,rld_nregs=3D0 Choosing alt 1 in insn 36: (0) &w (1) Upl (2) w (3) w (4) 0 = (5) Dz {*cond_fmavnx2di_any} 0 Early clobber: reject++ 0 Matched conflict early clobber reloads: reject-- alt=3D0,overall=3D6,losers=3D1,rld_nregs=3D0 0 Early clobber: reject++ 0 Conflict early clobber reload: reject-- alt=3D1,overall=3D6,losers=3D1,rld_nregs=3D0 0 Early clobber: reject++ 2 Matching earlyclobber alt: reject-- alt=3D2,overall=3D6,losers=3D1,rld_nregs=3D1 0 Early clobber: reject++ 3 Matching earlyclobber alt: reject-- alt=3D3,overall=3D6,losers=3D1,rld_nregs=3D1 0 Early clobber: reject++ 5 Matching earlyclobber alt: reject-- 5 Non-pseudo reload: reject+=3D2 5 Non input pseudo reload: reject++ alt=3D4,overall=3D9,losers=3D1 -- refuse Staticly defined alt reject+=3D6 0 Early clobber: reject++ 5 Non-pseudo reload: reject+=3D2 5 Non input pseudo reload: reject++ alt=3D5,overall=3D16,losers=3D1 -- refuse Choosing alt 0 in insn 39: (0) =3D&w (1) Upl (2) w (3) w (4) = w (5) Dz {*cond_fmavnx2di_any} Creating newreg=3D117, assigning class FP_REGS to r117 39: r117:VNx2DI=3Dunspec[r104:VNx16BI#0,r97:VNx2DI*r98:VNx2DI+r103:VNx4DI#[16,1= 6],const_vector] 284 REG_DEAD r98:VNx2DI REG_DEAD r97:VNx2DI Inserting insn reload after: 76: r103:VNx4DI#[16,16]=3Dr117:VNx2DI ... ``` [3] partial rtl of IRA pass ```lisp (insn 36 43 37 4 (set (subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) 0) (unspec:VNx2DI [ (subreg:VNx2BI (reg/v:VNx16BI 104 [ pg ]) 0) (plus:VNx2DI (mult:VNx2DI (reg/v:VNx2DI 97 [ v18 ]) (reg/v:VNx2DI 98 [ v19 ])) (subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) 0)) (const_vector:VNx2DI repeat [ (const_int 0 [0]) ]) ] UNSPEC_SEL)) "/app/example.c":13:25 discrim 1 7465 {*cond_fmavnx2di_any} (nil)) (insn 39 37 40 4 (set (subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) [16, 16]) (unspec:VNx2DI [ (subreg:VNx2BI (reg/v:VNx16BI 104 [ pg ]) 0) (plus:VNx2DI (mult:VNx2DI (reg/v:VNx2DI 97 [ v18 ]) (reg/v:VNx2DI 98 [ v19 ])) (subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) [16, 16])) (const_vector:VNx2DI repeat [ (const_int 0 [0]) ]) ] UNSPEC_SEL)) "/app/example.c":14:25 discrim 1 7465 {*cond_fmavnx2di_any} (expr_list:REG_DEAD (reg/v:VNx2DI 98 [ v19 ]) (expr_list:REG_DEAD (reg/v:VNx2DI 97 [ v18 ]) (nil)))) ``` [4] partial content of IRA dump info ``` Disposition: 6:r96 l0 69 24:r97 l0 35 23:r98 l0 34 4:r99 l0 1 3:r102 l0 0 2:r103 l0 32 1:r104 l0 68 5:r106 l0 1 7:r107 l0 2 8:r108 l0 3 0:r109 l0 4 12:r110 l0 68 9:r111 l0 0 14:r112 l0 1 13:r113 l0 2 11:r114 l0 3 10:r115 l0 4 ``` [5] partial source code of process_alt_operands ```c++ /* lra-constraints.cc */ static bool process_alt_operands (int only_alternative) { for (nop =3D 0; nop < n_operands; nop++) { ... biggest_mode[nop] =3D GET_MODE (op); if (GET_CODE (op) =3D=3D SUBREG) { /* !!! Here use reg instead of subreg's mode */ biggest_mode[nop] =3D wider_subreg_mode (op); operand_reg[nop] =3D reg =3D SUBREG_REG (op); } } ... for (nalt =3D 0; nalt < n_alternatives; nalt++) { ... for (nop =3D 0; nop < early_clobbered_regs_num; nop++) { ... /* !!! Here set operand0 occupied 33 and 34, where: biggest_mode[i] is VNx4DI clobbered_hard_regno is 33 */ add_to_hard_reg_set (&temp_set, biggest_mode[i], clobbered_hard_regno); ... } } ... } ```=