From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 1E98F3858C54; Mon, 28 Aug 2023 12:53:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1E98F3858C54 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1693227214; bh=bdSmB8neGxKGYY4G3IwEBAvje8NQ3xhqmcAUjCao5Gw=; h=From:To:Subject:Date:In-Reply-To:References:From; b=PiMdKoRF3+wsqXqc5aJOroZ11D1SatYaXs3ku2CvsWlWfvnfSXhi+i9zfYoqQQZf9 qoFKExYgf+qT22tP97OfDVxWk3o3+flUuYYQ/lU86U6tLhrT/KJ0/h/cERxyvrkqA9 TVOPTd7QJQGnQdaqOIkgJK3LSMBPZBBYHINuFtrA= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/111166] gcc unnecessarily creates vector operations for packing 32 bit integers into struct (x86_64) Date: Mon, 28 Aug 2023 12:53:33 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 13.2.1 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D111166 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |guojiufu at gcc dot gnu.or= g, | |sayle at gcc dot gnu.org --- Comment #6 from Richard Biener --- Roger was working on TImode incoming(?) argument code generation, this is TImode outgoing argument code generation where we produce for 32bit parts 7: NOTE_INSN_BASIC_BLOCK 2 2: r84:SI=3Ddi:SI 3: r85:SI=3Dsi:SI 4: r86:SI=3Ddx:SI 5: r87:SI=3Dcx:SI 6: NOTE_INSN_FUNCTION_BEG 9: r88:DI=3Dzero_extend(r84:SI) 10: r89:DI=3Dr82:TI#0 11: r91:DI=3D0xffffffff00000000 12: {r90:DI=3Dr89:DI&r91:DI;clobber flags:CC;} 13: {r92:DI=3Dr90:DI|r88:DI;clobber flags:CC;} 14: r82:TI=3Dr82:TI&<0xffffffffffffffff,0>|zero_extend(r92:DI) 15: r93:DI=3Dzero_extend(r85:SI) 16: {r94:DI=3Dr93:DI<<0x20;clobber flags:CC;} 17: r95:DI=3Dr82:TI#0 18: r96:DI=3Dzero_extend(r95:DI#0) 19: {r97:DI=3Dr96:DI|r94:DI;clobber flags:CC;} 20: r82:TI=3Dr82:TI&<0xffffffffffffffff,0>|zero_extend(r97:DI) 21: r98:DI=3Dzero_extend(r86:SI) 22: r99:DI=3Dr82:TI#8 23: r101:DI=3D0xffffffff00000000 24: {r100:DI=3Dr99:DI&r101:DI;clobber flags:CC;} 25: {r102:DI=3Dr100:DI|r98:DI;clobber flags:CC;} 26: r82:TI=3Dr82:TI&<0,0xffffffffffffffff>|zero_extend(r102:DI)<<0x40 27: r103:DI=3Dzero_extend(r87:SI) 28: {r104:DI=3Dr103:DI<<0x20;clobber flags:CC;} 29: r105:DI=3Dr82:TI#8 30: r106:DI=3Dzero_extend(r105:DI#0) 31: {r107:DI=3Dr106:DI|r104:DI;clobber flags:CC;} 32: r82:TI=3Dr82:TI&<0,0xffffffffffffffff>|zero_extend(r107:DI)<<0x40 33: r108:DI=3Dr82:TI#0 34: r109:DI=3Dr82:TI#8 35: di:DI=3Dr108:DI 36: si:DI=3Dr109:DI 37: ax:DI=3Dcall [`do_smth_with_4_u32'] argc:0 and we fail to dissect "backwards" from the 33: r108:DI=3Dr82:TI#0 34: r109:DI=3Dr82:TI#8 subregs. Possibly one issue is that we re-use r82. The dual-use of r82 at the end also poses issues as combine tries to match things like (parallel [=20 (set (reg:DI 108 [ D.2865 ]) (subreg:DI (reg:TI 82 [ D.2865 ]) 0)) (set (reg:TI 82 [ D.2865 ]) (ior:TI (and:TI (reg:TI 82 [ D.2865 ]) (const_wide_int 0x0ffffffffffffffff)) (ashift:TI (zero_extend:TI (reg:DI 107)) (const_int 64 [0x40])))) ])=20=20=20=20=20=20 but fails to "rename" r82 to split the parallel. At RTL expansion time we store to D.2865 where it's DECL_RTL is r82:TI so we can hardly fix it there. Only a later pass could figure each of the insns fully define the reg. Jiufu Guo is working to improve what we choose for DECL_RTL, but for incoming params / outgoing return. This is a case where we could, with -fno-tree-vectorize, improve DECL_RTL for an automatic var and choose not TImode but something like a (concat:TI reg:DI reg:DI).=