From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 0104C385842C; Wed, 4 Oct 2023 19:25:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0104C385842C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1696447537; bh=bVrDhsitPHoPhEG3U4o/DH5pmQdcvZ5UtKIRadYw0CM=; h=From:To:Subject:Date:From; b=Gx1RHZNK/cdGZlP9ODiW4qjSDtu+XT0lxWe4hWOPNN7VebIhatgRuwqu2HzAB/bRo fdAh6wGKLZr3gV7qjCAjV+VsjsE7kVIMM7PDwXeF5dbx+CUn84fRzBI19NL5T9Gj28 iYPtR8pkJ4YdRsbme0UU6wsNSFERUx42RDu873fo= From: "prathamesh3492 at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/111697] New: Sub optimal code gen for initialising vector using loop Date: Wed, 04 Oct 2023 19:25:36 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: prathamesh3492 at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D111697 Bug ID: 111697 Summary: Sub optimal code gen for initialising vector using loop Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- Hi, For the following test-case: typedef int v4si __attribute__((vector_size (sizeof (int) * 4))); v4si f(int x) { v4si v; for (int i =3D 0; i < 4; i++) v[i] =3D x; return v; } Compiling with -O2 results in following .optimized dump: v4si f (int x) { v4si v; [local count: 214748368]: v_16 =3D BIT_INSERT_EXPR ; v_20 =3D BIT_INSERT_EXPR ; v_24 =3D BIT_INSERT_EXPR ; v_2 =3D BIT_INSERT_EXPR ; return v_2; } and following code-gen on aarch64: f: movi v0.4s, 0 fmov s31, w0 ins v0.s[0], v31.s[0] ins v0.s[1], v31.s[0] ins v0.s[2], v31.s[0] ins v0.s[3], v31.s[0] ret which could instead be a single dup instruction: f: dup v0.4s, w0 ret Similarly, code-gen on x86_64: f: movd %edi, %xmm0 movd %edi, %xmm1 pshufd $225, %xmm0, %xmm0 movss %xmm1, %xmm0 pshufd $225, %xmm0, %xmm0 pshufd $198, %xmm0, %xmm0 movss %xmm1, %xmm0 pshufd $198, %xmm0, %xmm0 pshufd $39, %xmm0, %xmm0 movss %xmm1, %xmm0 pshufd $39, %xmm0, %xmm0 ret=