From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id A512C3857812; Fri, 4 Sep 2020 09:31:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A512C3857812 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1599211860; bh=iGcr4yST24WpJLScPUeulEXipUUJ6rElfT4Hdy7FeDE=; h=From:To:Subject:Date:From; b=t4MxBni0KE1Jd+1FoXGHC8gQUOczfJJs900Q0xY2UNZvcepfB7RJoumHnLbr5L0qd 7TIVE5mAorwmgNhp5QbC587thjYL/J6Q71VHMYH8jjzq6n2fBxTeW9PdbstFN+I9YW htS7tQFp5t+g5jkxYig6Ct+XQK1t4GNVA+9W9XQg= From: "linkw at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/96933] New: inefficient code for char/short vec CTOR Date: Fri, 04 Sep 2020 09:31:00 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: linkw at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Sep 2020 09:31:00 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D96933 Bug ID: 96933 Summary: inefficient code for char/short vec CTOR Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: linkw at gcc dot gnu.org Target Milestone: --- When I'm investigate the vectorization cost for vec_construct, I happened to find the generated code for vector construction is inefficient with DIRECT_= MOVE support. The test case looks like: vector unsigned char test_char(unsigned char f1, unsigned char f2, unsigned char f3, unsigned char f4, unsigned char f5, unsigned char f6, unsigned char f7, unsigned char f8, unsigned char f9, unsigned char f10, unsigned char f11, unsigned char f12, unsigned char f13, unsigned char f14, unsigned char f15, unsigned char f16) { vector unsigned char v =3D {f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15, f16}; return v; } The generated code currently with -mcpu=3Dpower9: 0000000000000000 : 0: e8 ff a1 fb std r29,-24(r1) 4: f0 ff c1 fb std r30,-16(r1) 8: f8 ff e1 fb std r31,-8(r1) c: 60 00 a1 8b lbz r29,96(r1) 10: 68 00 c1 8b lbz r30,104(r1) 14: 70 00 e1 8b lbz r31,112(r1) 18: d1 ff 81 98 stb r4,-47(r1) 1c: d2 ff a1 98 stb r5,-46(r1) 20: 78 00 81 89 lbz r12,120(r1) 24: 80 00 01 88 lbz r0,128(r1) 28: 88 00 61 89 lbz r11,136(r1) 2c: 90 00 81 88 lbz r4,144(r1) 30: 98 00 a1 88 lbz r5,152(r1) 34: d0 ff 61 98 stb r3,-48(r1) 38: d3 ff c1 98 stb r6,-45(r1) 3c: d4 ff e1 98 stb r7,-44(r1) 40: d8 ff a1 9b stb r29,-40(r1) 44: d5 ff 01 99 stb r8,-43(r1) 48: d6 ff 21 99 stb r9,-42(r1) 4c: d7 ff 41 99 stb r10,-41(r1) 50: d9 ff c1 9b stb r30,-39(r1) 54: da ff e1 9b stb r31,-38(r1) 58: db ff 81 99 stb r12,-37(r1) 5c: dc ff 01 98 stb r0,-36(r1) 60: dd ff 61 99 stb r11,-35(r1) 64: de ff 81 98 stb r4,-34(r1) 68: df ff a1 98 stb r5,-33(r1) 6c: e8 ff a1 eb ld r29,-24(r1) 70: f0 ff c1 eb ld r30,-16(r1) 74: f8 ff e1 eb ld r31,-8(r1) 78: d9 ff 41 f4 lxv vs34,-48(r1) 7c: 20 00 80 4e blr But it can be more efficient with direct move and vector merge, such as: 0: 67 01 43 7c mtvsrd vs34,r3 4: 68 00 61 80 lwz r3,104(r1) 8: 60 00 61 81 lwz r11,96(r1) c: 67 01 64 7c mtvsrd vs35,r4 10: 70 00 81 80 lwz r4,112(r1) 14: 67 01 03 7d mtvsrd vs40,r3 18: 78 00 61 80 lwz r3,120(r1) 1c: 67 01 85 7c mtvsrd vs36,r5 20: 67 01 a6 7c mtvsrd vs37,r6 24: 67 01 07 7c mtvsrd vs32,r7 28: 67 01 28 7c mtvsrd vs33,r8 2c: 67 01 24 7d mtvsrd vs41,r4 30: 80 00 81 80 lwz r4,128(r1) 34: 0c 10 43 10 vmrghb v2,v3,v2 38: 67 01 63 7c mtvsrd vs35,r3 3c: 88 00 61 80 lwz r3,136(r1) 40: 67 01 eb 7c mtvsrd vs39,r11 44: 0c 20 85 10 vmrghb v4,v5,v4 48: 67 01 a4 7c mtvsrd vs37,r4 4c: 90 00 81 80 lwz r4,144(r1) 50: 0c 00 01 10 vmrghb v0,v1,v0 54: 67 01 23 7c mtvsrd vs33,r3 58: 98 00 61 80 lwz r3,152(r1) 5c: 67 01 c9 7c mtvsrd vs38,r9 60: 0c 38 e8 10 vmrghb v7,v8,v7 64: 67 01 04 7d mtvsrd vs40,r4 68: 0c 48 63 10 vmrghb v3,v3,v9 6c: 67 01 23 7d mtvsrd vs41,r3 70: 0c 28 a1 10 vmrghb v5,v1,v5 74: 67 01 2a 7c mtvsrd vs33,r10 78: 0c 40 09 11 vmrghb v8,v9,v8 7c: 0c 30 21 10 vmrghb v1,v1,v6 80: 4c 11 44 10 vmrglh v2,v4,v2 84: 4c 39 63 10 vmrglh v3,v3,v7 88: 4c 29 88 10 vmrglh v4,v8,v5 8c: 4c 01 a1 10 vmrglh v5,v1,v0 90: 8c 19 64 10 vmrglw v3,v4,v3 94: 8c 11 45 10 vmrglw v2,v5,v2 98: 57 13 43 f0 xxmrgld vs34,vs35,vs34=