From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id D012A3858024; Thu, 4 Mar 2021 23:01:14 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D012A3858024 From: "hubicka at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/99395] New: s116 benchmark of TSVC is vectorized by clang and not by gcc Date: Thu, 04 Mar 2021 23:01:14 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: hubicka at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Mar 2021 23:01:14 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D99395 Bug ID: 99395 Summary: s116 benchmark of TSVC is vectorized by clang and not by gcc Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- s116 loop is: real_t s116(struct args_t * func_args) { // linear dependence testing initialise_arrays(__func__); gettimeofday(&func_args->t1, NULL); for (int nl =3D 0; nl < iterations*10; nl++) { for (int i =3D 0; i < LEN_1D - 5; i +=3D 5) { a[i] =3D a[i + 1] * a[i]; a[i + 1] =3D a[i + 2] * a[i + 1]; a[i + 2] =3D a[i + 3] * a[i + 2]; a[i + 3] =3D a[i + 4] * a[i + 3]; a[i + 4] =3D a[i + 5] * a[i + 4]; } dummy(a, b, c, d, e, aa, bb, cc, 0.); } gettimeofday(&func_args->t2, NULL); return calc_checksum(__func__); } and vectorized code produced by clang11 is about 2 times faster on zen3 mac= hine 0000000000401d00 : 401d00: 41 56 push %r14 401d02: 53 push %rbx 401d03: 50 push %rax 401d04: 49 89 fe mov %rdi,%r14 401d07: bf 66 e1 42 00 mov $0x42e166,%edi 401d0c: e8 ff 58 01 00 call 417610 401d11: 31 db xor %ebx,%ebx 401d13: 4c 89 f7 mov %r14,%rdi 401d16: 31 f6 xor %esi,%esi 401d18: e8 43 f3 ff ff call 401060 401d1d: eb 47 jmp 401d66 401d1f: 90 nop 401d20: bf 00 25 45 00 mov $0x452500,%edi 401d25: be 00 31 43 00 mov $0x433100,%esi 401d2a: ba 00 19 47 00 mov $0x471900,%edx 401d2f: b9 00 0d 49 00 mov $0x490d00,%ecx 401d34: 41 b8 00 01 4b 00 mov $0x4b0100,%r8d 401d3a: 41 b9 00 f5 4c 00 mov $0x4cf500,%r9d 401d40: c5 f8 57 c0 vxorps %xmm0,%xmm0,%xmm0 401d44: 68 00 f5 54 00 push $0x54f500 401d49: 68 00 f5 50 00 push $0x50f500 401d4e: e8 6d 3c 01 00 call 4159c0 401d53: 48 83 c4 10 add $0x10,%rsp 401d57: 83 c3 01 add $0x1,%ebx 401d5a: 81 fb 40 42 0f 00 cmp $0xf4240,%ebx 401d60: 0f 84 9a 00 00 00 je 401e00 401d66: c5 fa 10 05 92 07 05 vmovss 0x50792(%rip),%xmm0 # 452500 401d6d: 00=20 401d6e: 31 c0 xor %eax,%eax 401d70: c5 fa 10 0c 85 04 25 vmovss 0x452504(,%rax,4),%xmm1 401d77: 45 00=20 401d79: c5 fa 59 c1 vmulss %xmm1,%xmm0,%xmm0 401d7d: c5 fa 11 04 85 00 25 vmovss %xmm0,0x452500(,%rax,4) 401d84: 45 00=20 401d86: c5 f8 10 04 85 08 25 vmovups 0x452508(,%rax,4),%xmm0 401d8d: 45 00=20 401d8f: c5 f0 c6 c8 00 vshufps $0x0,%xmm0,%xmm1,%xmm1 401d94: c5 f0 c6 c8 98 vshufps $0x98,%xmm0,%xmm1,%xmm1 401d99: c5 f8 59 c9 vmulps %xmm1,%xmm0,%xmm1 401d9d: c5 f8 11 0c 85 04 25 vmovups %xmm1,0x452504(,%rax,4) 401da4: 45 00=20 401da6: 48 3d f5 7c 00 00 cmp $0x7cf5,%rax 401dac: 0f 87 6e ff ff ff ja 401d20 401db2: c4 e3 79 04 c0 e7 vpermilps $0xe7,%xmm0,%xmm0 401db8: c5 fa 10 0c 85 18 25 vmovss 0x452518(,%rax,4),%xmm1 401dbf: 45 00=20 401dc1: c5 fa 59 c1 vmulss %xmm1,%xmm0,%xmm0 401dc5: c5 fa 11 04 85 14 25 vmovss %xmm0,0x452514(,%rax,4) 401dcc: 45 00=20 401dce: c5 f8 10 04 85 1c 25 vmovups 0x45251c(,%rax,4),%xmm0 401dd5: 45 00=20 401dd7: c5 f0 c6 c8 00 vshufps $0x0,%xmm0,%xmm1,%xmm1 401ddc: c5 f0 c6 c8 98 vshufps $0x98,%xmm0,%xmm1,%xmm1 401de1: c5 f8 59 c9 vmulps %xmm1,%xmm0,%xmm1 401de5: c5 fa 10 04 85 28 25 vmovss 0x452528(,%rax,4),%xmm0 401dec: 45 00=20 401dee: c5 f8 11 0c 85 18 25 vmovups %xmm1,0x452518(,%rax,4) 401df5: 45 00=20 401df7: 48 83 c0 0a add $0xa,%rax 401dfb: e9 70 ff ff ff jmp 401d70 401e00: 49 83 c6 10 add $0x10,%r14 401e04: 4c 89 f7 mov %r14,%rdi 401e07: 31 f6 xor %esi,%esi 401e09: e8 52 f2 ff ff call 401060 401e0e: bf 66 e1 42 00 mov $0x42e166,%edi 401e13: 48 83 c4 08 add $0x8,%rsp 401e17: 5b pop %rbx 401e18: 41 5e pop %r14 401e1a: e9 e1 51 02 00 jmp 427000 401e1f: 90 nop=