From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 2B0483858C2C; Tue, 21 Dec 2021 23:29:01 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2B0483858C2C From: "hubicka at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/103797] New: Clang vectorized LightPixel while GCC does not Date: Tue, 21 Dec 2021 23:29:00 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: unknown X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: hubicka at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Dec 2021 23:29:01 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D103797 Bug ID: 103797 Summary: Clang vectorized LightPixel while GCC does not Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- Clang vectorises divss in LightPixel while GCC does not (at -O3). This see= ms to account for 17% difference in resteflood_svg benchmark of Firefox. =E2=94=82 0000000001864660 const&, mozilla::gfx::Point3DTyped const&, unsigned int)>: =E2=94=82 mozilla::gfx::(anonymous namespace)::SpecularLightingSoftware::LightPixel(mozilla::gfx::Point3DTyped= const&, mozilla::gfx::Point3DTyped const&, unsigned int): 0.05 =E2=94=82 push %rbp 0.07 =E2=94=82 mov %rsp,%rbp 0.71 =E2=94=82 xorps %xmm6,%xmm6 0.32 =E2=94=82 addss %xmm6,%xmm4 =E2=94=82 unpcklps %xmm3,%xmm5 0.78 =E2=94=82 movss=20=20=20=20 anon.5bcbce9b5eeaaf1a18a99b9a5b62e1ce.3.llvm.5306652999446557335+0x6d8,%xmm8 0.01 =E2=94=82 addps %xmm8,%xmm5 1.47 =E2=94=82 movaps %xmm4,%xmm9 =E2=94=82 mulss %xmm4,%xmm9 =E2=94=82 movaps %xmm5,%xmm7 0.01 =E2=94=82 mulps %xmm5,%xmm7 3.35 =E2=94=82 movaps %xmm7,%xmm3 =E2=94=82 shufps $0x55,%xmm7,%xmm3 0.99 =E2=94=82 addss %xmm9,%xmm3 1.59 =E2=94=82 addss %xmm7,%xmm3 2.01 =E2=94=82 sqrtss %xmm3,%xmm3 11.43 =E2=94=82 divss %xmm3,%xmm4 6.76 =E2=94=82 shufps $0x0,%xmm3,%xmm3 0.01 =E2=94=82 divps %xmm3,%xmm5 2.58 =E2=94=82 mulss %xmm1,%xmm4 0.04 =E2=94=82 unpcklps %xmm0,%xmm2 =E2=94=82 mulps %xmm5,%xmm2 2.67 =E2=94=82 movaps %xmm2,%xmm0 0.04 =E2=94=82 shufps $0x55,%xmm2,%xmm0 2.11 =E2=94=82 addss %xmm4,%xmm0 1.87 =E2=94=82 addss %xmm2,%xmm0 2.82 =E2=94=82 cmpless %xmm0,%xmm6 2.20 =E2=94=82 andps %xmm8,%xmm6 1.05 =E2=94=82 mulss %xmm0,%xmm6 4.04 =E2=94=82 mulss .str.6.llvm.231702015065810902+0x77,%xmm6 3.14 =E2=94=82 cvttss2si %xmm6,%eax 4.45 =E2=94=82 mov 0x8(%rdi),%ecx 0.00 =E2=94=82 mov 0xc(%rdi),%edx =E2=94=82 movzwl %ax,%eax 1.10 =E2=94=82 test %edx,%edx =E2=94=82 =E2=86=93 jle 92 =E2=94=8288: imul %eax,%eax 9.06 =E2=94=82 shr $0xf,%eax 3.12 =E2=94=82 dec %edx =E2=94=82 =E2=86=91 jne 88 =E2=94=8292: shr $0x8,%eax 1.95 =E2=94=82 movzwl 0x10(%rdi,%rax,2),%eax 6.48 =E2=94=82 imul %eax,%ecx 0.99 =E2=94=82 shr $0x8,%ecx 1.06 =E2=94=82 mov %esi,%eax 0.01 =E2=94=82 shr $0x8,%eax =E2=94=82 mov %esi,%edx =E2=94=82 shr $0x10,%edx 0.01 =E2=94=82 mov $0xff,%edi =E2=94=82 and %edi,%esi 0.01 =E2=94=82 imul %ecx,%esi 3.32 =E2=94=82 shr $0xf,%esi 1.81 =E2=94=82 cmp %edi,%esi =E2=96=92 0.04 =E2=94=82 cmovae %edi,%esi =E2=96=92 1.99 =E2=94=82 and %edi,%eax =E2=96=92 0.01 =E2=94=82 imul %ecx,%eax =E2=96=92 =E2=94=82 shr $0xf,%eax =E2=96=92 0.01 =E2=94=82 cmp %edi,%eax =E2=96=92 0.28 =E2=94=82 cmovae %edi,%eax =E2=96=92 0.96 =E2=94=82 and %edi,%edx =E2=96=92 =E2=94=82 imul %ecx,%edx =E2=96=92 =E2=94=82 shr $0xf,%edx =E2=96=92 0.92 =E2=94=82 cmp %edi,%edx =E2=96=92 0.85 =E2=94=82 cmovae %edi,%edx =E2=96=92 1.00 =E2=94=82 cmp %eax,%edx =E2=96=92 1.20 =E2=94=82 mov %eax,%ecx =E2=96=92 =E2=94=82 cmova %edx,%ecx =E2=96=92 2.17 =E2=94=82 cmp %esi,%ecx =E2=96=92 1.15 =E2=94=82 cmovbe %esi,%ecx =E2=96=92 1.79 =E2=94=82 shl $0x18,%ecx=E2=96=92 1.17 =E2=94=82 shl $0x10,%edx=E2=96=92 =E2=94=82 shl $0x8,%eax =E2=96=92 0.03 =E2=94=82 or %edx,%eax =E2=96=92 0.01 =E2=94=82 or %esi,%eax =E2=96=92 0.14 =E2=94=82 or %ecx,%eax =E2=96=92 0.72 =E2=94=82 pop %rbp =E2=96=92 0.04 =E2=94=82 =E2=86=90 ret=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 =E2=96=92=