From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 4D2583858426; Wed, 8 Feb 2023 03:11:31 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4D2583858426 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1675825891; bh=IfLfP9oMMDseDvM6jns72Xv3sTCStWFE/9M3dFJQspU=; h=From:To:Subject:Date:From; b=rCbG6vWbLgDlX6qxGVBOKSHRC45qDK9/c8rXeBQIHRSTYvloc6sRk/gPs928jjsGo R6HLxiu1gD4w/QtKL7k2VVl8NnuuF+wrtslvBZ4YpOlzafMbx72jI4CXfjV4Y7E7OS 1knN+Jy1u43MhFzZyG8wXjhEr49HSO1TU4vGlxDA= From: "crazylht at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/108707] New: suboptimal allocation with same memory op for many different instructions. Date: Wed, 08 Feb 2023 03:11:30 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: crazylht at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108707 Bug ID: 108707 Summary: suboptimal allocation with same memory op for many different instructions. Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- #include void foo (__m512* pv, float* __restrict ps, int n, __m512* pdest, __m512* p1, __m512* p2, __m512* p3) { __m512 a =3D _mm512_setzero_ps (); __m512 b =3D a; __m512 c =3D a; for (int i =3D 0; i !=3D n ;i++) { a =3D _mm512_fmadd_ps (p1[i], pv[i], a); b =3D _mm512_fmadd_ps (p2[i], pv[i], b); c =3D _mm512_fmadd_ps (p3[i], pv[i], c); } pdest[0] =3D a; pdest[1] =3D b; pdest[2] =3D c; } g++ -O2 -mavx512f -S got=20 .L3: vmovaps (%r8,%rax), %zmm3 vmovaps (%r9,%rax), %zmm4 vmovaps (%rsi,%rax), %zmm5 vfmadd231ps (%rdi,%rax), %zmm3, %zmm2 vfmadd231ps (%rdi,%rax), %zmm4, %zmm1 vfmadd231ps (%rdi,%rax), %zmm5, %zmm0 addq $64, %rax cmpq %rax, %rdx jne .L3 It would be better to load (%rdi, %rax) into a zmm then .L3: vmovaps (%rdi,%rax), %zmm0 vfmadd231ps (%r8,%rax), %zmm0, %zmm3 vfmadd231ps (%r9,%rax), %zmm0, %zmm2 vfmadd231ps (%rsi,%rax), %zmm0, %zmm1 addq $64, %rax cmpq %rax, %rdx jne .L3=