public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "cvs-commit at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/98350] Reassociation breaks FMA chains Date: Tue, 30 May 2023 06:03:03 +0000 [thread overview] Message-ID: <bug-98350-4-cOcItVPiZU@http.gcc.gnu.org/bugzilla/> (raw) In-Reply-To: <bug-98350-4@http.gcc.gnu.org/bugzilla/> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98350 --- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Lili Cui <cuilili@gcc.gnu.org>: https://gcc.gnu.org/g:e5405f065bace0685cb3b8878d1dfc7a6e7ef409 commit r14-1371-ge5405f065bace0685cb3b8878d1dfc7a6e7ef409 Author: Lili Cui <lili.cui@intel.com> Date: Tue May 30 05:47:47 2023 +0000 Handle FMA friendly in reassoc pass Make some changes in reassoc pass to make it more friendly to fma pass later. Using FMA instead of mult + add reduces register pressure and insruction retired. There are mainly two changes 1. Put no-mult ops and mult ops alternately at the end of the queue, which is conducive to generating more fma and reducing the loss of FMA when breaking the chain. 2. Rewrite the rewrite_expr_tree_parallel function to try to build parallel chains according to the given correlation width, keeping the FMA chance as much as possible. With the patch applied On ICX: 507.cactuBSSN_r: Improved by 1.7% for multi-copy . 503.bwaves_r : Improved by 0.60% for single copy . 507.cactuBSSN_r: Improved by 1.10% for single copy . 519.lbm_r : Improved by 2.21% for single copy . no measurable changes for other benchmarks. On aarch64 507.cactuBSSN_r: Improved by 1.7% for multi-copy. 503.bwaves_r : Improved by 6.00% for single-copy. no measurable changes for other benchmarks. TEST1: float foo (float a, float b, float c, float d, float *e) { return *e + a * b + c * d ; } For "-Ofast -mfpmath=sse -mfma" GCC generates: vmulss %xmm3, %xmm2, %xmm2 vfmadd132ss %xmm1, %xmm2, %xmm0 vaddss (%rdi), %xmm0, %xmm0 ret With this patch GCC generates: vfmadd213ss (%rdi), %xmm1, %xmm0 vfmadd231ss %xmm2, %xmm3, %xmm0 ret TEST2: for (int i = 0; i < N; i++) { a[i] += b[i]* c[i] + d[i] * e[i] + f[i] * g[i] + h[i] * j[i] + k[i] * l[i] + m[i]* o[i] + p[i]; } For "-Ofast -mfpmath=sse -mfma" GCC generates: vmovapd e(%rax), %ymm4 vmulpd d(%rax), %ymm4, %ymm3 addq $32, %rax vmovapd c-32(%rax), %ymm5 vmovapd j-32(%rax), %ymm6 vmulpd h-32(%rax), %ymm6, %ymm2 vmovapd a-32(%rax), %ymm6 vaddpd p-32(%rax), %ymm6, %ymm0 vmovapd g-32(%rax), %ymm7 vfmadd231pd b-32(%rax), %ymm5, %ymm3 vmovapd o-32(%rax), %ymm4 vmulpd m-32(%rax), %ymm4, %ymm1 vmovapd l-32(%rax), %ymm5 vfmadd231pd f-32(%rax), %ymm7, %ymm2 vfmadd231pd k-32(%rax), %ymm5, %ymm1 vaddpd %ymm3, %ymm0, %ymm0 vaddpd %ymm2, %ymm0, %ymm0 vaddpd %ymm1, %ymm0, %ymm0 vmovapd %ymm0, a-32(%rax) cmpq $8192, %rax jne .L4 vzeroupper ret with this patch applied GCC breaks the chain with width = 2 and generates 6 fma: vmovapd a(%rax), %ymm2 vmovapd c(%rax), %ymm0 addq $32, %rax vmovapd e-32(%rax), %ymm1 vmovapd p-32(%rax), %ymm5 vmovapd g-32(%rax), %ymm3 vmovapd j-32(%rax), %ymm6 vmovapd l-32(%rax), %ymm4 vmovapd o-32(%rax), %ymm7 vfmadd132pd b-32(%rax), %ymm2, %ymm0 vfmadd132pd d-32(%rax), %ymm5, %ymm1 vfmadd231pd f-32(%rax), %ymm3, %ymm0 vfmadd231pd h-32(%rax), %ymm6, %ymm1 vfmadd231pd k-32(%rax), %ymm4, %ymm0 vfmadd231pd m-32(%rax), %ymm7, %ymm1 vaddpd %ymm1, %ymm0, %ymm0 vmovapd %ymm0, a-32(%rax) cmpq $8192, %rax jne .L2 vzeroupper ret gcc/ChangeLog: PR tree-optimization/98350 * tree-ssa-reassoc.cc (rewrite_expr_tree_parallel): Rewrite this function. (rank_ops_for_fma): New. (reassociate_bb): Handle new function. gcc/testsuite/ChangeLog: PR tree-optimization/98350 * gcc.dg/pr98350-1.c: New test. * gcc.dg/pr98350-2.c: Ditto.
prev parent reply other threads:[~2023-05-30 6:03 UTC|newest] Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-12-17 15:24 [Bug tree-optimization/98350] New: " ktkachov at gcc dot gnu.org 2021-01-05 8:31 ` [Bug tree-optimization/98350] " rguenth at gcc dot gnu.org 2021-09-06 18:40 ` pinskia at gcc dot gnu.org 2023-03-23 4:16 ` dizhao at os dot amperecomputing.com 2023-03-23 4:22 ` dizhao at os dot amperecomputing.com 2023-05-19 7:24 ` pinskia at gcc dot gnu.org 2023-05-30 6:03 ` cvs-commit at gcc dot gnu.org [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-98350-4-cOcItVPiZU@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).