public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/99971] GCC generates partially vectorized and scalar code at once Date: Fri, 09 Apr 2021 07:05:46 +0000 [thread overview] Message-ID: <bug-99971-4-Mmw3DZgY1p@http.gcc.gnu.org/bugzilla/> (raw) In-Reply-To: <bug-99971-4@http.gcc.gnu.org/bugzilla/> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99971 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 Last reconfirmed| |2021-04-09 Status|UNCONFIRMED |ASSIGNED --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- Confirmed. While we manage to analyze for the "perfect" solution" we fail because dependence testing doesn't handle a piece, this throws away half of the vectorization. We do actually see that we'll retain the scalar loads and computations but still doing three vector loads and a vector add seems cheaper than doing four scalar stores: 0x1fdb5a0 x_2(D)->a 1 times unaligned_load (misalign -1) costs 12 in body 0x1fdb5a0 y1_3(D)->a 1 times unaligned_load (misalign -1) costs 12 in body 0x1fdb5a0 _13 + _14 1 times vector_stmt costs 4 in body 0x1fdb5a0 _15 1 times unaligned_store (misalign -1) costs 12 in body 0x1fddcb0 _15 1 times scalar_store costs 12 in body 0x1fddcb0 _18 1 times scalar_store costs 12 in body 0x1fddcb0 _21 1 times scalar_store costs 12 in body 0x1fddcb0 _24 1 times scalar_store costs 12 in body t.C:28:1: note: Cost model analysis: Vector inside of basic block cost: 40 Vector prologue cost: 0 Vector epilogue cost: 0 Scalar cost of basic block: 48 t.C:28:1: note: Basic block will be vectorized using SLP now, fortunately GCC 11 will improve on this [a bit] and we'll produce _Z4testR1ARKS_S2_: .LFB2: .cfi_startproc movdqu (%rsi), %xmm0 movdqu (%rdi), %xmm1 paddd %xmm1, %xmm0 movups %xmm0, (%rdi) movd %xmm0, %eax subl (%rdx), %eax movl %eax, (%rdi) pextrd $1, %xmm0, %eax subl 4(%rdx), %eax movl %eax, 4(%rdi) pextrd $2, %xmm0, %eax subl 8(%rdx), %eax movl %eax, 8(%rdi) pextrd $3, %xmm0, %eax subl 12(%rdx), %eax movl %eax, 12(%rdi) ret which is not re-doing the scalar loads/adds but instead uses the vector result. Still the same dependence issue is present: t.C:16:11: missed: can't determine dependence between y1_3(D)->b and x_2(D)->a t.C:16:11: note: removing SLP instance operations starting from: x_2(D)->a = _6; the scalar code before vectorization looks like <bb 2> [local count: 1073741824]: _13 = x_2(D)->a; _14 = y1_3(D)->a; _15 = _13 + _14; x_2(D)->a = _15; _16 = x_2(D)->b; _17 = y1_3(D)->b; <--- _18 = _16 + _17; x_2(D)->b = _18; _19 = x_2(D)->c; _20 = y1_3(D)->c; _21 = _19 + _20; x_2(D)->c = _21; _22 = x_2(D)->d; _23 = y1_3(D)->d; _24 = _22 + _23; x_2(D)->d = _24; _5 = y2_4(D)->a; _6 = _15 - _5; x_2(D)->a = _6; <--- _7 = y2_4(D)->b; _8 = _18 - _7; x_2(D)->b = _8; _9 = y2_4(D)->c; _10 = _21 - _9; x_2(D)->c = _10; _11 = y2_4(D)->d; _12 = _24 - _11; x_2(D)->d = _12; return; Using void test(A& __restrict x, A const& y1, A const& y2) { x += y1; x -= y2; } produces optimal assembly even with GCC 10: _Z4testR1ARKS_S2_: .LFB2: .cfi_startproc movdqu (%rsi), %xmm0 movdqu (%rdx), %xmm1 movdqu (%rdi), %xmm2 psubd %xmm1, %xmm0 paddd %xmm2, %xmm0 movups %xmm0, (%rdi) ret note that I think we should be able to handle the dependences even without the __restrict annotation.
next prev parent reply other threads:[~2021-04-09 7:05 UTC|newest] Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-04-08 14:41 [Bug tree-optimization/99971] New: " andysem at mail dot ru 2021-04-08 14:45 ` [Bug tree-optimization/99971] " andysem at mail dot ru 2021-04-09 7:05 ` rguenth at gcc dot gnu.org [this message] 2021-04-15 9:15 ` andysem at mail dot ru 2021-04-15 11:26 ` rguenth at gcc dot gnu.org 2021-04-15 11:30 ` rguenth at gcc dot gnu.org 2021-04-15 16:01 ` andysem at mail dot ru 2021-04-15 23:17 ` david.bolvansky at gmail dot com 2021-04-23 7:35 ` cvs-commit at gcc dot gnu.org 2021-04-23 7:37 ` rguenth at gcc dot gnu.org 2021-04-23 8:43 ` andysem at mail dot ru 2021-04-23 9:03 ` rguenther at suse dot de
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-99971-4-Mmw3DZgY1p@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).