public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "hubicka at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake Date: Wed, 31 May 2023 16:11:19 +0000 [thread overview] Message-ID: <bug-109812-4-yumD54N8hU@http.gcc.gnu.org/bugzilla/> (raw) In-Reply-To: <bug-109812-4@http.gcc.gnu.org/bugzilla/> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812 Jan Hubicka <hubicka at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenther at suse dot de See Also| |https://gcc.gnu.org/bugzill | |a/show_bug.cgi?id=110062 --- Comment #13 from Jan Hubicka <hubicka at gcc dot gnu.org> --- The only difference between slp vectorization is: - # _68 = PHI <_5(3)> - # _67 = PHI <_11(3)> - # _66 = PHI <_16(3)> - <retval>.r = _68; - <retval>.g = _67; - <retval>.b = _66; + # _70 = PHI <_5(3)> + # _69 = PHI <_11(3)> + # _68 = PHI <_16(3)> + <retval>.r = _70; + <retval>.g = _69; + <retval>.b = _68; + <retval>.o = r$o_33(D); so SRA invents r$o_33(D) even if that variable is undefined. SLP vectorizer then sees it as interleaving stores: -t.c:19:16: note: _1 = rgbs[i_35].r; -t.c:19:16: note: _7 = rgbs[i_35].g; -t.c:19:16: note: _12 = rgbs[i_35].b; -t.c:19:16: note: Detected interleaving store of size 3 -t.c:19:16: note: <retval>.r = _68; -t.c:19:16: note: <retval>.g = _67; -t.c:19:16: note: <retval>.b = _66; +t.c:19:16: note: _1 = rgbs[i_37].r; +t.c:19:16: note: _7 = rgbs[i_37].g; +t.c:19:16: note: _12 = rgbs[i_37].b; +t.c:19:16: note: Detected interleaving store of size 4 +t.c:19:16: note: <retval>.r = _70; +t.c:19:16: note: <retval>.g = _69; +t.c:19:16: note: <retval>.b = _68; +t.c:19:16: note: <retval>.o = r$o_33(D); For first case it first tries to vectorize for vector of 3 doubles and fails: -t.c:19:16: note: <retval>.r = _68; -t.c:19:16: note: <retval>.g = _67; -t.c:19:16: note: <retval>.b = _66; -t.c:19:16: note: starting SLP discovery for node 0x2cb4fe8 -t.c:19:16: note: Build SLP for <retval>.r = _68; -t.c:19:16: note: get vectype for scalar type (group size 3): double -t.c:19:16: note: vectype: vector(2) double -t.c:19:16: note: nunits = 2 -t.c:19:16: missed: Build SLP failed: unrolling required in basic block SLP -t.c:19:16: note: Build SLP for <retval>.g = _67; -t.c:19:16: note: get vectype for scalar type (group size 3): double -t.c:19:16: note: vectype: vector(2) double -t.c:19:16: note: nunits = 2 -t.c:19:16: missed: Build SLP failed: unrolling required in basic block SLP -t.c:19:16: note: Build SLP for <retval>.b = _66; -t.c:19:16: note: get vectype for scalar type (group size 3): double -t.c:19:16: note: vectype: vector(2) double -t.c:19:16: note: nunits = 2 -t.c:19:16: missed: Build SLP failed: unrolling required in basic block SLP -t.c:19:16: note: SLP discovery for node 0x2cb4fe8 failed And later it tries to vectorize first 2 items: -t.c:19:16: note: Splitting SLP group at stmt 2 -t.c:19:16: note: Split group into 2 and 1 -t.c:19:16: note: Starting SLP discovery for -t.c:19:16: note: <retval>.r = _68; -t.c:19:16: note: <retval>.g = _67; -t.c:19:16 ... and after a lot of blablabla succeeds. If opaque field is present we start with vector of size 4: +t.c:19:16: note: <retval>.r = _70; +t.c:19:16: note: <retval>.g = _69; +t.c:19:16: note: <retval>.b = _68; +t.c:19:16: note: <retval>.o = r$o_33(D); +t.c:19:16: note: vect_is_simple_use: operand _70 = PHI <_5(3)>, type of def: internal +t.c:19:16: note: vect_is_simple_use: operand _69 = PHI <_11(3)>, type of def: internal +t.c:19:16: note: vect_is_simple_use: operand _68 = PHI <_16(3)>, type of def: internal +t.c:19:16: note: vect_is_simple_use: operand r$o_33(D), type of def: external +t.c:19:16: missed: treating operand as external +t.c:19:16: note: SLP discovery for node 0x2e80058 succeeded +t.c:19:16: note: SLP size 1 vs. limit 23. +t.c:19:16: note: Final SLP tree for instance 0x2def840: +t.c:19:16: note: node 0x2e80058 (max_nunits=4, refcnt=2) vector(4) double +t.c:19:16: note: op template: <retval>.r = _70; +t.c:19:16: note: stmt 0 <retval>.r = _70; +t.c:19:16: note: stmt 1 <retval>.g = _69; +t.c:19:16: note: stmt 2 <retval>.b = _68; +t.c:19:16: note: stmt 3 <retval>.o = r$o_33(D); +t.c:19:16: note: children 0x2e800d8 +t.c:19:16: note: node (external) 0x2e800d8 (max_nunits=1, refcnt=1) +t.c:19:16: note: { _70, _69, _68, r$o_33(D) } So it seems to succeed vectorizing with 4 entries but it does so for the single return statement: <bb 3> [local count: 1063004409]: # i_37 = PHI <i_22(5), 0(2)> # r$r_40 = PHI <_5(5), r$r_25(D)(2)> # r$g_42 = PHI <_11(5), r$g_26(D)(2)> # r$b_44 = PHI <_16(5), r$b_27(D)(2)> # ivtmp_67 = PHI <ivtmp_66(5), 10000000(2)> _1 = rgbs[i_37].r; _2 = (int) _1; _3 = (double) _2; _4 = _3 * w_21(D); _5 = _4 + r$r_40; _7 = rgbs[i_37].g; _8 = (int) _7; _9 = (double) _8; _10 = _9 * w_21(D); _11 = _10 + r$g_42; _12 = rgbs[i_37].b; _13 = (int) _12; _14 = (double) _13; _15 = _14 * w_21(D); _16 = _15 + r$b_44; i_22 = i_37 + 1; ivtmp_66 = ivtmp_67 - 1; if (ivtmp_66 != 0) goto <bb 5>; [99.00%] else goto <bb 4>; [1.00%] <bb 5> [local count: 1052374367]: goto <bb 3>; [100.00%] <bb 4> [local count: 10737416]: # _70 = PHI <_5(3)> # _69 = PHI <_11(3)> # _68 = PHI <_16(3)> _65 = {_70, _69, _68, r$o_33(D)}; MEM <vector(4) double> [(double *)&<retval>] = _65; that seems somewhat pointless. If one adds code initializing opacity field then vectorization works well. So perhaps SLP vectorizer needs to be told how to deal with uninitialized variabels that may be common in code like this after SRA? Richi, it is not clear to me where SLP vectorizer discards the idea of vectorizing the loop body in this case. But I think one needs to address: +t.c:19:16: missed: treating operand as external I wonder if the loop would work faster it it used vectors of size 4 with the last field unused.
next prev parent reply other threads:[~2023-05-31 16:11 UTC|newest] Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-05-11 14:25 [Bug tree-optimization/109812] New: GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 aros at gmx dot com 2023-05-11 14:26 ` [Bug tree-optimization/109812] " aros at gmx dot com 2023-05-11 15:20 ` [Bug target/109812] " pinskia at gcc dot gnu.org 2023-05-11 15:50 ` aros at gmx dot com 2023-05-12 8:47 ` aros at gmx dot com 2023-05-16 22:43 ` juzhe.zhong at rivai dot ai 2023-05-17 0:08 ` sjames at gcc dot gnu.org 2023-05-28 16:46 ` hubicka at gcc dot gnu.org 2023-05-28 17:29 ` [Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake hubicka at gcc dot gnu.org 2023-05-28 17:39 ` hubicka at gcc dot gnu.org 2023-05-28 18:11 ` hubicka at gcc dot gnu.org 2023-05-28 18:50 ` hubicka at gcc dot gnu.org 2023-05-30 0:05 ` zhangjungcc at gmail dot com 2023-05-31 12:42 ` hubicka at ucw dot cz 2023-05-31 16:11 ` hubicka at gcc dot gnu.org [this message] 2023-05-31 16:52 ` jamborm at gcc dot gnu.org 2023-06-01 9:38 ` jamborm at gcc dot gnu.org 2023-06-01 11:19 ` jakub at gcc dot gnu.org 2023-06-01 12:28 ` hubicka at gcc dot gnu.org 2023-06-21 9:46 ` ubizjak at gmail dot com 2023-10-12 4:48 ` cvs-commit at gcc dot gnu.org 2023-11-24 23:38 ` hubicka at gcc dot gnu.org 2023-11-25 10:21 ` liuhongt at gcc dot gnu.org
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-109812-4-yumD54N8hU@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).