[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "hubicka at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake
Date: Wed, 31 May 2023 16:11:19 +0000	[thread overview]
Message-ID: <bug-109812-4-yumD54N8hU@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-109812-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenther at suse dot de
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=110062

--- Comment #13 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
The only difference between slp vectorization is:

-  # _68 = PHI <_5(3)>
-  # _67 = PHI <_11(3)>
-  # _66 = PHI <_16(3)>
-  <retval>.r = _68;
-  <retval>.g = _67;
-  <retval>.b = _66;
+  # _70 = PHI <_5(3)>
+  # _69 = PHI <_11(3)>
+  # _68 = PHI <_16(3)>
+  <retval>.r = _70;
+  <retval>.g = _69;
+  <retval>.b = _68;
+  <retval>.o = r$o_33(D);

so SRA invents r$o_33(D) even if that variable is undefined.

SLP vectorizer then sees it as interleaving stores:

-t.c:19:16: note:       _1 = rgbs[i_35].r;
-t.c:19:16: note:       _7 = rgbs[i_35].g;
-t.c:19:16: note:       _12 = rgbs[i_35].b;
-t.c:19:16: note:   Detected interleaving store of size 3
-t.c:19:16: note:       <retval>.r = _68;
-t.c:19:16: note:       <retval>.g = _67;
-t.c:19:16: note:       <retval>.b = _66;
+t.c:19:16: note:       _1 = rgbs[i_37].r;
+t.c:19:16: note:       _7 = rgbs[i_37].g;
+t.c:19:16: note:       _12 = rgbs[i_37].b;
+t.c:19:16: note:   Detected interleaving store of size 4
+t.c:19:16: note:       <retval>.r = _70;
+t.c:19:16: note:       <retval>.g = _69;
+t.c:19:16: note:       <retval>.b = _68;
+t.c:19:16: note:       <retval>.o = r$o_33(D);

For first case it first tries to vectorize for vector of 3 doubles and fails:

-t.c:19:16: note:     <retval>.r = _68;
-t.c:19:16: note:     <retval>.g = _67;
-t.c:19:16: note:     <retval>.b = _66;
-t.c:19:16: note:   starting SLP discovery for node 0x2cb4fe8
-t.c:19:16: note:   Build SLP for <retval>.r = _68;
-t.c:19:16: note:   get vectype for scalar type (group size 3): double
-t.c:19:16: note:   vectype: vector(2) double
-t.c:19:16: note:   nunits = 2
-t.c:19:16: missed:   Build SLP failed: unrolling required in basic block SLP
-t.c:19:16: note:   Build SLP for <retval>.g = _67;
-t.c:19:16: note:   get vectype for scalar type (group size 3): double
-t.c:19:16: note:   vectype: vector(2) double
-t.c:19:16: note:   nunits = 2
-t.c:19:16: missed:   Build SLP failed: unrolling required in basic block SLP
-t.c:19:16: note:   Build SLP for <retval>.b = _66;
-t.c:19:16: note:   get vectype for scalar type (group size 3): double
-t.c:19:16: note:   vectype: vector(2) double
-t.c:19:16: note:   nunits = 2
-t.c:19:16: missed:   Build SLP failed: unrolling required in basic block SLP
-t.c:19:16: note:   SLP discovery for node 0x2cb4fe8 failed

And later it tries to vectorize first 2 items:

-t.c:19:16: note:   Splitting SLP group at stmt 2
-t.c:19:16: note:   Split group into 2 and 1
-t.c:19:16: note:   Starting SLP discovery for
-t.c:19:16: note:     <retval>.r = _68;
-t.c:19:16: note:     <retval>.g = _67;
-t.c:19:16

... and after a lot of blablabla succeeds.

If opaque field is present we start with vector of size 4:
+t.c:19:16: note:     <retval>.r = _70;
+t.c:19:16: note:     <retval>.g = _69;
+t.c:19:16: note:     <retval>.b = _68;
+t.c:19:16: note:     <retval>.o = r$o_33(D);


+t.c:19:16: note:   vect_is_simple_use: operand _70 = PHI <_5(3)>, type of def:
internal
+t.c:19:16: note:   vect_is_simple_use: operand _69 = PHI <_11(3)>, type of
def: internal
+t.c:19:16: note:   vect_is_simple_use: operand _68 = PHI <_16(3)>, type of
def: internal
+t.c:19:16: note:   vect_is_simple_use: operand r$o_33(D), type of def:
external
+t.c:19:16: missed:   treating operand as external
+t.c:19:16: note:   SLP discovery for node 0x2e80058 succeeded
+t.c:19:16: note:   SLP size 1 vs. limit 23.
+t.c:19:16: note:   Final SLP tree for instance 0x2def840:
+t.c:19:16: note:   node 0x2e80058 (max_nunits=4, refcnt=2) vector(4) double
+t.c:19:16: note:   op template: <retval>.r = _70;
+t.c:19:16: note:       stmt 0 <retval>.r = _70;
+t.c:19:16: note:       stmt 1 <retval>.g = _69;
+t.c:19:16: note:       stmt 2 <retval>.b = _68;
+t.c:19:16: note:       stmt 3 <retval>.o = r$o_33(D);
+t.c:19:16: note:       children 0x2e800d8
+t.c:19:16: note:   node (external) 0x2e800d8 (max_nunits=1, refcnt=1)
+t.c:19:16: note:       { _70, _69, _68, r$o_33(D) }

So it seems to succeed vectorizing with 4 entries but it does so for the single
return statement:

  <bb 3> [local count: 1063004409]:
  # i_37 = PHI <i_22(5), 0(2)>
  # r$r_40 = PHI <_5(5), r$r_25(D)(2)>
  # r$g_42 = PHI <_11(5), r$g_26(D)(2)>
  # r$b_44 = PHI <_16(5), r$b_27(D)(2)>
  # ivtmp_67 = PHI <ivtmp_66(5), 10000000(2)>
  _1 = rgbs[i_37].r;
  _2 = (int) _1;
  _3 = (double) _2;
  _4 = _3 * w_21(D);
  _5 = _4 + r$r_40;
  _7 = rgbs[i_37].g;
  _8 = (int) _7;
  _9 = (double) _8;
  _10 = _9 * w_21(D);
  _11 = _10 + r$g_42;
  _12 = rgbs[i_37].b;
  _13 = (int) _12;
  _14 = (double) _13;
  _15 = _14 * w_21(D);
  _16 = _15 + r$b_44;
  i_22 = i_37 + 1;
  ivtmp_66 = ivtmp_67 - 1;
  if (ivtmp_66 != 0)
    goto <bb 5>; [99.00%]
  else
    goto <bb 4>; [1.00%]

  <bb 5> [local count: 1052374367]:
  goto <bb 3>; [100.00%]

  <bb 4> [local count: 10737416]:
  # _70 = PHI <_5(3)>
  # _69 = PHI <_11(3)>
  # _68 = PHI <_16(3)>
  _65 = {_70, _69, _68, r$o_33(D)};
  MEM <vector(4) double> [(double *)&<retval>] = _65;

that seems somewhat pointless.
If one adds code initializing opacity field then vectorization works well. So
perhaps SLP vectorizer needs to be told how to deal with uninitialized
variabels that may be common in code like this after SRA?

Richi, it is not clear to me where SLP vectorizer discards the idea of
vectorizing the loop body in this case. But I think one needs to address:
+t.c:19:16: missed:   treating operand as external

I wonder if the loop would work faster it it used vectors of size 4 with the
last field unused.

next prev parent reply	other threads:[~2023-05-31 16:11 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-11 14:25 [Bug tree-optimization/109812] New: GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 aros at gmx dot com
2023-05-11 14:26 ` [Bug tree-optimization/109812] " aros at gmx dot com
2023-05-11 15:20 ` [Bug target/109812] " pinskia at gcc dot gnu.org
2023-05-11 15:50 ` aros at gmx dot com
2023-05-12  8:47 ` aros at gmx dot com
2023-05-16 22:43 ` juzhe.zhong at rivai dot ai
2023-05-17  0:08 ` sjames at gcc dot gnu.org
2023-05-28 16:46 ` hubicka at gcc dot gnu.org
2023-05-28 17:29 ` [Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake hubicka at gcc dot gnu.org
2023-05-28 17:39 ` hubicka at gcc dot gnu.org
2023-05-28 18:11 ` hubicka at gcc dot gnu.org
2023-05-28 18:50 ` hubicka at gcc dot gnu.org
2023-05-30  0:05 ` zhangjungcc at gmail dot com
2023-05-31 12:42 ` hubicka at ucw dot cz
2023-05-31 16:11 ` hubicka at gcc dot gnu.org [this message]
2023-05-31 16:52 ` jamborm at gcc dot gnu.org
2023-06-01  9:38 ` jamborm at gcc dot gnu.org
2023-06-01 11:19 ` jakub at gcc dot gnu.org
2023-06-01 12:28 ` hubicka at gcc dot gnu.org
2023-06-21  9:46 ` ubizjak at gmail dot com
2023-10-12  4:48 ` cvs-commit at gcc dot gnu.org
2023-11-24 23:38 ` hubicka at gcc dot gnu.org
2023-11-25 10:21 ` liuhongt at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-109812-4-yumD54N8hU@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).