public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "rdapp at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.
Date: Fri, 26 Jan 2024 09:50:52 +0000	[thread overview]
Message-ID: <bug-113583-4-4ocaad3Zmt@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-113583-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583

--- Comment #9 from Robin Dapp <rdapp at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #6)

> t.c:47:21: missed:   the size of the group of accesses is not a power of 2 
> or not equal to 3
> t.c:47:21: missed:   not falling back to elementwise accesses
> t.c:58:15: missed:   not vectorized: relevant stmt not supported: _4 = 
> *_3;
> t.c:47:21: missed:  bad operation or unsupported loop bound.
> 
> where we don't consider using gather because we have a known constant
> stride (20).  Since the stores are really scatters we don't attempt
> to SLP either.
> 
> Disabling the above heuristic we get this vectorized as well, avoiding
> gather/scatter by manually implementing them and using a quite high
> VF of 8 (with -mprefer-vector-width=256 you get VF 4 and likely
> faster code in the end).

I suppose you're referring to this?

  /* FIXME: At the moment the cost model seems to underestimate the
     cost of using elementwise accesses.  This check preserves the
     traditional behavior until that can be fixed.  */
  stmt_vec_info first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
  if (!first_stmt_info)
    first_stmt_info = stmt_info;
  if (*memory_access_type == VMAT_ELEMENTWISE
      && !STMT_VINFO_STRIDED_P (first_stmt_info)
      && !(stmt_info == DR_GROUP_FIRST_ELEMENT (stmt_info)
           && !DR_GROUP_NEXT_ELEMENT (stmt_info)
           && !pow2p_hwi (DR_GROUP_SIZE (stmt_info))))
    {
      if (dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                         "not falling back to elementwise accesses\n");
      return false;
    }


I did some more tests on my laptop.  As said above the whole loop in lbm is
larger and contains two ifs.  The first one prevents clang and GCC from
vectorizing the loop, the second one

                if( TEST_FLAG_SWEEP( srcGrid, ACCEL )) {
                        ux = 0.005;
                        uy = 0.002;
                        uz = 0.000;
                }

seems to be if-converted? by clang or at least doesn't inhibit vectorization.

Now if I comment out the first, larger if clang does vectorize the loop.  With
the return false commented out in the above GCC snippet GCC also vectorizes,
but only when both ifs are commented out.

Results (with both ifs commented out), -march=native (resulting in avx2), best
of 3 as lbm is notoriously fickle:

gcc trunk vanilla: 156.04s
gcc trunk with elementwise: 132.10s
clang 17: 143.06s

Of course even the comment already said that costing is difficult and the
change will surely cause regressions elsewhere.  However the 15% improvement
with vectorization (or the 9% improvement of clang) IMHO show that it's surely
useful to look into this further.  On top, the riscv clang seems to not care
about the first if either and still vectorize.  I haven't looked closer what
happens there, though.

  parent reply	other threads:[~2024-01-26  9:50 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-24 14:21 [Bug tree-optimization/113583] New: " rdapp at gcc dot gnu.org
2024-01-24 14:42 ` [Bug tree-optimization/113583] " juzhe.zhong at rivai dot ai
2024-01-24 14:44 ` rdapp at gcc dot gnu.org
2024-01-24 15:00 ` juzhe.zhong at rivai dot ai
2024-01-25  3:06 ` juzhe.zhong at rivai dot ai
2024-01-25  3:13 ` juzhe.zhong at rivai dot ai
2024-01-25  5:41 ` pinskia at gcc dot gnu.org
2024-01-25  9:05 ` rguenther at suse dot de
2024-01-25  9:16 ` juzhe.zhong at rivai dot ai
2024-01-25  9:34 ` rguenth at gcc dot gnu.org
2024-01-26  9:50 ` rdapp at gcc dot gnu.org [this message]
2024-01-26 10:21 ` rguenther at suse dot de
2024-02-05  6:59 ` juzhe.zhong at rivai dot ai
2024-02-07  3:39 ` juzhe.zhong at rivai dot ai
2024-02-07  7:48 ` juzhe.zhong at rivai dot ai
2024-02-07  8:04 ` rguenther at suse dot de
2024-02-07  8:08 ` juzhe.zhong at rivai dot ai
2024-02-07  8:13 ` juzhe.zhong at rivai dot ai
2024-02-07 10:24 ` rguenther at suse dot de
2024-05-13 14:17 ` rdapp at gcc dot gnu.org
2024-05-16 12:41 ` rguenth at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-113583-4-4ocaad3Zmt@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).