public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "rdapp at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/113583] New: Main loop in 519.lbm not vectorized.
Date: Wed, 24 Jan 2024 14:21:14 +0000	[thread overview]
Message-ID: <bug-113583-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583

            Bug ID: 113583
           Summary: Main loop in 519.lbm not vectorized.
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rdapp at gcc dot gnu.org
  Target Milestone: ---
            Target: x86_64-*-* riscv*-*-*

This might be a known issue but a bugzilla search regarding lbm didn't show
anything related.

The main loop in SPEC2017 519.lbm GCC riscv does not vectorize while clang
does.  For x86 neither clang nor GCC seem to vectorize it.

A (not entirely minimal but let's start somewhere) example is the following. 
This one is, however, vectorized by clang-17 x86 and not by GCC trunk x86 or
other targets I checked.

#define CST1 (1.0 / 3.0)

typedef enum
{
  C = 0,
  N, S, E, W, T, B, NW,
  NE, A, BB, CC, D, EE, FF, GG,
  HH, II, JJ, FLAGS, NN
} CELL_ENTRIES;

#define SX 100
#define SY 100
#define SZ 130

#define CALC_INDEX(x, y, z, e) ((e) + NN * ((x) + (y) * SX + (z) * SX * SY))

#define GRID_ENTRY_SWEEP(g, dx, dy, dz, e) ((g)[CALC_INDEX (dx, dy, dz, e) +
(i)])

#define LOCAL(g, e) (GRID_ENTRY_SWEEP (g, 0, 0, 0, e))
#define NEIGHBOR_C(g, e) (GRID_ENTRY_SWEEP (g, 0, 0, 0, e))
#define NEIGHBOR_S(g, e) (GRID_ENTRY_SWEEP (g, 0, -1, 0, e))
#define NEIGHBOR_N(g, e) (GRID_ENTRY_SWEEP (g, 0, +1, 0, e))
#define NEIGHBOR_E(g, e) (GRID_ENTRY_SWEEP (g, +1, 0, 0, e))

#define SRC_C(g) (LOCAL (g, C))
#define SRC_N(g) (LOCAL (g, N))
#define SRC_S(g) (LOCAL (g, S))
#define SRC_E(g) (LOCAL (g, E))
#define SRC_W(g) (LOCAL (g, W))

#define DST_C(g) (NEIGHBOR_C (g, C))
#define DST_N(g) (NEIGHBOR_N (g, N))
#define DST_S(g) (NEIGHBOR_S (g, S))
#define DST_E(g) (NEIGHBOR_E (g, E))

typedef double arr[SX * SY * SZ * NN];

#define OMEGA 0.123

void
foo (arr src, arr dst)
{
  double ux, uy, u2;
  const double lambda0 = 1.0 / (0.5 + 3.0 / (16.0 * (1.0 / OMEGA - 0.5)));
  double fs[NN], fa[NN], feqs[NN], feqa[NN];

  for (int i = 0; i < SX * SY * SZ * NN; i += NN)
    {
      ux = 1.0;
      uy = 1.0;

      feqs[C] = CST1 * (1.0);
      feqs[N] = feqs[S] = CST1 * (1.0 + 4.5 * (+uy) * (+uy));

      feqa[C] = 0.0;
      feqa[N] = 0.2;

      fs[C] = SRC_C (src);
      fs[N] = fs[S] = 0.5 * (SRC_N (src) + SRC_S (src));

      fa[C] = 0.0;
      fa[N] = 0.1;

      DST_C (dst) = SRC_C (src) - OMEGA * (fs[C] - feqs[C]);
      DST_N (dst)
        = SRC_N (src) - OMEGA * (fs[N] - feqs[N]) - lambda0 * (fa[N] -
feqa[N]);
    }
}



missed.c:19:2: note:   ==> examining statement: _4 = *_3;
missed.c:19:2: missed:   no array mode for V8DF[20]
missed.c:19:2: missed:   no array mode for V8DF[20]
missed.c:19:2: missed:   the size of the group of accesses is not a power of 2
or not equal to 3
missed.c:19:2: missed:   not falling back to elementwise accesses
missed.c:43:11: missed:   not vectorized: relevant stmt not supported: _4 =
*_3;


Also refer to https://godbolt.org/z/P517qc3Yf for riscv and
https://godbolt.org/z/M134KvEEo for aarch64.  For aarch64 it seems clang would
vectorize the snippet but does not consider it profitable to do so.

For riscv and the full lbm workload I roughly see one third the number of
dynamically executed qemu instructions with the clang build vs GCC build, 340
billion vs 1200 billion.

             reply	other threads:[~2024-01-24 14:21 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-24 14:21 rdapp at gcc dot gnu.org [this message]
2024-01-24 14:42 ` [Bug tree-optimization/113583] " juzhe.zhong at rivai dot ai
2024-01-24 14:44 ` rdapp at gcc dot gnu.org
2024-01-24 15:00 ` juzhe.zhong at rivai dot ai
2024-01-25  3:06 ` juzhe.zhong at rivai dot ai
2024-01-25  3:13 ` juzhe.zhong at rivai dot ai
2024-01-25  5:41 ` pinskia at gcc dot gnu.org
2024-01-25  9:05 ` rguenther at suse dot de
2024-01-25  9:16 ` juzhe.zhong at rivai dot ai
2024-01-25  9:34 ` rguenth at gcc dot gnu.org
2024-01-26  9:50 ` rdapp at gcc dot gnu.org
2024-01-26 10:21 ` rguenther at suse dot de
2024-02-05  6:59 ` juzhe.zhong at rivai dot ai
2024-02-07  3:39 ` juzhe.zhong at rivai dot ai
2024-02-07  7:48 ` juzhe.zhong at rivai dot ai
2024-02-07  8:04 ` rguenther at suse dot de
2024-02-07  8:08 ` juzhe.zhong at rivai dot ai
2024-02-07  8:13 ` juzhe.zhong at rivai dot ai
2024-02-07 10:24 ` rguenther at suse dot de
2024-05-13 14:17 ` rdapp at gcc dot gnu.org
2024-05-16 12:41 ` rguenth at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-113583-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).