public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/113358] New: OpenMP inhibits vectorization
@ 2024-01-12 17:23 thomas.koopman at ru dot nl
  2024-01-12 17:41 ` [Bug tree-optimization/113358] " pinskia at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: thomas.koopman at ru dot nl @ 2024-01-12 17:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113358

            Bug ID: 113358
           Summary: OpenMP inhibits vectorization
           Product: gcc
           Version: 13.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: thomas.koopman at ru dot nl
  Target Milestone: ---

Created attachment 57054
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57054&action=edit
The three different versions as well as preprocessed output.

I attached three programs that compute an array accel of points in R^3 from a
3d array of positions in R^3. accel[i] is sum_j ||positions[i] -
positions[j]||^2. 

These are seq.c which is the most basic, omp.c which parallelises the outer
loop with OpenMP and block.c which uses the blocking optimisation. The first
version vectorizes as expected, but the other two do not. objdump -d omp.o |
grep ymm  shows up empty.

They are compiled with

gcc -c seq.c -Ofast -mavx2 -mfma -save-temps -Wall -Wextra -o seq.o -lm

gcc -c omp.c -Ofast -mavx2 -mfma -save-temps -Wall -Wextra -o omp.o -lm
-fopenmp

gcc -c block.c -Ofast -mavx2 -mfma -save-temps -Wall -Wextra -o block.o -lm

and

gcc -v gives the following.

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/home/thomas/.local/libexec/gcc/x86_64-pc-linux-gnu/13.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ./configure --prefix=/home/thomas/.local
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.2.0 (GCC)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/113358] OpenMP inhibits vectorization
  2024-01-12 17:23 [Bug c/113358] New: OpenMP inhibits vectorization thomas.koopman at ru dot nl
@ 2024-01-12 17:41 ` pinskia at gcc dot gnu.org
  2024-01-12 17:44 ` pinskia at gcc dot gnu.org
  2024-01-15  8:16 ` rguenth at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-12 17:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113358

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 57055
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57055&action=edit
testcase that does vectorize with openmp though

So what is happening is loop invariant motion is not happening with the openmp
version, I have not looked into why exactly. BUT you can manually it for the
loop and get a vectorized version of the loop.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/113358] OpenMP inhibits vectorization
  2024-01-12 17:23 [Bug c/113358] New: OpenMP inhibits vectorization thomas.koopman at ru dot nl
  2024-01-12 17:41 ` [Bug tree-optimization/113358] " pinskia at gcc dot gnu.org
@ 2024-01-12 17:44 ` pinskia at gcc dot gnu.org
  2024-01-15  8:16 ` rguenth at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-12 17:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113358

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2024-01-12
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=49761

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
One thing I should note is __restrict__ inside a struct is almost definitely
not going to help here. See PR 49761 for that.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/113358] OpenMP inhibits vectorization
  2024-01-12 17:23 [Bug c/113358] New: OpenMP inhibits vectorization thomas.koopman at ru dot nl
  2024-01-12 17:41 ` [Bug tree-optimization/113358] " pinskia at gcc dot gnu.org
  2024-01-12 17:44 ` pinskia at gcc dot gnu.org
@ 2024-01-15  8:16 ` rguenth at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-15  8:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113358

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue with block.c is

Analyzing loop at block.c:22
block.c:22:39: note:  === analyze_loop_nest ===
block.c:22:39: note:   === vect_analyze_loop_form ===
block.c:22:39: note:    === get_loop_niters ===
block.c:22:39: missed:   not vectorized: number of iterations cannot be
computed.
block.c:22:39: missed:  bad loop form.
block.c:22:39: missed: couldn't vectorize loop

we fail to compute an expression for the number of scalar iterations in the
innermost loop.  That's because we have 'j < J + BLOCK && j < n' as
the terminating condition.  I suspect that the blocking should peel the
case where J + BLOCK > n, basically

      if (J + BLOCK > n || I + BLOCK > n)
        {
          ... blocking nest with < n exit condition
        }
      else
        {
          ... blocking nest with < {J,I} + BLOCK exit condition
        }

the vectorizer (or rather niter analysis) could try to recover in a similar
way with using 'assumptions' - basically we can compute the number of
iterations to BLOCK if we assume that J + BLOCK <= n.  The exit condition
looks like

  _145 = J_86 + 999;
...
  <bb 4> [local count: 958878294]:
  # j_88 = PHI <j_58(18), J_86(7)>
...
  j_58 = j_88 + 1;
  _63 = n_49(D) > j_58;
  _64 = j_58 <= _145;
  _65 = _63 & _64;
  if (_65 != 0)

we could try to pattern-match this NE_EXPR (we need to choose which
condition we use as assumption and which to base the niters on).
Another possibility would be (I think this came up in another bugreport
as well) to use j < MIN (J + BLOCK, n).

The following source modification works:

    for (int i = I; i < I + BLOCK && i < n; i++) {
        int m = J + BLOCK > n ? n : J + BLOCK;
        for (int j = J; j < m; j++) {

whether it's a general profitable transform or should be matched again
only during niter analysis I'm not sure (if the MIN is loop invariant
and this is an exit condition it surely is profitable).

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-01-15  8:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-12 17:23 [Bug c/113358] New: OpenMP inhibits vectorization thomas.koopman at ru dot nl
2024-01-12 17:41 ` [Bug tree-optimization/113358] " pinskia at gcc dot gnu.org
2024-01-12 17:44 ` pinskia at gcc dot gnu.org
2024-01-15  8:16 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).