public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/96888] New: Missing vectorization opportunity depending on integer type
@ 2020-09-02  0:39 pmenon at cs dot cmu.edu
  2020-09-02  0:46 ` [Bug tree-optimization/96888] " pmenon at cs dot cmu.edu
  2020-09-02  8:11 ` rguenth at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: pmenon at cs dot cmu.edu @ 2020-09-02  0:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96888

            Bug ID: 96888
           Summary: Missing vectorization opportunity depending on integer
                    type
           Product: gcc
           Version: 10.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pmenon at cs dot cmu.edu
  Target Milestone: ---

The loop in the following test case isn't vectorized:

#include <cstdlib>
#include <cstdint>

// Add x to each v[i] if bit 'i' is set in LSB-encoded bits.
void Test(int8_t *__restrict v, int8_t x, const uint64_t *bits, unsigned n) {
    for (int i = 0, num_words=(n+64-1)/64; i , n; i++) {
        const uint64_t word = bits[i];
        for (int j = 0; j < 64; j++) {
            v[i*64+j] += x * (bool)(word & (uint64_t(1)<<j));
        }
    }
}

<source>:7:9: missed: couldn't vectorize loop
<source>:7:9: missed: not vectorized: control flow in loop.
<source>:8:27: missed: couldn't vectorize loop
<source>:9:30: missed: not vectorized: relevant stmt not supported: _10 =
word_24 >> j_34;

However, changing one line (the one constructing the mask) from an explicit
uint64_t(1) to a plan 1U (which is not correct), we get auto-vectorization:

#include <cstdlib>
#include <cstdint>

// Add x to each v[i] if bit 'i' is set in LSB-encoded bits.
void Test(int8_t *__restrict v, int8_t x, const uint64_t *bits, unsigned n) {
    for (int i = 0, num_words=(n+64-1)/64; i , n; i++) {
        const uint64_t word = bits[i];
        for (int j = 0; j < 64; j++) {
            v[i*64+j] += x * (bool)(word & (1<<j)); // CHANGE HERE
        }
    }
}

Is this a known issue? Is there a reason why the former code can't be
vectorized, or do I need to restructure the code to trip the compiler?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/96888] Missing vectorization opportunity depending on integer type
  2020-09-02  0:39 [Bug tree-optimization/96888] New: Missing vectorization opportunity depending on integer type pmenon at cs dot cmu.edu
@ 2020-09-02  0:46 ` pmenon at cs dot cmu.edu
  2020-09-02  8:11 ` rguenth at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: pmenon at cs dot cmu.edu @ 2020-09-02  0:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96888

--- Comment #1 from pmenon at cs dot cmu.edu ---
Correction: outer loop condition should read 'i < n'.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/96888] Missing vectorization opportunity depending on integer type
  2020-09-02  0:39 [Bug tree-optimization/96888] New: Missing vectorization opportunity depending on integer type pmenon at cs dot cmu.edu
  2020-09-02  0:46 ` [Bug tree-optimization/96888] " pmenon at cs dot cmu.edu
@ 2020-09-02  8:11 ` rguenth at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-09-02  8:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96888

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |53947
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2020-09-02
     Ever confirmed|0                           |1

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  We currently do not support promoting/demoting the shift amount
vector operand when vectorizing shifts.  Note the fact that we
turn word & (1ul<<j) into word >> j likely triggers this issue - this
does not happen when you write word & (1u<<j).

Still AVX2 is needed for the 1<<j induction.  Note the generated code
doesn't exactly look good ...

It seems we should be able to use outer loop vectorization here, but
after fixing some things we still run into dependence analysis issues
there:

t.ii:4:46: note:   === vect_analyze_data_ref_dependences ===
t.ii:4:46: note:   dependence distance  = 0.
t.ii:4:46: note:   dependence distance == 0 between *_8 and *_8
t.ii:4:46: note:   dependence distance  = 1.
t.ii:7:23: missed:   not vectorized, possible dependence between data-refs *_8
and *_8
t.ii:4:46: missed:  bad data dependence.

that's the v[i*16+j] read/write.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-09-02  8:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-02  0:39 [Bug tree-optimization/96888] New: Missing vectorization opportunity depending on integer type pmenon at cs dot cmu.edu
2020-09-02  0:46 ` [Bug tree-optimization/96888] " pmenon at cs dot cmu.edu
2020-09-02  8:11 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).