public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/114061] New: GCC fails vectorization when using __builtin_prefetch
@ 2024-02-22 19:04 tnfchris at gcc dot gnu.org
  2024-02-22 19:07 ` [Bug tree-optimization/114061] " pinskia at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-02-22 19:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114061

            Bug ID: 114061
           Summary: GCC fails vectorization when using __builtin_prefetch
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---

The following example:

void foo(double * restrict a, double * restrict b, int n){
  int i;
  for(i=0; i<n; ++i){
    a[i] = a[i] + b[i];
    __builtin_prefetch(&(b[i+8]));
  }
}

fails to vectorize because of the __builtin_prefetch.

/app/example.c:5:5: missed:  statement clobbers memory: __builtin_prefetch
(_10);
/app/example.c:3:13: missed:  not vectorized: loop contains function calls or
data references that cannot be analyzed

However two things:

1. prefetching are usually hints anyway and not a correctness thing.  It should
be safe to elide the call and vectorizer as normal.
2. SVE has prefetched vector operations which we can use here.  The vector
prefetches are also predicated so they need to be actually codegened.

Perhaps one solution here would be to have a vect-pattern which checks for
COND_PREFETCH support if supported, and if not just elides the prefetch?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114061] GCC fails vectorization when using __builtin_prefetch
  2024-02-22 19:04 [Bug tree-optimization/114061] New: GCC fails vectorization when using __builtin_prefetch tnfchris at gcc dot gnu.org
@ 2024-02-22 19:07 ` pinskia at gcc dot gnu.org
  2024-02-22 19:09 ` tnfchris at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-22 19:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114061

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I thought there was already one recorded about this.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114061] GCC fails vectorization when using __builtin_prefetch
  2024-02-22 19:04 [Bug tree-optimization/114061] New: GCC fails vectorization when using __builtin_prefetch tnfchris at gcc dot gnu.org
  2024-02-22 19:07 ` [Bug tree-optimization/114061] " pinskia at gcc dot gnu.org
@ 2024-02-22 19:09 ` tnfchris at gcc dot gnu.org
  2024-02-22 19:11 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-02-22 19:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114061

--- Comment #2 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #1)
> I thought there was already one recorded about this.

I could only find https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103938 about an
ICE when prefetching a vector address.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114061] GCC fails vectorization when using __builtin_prefetch
  2024-02-22 19:04 [Bug tree-optimization/114061] New: GCC fails vectorization when using __builtin_prefetch tnfchris at gcc dot gnu.org
  2024-02-22 19:07 ` [Bug tree-optimization/114061] " pinskia at gcc dot gnu.org
  2024-02-22 19:09 ` tnfchris at gcc dot gnu.org
@ 2024-02-22 19:11 ` pinskia at gcc dot gnu.org
  2024-02-22 19:21 ` tnfchris at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-22 19:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114061

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-02-22
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
           Severity|normal                      |enhancement

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.

Though maybe we should drop them in the vectorized version of the loop. HW
prefetchers usually do a decent job and sometimes (maybe most) SW hinted
prefetches interfere with the HW prefetchers.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114061] GCC fails vectorization when using __builtin_prefetch
  2024-02-22 19:04 [Bug tree-optimization/114061] New: GCC fails vectorization when using __builtin_prefetch tnfchris at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-02-22 19:11 ` pinskia at gcc dot gnu.org
@ 2024-02-22 19:21 ` tnfchris at gcc dot gnu.org
  2024-02-23  7:01 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-02-22 19:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114061

--- Comment #4 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #3)
> Confirmed.
> 
> Though maybe we should drop them in the vectorized version of the loop. HW
> prefetchers usually do a decent job and sometimes (maybe most) SW hinted
> prefetches interfere with the HW prefetchers.

definitely agree that I'm not sure how useful they are, but some customers
definitely seem to want them.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114061] GCC fails vectorization when using __builtin_prefetch
  2024-02-22 19:04 [Bug tree-optimization/114061] New: GCC fails vectorization when using __builtin_prefetch tnfchris at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-02-22 19:21 ` tnfchris at gcc dot gnu.org
@ 2024-02-23  7:01 ` rguenth at gcc dot gnu.org
  2024-04-08 14:01 ` victorldn at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-23  7:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114061

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think we could try to "vectorize" them by only updating the address (the
builtin doesn't specify a size) when that evolves in the scalar loop, updating
the step with the chosen VF.

Dependence shouldn't be a concern here.

The main issue is a representational - how to handle this in data-ref and
dependence analysis (or whether to just "skip" them in the vectorizer).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114061] GCC fails vectorization when using __builtin_prefetch
  2024-02-22 19:04 [Bug tree-optimization/114061] New: GCC fails vectorization when using __builtin_prefetch tnfchris at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-02-23  7:01 ` rguenth at gcc dot gnu.org
@ 2024-04-08 14:01 ` victorldn at gcc dot gnu.org
  2024-06-12 13:39 ` cvs-commit at gcc dot gnu.org
  2024-06-12 17:15 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: victorldn at gcc dot gnu.org @ 2024-04-08 14:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114061

Victor Do Nascimento <victorldn at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |victorldn at gcc dot gnu.org
             Status|NEW                         |ASSIGNED
                 CC|                            |victorldn at gcc dot gnu.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114061] GCC fails vectorization when using __builtin_prefetch
  2024-02-22 19:04 [Bug tree-optimization/114061] New: GCC fails vectorization when using __builtin_prefetch tnfchris at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2024-04-08 14:01 ` victorldn at gcc dot gnu.org
@ 2024-06-12 13:39 ` cvs-commit at gcc dot gnu.org
  2024-06-12 17:15 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-06-12 13:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114061

--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Victor Do Nascimento
<victorldn@gcc.gnu.org>:

https://gcc.gnu.org/g:adcc815a01ae009d2768b6afb546e357bd37bbd2

commit r15-1211-gadcc815a01ae009d2768b6afb546e357bd37bbd2
Author: Victor Do Nascimento <victor.donascimento@arm.com>
Date:   Wed May 22 12:14:11 2024 +0100

    middle-end: Drop __builtin_prefetch calls in autovectorization [PR114061]

    At present the autovectorizer fails to vectorize simple loops
    involving calls to `__builtin_prefetch'.  A simple example of such
    loop is given below:

    void foo(double * restrict a, double * restrict b, int n){
      int i;
      for(i=0; i<n; ++i){
        a[i] = a[i] + b[i];
        __builtin_prefetch(&(b[i+8]));
      }
    }

    The failure stems from two issues:

    1. Given that it is typically not possible to fully reason about a
       function call due to the possibility of side effects, the
       autovectorizer does not attempt to vectorize loops which make such
       calls.

       Given the memory reference passed to `__builtin_prefetch', in the
       absence of assurances about its effect on the passed memory
       location the compiler deems the function unsafe to vectorize,
       marking it as clobbering memory in `vect_find_stmt_data_reference'.
       This leads to the failure in autovectorization.

    2. Notwithstanding the above issue, though the prefetch statement
       would be classed as `vect_unused_in_scope', the loop invariant that
       is used in the address of the prefetch is the scalar loop's and not
       the vector loop's IV. That is, it still uses `i' and not `vec_iv'
       because the instruction wasn't vectorized, causing DCE to think the
       value is live, such that we now have both the vector and scalar loop
       invariant actively used in the loop.

    This patch addresses both of these:

    1. About the issue regarding the memory clobber, data prefetch does
       not generate faults if its address argument is invalid and does not
       write to memory.  Therefore, it does not alter the internal state
       of the program or its control flow under any circumstance.  As
       such, it is reasonable that the function be marked as not affecting
       memory contents.

       To achieve this, we add the necessary logic to
       `get_references_in_stmt' to ensure that builtin functions are given
       given the same treatment as internal functions.  If the gimple call
       is to a builtin function and its function code is
       `BUILT_IN_PREFETCH', we mark `clobbers_memory' as false.

    2. Finding precedence in the way clobber statements are handled,
       whereby the vectorizer drops these from both the scalar and
       vectorized versions of a given loop, we choose to drop prefetch
       hints in a similar fashion.  This seems appropriate given how
       software prefetch hints are typically ignored by processors across
       architectures, as they seldom lead to performance gain over their
       hardware counterparts.

    gcc/ChangeLog:

            PR tree-optimization/114061
            * tree-data-ref.cc (get_references_in_stmt): set
            `clobbers_memory' to false for __builtin_prefetch.
            * tree-vect-loop.cc (vect_transform_loop): Drop all
            __builtin_prefetch calls from loops.

    gcc/testsuite/ChangeLog:

            * gcc.dg/vect/vect-prefetch-drop.c: New test.
            * gcc.target/aarch64/vect-prefetch-drop.c: Likewise.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/114061] GCC fails vectorization when using __builtin_prefetch
  2024-02-22 19:04 [Bug tree-optimization/114061] New: GCC fails vectorization when using __builtin_prefetch tnfchris at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2024-06-12 13:39 ` cvs-commit at gcc dot gnu.org
@ 2024-06-12 17:15 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-06-12 17:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114061

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
   Target Milestone|---                         |15.0
             Status|ASSIGNED                    |RESOLVED

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Fixed I think.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-06-12 17:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-22 19:04 [Bug tree-optimization/114061] New: GCC fails vectorization when using __builtin_prefetch tnfchris at gcc dot gnu.org
2024-02-22 19:07 ` [Bug tree-optimization/114061] " pinskia at gcc dot gnu.org
2024-02-22 19:09 ` tnfchris at gcc dot gnu.org
2024-02-22 19:11 ` pinskia at gcc dot gnu.org
2024-02-22 19:21 ` tnfchris at gcc dot gnu.org
2024-02-23  7:01 ` rguenth at gcc dot gnu.org
2024-04-08 14:01 ` victorldn at gcc dot gnu.org
2024-06-12 13:39 ` cvs-commit at gcc dot gnu.org
2024-06-12 17:15 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).