[Bug tree-optimization/115340] Loop/SLP vectorization possible inefficiency

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/115340] Loop/SLP vectorization possible inefficiency
Date: Tue, 04 Jun 2024 08:06:40 +0000	[thread overview]
Message-ID: <bug-115340-4-oNQh8NlFzW@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-115340-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-06-04
             Blocks|                            |53947
             Status|UNCONFIRMED                 |NEW
           Keywords|                            |missed-optimization
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is that the DRs for the loads tmp[0][i] and tmp[1][i] are not
related - they are off different base pointers.  At the moment we are
not merging unrelated "groups" (even though the loads are not marked
as grouped) into one SLP node.

The stores are not considered "grouped" because they have gaps.

With SLP-ification you'd get four instances and the same code-gen as now.

To do better we'd have to improve the store dataref analysis to see
that a vectorization factor of four would "close" the gaps, or more
generally support store groups with gaps.  Stores with gaps can be
handled by masking for example.

You get the store side handled when using -fno-tree-loop-vectorize to
get basic-block vectorization after unrolling the loop.  But you
still run into the issue that we do not combine from different load
groups during SLP discovery.  That's another angle you can attack;
during greedy discovery we also do not consider splitting the store
but instead build the loads from scalars which is of course less than
optimal.  Also since we do not re-process the built vector CTORs for
further basic-block vectorization opportunities.

Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

     prev parent reply	other threads:[~2024-06-04  8:06 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-04  7:46 [Bug tree-optimization/115340] New: " rdapp at gcc dot gnu.org
2024-06-04  8:06 ` rguenth at gcc dot gnu.org [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-115340-4-oNQh8NlFzW@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).