[Bug tree-optimization/95845] New: Failure to optimize vector load made in separate operations to single load

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/95845] New: Failure to optimize vector load made in separate operations to single load
@ 2020-06-23 15:43 gabravier at gmail dot com
  2020-06-24  6:51 ` [Bug tree-optimization/95845] " rguenth at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: gabravier at gmail dot com @ 2020-06-23 15:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95845

            Bug ID: 95845
           Summary: Failure to optimize vector load made in separate
                    operations to single load
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

typedef float __attribute__((vector_size(8))) v2f32;

v2f32 f(const float *ptr)
{
    v2f32 r;
    r[0] = ptr[0];
    r[1] = ptr[1];
    return r;
}

This can be optimized to `return (v2f32){ptr[0], ptr[1]};`. This transformation
is done by LLVM, but not by GCC.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/95845] Failure to optimize vector load made in separate operations to single load
  2020-06-23 15:43 [Bug tree-optimization/95845] New: Failure to optimize vector load made in separate operations to single load gabravier at gmail dot com
@ 2020-06-24  6:51 ` rguenth at gcc dot gnu.org
  2021-09-01 23:29 ` gabravier at gmail dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-06-24  6:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95845

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org
   Last reconfirmed|                            |2020-06-24
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
So this is

  VIEW_CONVERT_EXPR<float[2]>(r)[0] = *ptr;
  VIEW_CONVERT_EXPR<float[2]>(r)[1] = *(ptr + 4);

from the FEs and

  _1 = *ptr_4(D);
  r_6 = BIT_INSERT_EXPR <r_5(D), _1, 0>;
  _2 = MEM[(const float *)ptr_4(D) + 4B];
  r_7 = BIT_INSERT_EXPR <r_6, _2, 32>;

after SSA rewrite.  There's no further combining of inserts happening,
I guess forwprop might want to see whether an insert chain forms a full
CTOR.  BB vectorization might also a candidate to look at but it would
be quite late.

The issue with forwprop is to somehow avoid quadraticness in searching
the chain which will be difficult given it's structure.  One possibility
would be to perform a forward search from BIT_INSERT_EXPRs with a
default def arg and mark the last BIT_INSERT_EXPR in a single-use chain
as to be processed.  Or declare it not a problem.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/95845] Failure to optimize vector load made in separate operations to single load
  2020-06-23 15:43 [Bug tree-optimization/95845] New: Failure to optimize vector load made in separate operations to single load gabravier at gmail dot com
  2020-06-24  6:51 ` [Bug tree-optimization/95845] " rguenth at gcc dot gnu.org
@ 2021-09-01 23:29 ` gabravier at gmail dot com
  2021-09-02  7:01 ` rguenth at gcc dot gnu.org
  2023-05-12  6:23 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: gabravier at gmail dot com @ 2021-09-01 23:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95845

--- Comment #2 from Gabriel Ravier <gabravier at gmail dot com> ---
This appears to be fixed in trunk, GCC seems to use a movq now instead of a
movlps on x86.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/95845] Failure to optimize vector load made in separate operations to single load
  2020-06-23 15:43 [Bug tree-optimization/95845] New: Failure to optimize vector load made in separate operations to single load gabravier at gmail dot com
  2020-06-24  6:51 ` [Bug tree-optimization/95845] " rguenth at gcc dot gnu.org
  2021-09-01 23:29 ` gabravier at gmail dot com
@ 2021-09-02  7:01 ` rguenth at gcc dot gnu.org
  2023-05-12  6:23 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-02  7:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95845

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
It's now vectorized at -O3 or with -ftree-slp-vectorize.  In particular
vect_slp_check_for_constructors now matches

      else if (code == BIT_INSERT_EXPR
               && VECTOR_TYPE_P (TREE_TYPE (rhs))
               && TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs)).is_constant ()
               && TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs)).to_constant () > 1
               && integer_zerop (gimple_assign_rhs3 (assign))
               && useless_type_conversion_p
                    (TREE_TYPE (TREE_TYPE (rhs)),
                     TREE_TYPE (gimple_assign_rhs2 (assign)))
               && bb_vinfo->lookup_def (gimple_assign_rhs2 (assign)))
        {
          /* We start to match on insert to lane zero but since the
             inserts need not be ordered we'd have to search both
             the def and the use chains.  */

this matching could be factored out and used by forwprop to build a
vector CTOR.  So I don't think it's fully fixed yet and there's an
opportunity to improve things earlier.

Partial defs of otherwise uninitialized vectors might also be an interesting
target.  When not keying on lane zero to start the match the possibility
is to start matching on the insert that does not have a single immediate
use of the result in another BIT_INSERT_EXPR.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/95845] Failure to optimize vector load made in separate operations to single load
  2020-06-23 15:43 [Bug tree-optimization/95845] New: Failure to optimize vector load made in separate operations to single load gabravier at gmail dot com
                   ` (2 preceding siblings ...)
  2021-09-02  7:01 ` rguenth at gcc dot gnu.org
@ 2023-05-12  6:23 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-12  6:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95845

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.0
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note this has been working with the SLP vectorizer (at -O3) since GCC 8. Since
GCC 12, the SLP vectorizer is turned on at -O2 and above.

So I am just going to close this as fixed.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-05-12  6:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-23 15:43 [Bug tree-optimization/95845] New: Failure to optimize vector load made in separate operations to single load gabravier at gmail dot com
2020-06-24  6:51 ` [Bug tree-optimization/95845] " rguenth at gcc dot gnu.org
2021-09-01 23:29 ` gabravier at gmail dot com
2021-09-02  7:01 ` rguenth at gcc dot gnu.org
2023-05-12  6:23 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).