public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/109747] New: [12/13/14 Regression] SLP cost of constructors is off
@ 2023-05-05 11:56 rguenth at gcc dot gnu.org
  2023-05-05 12:02 ` [Bug target/109747] " rguenth at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-05 11:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109747

            Bug ID: 109747
           Summary: [12/13/14 Regression] SLP cost of constructors is off
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

When r12-7319-g90d693bdc9d718 added the accounting for GPR->XMM moves to the
cost of SLP CTORs I failed to realize that the costing code will pass down
the very same (and full) SLP node from vect_prologue_cost_for_slp but
will generate costs for each individual actual vector that's constructed.

So for the case where the SLP node covers more than one vector the costing
will be off.  That's visible in the costing of the testcase for PR108724
for example which is

void foo(int *a, const int *__restrict b, const int *__restrict c)
{
  for (int i = 0; i < 16; i++) {
    a[i] = b[i] + c[i];
  }
}

and we end up with

_17 8 times unaligned_store (misalign -1) costs 96 in body
node 0x3fb5838 1 times vec_construct costs 100 in prologue
node 0x3fb5838 1 times vec_construct costs 100 in prologue
node 0x3fb5838 1 times vec_construct costs 100 in prologue
node 0x3fb5838 1 times vec_construct costs 100 in prologue
node 0x3fb5838 1 times vec_construct costs 100 in prologue
node 0x3fb5838 1 times vec_construct costs 100 in prologue
node 0x3fb5838 1 times vec_construct costs 100 in prologue
node 0x3fb5838 1 times vec_construct costs 100 in prologue

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/109747] [12/13/14 Regression] SLP cost of constructors is off
  2023-05-05 11:56 [Bug target/109747] New: [12/13/14 Regression] SLP cost of constructors is off rguenth at gcc dot gnu.org
@ 2023-05-05 12:02 ` rguenth at gcc dot gnu.org
  2023-05-08 12:27 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-05 12:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109747

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.3
             Target|                            |x86_64-*-* i?86-*-*
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
             Status|UNCONFIRMED                 |ASSIGNED
                 CC|                            |rsandifo at gcc dot gnu.org
     Ever confirmed|0                           |1
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2023-05-05

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
A fix, and maybe exactly a step in the right direction, would be to construct
individual new SLP nodes for each call to record_stmt_cost from
vect_prologue_cost_for_slp:

  /* ???  We're just tracking whether vectors in a single node are the same.
     Ideally we'd do something more global.  */
  for (unsigned int start : starts)
    { 
      vect_cost_for_stmt kind;
      if (SLP_TREE_DEF_TYPE (node) == vect_constant_def)
        kind = vector_load;
      else if (vect_scalar_ops_slice { ops, start, nelt_limit }.all_same_p ())
        kind = scalar_to_vec;
      else
        kind = vec_construct;
      record_stmt_cost (cost_vec, 1, kind, node, vectype, 0, vect_prologue);
    }                       

alternatively we could pass down 'start' as well.  The x86 backend code
could also detect the mismatch of TYPE_VECTOR_SUBPARTS * count and
the number of SLP lanes (but not sure what it should do in that case).

Note we can't currently meaningfully put such a split set of SLP nodes
into the SLP graph, but in the end we might want to go into the direction
of splitting it into individual vector ops, esp. for load/store vectorization
and interleaving.

Short-term passing down 'start' (and only interpreting it with count is one?)
might be easiest.

Any opinions?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/109747] [12/13/14 Regression] SLP cost of constructors is off
  2023-05-05 11:56 [Bug target/109747] New: [12/13/14 Regression] SLP cost of constructors is off rguenth at gcc dot gnu.org
  2023-05-05 12:02 ` [Bug target/109747] " rguenth at gcc dot gnu.org
@ 2023-05-08 12:27 ` rguenth at gcc dot gnu.org
  2023-05-09 11:05 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-08 12:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109747

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|12.3                        |12.4

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 12.3 is being released, retargeting bugs to GCC 12.4.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/109747] [12/13/14 Regression] SLP cost of constructors is off
  2023-05-05 11:56 [Bug target/109747] New: [12/13/14 Regression] SLP cost of constructors is off rguenth at gcc dot gnu.org
  2023-05-05 12:02 ` [Bug target/109747] " rguenth at gcc dot gnu.org
  2023-05-08 12:27 ` rguenth at gcc dot gnu.org
@ 2023-05-09 11:05 ` rguenth at gcc dot gnu.org
  2023-05-23 16:59 ` cvs-commit at gcc dot gnu.org
  2023-05-23 17:00 ` [Bug target/109747] [12/13 " rguenth at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-09 11:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109747

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
void foo(int *a, int b, int c)
{
  a[0] = b;
  a[1] = c;
  a[2] = b;
  a[3] = c;
  a[4] = c;
  a[5] = b;
  a[6] = c;
  a[7] = b;
}

for a simpler testcase.  We get

node 0x583ae48 1 times vec_construct costs 24 in prologue 
node 0x583ae48 1 times vec_construct costs 24 in prologue

and that loop in x86 add_stmt_cost iterates over the whole CTOR twice
rather than just once for each subset (the actual cost is the same
here because I repeat b and c only).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/109747] [12/13/14 Regression] SLP cost of constructors is off
  2023-05-05 11:56 [Bug target/109747] New: [12/13/14 Regression] SLP cost of constructors is off rguenth at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-05-09 11:05 ` rguenth at gcc dot gnu.org
@ 2023-05-23 16:59 ` cvs-commit at gcc dot gnu.org
  2023-05-23 17:00 ` [Bug target/109747] [12/13 " rguenth at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-05-23 16:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109747

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:b6b8870ec585947a03a797f9037d02380316e235

commit r14-1139-gb6b8870ec585947a03a797f9037d02380316e235
Author: Richard Biener <rguenther@suse.de>
Date:   Tue May 23 15:03:00 2023 +0200

    tree-optimization/109747 - SLP cost of CTORs

    The x86 backend looks at the SLP node passed to the add_stmt_cost
    hook when costing vec_construct, looking for elements that require
    a move from a GPR to a vector register and cost that.  But since
    vect_prologue_cost_for_slp decomposes the cost for an external
    SLP node into individual pieces this cost gets applied N times
    without a chance for the backend to know it's just dealing with
    a part of the SLP node.  Just looking at a part is also not perfect
    since the GPR to XMM move cost applies only once per distinct
    element so handling the whole SLP node one more correctly reflects
    cost (albeit without considering other external SLP nodes).

    The following addresses the issue by passing down the SLP node
    only for one piece and nullptr for the rest.  The x86 backend
    is currently the only one looking at it.

    In the future the cost of external elements is something to deal
    with globally but that would require the full SLP tree be available
    to costing.

    It's difficult to write a testcase, at the tipping point not
    vectorizing is better so I'll followup with x86 specific adjustments
    and will see to add a testcase later.

            PR tree-optimization/109747
            * tree-vect-slp.cc (vect_prologue_cost_for_slp): Pass down
            the SLP node only once to the cost hook.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/109747] [12/13 Regression] SLP cost of constructors is off
  2023-05-05 11:56 [Bug target/109747] New: [12/13/14 Regression] SLP cost of constructors is off rguenth at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-05-23 16:59 ` cvs-commit at gcc dot gnu.org
@ 2023-05-23 17:00 ` rguenth at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-23 17:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109747

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[12/13/14 Regression] SLP   |[12/13 Regression] SLP cost
                   |cost of constructors is off |of constructors is off
      Known to work|                            |14.0
           Priority|P3                          |P2

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed on trunk sofar.  Note fixing on the branch(es) will eventually cause more
SLP vectorization which isn't always desired.  So I'm at least waiting for
regressions on trunk because of this.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-05-23 17:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-05 11:56 [Bug target/109747] New: [12/13/14 Regression] SLP cost of constructors is off rguenth at gcc dot gnu.org
2023-05-05 12:02 ` [Bug target/109747] " rguenth at gcc dot gnu.org
2023-05-08 12:27 ` rguenth at gcc dot gnu.org
2023-05-09 11:05 ` rguenth at gcc dot gnu.org
2023-05-23 16:59 ` cvs-commit at gcc dot gnu.org
2023-05-23 17:00 ` [Bug target/109747] [12/13 " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).