public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/102176] New: BB SLP scalar costing is off with extern promoted nodes
@ 2021-09-02 12:18 rguenth at gcc dot gnu.org
  2021-09-02 12:18 ` [Bug tree-optimization/102176] " rguenth at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-02 12:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102176

            Bug ID: 102176
           Summary: BB SLP scalar costing is off with extern promoted
                    nodes
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

On aarch64 we can see

int foo(long *restrict res, long *restrict foo, long a, long b)
{
  res[0] = ((foo[0] * a) >> 1) + foo[0];
  res[1] = ((foo[1] * b) >> 1) + foo[1];
}

being vectorized as

t.c:3:10: note: Costing subgraph:
t.c:3:10: note: node 0x35f03b0 (max_nunits=2, refcnt=1)
t.c:3:10: note: op template: *res_12(D) = _4;
t.c:3:10: note:         stmt 0 *res_12(D) = _4;
t.c:3:10: note:         stmt 1 MEM[(long int *)res_12(D) + 8B] = _8;
t.c:3:10: note:         children 0x35f0440
t.c:3:10: note: node 0x35f0440 (max_nunits=2, refcnt=1)
t.c:3:10: note: op template: _4 = _1 + _3;
t.c:3:10: note:         stmt 0 _4 = _1 + _3;
t.c:3:10: note:         stmt 1 _8 = _5 + _7;
t.c:3:10: note:         children 0x35f04d0 0x35f0560
t.c:3:10: note: node 0x35f04d0 (max_nunits=2, refcnt=2)
t.c:3:10: note: op template: _1 = *foo_10(D);
t.c:3:10: note:         stmt 0 _1 = *foo_10(D);
t.c:3:10: note:         stmt 1 _5 = MEM[(long int *)foo_10(D) + 8B];
t.c:3:10: note: node 0x35f0560 (max_nunits=2, refcnt=1)
t.c:3:10: note: op template: _3 = _2 >> 1;
t.c:3:10: note:         stmt 0 _3 = _2 >> 1;
t.c:3:10: note:         stmt 1 _7 = _6 >> 1;
t.c:3:10: note:         children 0x35f05f0 0x35f0710
t.c:3:10: note: node (external) 0x35f05f0 (max_nunits=2, refcnt=1)
t.c:3:10: note:         stmt 0 _2 = _1 * a_11(D);
t.c:3:10: note:         stmt 1 _6 = _5 * b_14(D);
t.c:3:10: note:         children 0x35f04d0 0x35f0680
t.c:3:10: note: node (external) 0x35f0680 (max_nunits=1, refcnt=1)
t.c:3:10: note:         { a_11(D), b_14(D) }
t.c:3:10: note: node (constant) 0x35f0710 (max_nunits=1, refcnt=1)
t.c:3:10: note:         { 1, 1 }

so the promoted external node 0x35f05f0 should keep the load live.
vect_bb_slp_scalar_cost relies on PURE_SLP_STMT but
that's unreliable here since the per-stmt setting cannot capture the
different uses.  The code shares intend (and some bugs) with
vect_bb_slp_mark_live_stmts and the problem in general is a bit
difficult given the lack of back-mapping from stmt to SLP nodes
referencing it.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/102176] BB SLP scalar costing is off with extern promoted nodes
  2021-09-02 12:18 [Bug tree-optimization/102176] New: BB SLP scalar costing is off with extern promoted nodes rguenth at gcc dot gnu.org
@ 2021-09-02 12:18 ` rguenth at gcc dot gnu.org
  2021-09-02 12:26 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-02 12:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102176

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2021-09-02
             Status|UNCONFIRMED                 |ASSIGNED
           Keywords|                            |missed-optimization

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/102176] BB SLP scalar costing is off with extern promoted nodes
  2021-09-02 12:18 [Bug tree-optimization/102176] New: BB SLP scalar costing is off with extern promoted nodes rguenth at gcc dot gnu.org
  2021-09-02 12:18 ` [Bug tree-optimization/102176] " rguenth at gcc dot gnu.org
@ 2021-09-02 12:26 ` rguenth at gcc dot gnu.org
  2021-09-02 12:54 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-02 12:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102176

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
So in this case we have _2 = _1 * a_11(D) still pure_slp even though it does
not participate in any vectorized SLP node.

Unfortunately marking of PURE_SLP_STMTs happens before analyzing operations
(the vectorizable_* functions called rely on the SLP type here for no good
reason).  But that analysis can promote nodes extern and the SLP type is
not adjusted afterwards.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/102176] BB SLP scalar costing is off with extern promoted nodes
  2021-09-02 12:18 [Bug tree-optimization/102176] New: BB SLP scalar costing is off with extern promoted nodes rguenth at gcc dot gnu.org
  2021-09-02 12:18 ` [Bug tree-optimization/102176] " rguenth at gcc dot gnu.org
  2021-09-02 12:26 ` rguenth at gcc dot gnu.org
@ 2021-09-02 12:54 ` rguenth at gcc dot gnu.org
  2021-09-06  6:55 ` cvs-commit at gcc dot gnu.org
  2021-09-06  6:56 ` rguenth at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-02 12:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102176

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 51404
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51404&action=edit
patch

This brute-force approach of re-computing something like PURE_SLP_STMT minus
the set of defs used in extern def SLP nodes does the trick.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/102176] BB SLP scalar costing is off with extern promoted nodes
  2021-09-02 12:18 [Bug tree-optimization/102176] New: BB SLP scalar costing is off with extern promoted nodes rguenth at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-09-02 12:54 ` rguenth at gcc dot gnu.org
@ 2021-09-06  6:55 ` cvs-commit at gcc dot gnu.org
  2021-09-06  6:56 ` rguenth at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-09-06  6:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102176

--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:a3fb781d4b341c0d50ef1b92cd3e8734e673ef18

commit r12-3362-ga3fb781d4b341c0d50ef1b92cd3e8734e673ef18
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Sep 2 14:48:10 2021 +0200

    tree-optimization/102176 - locally compute participating SLP stmts

    This performs local re-computation of participating scalar stmts
    in BB vectorization subgraphs to allow precise computation of
    liveness of scalar stmts after vectorization and thus precise
    costing.  This treats all extern defs as live but continues
    to optimistically handle scalar defs that we think we can handle
    by lane-extraction even though that can still fail late during
    code-generation.

    2021-09-02  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/102176
            * tree-vect-slp.c (vect_slp_gather_vectorized_scalar_stmts):
            New function.
            (vect_bb_slp_scalar_cost): Use the computed set of
            vectorized scalar stmts instead of relying on the out-of-date
            and not accurate PURE_SLP_STMT.
            (vect_bb_vectorization_profitable_p): Compute the set
            of vectorized scalar stmts.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/102176] BB SLP scalar costing is off with extern promoted nodes
  2021-09-02 12:18 [Bug tree-optimization/102176] New: BB SLP scalar costing is off with extern promoted nodes rguenth at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-09-06  6:55 ` cvs-commit at gcc dot gnu.org
@ 2021-09-06  6:56 ` rguenth at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-06  6:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102176

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |12.0

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed on trunk, since we enabled whole-function BB vectorization for GCC 11 I'm
considering to backport this.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-09-06  6:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-02 12:18 [Bug tree-optimization/102176] New: BB SLP scalar costing is off with extern promoted nodes rguenth at gcc dot gnu.org
2021-09-02 12:18 ` [Bug tree-optimization/102176] " rguenth at gcc dot gnu.org
2021-09-02 12:26 ` rguenth at gcc dot gnu.org
2021-09-02 12:54 ` rguenth at gcc dot gnu.org
2021-09-06  6:55 ` cvs-commit at gcc dot gnu.org
2021-09-06  6:56 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).