public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement
@ 2023-12-20  9:54 fxue at os dot amperecomputing.com
  2023-12-20 13:09 ` [Bug tree-optimization/113091] " rguenth at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: fxue at os dot amperecomputing.com @ 2023-12-20  9:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091

            Bug ID: 113091
           Summary: Over-estimate SLP vector-to-scalar cost for non-live
                    pattern statement
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: fxue at os dot amperecomputing.com
  Target Milestone: ---

Gcc fails to vectorize the below testcase on aarch64.

  int test(unsigned array[8]);

  int foo(char *a, char *b)
  {
    unsigned array[8];

    array[0] = (a[0] - b[0]);
    array[1] = (a[1] - b[1]);
    array[2] = (a[2] - b[2]);
    array[3] = (a[3] - b[3]);
    array[4] = (a[4] - b[4]);
    array[5] = (a[5] - b[5]);
    array[6] = (a[6] - b[6]);
    array[7] = (a[7] - b[7]);

    return test(array);
  }

The dump shows that loads to a[i] and b[i] are considered to be live as scalar
references, which results in over-estimated vector-to-scalar cost.

*a_50(D) 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 1B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 2B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 3B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 4B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 5B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 6B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 7B] 1 times vec_to_scalar costs 2 in epilogue
*b_51(D) 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 1B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 2B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 3B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 4B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 5B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 6B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 7B] 1 times vec_to_scalar costs 2 in epilogue

Subtraction on char type is recognized as widen-sub, and involves two kinds of
pattern replacement.

 * Original
 _1 = *a_50(D);
 _2 = (int) _1;
 _3 = *b_51(D);
 _4 = (int) _3;
 _5 = _2 - _4;


 * After pattern replacement
 patt_63 = (unsigned short) _1;  //  _2 = (int) _1;
 patt_64 = (int) patt_63;        //  _2 = (int) _1;

 patt_65 = (unsigned short) _3;  //  _4 = (int) _3;
 patt_66 = (int) patt_65;        //  _4 = (int) _3;

 patt_67 = .VEC_WIDEN_MINUS (_1, _3);  //  _5 = _2 - _4;
 patt_68 = (signed short) patt_67;     //  _5 = _2 - _4;
 patt_69 = (int) patt_68;              //  _5 = _2 - _4;

For the statement "_2 = (int) _1", its vectorization representative "patt_64 =
(int) patt_63" is not marked as PURE_SLP, so it is conservatively considered to
having scalar use and being live outside of SLP bb (in the function
vect_bb_slp_mark_live_stmts). However, the pattern definition is actually dead,
should not contribute to vector-to-scalar cost. 

Those defs from pattern statements are not part of function body, we could not
track def/use chain as ordinary SSAs. Probably, we may have a quick fix for one
situation, if the original SSA "_2" has single use, its existence should be
only covered by vectorized operation, no matter what/how it would be w/o
pattern replacement.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
  2023-12-20  9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
@ 2023-12-20 13:09 ` rguenth at gcc dot gnu.org
  2023-12-21  5:25 ` fxue at os dot amperecomputing.com
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-12-20 13:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
It's the logic

  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
    {
      if (svisited.contains (stmt_info))
        continue;
      stmt_vec_info orig_stmt_info = vect_orig_stmt (stmt_info);
      if (STMT_VINFO_IN_PATTERN_P (orig_stmt_info)
          && STMT_VINFO_RELATED_STMT (orig_stmt_info) != stmt_info)
        /* Only the pattern root stmt computes the original scalar value.  */
        continue;
      bool mark_visited = true;
      gimple *orig_stmt = orig_stmt_info->stmt;
      ssa_op_iter op_iter;
      def_operand_p def_p;
      FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF)
        {
          imm_use_iterator use_iter;
          gimple *use_stmt;
          stmt_vec_info use_stmt_info;
          FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
            if (!is_gimple_debug (use_stmt))
              {
                use_stmt_info = bb_vinfo->lookup_stmt (use_stmt);
                if (!use_stmt_info
                    || !PURE_SLP_STMT (vect_stmt_to_vectorize (use_stmt_info)))
                  {
                    STMT_VINFO_LIVE_P (stmt_info) = true;

specifically the last check.  That's supposed to pick up the "main" pattern
that's now covering the scalar stmt.

But somehow the "main" pattern,

patt_67 = .VEC_WIDEN_MINUS (_1, _3);  //  _5 = _2 - _4;
 patt_68 = (signed short) patt_67;     //  _5 = _2 - _4;
 patt_69 = (int) patt_68;              //  _5 = _2 - _4;

doesn't get picked up here.  I wonder what's the orig_stmt and the def
picked and what original scalar use we end up in where the
vect_stmt_to_vectorize isn't the "last" pattern.  Maybe we really want
these "overlapping" patterns, but IMHO having "two entries" into
a chain of scalar stmts is bad and we should link up the whole matched
sequence to the final "root" instead?

That said, the current code doesn't see that wherever we end up isn't
dead code (aka fully covered by the vectorization).

IMO vect_stmt_to_vectorize for each of those stmts should end up at

patt_69 = (int) patt_68;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
  2023-12-20  9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
  2023-12-20 13:09 ` [Bug tree-optimization/113091] " rguenth at gcc dot gnu.org
@ 2023-12-21  5:25 ` fxue at os dot amperecomputing.com
  2023-12-21  5:27 ` fxue at os dot amperecomputing.com
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: fxue at os dot amperecomputing.com @ 2023-12-21  5:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091

--- Comment #2 from Feng Xue <fxue at os dot amperecomputing.com> ---
(In reply to Richard Biener from comment #1)
> It's the logic
> 
>   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
>     {
>       if (svisited.contains (stmt_info))
>         continue;
>       stmt_vec_info orig_stmt_info = vect_orig_stmt (stmt_info);
>       if (STMT_VINFO_IN_PATTERN_P (orig_stmt_info)
>           && STMT_VINFO_RELATED_STMT (orig_stmt_info) != stmt_info)
>         /* Only the pattern root stmt computes the original scalar value.  */
>         continue;
>       bool mark_visited = true;
>       gimple *orig_stmt = orig_stmt_info->stmt;
>       ssa_op_iter op_iter;
>       def_operand_p def_p;
>       FOR_EACH_PHI_OR_STMT_DEF (def_p, orig_stmt, op_iter, SSA_OP_DEF)
>         {
>           imm_use_iterator use_iter;
>           gimple *use_stmt;
>           stmt_vec_info use_stmt_info;
>           FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
>             if (!is_gimple_debug (use_stmt))
>               {
>                 use_stmt_info = bb_vinfo->lookup_stmt (use_stmt);
>                 if (!use_stmt_info
>                     || !PURE_SLP_STMT (vect_stmt_to_vectorize
> (use_stmt_info)))
>                   {
>                     STMT_VINFO_LIVE_P (stmt_info) = true;
> 
> specifically the last check.  That's supposed to pick up the "main" pattern
> that's now covering the scalar stmt.
> 
> But somehow the "main" pattern,
> 
> patt_67 = .VEC_WIDEN_MINUS (_1, _3);  //  _5 = _2 - _4;
>  patt_68 = (signed short) patt_67;     //  _5 = _2 - _4;
>  patt_69 = (int) patt_68;              //  _5 = _2 - _4;
> 
> doesn't get picked up here.  I wonder what's the orig_stmt and the def
> picked and what original scalar use we end up in where the
> vect_stmt_to_vectorize isn't the "last" pattern.  Maybe we really want


This problem happens at slp node:

 note: node 0x425bc38 (max_nunits=8, refcnt=1) vector(8) char
 note: op template: _1 = *a_50(D);
 note:  stmt 0 _1 = *a_50(D);
 note:  stmt 1 _7 = MEM[(char *)a_50(D) + 1B];
 note:  stmt 2 _13 = MEM[(char *)a_50(D) + 2B];
 note:  stmt 3 _19 = MEM[(char *)a_50(D) + 3B];
 note:  stmt 4 _25 = MEM[(char *)a_50(D) + 4B];
 note:  stmt 5 _31 = MEM[(char *)a_50(D) + 5B];
 note:  stmt 6 _37 = MEM[(char *)a_50(D) + 6B];
 note:  stmt 7 _43 = MEM[(char *)a_50(D) + 7B];

The orig_stmt is "_1 = *a_50(D)"

The use stmt is "_2 = (int) _1", whose pattern statement is "patt_64 = (int)
patt_63", which is not referenced by any original or other pattern statements.
Or in other word, the orig_stmt could be absorbed into a vector operation,
without any outlier scalar use.

The fore-mentioned "last check" in vect_bb_slp_mark_live_stmts would make the
orig_stmt to be STMT_VINFO_LIVE_P, which actually implies it has scalar use
(though it should not have), the difference is re-generating the def somewhere,
rather than retaining the original scalar statement. And the following
"vectorizable_live_operation" would account the new operations into
vectorization cost of the SLP instance.

The function vect_bb_vectorization_profitable_p resorts to a recursive way to
identify scalar use, for this case, setting STMT_VINFO_LIVE_P or not would
change scalar cost computation. If we can avoid such fake-liveness adjustment
on the statements we are interested in, vectorization cost could beat scalar
cost, and make the former succeed.

   Unvectorized: 
        mov     x2, x0
        stp     x29, x30, [sp, -48]!
        mov     x29, sp
        ldrb    w3, [x1]
        ldrb    w4, [x1, 1]
        add     x0, sp, 16
        ldrb    w9, [x2]
        ldrb    w8, [x2, 1]
        sub     w9, w9, w3
        ldrb    w7, [x2, 2]
        ldrb    w3, [x1, 2]
        sub     w8, w8, w4
        ldrb    w6, [x2, 3]
        ldrb    w4, [x1, 3]
        sub     w7, w7, w3
        ldrb    w10, [x1, 5]
        ldrb    w3, [x1, 4]
        sub     w6, w6, w4
        ldrb    w5, [x2, 4]
        ldrb    w4, [x2, 5]
        sub     w5, w5, w3
        ldrb    w3, [x2, 6]
        sub     w4, w4, w10
        ldrb    w2, [x2, 7]
        ldrb    w10, [x1, 6]
        ldrb    w1, [x1, 7]
        sub     w3, w3, w10
        stp     w9, w8, [sp, 16]
        sub     w1, w2, w1
        stp     w7, w6, [sp, 24]
        stp     w5, w4, [sp, 32]
        stp     w3, w1, [sp, 40]
        bl      test
        ldp     x29, x30, [sp], 48
        ret


    Vectorized:
        mov     x2, x0
        stp     x29, x30, [sp, -48]!
        mov     x29, sp
        ldr     d1, [x1]
        add     x0, sp, 16
        ldr     d0, [x2]
        usubl   v0.8h, v0.8b, v1.8b
        sxtl    v1.4s, v0.4h
        sxtl2   v0.4s, v0.8h
        stp     q1, q0, [sp, 16]
        bl      test
        ldp     x29, x30, [sp], 48
        ret


> these "overlapping" patterns, but IMHO having "two entries" into
> a chain of scalar stmts is bad and we should link up the whole matched
> sequence to the final "root" instead?
> 
> That said, the current code doesn't see that wherever we end up isn't
> dead code (aka fully covered by the vectorization).
> 
> IMO vect_stmt_to_vectorize for each of those stmts should end up at
> 
> patt_69 = (int) patt_68;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
  2023-12-20  9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
  2023-12-20 13:09 ` [Bug tree-optimization/113091] " rguenth at gcc dot gnu.org
  2023-12-21  5:25 ` fxue at os dot amperecomputing.com
@ 2023-12-21  5:27 ` fxue at os dot amperecomputing.com
  2023-12-21  7:31 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: fxue at os dot amperecomputing.com @ 2023-12-21  5:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091

--- Comment #3 from Feng Xue <fxue at os dot amperecomputing.com> ---
The function vect_bb_vectorization_profitable_p resorts to a recursive way to
identify scalar use, for this case, setting STMT_VINFO_LIVE_P or not would not
change scalar cost computation.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
  2023-12-20  9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
                   ` (2 preceding siblings ...)
  2023-12-21  5:27 ` fxue at os dot amperecomputing.com
@ 2023-12-21  7:31 ` rguenth at gcc dot gnu.org
  2023-12-21 11:01 ` rsandifo at gcc dot gnu.org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-12-21  7:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-12-21
             Status|UNCONFIRMED                 |NEW

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
"The use stmt is "_2 = (int) _1", whose pattern statement is "patt_64 = (int)
patt_63", which is not referenced by any original or other pattern statements.
Or in other word, the orig_stmt could be absorbed into a vector operation,
without any outlier scalar use."

That means the code sees that _2 = (int) _1 isn't vectorized (the pattern
stmt isn't actually used) which means _2 = (int) _1 stays in the code and
thus _1 is live.

The issue here is that because the "outer" pattern consumes
patt_64 = (int) patt_63 it should have adjusted _2 = (int) _1 stmt-to-vectorize
as being the outer pattern root stmt for all this logic to work correctly.

Otherwise we have no means of identifying whether a scalar stmt takes part
in vectorization or not.

I'm not sure what restrictions we place on pattern recognition of patterns - do
we require single-uses or do we allow the situation that one vectorization
path picks up the "inner" pattern while another picks the "outer" one?

In theory we can hack up the liveness analysis but as you noticed that
isn't the part doing the costing.  The costing part is just written in
the very same way (vect_bb_vectorization_profitable_p, specifically
vect_slp_gather_vectorized_scalar_stmts and vect_bb_slp_scalar_cost).
Basically the scalar cost is
the cost of the scalar stmts that are fully replaced (can be DCEd after
vectorization) by the vector stmts.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
  2023-12-20  9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
                   ` (3 preceding siblings ...)
  2023-12-21  7:31 ` rguenth at gcc dot gnu.org
@ 2023-12-21 11:01 ` rsandifo at gcc dot gnu.org
  2023-12-22  3:55 ` fxue at os dot amperecomputing.com
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2023-12-21 11:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091

--- Comment #5 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
> The issue here is that because the "outer" pattern consumes
> patt_64 = (int) patt_63 it should have adjusted _2 = (int) _1 stmt-to-vectorize
> as being the outer pattern root stmt for all this logic to work correctly.

I don't think it can though, at least not in general.  The final pattern
stmt has to compute the same value as the original scalar stmt.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
  2023-12-20  9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
                   ` (4 preceding siblings ...)
  2023-12-21 11:01 ` rsandifo at gcc dot gnu.org
@ 2023-12-22  3:55 ` fxue at os dot amperecomputing.com
  2023-12-26 15:16 ` fxue at os dot amperecomputing.com
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: fxue at os dot amperecomputing.com @ 2023-12-22  3:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091

--- Comment #6 from Feng Xue <fxue at os dot amperecomputing.com> ---
(In reply to Richard Sandiford from comment #5)
> > The issue here is that because the "outer" pattern consumes
> > patt_64 = (int) patt_63 it should have adjusted _2 = (int) _1 stmt-to-vectorize
> > as being the outer pattern root stmt for all this logic to work correctly.
> 
> I don't think it can though, at least not in general.  The final pattern
> stmt has to compute the same value as the original scalar stmt.

Could current pattern replacement support N:1 mapping (N stmts -> 1 pattern)?
If not, probably this handing would break related code somewhere.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
  2023-12-20  9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
                   ` (5 preceding siblings ...)
  2023-12-22  3:55 ` fxue at os dot amperecomputing.com
@ 2023-12-26 15:16 ` fxue at os dot amperecomputing.com
  2023-12-29 10:35 ` fxue at os dot amperecomputing.com
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: fxue at os dot amperecomputing.com @ 2023-12-26 15:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091

--- Comment #7 from Feng Xue <fxue at os dot amperecomputing.com> ---
> The issue here is that because the "outer" pattern consumes
> patt_64 = (int) patt_63 it should have adjusted _2 = (int) _1
> stmt-to-vectorize
> as being the outer pattern root stmt for all this logic to work correctly.

We could not simply make this adjustment since pattern recognition does not
require replaced SSA to be of single-use. If we change the above case to attach
another scalar use to "_2" as:

  int foo(char *a, char *b)
  {
    unsigned array[8];
    int a0 = a[0];  // _2 = (int) _1;

    array[0] = (a0 - b[0]);
    array[1] = (a[1] - b[1]);
    array[2] = (a[2] - b[2]);
    array[3] = (a[3] - b[3]);
    array[4] = (a[4] - b[4]);
    array[5] = (a[5] - b[5]);
    array[6] = (a[6] - b[6]);
    array[7] = (a[7] - b[7]);

    return test(array) + a0;
  }

The pattern statement "patt_64 = (int) patt_63" for "_2 = (int) _1" should be
kept. So we also need the check of "identifying whether a scalar stmt takes
part
in vectorization or not" to ensure the adjustment is doable.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
  2023-12-20  9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
                   ` (6 preceding siblings ...)
  2023-12-26 15:16 ` fxue at os dot amperecomputing.com
@ 2023-12-29 10:35 ` fxue at os dot amperecomputing.com
  2024-01-16  3:36 ` cvs-commit at gcc dot gnu.org
  2024-01-31  3:13 ` fxue at os dot amperecomputing.com
  9 siblings, 0 replies; 11+ messages in thread
From: fxue at os dot amperecomputing.com @ 2023-12-29 10:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091

--- Comment #8 from Feng Xue <fxue at os dot amperecomputing.com> ---
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641547.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
  2023-12-20  9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
                   ` (7 preceding siblings ...)
  2023-12-29 10:35 ` fxue at os dot amperecomputing.com
@ 2024-01-16  3:36 ` cvs-commit at gcc dot gnu.org
  2024-01-31  3:13 ` fxue at os dot amperecomputing.com
  9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-16  3:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091

--- Comment #9 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Feng Xue <fxue@gcc.gnu.org>:

https://gcc.gnu.org/g:57f611604e8bab67af6c0bcfe6ea88c001408412

commit r14-7272-g57f611604e8bab67af6c0bcfe6ea88c001408412
Author: Feng Xue <fxue@os.amperecomputing.com>
Date:   Thu Dec 28 16:55:39 2023 +0800

    Do not count unused scalar use when marking STMT_VINFO_LIVE_P [PR113091]

    When pattern recognition is involved, a statement whose definition is
    consumed in some pattern, may not be included in the final replacement
    pattern statements, and would be skipped when building SLP graph.

     * Original
      char a_c = *(char *) a;
      char b_c = *(char *) b;
      unsigned short a_s = (unsigned short) a_c;
      int a_i = (int) a_s;
      int b_i = (int) b_c;
      int r_i = a_i - b_i;

     * After pattern replacement
      a_s = (unsigned short) a_c;
      a_i = (int) a_s;

      patt_b_s = (unsigned short) b_c;    // b_i = (int) b_c
      patt_b_i = (int) patt_b_s;          // b_i = (int) b_c

      patt_r_s = widen_minus(a_c, b_c);   // r_i = a_i - b_i
      patt_r_i = (int) patt_r_s;          // r_i = a_i - b_i

    The definitions of a_i(original statement) and b_i(pattern statement)
    are related to, but actually not part of widen_minus pattern.
    Vectorizing the pattern does not cause these definition statements to
    be marked as PURE_SLP.  For this case, we need to recursively check
    whether their uses are all absorbed into vectorized code.  But there
    is an exception that some use may participate in an vectorized
    operation via an external SLP node containing that use as an element.

    gcc/ChangeLog:

            PR tree-optimization/113091
            * tree-vect-slp.cc (vect_slp_has_scalar_use): New function.
            (vect_bb_slp_mark_live_stmts): New parameter scalar_use_map, check
            scalar use with new function.
            (vect_bb_slp_mark_live_stmts): New function as entry to existing
            overriden functions with same name.
            (vect_slp_analyze_operations): Call new entry function to mark
            live statements.

    gcc/testsuite/ChangeLog:

            * gcc.target/aarch64/bb-slp-pr113091.c: New test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/113091] Over-estimate SLP vector-to-scalar cost for non-live pattern statement
  2023-12-20  9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
                   ` (8 preceding siblings ...)
  2024-01-16  3:36 ` cvs-commit at gcc dot gnu.org
@ 2024-01-31  3:13 ` fxue at os dot amperecomputing.com
  9 siblings, 0 replies; 11+ messages in thread
From: fxue at os dot amperecomputing.com @ 2024-01-31  3:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091

Feng Xue <fxue at os dot amperecomputing.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #10 from Feng Xue <fxue at os dot amperecomputing.com> ---
Fixed

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-01-31  3:13 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-20  9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
2023-12-20 13:09 ` [Bug tree-optimization/113091] " rguenth at gcc dot gnu.org
2023-12-21  5:25 ` fxue at os dot amperecomputing.com
2023-12-21  5:27 ` fxue at os dot amperecomputing.com
2023-12-21  7:31 ` rguenth at gcc dot gnu.org
2023-12-21 11:01 ` rsandifo at gcc dot gnu.org
2023-12-22  3:55 ` fxue at os dot amperecomputing.com
2023-12-26 15:16 ` fxue at os dot amperecomputing.com
2023-12-29 10:35 ` fxue at os dot amperecomputing.com
2024-01-16  3:36 ` cvs-commit at gcc dot gnu.org
2024-01-31  3:13 ` fxue at os dot amperecomputing.com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).