public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement
@ 2023-12-20  9:54 fxue at os dot amperecomputing.com
  2023-12-20 13:09 ` [Bug tree-optimization/113091] " rguenth at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: fxue at os dot amperecomputing.com @ 2023-12-20  9:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091

            Bug ID: 113091
           Summary: Over-estimate SLP vector-to-scalar cost for non-live
                    pattern statement
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: fxue at os dot amperecomputing.com
  Target Milestone: ---

Gcc fails to vectorize the below testcase on aarch64.

  int test(unsigned array[8]);

  int foo(char *a, char *b)
  {
    unsigned array[8];

    array[0] = (a[0] - b[0]);
    array[1] = (a[1] - b[1]);
    array[2] = (a[2] - b[2]);
    array[3] = (a[3] - b[3]);
    array[4] = (a[4] - b[4]);
    array[5] = (a[5] - b[5]);
    array[6] = (a[6] - b[6]);
    array[7] = (a[7] - b[7]);

    return test(array);
  }

The dump shows that loads to a[i] and b[i] are considered to be live as scalar
references, which results in over-estimated vector-to-scalar cost.

*a_50(D) 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 1B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 2B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 3B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 4B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 5B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 6B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 7B] 1 times vec_to_scalar costs 2 in epilogue
*b_51(D) 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 1B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 2B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 3B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 4B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 5B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 6B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 7B] 1 times vec_to_scalar costs 2 in epilogue

Subtraction on char type is recognized as widen-sub, and involves two kinds of
pattern replacement.

 * Original
 _1 = *a_50(D);
 _2 = (int) _1;
 _3 = *b_51(D);
 _4 = (int) _3;
 _5 = _2 - _4;


 * After pattern replacement
 patt_63 = (unsigned short) _1;  //  _2 = (int) _1;
 patt_64 = (int) patt_63;        //  _2 = (int) _1;

 patt_65 = (unsigned short) _3;  //  _4 = (int) _3;
 patt_66 = (int) patt_65;        //  _4 = (int) _3;

 patt_67 = .VEC_WIDEN_MINUS (_1, _3);  //  _5 = _2 - _4;
 patt_68 = (signed short) patt_67;     //  _5 = _2 - _4;
 patt_69 = (int) patt_68;              //  _5 = _2 - _4;

For the statement "_2 = (int) _1", its vectorization representative "patt_64 =
(int) patt_63" is not marked as PURE_SLP, so it is conservatively considered to
having scalar use and being live outside of SLP bb (in the function
vect_bb_slp_mark_live_stmts). However, the pattern definition is actually dead,
should not contribute to vector-to-scalar cost. 

Those defs from pattern statements are not part of function body, we could not
track def/use chain as ordinary SSAs. Probably, we may have a quick fix for one
situation, if the original SSA "_2" has single use, its existence should be
only covered by vectorized operation, no matter what/how it would be w/o
pattern replacement.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-01-31  3:13 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-20  9:54 [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement fxue at os dot amperecomputing.com
2023-12-20 13:09 ` [Bug tree-optimization/113091] " rguenth at gcc dot gnu.org
2023-12-21  5:25 ` fxue at os dot amperecomputing.com
2023-12-21  5:27 ` fxue at os dot amperecomputing.com
2023-12-21  7:31 ` rguenth at gcc dot gnu.org
2023-12-21 11:01 ` rsandifo at gcc dot gnu.org
2023-12-22  3:55 ` fxue at os dot amperecomputing.com
2023-12-26 15:16 ` fxue at os dot amperecomputing.com
2023-12-29 10:35 ` fxue at os dot amperecomputing.com
2024-01-16  3:36 ` cvs-commit at gcc dot gnu.org
2024-01-31  3:13 ` fxue at os dot amperecomputing.com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).