[Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "fxue at os dot amperecomputing.com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement
Date: Wed, 20 Dec 2023 09:54:01 +0000	[thread overview]
Message-ID: <bug-113091-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113091

            Bug ID: 113091
           Summary: Over-estimate SLP vector-to-scalar cost for non-live
                    pattern statement
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: fxue at os dot amperecomputing.com
  Target Milestone: ---

Gcc fails to vectorize the below testcase on aarch64.

  int test(unsigned array[8]);

  int foo(char *a, char *b)
  {
    unsigned array[8];

    array[0] = (a[0] - b[0]);
    array[1] = (a[1] - b[1]);
    array[2] = (a[2] - b[2]);
    array[3] = (a[3] - b[3]);
    array[4] = (a[4] - b[4]);
    array[5] = (a[5] - b[5]);
    array[6] = (a[6] - b[6]);
    array[7] = (a[7] - b[7]);

    return test(array);
  }

The dump shows that loads to a[i] and b[i] are considered to be live as scalar
references, which results in over-estimated vector-to-scalar cost.

*a_50(D) 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 1B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 2B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 3B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 4B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 5B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 6B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)a_50(D) + 7B] 1 times vec_to_scalar costs 2 in epilogue
*b_51(D) 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 1B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 2B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 3B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 4B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 5B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 6B] 1 times vec_to_scalar costs 2 in epilogue
MEM[(char *)b_51(D) + 7B] 1 times vec_to_scalar costs 2 in epilogue

Subtraction on char type is recognized as widen-sub, and involves two kinds of
pattern replacement.

 * Original
 _1 = *a_50(D);
 _2 = (int) _1;
 _3 = *b_51(D);
 _4 = (int) _3;
 _5 = _2 - _4;


 * After pattern replacement
 patt_63 = (unsigned short) _1;  //  _2 = (int) _1;
 patt_64 = (int) patt_63;        //  _2 = (int) _1;

 patt_65 = (unsigned short) _3;  //  _4 = (int) _3;
 patt_66 = (int) patt_65;        //  _4 = (int) _3;

 patt_67 = .VEC_WIDEN_MINUS (_1, _3);  //  _5 = _2 - _4;
 patt_68 = (signed short) patt_67;     //  _5 = _2 - _4;
 patt_69 = (int) patt_68;              //  _5 = _2 - _4;

For the statement "_2 = (int) _1", its vectorization representative "patt_64 =
(int) patt_63" is not marked as PURE_SLP, so it is conservatively considered to
having scalar use and being live outside of SLP bb (in the function
vect_bb_slp_mark_live_stmts). However, the pattern definition is actually dead,
should not contribute to vector-to-scalar cost. 

Those defs from pattern statements are not part of function body, we could not
track def/use chain as ordinary SSAs. Probably, we may have a quick fix for one
situation, if the original SSA "_2" has single use, its existence should be
only covered by vectorized operation, no matter what/how it would be w/o
pattern replacement.

next             reply	other threads:[~2023-12-20  9:54 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-20  9:54 fxue at os dot amperecomputing.com [this message]
2023-12-20 13:09 ` [Bug tree-optimization/113091] " rguenth at gcc dot gnu.org
2023-12-21  5:25 ` fxue at os dot amperecomputing.com
2023-12-21  5:27 ` fxue at os dot amperecomputing.com
2023-12-21  7:31 ` rguenth at gcc dot gnu.org
2023-12-21 11:01 ` rsandifo at gcc dot gnu.org
2023-12-22  3:55 ` fxue at os dot amperecomputing.com
2023-12-26 15:16 ` fxue at os dot amperecomputing.com
2023-12-29 10:35 ` fxue at os dot amperecomputing.com
2024-01-16  3:36 ` cvs-commit at gcc dot gnu.org
2024-01-31  3:13 ` fxue at os dot amperecomputing.com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-113091-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).