[Bug tree-optimization/108724] Poor codegen when summing two arrays without AVX or SSE

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/108724] Poor codegen when summing two arrays without AVX or SSE
Date: Thu, 09 Feb 2023 13:54:25 +0000	[thread overview]
Message-ID: <bug-108724-4-Pdl5C3h03N@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-108724-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108724

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2023-02-09

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Adding -fopt-info shows

t.c:3:21: optimized: loop vectorized using 8 byte vectors
t.c:1:6: optimized: loop with 7 iterations completely unrolled (header
execution count 63136016)

disabling unrolling instead shows

.L2:
        leaq    (%rsi,%rax), %r8
        leaq    (%rdx,%rax), %rdi
        movl    (%r8), %ecx
        addl    (%rdi), %ecx
        movq    %r10, -8(%rsp)
        movl    %ecx, -8(%rsp)
        movq    -8(%rsp), %rcx
        movl    4(%rdi), %edi
        addl    4(%r8), %edi
        movq    %rcx, -16(%rsp)
        movl    %edi, -12(%rsp)
        movq    -16(%rsp), %rcx
        movq    %rcx, (%r9,%rax)
        addq    $8, %rax
        cmpq    $64, %rax
        jne     .L2

and what happens is that vector lowering fails to perform generic vector
addition (vector lowering is supposed to materialize that), but instead
decomposes the vector, doing scalar adds, which eventually results in
us spilling ...

The reason is that vector lowering does

/* Expand a vector operation to scalars; for integer types we can use
   special bit twiddling tricks to do the sums a word at a time, using
   function F_PARALLEL instead of F.  These tricks are done only if
   they can process at least four items, that is, only if the vector
   holds at least four items and if a word can hold four items.  */
static tree
expand_vector_addition (gimple_stmt_iterator *gsi,
                        elem_op_func f, elem_op_func f_parallel,
                        tree type, tree a, tree b, enum tree_code code)
{
  int parts_per_word = BITS_PER_WORD / vector_element_bits (type);

  if (INTEGRAL_TYPE_P (TREE_TYPE (type))
      && parts_per_word >= 4
      && nunits_for_known_piecewise_op (type) >= 4)
    return expand_vector_parallel (gsi, f_parallel,
                                   type, a, b, code);
  else
    return expand_vector_piecewise (gsi, f,
                                    type, TREE_TYPE (type),
                                    a, b, code, false);

so it only treats >= 4 elements as profitable to vectorize this way but the
vectorizer doesn't seem to know that, it instead applies its own cost model
here while vector lowering doesn't have any.

next prev parent reply	other threads:[~2023-02-09 13:54 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-08 19:17 [Bug target/108724] New: [11 regression] " gbs at canishe dot com
2023-02-08 19:30 ` [Bug tree-optimization/108724] " pinskia at gcc dot gnu.org
2023-02-09  9:37 ` crazylht at gmail dot com
2023-02-09 13:54 ` rguenth at gcc dot gnu.org [this message]
2023-02-10 10:00 ` rguenth at gcc dot gnu.org
2023-02-10 10:07 ` [Bug tree-optimization/108724] [11/12/13 Regression] " rguenth at gcc dot gnu.org
2023-02-10 11:22 ` cvs-commit at gcc dot gnu.org
2023-02-10 11:22 ` [Bug tree-optimization/108724] [11/12 " rguenth at gcc dot gnu.org
2023-03-15  9:48 ` cvs-commit at gcc dot gnu.org
2023-05-05  8:34 ` [Bug tree-optimization/108724] [11 " rguenth at gcc dot gnu.org
2023-05-05 12:06 ` [Bug target/108724] " rguenth at gcc dot gnu.org
2023-05-23 12:55 ` rguenth at gcc dot gnu.org
2023-05-29 10:08 ` jakub at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-108724-4-Pdl5C3h03N@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).