public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "pinskia at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/114814] New: Reduction sum of comparison should be better
Date: Mon, 22 Apr 2024 22:15:18 +0000	[thread overview]
Message-ID: <bug-114814-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114814

            Bug ID: 114814
           Summary: Reduction sum of comparison should be better
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

Take the example from PR 114809:
```
#include <cstdint>
#include <cstdlib>

size_t count_chars(const char *src, size_t len, char c) {
    size_t count = 0;
    for (size_t i=0; i < len; i++) {
        count += src[i] == c;
    }

    return count;
}
```

For aarch64 we produce currently for the inner loop:
```
.L4:
        ldr     q31, [x3], 16
        cmeq    v31.16b, v31.16b, v22.16b
        and     v31.16b, v23.16b, v31.16b
        zip1    v27.16b, v31.16b, v29.16b
        zip2    v31.16b, v31.16b, v29.16b
        zip1    v25.8h, v27.8h, v29.8h
        zip2    v27.8h, v27.8h, v29.8h
        zip1    v26.8h, v31.8h, v29.8h
        zip2    v31.8h, v31.8h, v29.8h
        zip2    v30.4s, v25.4s, v29.4s
        zip2    v28.4s, v27.4s, v29.4s
        uaddw   v30.2d, v30.2d, v25.2s
        uaddw   v28.2d, v28.2d, v27.2s
        uaddw   v30.2d, v30.2d, v26.2s
        uaddw2  v28.2d, v28.2d, v26.4s
        uaddw   v30.2d, v30.2d, v31.2s
        uaddw2  v31.2d, v28.2d, v31.4s
        add     v31.2d, v30.2d, v31.2d
        add     v24.2d, v24.2d, v31.2d
        cmp     x5, x3
        bne     .L4
```

But instead we should be able to just do:
```
.L4:
        ldr     q31, [x3], 16
        cmeq    v31.16b, v31.16b, v22.16b
        and     v31.16b, v23.16b, v31.16b
        addv    b31, v31.16b
        fmov    x0, d31
        add     x1, x1, x0
        cmp     x5, x3
        bne     .L4
```

Instead. That is do the reduction of the sum of the compare inside the loop
rather than outside.

                 reply	other threads:[~2024-04-22 22:15 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-114814-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).