public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "clyon at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/97875] New: suboptimal loop vectorization
Date: Tue, 17 Nov 2020 13:11:41 +0000	[thread overview]
Message-ID: <bug-97875-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97875

            Bug ID: 97875
           Summary: suboptimal loop vectorization
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: clyon at gcc dot gnu.org
  Target Milestone: ---

Looking at the code generated for gcc.target/arm/simd/mve-vsub_1.c:
#include <stdint.h>

void test_vsub_i32 (int32_t * dest, int32_t * a, int32_t * b) {
  int i;
  for (i=0; i<4; i++) {
    dest[i] = a[i] - b[i];
  }
}

Compiled with -mfloat-abi=hard -mfpu=auto -march=armv8.1-m.main+mve -mthumb
-O3, we get:
test_vsub_i32:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        add     ip, r1, #4
        adds    r3, r2, #4
        sub     ip, r0, ip
        subs    r3, r0, r3
        cmp     ip, #8
        it      hi
        cmphi   r3, #8
        bls     .L2
        orr     r3, r2, r0
        orrs    r3, r3, r1
        lsls    r3, r3, #28
        bne     .L2
        vldrw.32        q3, [r1]
        vldrw.32        q2, [r2]
        vsub.i32        q3, q3, q2
        vstrw.32        q3, [r0]
        bx      lr
.L2:
        ldr     r3, [r1]
        push    {r4}
        ldr     r4, [r2]
        subs    r3, r3, r4
        str     r3, [r0]
        ldr     r4, [r2, #4]
        ldr     r3, [r1, #4]
        subs    r3, r3, r4
        str     r3, [r0, #4]
        ldr     r4, [r2, #8]
        ldr     r3, [r1, #8]
        subs    r3, r3, r4
        str     r3, [r0, #8]
        ldr     r3, [r1, #12]
        ldr     r2, [r2, #12]
        ldr     r4, [sp], #4
        subs    r3, r3, r2
        str     r3, [r0, #12]
        bx      lr


but only the short vectorized part is necessary:
        vldrw.32        q3, [r1]
        vldrw.32        q2, [r2]
        vsub.i32        q3, q3, q2
        vstrw.32        q3, [r0]
        bx      lr

Since the loop trip count is constant (=4), why isn't this better optimized?


If I declare 'dest' as __restrict__, I get something better, but still not
perfect:
test_vsub_i32:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        orr     r3, r2, r0
        orrs    r3, r3, r1
        lsls    r3, r3, #28
        bne     .L2
        vldrw.32        q3, [r1]
        vldrw.32        q2, [r2]
        vsub.i32        q3, q3, q2
        vstrw.32        q3, [r0]
        bx      lr
.L2:
        push    {r4, r5}
        ldr     r3, [r1]
        ldr     r4, [r2]
        subs    r4, r3, r4
        str     r4, [r0]
        ldr     r3, [r1, #4]
        ldr     r4, [r2, #4]
        subs    r5, r3, r4
        str     r5, [r0, #4]
        ldrd    r4, r3, [r1, #8]
        ldrd    r5, r1, [r2, #8]
        subs    r4, r4, r5
        subs    r3, r3, r1
        strd    r4, r3, [r0, #8]
        pop     {r4, r5}
        bx      lr



Compiling for cortex-a9 and Neon:
-mfloat-abi=hard -mcpu=cortex-a9 -mfpu=neon -O3
test_vsub_i32:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        add     ip, r2, #4
        adds    r3, r1, #4
        sub     ip, r0, ip
        subs    r3, r0, r3
        cmp     ip, #8
        it      hi
        cmphi   r3, #8
        bls     .L2
        vld1.32 {q8}, [r1]
        vld1.32 {q9}, [r2]
        vsub.i32        q8, q8, q9
        vst1.32 {q8}, [r0]
        bx      lr
.L2:
        ldr     r3, [r1]
        push    {r4}
        ldr     r4, [r2]
        subs    r3, r3, r4
        str     r3, [r0]
        ldr     r4, [r2, #4]
        ldr     r3, [r1, #4]
        subs    r3, r3, r4
        str     r3, [r0, #4]
        ldr     r4, [r2, #8]
        ldr     r3, [r1, #8]
        subs    r3, r3, r4
        ldr     r4, [sp], #4
        str     r3, [r0, #8]
        ldr     r3, [r1, #12]
        ldr     r2, [r2, #12]
        subs    r3, r3, r2
        str     r3, [r0, #12]
        bx      lr


But in this case adding __restrict__ works well:
test_vsub_i32:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        vld1.32 {q8}, [r1]
        vld1.32 {q9}, [r2]
        vsub.i32        q8, q8, q9
        vst1.32 {q8}, [r0]
        bx      lr

             reply	other threads:[~2020-11-17 13:11 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-17 13:11 clyon at gcc dot gnu.org [this message]
2020-11-17 15:25 ` [Bug tree-optimization/97875] " rguenth at gcc dot gnu.org
2020-11-17 15:41 ` [Bug target/97875] " clyon at gcc dot gnu.org
2020-11-18  8:17 ` rguenth at gcc dot gnu.org
2020-12-09 15:06 ` clyon at gcc dot gnu.org
2020-12-09 16:59 ` clyon at gcc dot gnu.org
2020-12-10 14:42 ` clyon at gcc dot gnu.org
2021-01-12 16:51 ` cvs-commit at gcc dot gnu.org
2021-01-12 16:52 ` clyon at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-97875-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).