public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "clyon at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/97875] New: suboptimal loop vectorization Date: Tue, 17 Nov 2020 13:11:41 +0000 [thread overview] Message-ID: <bug-97875-4@http.gcc.gnu.org/bugzilla/> (raw) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97875 Bug ID: 97875 Summary: suboptimal loop vectorization Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: clyon at gcc dot gnu.org Target Milestone: --- Looking at the code generated for gcc.target/arm/simd/mve-vsub_1.c: #include <stdint.h> void test_vsub_i32 (int32_t * dest, int32_t * a, int32_t * b) { int i; for (i=0; i<4; i++) { dest[i] = a[i] - b[i]; } } Compiled with -mfloat-abi=hard -mfpu=auto -march=armv8.1-m.main+mve -mthumb -O3, we get: test_vsub_i32: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. add ip, r1, #4 adds r3, r2, #4 sub ip, r0, ip subs r3, r0, r3 cmp ip, #8 it hi cmphi r3, #8 bls .L2 orr r3, r2, r0 orrs r3, r3, r1 lsls r3, r3, #28 bne .L2 vldrw.32 q3, [r1] vldrw.32 q2, [r2] vsub.i32 q3, q3, q2 vstrw.32 q3, [r0] bx lr .L2: ldr r3, [r1] push {r4} ldr r4, [r2] subs r3, r3, r4 str r3, [r0] ldr r4, [r2, #4] ldr r3, [r1, #4] subs r3, r3, r4 str r3, [r0, #4] ldr r4, [r2, #8] ldr r3, [r1, #8] subs r3, r3, r4 str r3, [r0, #8] ldr r3, [r1, #12] ldr r2, [r2, #12] ldr r4, [sp], #4 subs r3, r3, r2 str r3, [r0, #12] bx lr but only the short vectorized part is necessary: vldrw.32 q3, [r1] vldrw.32 q2, [r2] vsub.i32 q3, q3, q2 vstrw.32 q3, [r0] bx lr Since the loop trip count is constant (=4), why isn't this better optimized? If I declare 'dest' as __restrict__, I get something better, but still not perfect: test_vsub_i32: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. orr r3, r2, r0 orrs r3, r3, r1 lsls r3, r3, #28 bne .L2 vldrw.32 q3, [r1] vldrw.32 q2, [r2] vsub.i32 q3, q3, q2 vstrw.32 q3, [r0] bx lr .L2: push {r4, r5} ldr r3, [r1] ldr r4, [r2] subs r4, r3, r4 str r4, [r0] ldr r3, [r1, #4] ldr r4, [r2, #4] subs r5, r3, r4 str r5, [r0, #4] ldrd r4, r3, [r1, #8] ldrd r5, r1, [r2, #8] subs r4, r4, r5 subs r3, r3, r1 strd r4, r3, [r0, #8] pop {r4, r5} bx lr Compiling for cortex-a9 and Neon: -mfloat-abi=hard -mcpu=cortex-a9 -mfpu=neon -O3 test_vsub_i32: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. add ip, r2, #4 adds r3, r1, #4 sub ip, r0, ip subs r3, r0, r3 cmp ip, #8 it hi cmphi r3, #8 bls .L2 vld1.32 {q8}, [r1] vld1.32 {q9}, [r2] vsub.i32 q8, q8, q9 vst1.32 {q8}, [r0] bx lr .L2: ldr r3, [r1] push {r4} ldr r4, [r2] subs r3, r3, r4 str r3, [r0] ldr r4, [r2, #4] ldr r3, [r1, #4] subs r3, r3, r4 str r3, [r0, #4] ldr r4, [r2, #8] ldr r3, [r1, #8] subs r3, r3, r4 ldr r4, [sp], #4 str r3, [r0, #8] ldr r3, [r1, #12] ldr r2, [r2, #12] subs r3, r3, r2 str r3, [r0, #12] bx lr But in this case adding __restrict__ works well: test_vsub_i32: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vld1.32 {q8}, [r1] vld1.32 {q9}, [r2] vsub.i32 q8, q8, q9 vst1.32 {q8}, [r0] bx lr
next reply other threads:[~2020-11-17 13:11 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-11-17 13:11 clyon at gcc dot gnu.org [this message] 2020-11-17 15:25 ` [Bug tree-optimization/97875] " rguenth at gcc dot gnu.org 2020-11-17 15:41 ` [Bug target/97875] " clyon at gcc dot gnu.org 2020-11-18 8:17 ` rguenth at gcc dot gnu.org 2020-12-09 15:06 ` clyon at gcc dot gnu.org 2020-12-09 16:59 ` clyon at gcc dot gnu.org 2020-12-10 14:42 ` clyon at gcc dot gnu.org 2021-01-12 16:51 ` cvs-commit at gcc dot gnu.org 2021-01-12 16:52 ` clyon at gcc dot gnu.org
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-97875-4@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).