public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/98143] New: arm: missed vectorization with MVE compared to Neon
@ 2020-12-04 14:22 clyon at gcc dot gnu.org
  2021-04-21 15:31 ` [Bug target/98143] " clyon at gcc dot gnu.org
  0 siblings, 1 reply; 2+ messages in thread
From: clyon at gcc dot gnu.org @ 2020-12-04 14:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98143

            Bug ID: 98143
           Summary: arm: missed vectorization with MVE compared to Neon
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: clyon at gcc dot gnu.org
  Target Milestone: ---

While working on enabling auto-vectorization for MVE, I noticed a missed
optimization compared to Neon:

#include <stdint.h>
uint16_t *dest;
void func()
{
  int i;
  for (i=0;i<16;i++)
    dest[i]=3;
}

Compiled with -O3 -S -dp -mfloat-abi=hard -mfpu=auto -mcpu=cortex-a9 -mthumb:
func:
        movw    r3, #:lower16:.LANCHOR0 @ 15    [c=4 l=4]  *thumb2_movsi_vfp/4
        vmov.i16        q8, #3  @ v8hi  @ 7     [c=4 l=4]  *neon_movv8hi/2
        movt    r3, #:upper16:.LANCHOR0 @ 16    [c=4 l=4]  *arm_movt/0
        ldr     r3, [r3]        @ 14    [c=12 l=4]  *thumb2_movsi_vfp/5
        vst1.16 {q8}, [r3]!     @ 8     [c=8 l=4]  *movmisalignv8hi_neon_store
        vst1.16 {q8}, [r3]      @ 11    [c=8 l=4]  *movmisalignv8hi_neon_store
        bx      lr      @ 44    [c=8 l=4]  *thumb2_return

Compiled with -O3 -S -dp -mfloat-abi=hard -mfpu=auto -march=armv8.1-m.main+mve
-mthumb:
func:
        movs    r2, #3  @ 7     [c=4 l=2]  *thumb2_movsi_shortim
        ldr     r3, .L3 @ 5     [c=12 l=4]  *thumb2_movsi_vfp/5
        ldr     r3, [r3]        @ 6     [c=12 l=4]  *thumb2_movsi_vfp/5
        strh    r2, [r3]        @ movhi @ 9     [c=4 l=4]  *thumb2_movhi_vfp/4
        strh    r2, [r3, #2]    @ movhi @ 12    [c=4 l=4]  *thumb2_movhi_vfp/4
        strh    r2, [r3, #4]    @ movhi @ 15    [c=4 l=4]  *thumb2_movhi_vfp/4
        strh    r2, [r3, #6]    @ movhi @ 18    [c=4 l=4]  *thumb2_movhi_vfp/4
        strh    r2, [r3, #8]    @ movhi @ 21    [c=4 l=4]  *thumb2_movhi_vfp/4
        strh    r2, [r3, #10]   @ movhi @ 24    [c=4 l=4]  *thumb2_movhi_vfp/4
        strh    r2, [r3, #12]   @ movhi @ 27    [c=4 l=4]  *thumb2_movhi_vfp/4
        strh    r2, [r3, #14]   @ movhi @ 30    [c=4 l=4]  *thumb2_movhi_vfp/4
        strh    r2, [r3, #16]   @ movhi @ 33    [c=4 l=4]  *thumb2_movhi_vfp/4
        strh    r2, [r3, #18]   @ movhi @ 36    [c=4 l=4]  *thumb2_movhi_vfp/4
        strh    r2, [r3, #20]   @ movhi @ 39    [c=4 l=4]  *thumb2_movhi_vfp/4
        strh    r2, [r3, #22]   @ movhi @ 42    [c=4 l=4]  *thumb2_movhi_vfp/4
        strh    r2, [r3, #24]   @ movhi @ 45    [c=4 l=4]  *thumb2_movhi_vfp/4
        strh    r2, [r3, #26]   @ movhi @ 48    [c=4 l=4]  *thumb2_movhi_vfp/4
        strh    r2, [r3, #28]   @ movhi @ 51    [c=4 l=4]  *thumb2_movhi_vfp/4
        strh    r2, [r3, #30]   @ movhi @ 54    [c=4 l=4]  *thumb2_movhi_vfp/4
        bx      lr      @ 84    [c=8 l=4]  *thumb2_return



This PR is about building the const, as the problems with stores are probably
part of PR97875.

In summry, with Neon we build the constant vector with:
        vmov.i16        q8, #3  @ v8hi  @ 7     [c=4 l=4]  *neon_movv8hi/2
but with MVE:
        movs    r2, #3  @ 7     [c=4 l=2]  *thumb2_movsi_shortim
and then store it as 16-bits value as many times as needed.

I haven't managed to understand why we can't make use of mve.md's mve_mov<mode>
where there is an alternative with "Dm", which should work?

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug target/98143] arm: missed vectorization with MVE compared to Neon
  2020-12-04 14:22 [Bug target/98143] New: arm: missed vectorization with MVE compared to Neon clyon at gcc dot gnu.org
@ 2021-04-21 15:31 ` clyon at gcc dot gnu.org
  0 siblings, 0 replies; 2+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-04-21 15:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98143

--- Comment #1 from Christophe Lyon <clyon at gcc dot gnu.org> ---
Current trunk generates for MVE:

        ldr     r3, .L3+16      @ 5     [c=12 l=4]  *thumb2_movsi_vfp/5
        vldr.64 d6, .L3 @ 7     [c=8 l=4]  *mve_movv8hi/8
        vldr.64 d7, .L3+8
        ldr     r3, [r3]        @ 14    [c=12 l=4]  *thumb2_movsi_vfp/5
        mov     r2, r3  @ 17    [c=4 l=2]  *thumb2_movsi_vfp/0
        adds    r3, r3, #16     @ 18    [c=4 l=2]  *thumb2_addsi_short/1
        vstrh.16        q3, [r2]        @ 8     [c=8 l=4] 
*movmisalignv8hi_mve_store
        vstrh.16        q3, [r3]        @ 11    [c=8 l=4] 
*movmisalignv8hi_mve_store
        bx      lr      @ 45    [c=8 l=4]  *thumb2_return
.L4:
        .align  3
.L3:
        .short  3
        .short  3
        .short  3
        .short  3
        .short  3
        .short  3
        .short  3
        .short  3

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-04-21 15:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-04 14:22 [Bug target/98143] New: arm: missed vectorization with MVE compared to Neon clyon at gcc dot gnu.org
2021-04-21 15:31 ` [Bug target/98143] " clyon at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).