public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/95265] New: aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull
@ 2020-05-21 22:36 generictoadhuman at gmail dot com
  2020-05-21 23:11 ` [Bug target/95265] " pinskia at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: generictoadhuman at gmail dot com @ 2020-05-21 22:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95265

            Bug ID: 95265
           Summary: aarch64: suboptimal code generation for common neon
                    intrinsic sequence involving shrn and mull
           Product: gcc
           Version: 10.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: generictoadhuman at gmail dot com
  Target Milestone: ---

Compileable example:

#include <arm_neon.h>

int32x4_t func(int32x4_t a, int32x4_t b)
{
    return vshrn_high_n_s64(
        vshrn_n_s64(vmull_s32(vget_low_s32(a), vget_low_s32(b)), 12), 
        vmull_high_s32(a, b), 12);
}

with gcc -O3 the generated code contains two superfluent movs and and one
unecessary dup.

output of gcc -v
Using built-in specs.
COLLECT_GCC=C:\msys64\opt\devkitpro\devkitA64\bin\aarch64-none-elf-gcc.exe
COLLECT_LTO_WRAPPER=c:/msys64/opt/devkitpro/devkita64/bin/../libexec/gcc/aarch64-none-elf/10.1.0/lto-wrapper.exe
Target: aarch64-none-elf
Configured with: ../../gcc-10.1.0/configure --enable-languages=c,c++,objc,lto
--with-gnu-as --with-gnu-ld --with-gcc --with-march=armv8
--enable-cxx-flags=-ffunction-sections --disable-libstdcxx-verbose
--enable-poison-system-directories --enable-interwork --enable-multilib
--enable-threads --disable-win32-registry --disable-nls --disable-debug
--disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch
--enable-libstdcxx-time --enable-libstdcxx-filesystem-ts
--target=aarch64-none-elf --with-newlib=yes
--with-headers=../../newlib-3.3.0/newlib/libc/include
--prefix=/opt/devkitpro/x86_64-w64-mingw32/devkitA64 --enable-lto
--with-system-zlib
--with-bugurl=https://github.com/devkitPro/buildscripts/issues
--with-pkgversion='devkitA64 release 15' --build=x86_64-unknown-linux-gnu
--host=x86_64-w64-mingw32 --with-gmp=/opt/mingw64/mingw
--with-mpfr=/opt/mingw64/mingw --with-mpc=/opt/mingw64/mingw
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.1.0 (devkitA64 release 15)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/95265] aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull
  2020-05-21 22:36 [Bug target/95265] New: aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull generictoadhuman at gmail dot com
@ 2020-05-21 23:11 ` pinskia at gcc dot gnu.org
  2021-01-29  8:43 ` ktkachov at gcc dot gnu.org
  2021-02-10 12:36 ` ktkachov at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2020-05-21 23:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95265

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
         Depends on|                            |92665
           Severity|normal                      |enhancement
   Last reconfirmed|                            |2020-05-21
     Ever confirmed|0                           |1

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The patch for PR 92665 fixes the dup and one mov.  The other move/zeroing the
upper part of the register needs to be seperate as it is due to the intrinsic
being defined as an inline-assembly .


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665
[Bug 92665] [AArch64] low lanes select not optimized out for vmlal intrinsics

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/95265] aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull
  2020-05-21 22:36 [Bug target/95265] New: aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull generictoadhuman at gmail dot com
  2020-05-21 23:11 ` [Bug target/95265] " pinskia at gcc dot gnu.org
@ 2021-01-29  8:43 ` ktkachov at gcc dot gnu.org
  2021-02-10 12:36 ` ktkachov at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2021-01-29  8:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95265
Bug 95265 depends on bug 92665, which changed state.

Bug 92665 Summary: [AArch64] low lanes select not optimized out for vmlal intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/95265] aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull
  2020-05-21 22:36 [Bug target/95265] New: aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull generictoadhuman at gmail dot com
  2020-05-21 23:11 ` [Bug target/95265] " pinskia at gcc dot gnu.org
  2021-01-29  8:43 ` ktkachov at gcc dot gnu.org
@ 2021-02-10 12:36 ` ktkachov at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2021-02-10 12:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95265

ktkachov at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
      Known to work|                            |11.0
   Target Milestone|---                         |11.0
             Status|NEW                         |RESOLVED
                 CC|                            |ktkachov at gcc dot gnu.org

--- Comment #2 from ktkachov at gcc dot gnu.org ---
This is fixed in GCC 11. It now generates:
func:
        smull   v2.2d, v0.2s, v1.2s
        smull2  v1.2d, v0.4s, v1.4s
        shrn    v0.2s, v2.2d, 12
        shrn2   v0.4s, v1.2d, 12
        ret

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-02-10 12:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-21 22:36 [Bug target/95265] New: aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull generictoadhuman at gmail dot com
2020-05-21 23:11 ` [Bug target/95265] " pinskia at gcc dot gnu.org
2021-01-29  8:43 ` ktkachov at gcc dot gnu.org
2021-02-10 12:36 ` ktkachov at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).