public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/95265] New: aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull
@ 2020-05-21 22:36 generictoadhuman at gmail dot com
2020-05-21 23:11 ` [Bug target/95265] " pinskia at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: generictoadhuman at gmail dot com @ 2020-05-21 22:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95265
Bug ID: 95265
Summary: aarch64: suboptimal code generation for common neon
intrinsic sequence involving shrn and mull
Product: gcc
Version: 10.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: generictoadhuman at gmail dot com
Target Milestone: ---
Compileable example:
#include <arm_neon.h>
int32x4_t func(int32x4_t a, int32x4_t b)
{
return vshrn_high_n_s64(
vshrn_n_s64(vmull_s32(vget_low_s32(a), vget_low_s32(b)), 12),
vmull_high_s32(a, b), 12);
}
with gcc -O3 the generated code contains two superfluent movs and and one
unecessary dup.
output of gcc -v
Using built-in specs.
COLLECT_GCC=C:\msys64\opt\devkitpro\devkitA64\bin\aarch64-none-elf-gcc.exe
COLLECT_LTO_WRAPPER=c:/msys64/opt/devkitpro/devkita64/bin/../libexec/gcc/aarch64-none-elf/10.1.0/lto-wrapper.exe
Target: aarch64-none-elf
Configured with: ../../gcc-10.1.0/configure --enable-languages=c,c++,objc,lto
--with-gnu-as --with-gnu-ld --with-gcc --with-march=armv8
--enable-cxx-flags=-ffunction-sections --disable-libstdcxx-verbose
--enable-poison-system-directories --enable-interwork --enable-multilib
--enable-threads --disable-win32-registry --disable-nls --disable-debug
--disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch
--enable-libstdcxx-time --enable-libstdcxx-filesystem-ts
--target=aarch64-none-elf --with-newlib=yes
--with-headers=../../newlib-3.3.0/newlib/libc/include
--prefix=/opt/devkitpro/x86_64-w64-mingw32/devkitA64 --enable-lto
--with-system-zlib
--with-bugurl=https://github.com/devkitPro/buildscripts/issues
--with-pkgversion='devkitA64 release 15' --build=x86_64-unknown-linux-gnu
--host=x86_64-w64-mingw32 --with-gmp=/opt/mingw64/mingw
--with-mpfr=/opt/mingw64/mingw --with-mpc=/opt/mingw64/mingw
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.1.0 (devkitA64 release 15)
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/95265] aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull
2020-05-21 22:36 [Bug target/95265] New: aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull generictoadhuman at gmail dot com
@ 2020-05-21 23:11 ` pinskia at gcc dot gnu.org
2021-01-29 8:43 ` ktkachov at gcc dot gnu.org
2021-02-10 12:36 ` ktkachov at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2020-05-21 23:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95265
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Status|UNCONFIRMED |NEW
Depends on| |92665
Severity|normal |enhancement
Last reconfirmed| |2020-05-21
Ever confirmed|0 |1
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The patch for PR 92665 fixes the dup and one mov. The other move/zeroing the
upper part of the register needs to be seperate as it is due to the intrinsic
being defined as an inline-assembly .
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665
[Bug 92665] [AArch64] low lanes select not optimized out for vmlal intrinsics
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/95265] aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull
2020-05-21 22:36 [Bug target/95265] New: aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull generictoadhuman at gmail dot com
2020-05-21 23:11 ` [Bug target/95265] " pinskia at gcc dot gnu.org
@ 2021-01-29 8:43 ` ktkachov at gcc dot gnu.org
2021-02-10 12:36 ` ktkachov at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2021-01-29 8:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95265
Bug 95265 depends on bug 92665, which changed state.
Bug 92665 Summary: [AArch64] low lanes select not optimized out for vmlal intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92665
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/95265] aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull
2020-05-21 22:36 [Bug target/95265] New: aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull generictoadhuman at gmail dot com
2020-05-21 23:11 ` [Bug target/95265] " pinskia at gcc dot gnu.org
2021-01-29 8:43 ` ktkachov at gcc dot gnu.org
@ 2021-02-10 12:36 ` ktkachov at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2021-02-10 12:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95265
ktkachov at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Known to work| |11.0
Target Milestone|--- |11.0
Status|NEW |RESOLVED
CC| |ktkachov at gcc dot gnu.org
--- Comment #2 from ktkachov at gcc dot gnu.org ---
This is fixed in GCC 11. It now generates:
func:
smull v2.2d, v0.2s, v1.2s
smull2 v1.2d, v0.4s, v1.4s
shrn v0.2s, v2.2d, 12
shrn2 v0.4s, v1.2d, 12
ret
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-02-10 12:36 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-21 22:36 [Bug target/95265] New: aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull generictoadhuman at gmail dot com
2020-05-21 23:11 ` [Bug target/95265] " pinskia at gcc dot gnu.org
2021-01-29 8:43 ` ktkachov at gcc dot gnu.org
2021-02-10 12:36 ` ktkachov at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).