public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "ebiggers3 at gmail dot com" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/107892] New: Unnecessary move between ymm registers in loop using AVX2 intrinsic Date: Mon, 28 Nov 2022 07:41:05 +0000 [thread overview] Message-ID: <bug-107892-4@http.gcc.gnu.org/bugzilla/> (raw) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107892 Bug ID: 107892 Summary: Unnecessary move between ymm registers in loop using AVX2 intrinsic Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ebiggers3 at gmail dot com Target Milestone: --- To reproduce with the latest trunk, compile the following .c file on x86_64 at -O2: #include <immintrin.h> int __attribute__((target("avx2"))) sum_ints(const __m256i *p, size_t n) { __m256i a = _mm256_setzero_si256(); __m128i b; do { a = _mm256_add_epi32(a, *p++); } while (--n); b = _mm_add_epi32(_mm256_extracti128_si256(a, 0), _mm256_extracti128_si256(a, 1)); b = _mm_add_epi32(b, _mm_shuffle_epi32(b, 0x31)); b = _mm_add_epi32(b, _mm_shuffle_epi32(b, 0x02)); return _mm_cvtsi128_si32(b); } The assembly that gcc generates is: 0000000000000000 <sum_ints>: 0: c5 f1 ef c9 vpxor %xmm1,%xmm1,%xmm1 4: 0f 1f 40 00 nopl 0x0(%rax) 8: c5 f5 fe 07 vpaddd (%rdi),%ymm1,%ymm0 c: 48 83 c7 20 add $0x20,%rdi 10: c5 fd 6f c8 vmovdqa %ymm0,%ymm1 14: 48 83 ee 01 sub $0x1,%rsi 18: 75 ee jne 8 <sum_ints+0x8> 1a: c4 e3 7d 39 c1 01 vextracti128 $0x1,%ymm0,%xmm1 20: c5 f9 fe c1 vpaddd %xmm1,%xmm0,%xmm0 24: c5 f9 70 c8 31 vpshufd $0x31,%xmm0,%xmm1 29: c5 f1 fe c8 vpaddd %xmm0,%xmm1,%xmm1 2d: c5 f9 70 c1 02 vpshufd $0x2,%xmm1,%xmm0 32: c5 f9 fe c1 vpaddd %xmm1,%xmm0,%xmm0 36: c5 f9 7e c0 vmovd %xmm0,%eax 3a: c5 f8 77 vzeroupper 3d: c3 ret The bug is that the inner loop contains an unnecessary vmovdqa: 8: vpaddd (%rdi),%ymm1,%ymm0 add $0x20,%rdi vmovdqa %ymm0,%ymm1 sub $0x1,%rsi jne 8 <sum_ints+0x8> It should look like the following instead: 8: vpaddd (%rdi),%ymm0,%ymm0 add $0x20,%rdi sub $0x1,%rsi jne 8 <sum_ints+0x8> Strangely, the bug goes away if the __v8si type is used instead of __m256i and the addition is done using "+=" instead of _mm256_add_epi32(): int __attribute__((target("avx2"))) sum_ints_good(const __v8si *p, size_t n) { __v8si a = {}; __m128i b; do { a += *p++; } while (--n); b = _mm_add_epi32(_mm256_extracti128_si256((__m256i)a, 0), _mm256_extracti128_si256((__m256i)a, 1)); b = _mm_add_epi32(b, _mm_shuffle_epi32(b, 0x31)); b = _mm_add_epi32(b, _mm_shuffle_epi32(b, 0x02)); return _mm_cvtsi128_si32(b); } In the bad version, I noticed that the RTL initially has two separate insns for 'a += *p': one to do the addition and write the result to a new pseudo register, and one to convert the value from mode V8SI to V4DI and assign it to the original pseudo register. These two separate insns never get combined. (That sort of explains why the bug isn't seen with the __v8si and += method; gcc doesn't do a type conversion with that method.) So, I'm wondering if the bug is in the instruction combining pass. Or perhaps the RTL should never have had two separate insns in the first place?
next reply other threads:[~2022-11-28 7:41 UTC|newest] Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-11-28 7:41 ebiggers3 at gmail dot com [this message] 2022-11-28 8:15 ` [Bug rtl-optimization/107892] " ebiggers3 at gmail dot com 2022-11-28 8:23 ` ebiggers3 at gmail dot com 2022-11-28 8:43 ` crazylht at gmail dot com
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-107892-4@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).