public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug target/102591] New: Failure to optimize search for value in vector-sized area to use SIMD @ 2021-10-04 12:12 gabravier at gmail dot com 2021-10-05 6:44 ` [Bug target/102591] " rguenth at gcc dot gnu.org ` (2 more replies) 0 siblings, 3 replies; 4+ messages in thread From: gabravier at gmail dot com @ 2021-10-04 12:12 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102591 Bug ID: 102591 Summary: Failure to optimize search for value in vector-sized area to use SIMD Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- bool match8(char *tpl) { int found = 0; for (int at = 0; at < 16; at++) if (tpl[at] == 0) found = 1; return found; } This function can be greatly optimized by using SIMD. It can be optimized to something like this: typedef char v16i8 __attribute__((vector_size(16))); bool match8v2(char *tpl) { v16i8 values; __builtin_memcpy(&values, tpl, 16); v16i8 compared = (values == 0); return _mm_movemask_epi8((__m128i)compared) != 0; } This optimization is done by LLVM, but not by GCC. PS: I've marked this as an x86 bug, but only because I could not find a portable way of expressing `_mm_movemask_epi8((__m128i)compared)`, I would assume other architectures have similar ways of expressing the same thing cheaply. (For example, Altivec should be able to implement that operation with a `vec_extract(vec_vbpermq((__vector unsigned char)compared, perm), 1)` with `perm` looking like this: `{120, 112, 104, 96, 88, 80, 72, 64, 56, 48, 40, 32, 24, 16, 8, 0}` and the 1 replaced with 14 on big-endian) ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/102591] Failure to optimize search for value in vector-sized area to use SIMD 2021-10-04 12:12 [Bug target/102591] New: Failure to optimize search for value in vector-sized area to use SIMD gabravier at gmail dot com @ 2021-10-05 6:44 ` rguenth at gcc dot gnu.org 2021-10-05 9:46 ` gabravier at gmail dot com 2021-10-05 10:19 ` [Bug tree-optimization/102591] " rguenth at gcc dot gnu.org 2 siblings, 0 replies; 4+ messages in thread From: rguenth at gcc dot gnu.org @ 2021-10-05 6:44 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102591 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org Ever confirmed|0 |1 Last reconfirmed| |2021-10-05 Status|UNCONFIRMED |WAITING --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Hmm, but __builtin_memcpy(&values, tpl, 16); could trap since 'tpl' is not aligned to 16 bytes? So LLVM creates wrong code here? ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/102591] Failure to optimize search for value in vector-sized area to use SIMD 2021-10-04 12:12 [Bug target/102591] New: Failure to optimize search for value in vector-sized area to use SIMD gabravier at gmail dot com 2021-10-05 6:44 ` [Bug target/102591] " rguenth at gcc dot gnu.org @ 2021-10-05 9:46 ` gabravier at gmail dot com 2021-10-05 10:19 ` [Bug tree-optimization/102591] " rguenth at gcc dot gnu.org 2 siblings, 0 replies; 4+ messages in thread From: gabravier at gmail dot com @ 2021-10-05 9:46 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102591 --- Comment #2 from Gabriel Ravier <gabravier at gmail dot com> --- memcpy can fail on unaligned memory ??? I used it specifically to avoid this problem ! (also, LLVM's code, I am pretty sure, does not have any issue with alignment, as it uses either AVX instructions which care not for it, or specifically does a movdqu (i.e. unaligned load) of the memory) ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/102591] Failure to optimize search for value in vector-sized area to use SIMD 2021-10-04 12:12 [Bug target/102591] New: Failure to optimize search for value in vector-sized area to use SIMD gabravier at gmail dot com 2021-10-05 6:44 ` [Bug target/102591] " rguenth at gcc dot gnu.org 2021-10-05 9:46 ` gabravier at gmail dot com @ 2021-10-05 10:19 ` rguenth at gcc dot gnu.org 2 siblings, 0 replies; 4+ messages in thread From: rguenth at gcc dot gnu.org @ 2021-10-05 10:19 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102591 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|WAITING |NEW Component|target |tree-optimization Blocks| |53947 --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Gabriel Ravier from comment #2) > memcpy can fail on unaligned memory ??? I used it specifically to avoid this > problem ! > > (also, LLVM's code, I am pretty sure, does not have any issue with > alignment, as it uses either AVX instructions which care not for it, or > specifically does a movdqu (i.e. unaligned load) of the memory) Ah, sorry - I was reading the loop as for (int at = 0; at < 16; at++) if (tpl[at] == 0) { found = 1; break; } thus as if the suggested transform would eventually access storage that is not accessed originally... Btw, we vectorize bool match8(char *tpl) { char found = 0; for (int at = 0; at < 16; at++) if (tpl[at] == 0) found = 1; return found; } but use vector(16) char vect_found_4.8; vect__3.7_29 = MEM <vector(16) char> [(char *)tpl_10(D)]; _32 = vect__3.7_29 != { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; vect_found_4.8_33 = VEC_COND_EXPR <_32, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }>; _35 = .REDUC_MAX (vect_found_4.8_33); _8 = (bool) _35; return _8; where we fail to apply "magic" to the .REDUC_MAX as we know the values are all 0 or 1. The conditional reduction support doesn't support producing 'int' from char compares and we fail to narrow the reduction vector. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-10-05 10:19 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-10-04 12:12 [Bug target/102591] New: Failure to optimize search for value in vector-sized area to use SIMD gabravier at gmail dot com 2021-10-05 6:44 ` [Bug target/102591] " rguenth at gcc dot gnu.org 2021-10-05 9:46 ` gabravier at gmail dot com 2021-10-05 10:19 ` [Bug tree-optimization/102591] " rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).