public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug other/68109] New: GCC fails to vectorize popcount on x86_64
@ 2015-10-27 3:07 haneef503 at gmail dot com
2015-10-27 9:51 ` [Bug target/68109] " rguenth at gcc dot gnu.org
2021-08-16 4:50 ` [Bug tree-optimization/68109] " pinskia at gcc dot gnu.org
0 siblings, 2 replies; 3+ messages in thread
From: haneef503 at gmail dot com @ 2015-10-27 3:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68109
Bug ID: 68109
Summary: GCC fails to vectorize popcount on x86_64
Product: gcc
Version: 5.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: other
Assignee: unassigned at gcc dot gnu.org
Reporter: haneef503 at gmail dot com
Target Milestone: ---
Created attachment 36595
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36595&action=edit
Clang Vectorized Assembly Output
The following code is an SSCCE that GCC doesn't vectorize on x86_64:
#include <stdlib.h>
#include <stdint.h>
size_t hd (const uint8_t *restrict a, const uint8_t *restrict b, size_t l) {
size_t r = 0, x;
for (x = 0; x < l; x++)
r += __builtin_popcount (a[x] ^ b[x]);
return r;
}
On other architectures, such as power8, GCC successfully vectorizes the loop.
However, on x86_64, there doesn't actually exist a vector version of the
`popcnt` instruction. Despite this, as shown by
[http://wm.ite.pl/articles/sse-popcount.html] it is actually possible to
vectorize popcount by using SSE2 or SSSE3 instructions. Further research on
[https://software.intel.com/sites/landingpage/IntrinsicsGuide/] shows that it
may be possible to achieve further performance on the latest architectures
gains by using AVX2 instructions along the same lines as in the article, albeit
with 256-bit YMM registers in place of the 128-bit XMM registers used in the
article. Since GCC often has support for insofar unreleased architectures, I
did a bit more research on the Intel Intrisics Guide mentioned above for future
architectures and found that the same could likely also be done using AVX-512
with the 512-bit ZMM registers if you guys are interested.
Anyways, I did find that clang has been doing these optimizations since
~clang3.5. I've attached an output of the resulting [vectorized] assembly
emitted by clang3.7 for the above function, since it appears to be done
relatively thoroughly and cleanly.
In both GCC and Clang, I used the following flags:
-xc -O2 -ftree-vectorize -D_GNU_SOURCE -std=gnu11 -fverbose-asm
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug target/68109] GCC fails to vectorize popcount on x86_64
2015-10-27 3:07 [Bug other/68109] New: GCC fails to vectorize popcount on x86_64 haneef503 at gmail dot com
@ 2015-10-27 9:51 ` rguenth at gcc dot gnu.org
2021-08-16 4:50 ` [Bug tree-optimization/68109] " pinskia at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-10-27 9:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68109
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Target| |x86_64-*-*, i?86-*-*
Status|UNCONFIRMED |NEW
Last reconfirmed| |2015-10-27
Component|other |target
Blocks| |53947
Ever confirmed|0 |1
Severity|normal |enhancement
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed. The target would have to provide the neccessary target
builtin/expander.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/68109] GCC fails to vectorize popcount on x86_64
2015-10-27 3:07 [Bug other/68109] New: GCC fails to vectorize popcount on x86_64 haneef503 at gmail dot com
2015-10-27 9:51 ` [Bug target/68109] " rguenth at gcc dot gnu.org
@ 2021-08-16 4:50 ` pinskia at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-16 4:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68109
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|target |tree-optimization
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Could there be generic support for popcount added?
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-08-16 4:50 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-27 3:07 [Bug other/68109] New: GCC fails to vectorize popcount on x86_64 haneef503 at gmail dot com
2015-10-27 9:51 ` [Bug target/68109] " rguenth at gcc dot gnu.org
2021-08-16 4:50 ` [Bug tree-optimization/68109] " pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).