public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/98442] New: [X86] suboptimal for memset with CLEAR_BY_PIECES
@ 2020-12-25 1:38 crazylht at gmail dot com
2020-12-31 3:38 ` [Bug target/98442] " crazylht at gmail dot com
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: crazylht at gmail dot com @ 2020-12-25 1:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98442
Bug ID: 98442
Summary: [X86] suboptimal for memset with CLEAR_BY_PIECES
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: crazylht at gmail dot com
CC: hjl.tools at gmail dot com, wei3.xiao at intel dot com,
wwwhhhyyy333 at gmail dot com
Target Milestone: ---
Target: x86_64-*-* i?86-*-*
cat test.c
--------
char Tab[64];
void foo(int n)
{
for (int i= 0; i != 64; i++)
Tab[i] = 0;
}
----
gcc generate
------
foo(int):
vpxor xmm0, xmm0, xmm0
vmovdqa XMMWORD PTR Tab[rip], xmm0
vmovdqa XMMWORD PTR Tab[rip+16], xmm0
vmovdqa XMMWORD PTR Tab[rip+32], xmm0
vmovdqa XMMWORD PTR Tab[rip+48], xmm0
ret
Tab:
.zero 64
---------
Could be better
----
foo(int):
vpxor ymm0, ymm0, ymm0 #4.5
vmovdqu YMMWORD PTR Tab[rip], ymm0 #4.5
vmovdqu YMMWORD PTR 32+Tab[rip], ymm0 #4.5
vzeroupper #6.1
ret #6.1
Tab:
-----
GCC use 128-bit as default
----
bool
default_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size,
unsigned int alignment,
enum by_pieces_operation op,
bool speed_p)
{
unsigned int max_size = 0;
unsigned int ratio = 0;
switch (op)
{
case CLEAR_BY_PIECES:
max_size = STORE_MAX_PIECES;
ratio = CLEAR_RATIO (speed_p);
----
Define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P for i386?
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/98442] [X86] suboptimal for memset with CLEAR_BY_PIECES
2020-12-25 1:38 [Bug target/98442] New: [X86] suboptimal for memset with CLEAR_BY_PIECES crazylht at gmail dot com
@ 2020-12-31 3:38 ` crazylht at gmail dot com
2020-12-31 3:48 ` hjl.tools at gmail dot com
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: crazylht at gmail dot com @ 2020-12-31 3:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98442
--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
>
> Define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P for i386?
It's actually determined by MOV_MAX_PIECES and related to MAX_FIXED_MODE_SIZE
??? We should use TImode in 32-bit mode and use OImode or XImode
if they are available. But since by_pieces_ninsns determines the
widest mode with MAX_FIXED_MODE_SIZE, we can only use TImode in
64-bit mode. */
#define MOVE_MAX_PIECES \
((TARGET_64BIT \
&& TARGET_SSE2 \
&& TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
&& TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
? GET_MODE_SIZE (OImode) : UNITS_PER_WORD)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/98442] [X86] suboptimal for memset with CLEAR_BY_PIECES
2020-12-25 1:38 [Bug target/98442] New: [X86] suboptimal for memset with CLEAR_BY_PIECES crazylht at gmail dot com
2020-12-31 3:38 ` [Bug target/98442] " crazylht at gmail dot com
@ 2020-12-31 3:48 ` hjl.tools at gmail dot com
2020-12-31 3:56 ` hjl.tools at gmail dot com
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: hjl.tools at gmail dot com @ 2020-12-31 3:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98442
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2020-12-31
Ever confirmed|0 |1
--- Comment #2 from H.J. Lu <hjl.tools at gmail dot com> ---
Please take a look at users/hjl/pieces/master branch:
https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/pieces/master
You may get some ideas.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/98442] [X86] suboptimal for memset with CLEAR_BY_PIECES
2020-12-25 1:38 [Bug target/98442] New: [X86] suboptimal for memset with CLEAR_BY_PIECES crazylht at gmail dot com
2020-12-31 3:38 ` [Bug target/98442] " crazylht at gmail dot com
2020-12-31 3:48 ` hjl.tools at gmail dot com
@ 2020-12-31 3:56 ` hjl.tools at gmail dot com
2021-01-05 10:05 ` rguenth at gcc dot gnu.org
2021-10-06 23:48 ` hjl.tools at gmail dot com
4 siblings, 0 replies; 6+ messages in thread
From: hjl.tools at gmail dot com @ 2020-12-31 3:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98442
--- Comment #3 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to H.J. Lu from comment #2)
> Please take a look at users/hjl/pieces/master branch:
>
> https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/pieces/master
>
> You may get some ideas.
I got
[hjl@gnu-cfl-1 gcc]$ cat /tmp/x.c
char Tab[64];
void foo(int n)
{
for (int i= 0; i != 64; i++)
Tab[i] = 0;
}
[hjl@gnu-cfl-1 gcc]$ ./xgcc -B./ -march=skylake -S -O2 /tmp/x.c
[hjl@gnu-cfl-1 gcc]$ cat x.s
.file "x.c"
.text
.p2align 4
.globl foo
.type foo, @function
foo:
.LFB0:
.cfi_startproc
vpxor %xmm0, %xmm0, %xmm0
vmovups %ymm0, Tab(%rip)
vmovups %ymm0, Tab+32(%rip)
vzeroupper
ret
.cfi_endproc
.LFE0:
.size foo, .-foo
.comm Tab,64,32
.ident "GCC: (GNU) 10.0.0 20190523 (experimental)"
.section .note.GNU-stack,"",@progbits
[hjl@gnu-cfl-1 gcc]$
This requires middle-end changes.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/98442] [X86] suboptimal for memset with CLEAR_BY_PIECES
2020-12-25 1:38 [Bug target/98442] New: [X86] suboptimal for memset with CLEAR_BY_PIECES crazylht at gmail dot com
` (2 preceding siblings ...)
2020-12-31 3:56 ` hjl.tools at gmail dot com
@ 2021-01-05 10:05 ` rguenth at gcc dot gnu.org
2021-10-06 23:48 ` hjl.tools at gmail dot com
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-05 10:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98442
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Using ymm might also trigger dynamic stack realignment if we ever spill, also
using ymm can be slower when the memory is unaligned (and/or when the CPU
has split AVX support only). It will also require vzeroupper.
So I wonder if it is really worth for small structures like this? And with
fast rep;movb isn't that even better? [can fast rep/movb stores be forwarded?]
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/98442] [X86] suboptimal for memset with CLEAR_BY_PIECES
2020-12-25 1:38 [Bug target/98442] New: [X86] suboptimal for memset with CLEAR_BY_PIECES crazylht at gmail dot com
` (3 preceding siblings ...)
2021-01-05 10:05 ` rguenth at gcc dot gnu.org
@ 2021-10-06 23:48 ` hjl.tools at gmail dot com
4 siblings, 0 replies; 6+ messages in thread
From: hjl.tools at gmail dot com @ 2021-10-06 23:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98442
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |DUPLICATE
--- Comment #5 from H.J. Lu <hjl.tools at gmail dot com> ---
Dup.
*** This bug has been marked as a duplicate of bug 90773 ***
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-10-06 23:48 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-25 1:38 [Bug target/98442] New: [X86] suboptimal for memset with CLEAR_BY_PIECES crazylht at gmail dot com
2020-12-31 3:38 ` [Bug target/98442] " crazylht at gmail dot com
2020-12-31 3:48 ` hjl.tools at gmail dot com
2020-12-31 3:56 ` hjl.tools at gmail dot com
2021-01-05 10:05 ` rguenth at gcc dot gnu.org
2021-10-06 23:48 ` hjl.tools at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).