public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/102483] New: Regression in codegen of reduction of 4 chars
@ 2021-09-25 15:22 david.bolvansky at gmail dot com
2021-09-25 19:05 ` [Bug target/102483] Reduction of 4 chars can be improved pinskia at gcc dot gnu.org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: david.bolvansky at gmail dot com @ 2021-09-25 15:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102483
Bug ID: 102483
Summary: Regression in codegen of reduction of 4 chars
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: david.bolvansky at gmail dot com
Target Milestone: ---
char foo (char* p)
{
char sum = 0;
for (int i = 0; i != 4; i++)
sum += p[i];
return sum;
}
-O3 -march=x86-64
GCC trunk:
foo:
mov edx, DWORD PTR [rdi]
movzx eax, dh
mov ecx, edx
add eax, edx
shr ecx, 16
add eax, ecx
shr edx, 24
add eax, edx
ret
GCC 11 (much better):
foo:
movzx eax, BYTE PTR [rdi+1]
add al, BYTE PTR [rdi]
add al, BYTE PTR [rdi+2]
add al, BYTE PTR [rdi+3]
ret
Best? llvm-mca says so..
foo: # @foo
movd xmm0, dword ptr [rdi] # xmm0 = mem[0],zero,zero,zero
pxor xmm1, xmm1
psadbw xmm1, xmm0
movd eax, xmm1
ret
https://godbolt.org/z/sT9svvj7W
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/102483] Reduction of 4 chars can be improved
2021-09-25 15:22 [Bug tree-optimization/102483] New: Regression in codegen of reduction of 4 chars david.bolvansky at gmail dot com
@ 2021-09-25 19:05 ` pinskia at gcc dot gnu.org
2021-09-27 8:36 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-25 19:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102483
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|tree-optimization |target
Status|UNCONFIRMED |NEW
Last reconfirmed| |2021-09-25
Target| |x86_64-linux-gnu
Summary|Regression in codegen of |Reduction of 4 chars can be
|reduction of 4 chars |improved
Ever confirmed|0 |1
Keywords| |missed-optimization
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.
There is a cost model issue as the backend does not implement a reduction add
for vector(4) char.
Note just a slightly different function (doing a store rather than a return)
produces way different results too (even on LLMV):
void foo (unsigned char* p, unsigned char *t)
{
char sum = 0;
for (int i = 0; i != 4; i++)
sum += p[i];
*t= sum;
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/102483] Reduction of 4 chars can be improved
2021-09-25 15:22 [Bug tree-optimization/102483] New: Regression in codegen of reduction of 4 chars david.bolvansky at gmail dot com
2021-09-25 19:05 ` [Bug target/102483] Reduction of 4 chars can be improved pinskia at gcc dot gnu.org
@ 2021-09-27 8:36 ` rguenth at gcc dot gnu.org
2021-09-27 8:46 ` crazylht at gmail dot com
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-27 8:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102483
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Version|unknown |12.0
Blocks| |53947
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
It's an interesting sub-case which we could eventually handle generally as a
mechanism for the final reduction. Note we need to constrain vector [us]sad
as to which lanes are summed, otherwise we might get them to spread to
4 integer lanes.
Of course the target could simply provide reduc_plus_scal_v4qi ...
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/102483] Reduction of 4 chars can be improved
2021-09-25 15:22 [Bug tree-optimization/102483] New: Regression in codegen of reduction of 4 chars david.bolvansky at gmail dot com
2021-09-25 19:05 ` [Bug target/102483] Reduction of 4 chars can be improved pinskia at gcc dot gnu.org
2021-09-27 8:36 ` rguenth at gcc dot gnu.org
@ 2021-09-27 8:46 ` crazylht at gmail dot com
2021-10-11 6:22 ` crazylht at gmail dot com
2021-10-12 7:25 ` cvs-commit at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: crazylht at gmail dot com @ 2021-09-27 8:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102483
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
Also for reduc_umin/umax/smin/smax_scal_v4qi.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/102483] Reduction of 4 chars can be improved
2021-09-25 15:22 [Bug tree-optimization/102483] New: Regression in codegen of reduction of 4 chars david.bolvansky at gmail dot com
` (2 preceding siblings ...)
2021-09-27 8:46 ` crazylht at gmail dot com
@ 2021-10-11 6:22 ` crazylht at gmail dot com
2021-10-12 7:25 ` cvs-commit at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: crazylht at gmail dot com @ 2021-10-11 6:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102483
--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #3)
> Also for reduc_umin/umax/smin/smax_scal_v4qi.
After providing expanders for reduc_umin/umax/smin/smax_scal_v4qi, perfomance
for below functions are a little bit faster than before for -O2 -march=haswell,
-O2 -march=skylake-avx512 and -Ofast -march=skylake-avx512.
char
__attribute__((noipa, optimize("Ofast"),target("sse4.1")))
reduce_add (char* p)
{
char sum = 0;
for (int i = 0; i != 4; i++)
sum += p[i];
return sum;
}
#define MAX(a, b) ((a) > (b) ? (a) : (b))
#define MIN(a, b) ((a) > (b) ? (b) : (a))
unsigned char
__attribute__((noipa))
reduce_umax (unsigned char* p)
{
unsigned char sum = p[0];
for (int i = 0; i != 4; i++)
sum = MAX(sum, p[i]);
return sum;
}
unsigned char
__attribute__((noipa))
reduce_umin (unsigned char* p)
{
unsigned char sum = p[0];
for (int i = 0; i != 4; i++)
sum = MIN(sum, p[i]);
return sum;
}
char
__attribute__((noipa))
reduce_smax (char* p)
{
char sum = p[0];
for (int i = 0; i != 4; i++)
sum = MAX(sum, p[i]);
return sum;
}
char
__attribute__((noipa))
reduce_smin (char* p)
{
char sum = p[0];
for (int i = 0; i != 4; i++)
sum = MIN(sum, p[i]);
return sum;
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/102483] Reduction of 4 chars can be improved
2021-09-25 15:22 [Bug tree-optimization/102483] New: Regression in codegen of reduction of 4 chars david.bolvansky at gmail dot com
` (3 preceding siblings ...)
2021-10-11 6:22 ` crazylht at gmail dot com
@ 2021-10-12 7:25 ` cvs-commit at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-10-12 7:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102483
--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:73c535a00bc4dfe9a939cd80facbe79a929cab3e
commit r12-4339-g73c535a00bc4dfe9a939cd80facbe79a929cab3e
Author: liuhongt <hongtao.liu@intel.com>
Date: Sat Oct 9 14:34:38 2021 +0800
Support reduc_{plus,smax,smin,umax,umin}_scal_v4qi.
gcc/ChangeLog
PR target/102483
* config/i386/i386-expand.c (emit_reduc_half): Handle
V4QImode.
* config/i386/mmx.md (reduc_<code>_scal_v4qi): New expander.
(reduc_plus_scal_v4qi): Ditto.
gcc/testsuite/ChangeLog
* gcc.target/i386/pr102483.c: New test.
* gcc.target/i386/pr102483-2.c: New test.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-10-12 7:25 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-25 15:22 [Bug tree-optimization/102483] New: Regression in codegen of reduction of 4 chars david.bolvansky at gmail dot com
2021-09-25 19:05 ` [Bug target/102483] Reduction of 4 chars can be improved pinskia at gcc dot gnu.org
2021-09-27 8:36 ` rguenth at gcc dot gnu.org
2021-09-27 8:46 ` crazylht at gmail dot com
2021-10-11 6:22 ` crazylht at gmail dot com
2021-10-12 7:25 ` cvs-commit at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).