public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/107718] New: clang optimizes TSVC s317 a lot better
@ 2022-11-16 17:11 hubicka at gcc dot gnu.org
2022-11-17 8:13 ` [Bug middle-end/107718] " rguenth at gcc dot gnu.org
2022-11-21 9:55 ` marxin at gcc dot gnu.org
0 siblings, 2 replies; 3+ messages in thread
From: hubicka at gcc dot gnu.org @ 2022-11-16 17:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107718
Bug ID: 107718
Summary: clang optimizes TSVC s317 a lot better
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
This is a stupid benchmark but still...
jh@alberti:~/tsvc/bin> more tt2.c
typedef double real_t;
#define iterations 100000
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D];
real_t qq;
int
main(void)
{
real_t q;
for (int nl = 0; nl < 5*iterations; nl++) {
q = (real_t)1.;
for (int i = 0; i < LEN_1D/2; i++) {
q *= (real_t).99;
}
qq+=q;
}
return q;
}
jh@alberti:~/tsvc/bin> time ./a.out
real 0m0.805s
user 0m0.805s
sys 0m0.000s
jh@alberti:~/tsvc/bin> clang -Ofast -march=native tt2.c
jh@alberti:~/tsvc/bin> time ./a.out
real 0m0.010s
user 0m0.007s
sys 0m0.003s
Clang does:
.LBB0_2: # Parent Loop BB0_1 Depth=1
# => This Inner Loop Header: Depth=2
vmulpd %zmm2, %zmm3, %zmm3
vmulpd %zmm2, %zmm4, %zmm4
vmulpd %zmm2, %zmm5, %zmm5
vmulpd %zmm2, %zmm6, %zmm6
addl $-3200, %ecx # imm = 0xF380
jne .LBB0_2
# %bb.3: # in Loop: Header=BB0_1 Depth=1
vmulpd %zmm3, %zmm4, %zmm3
So it runs multiplications and because of unrolling combines the exponent?
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug middle-end/107718] clang optimizes TSVC s317 a lot better
2022-11-16 17:11 [Bug middle-end/107718] New: clang optimizes TSVC s317 a lot better hubicka at gcc dot gnu.org
@ 2022-11-17 8:13 ` rguenth at gcc dot gnu.org
2022-11-21 9:55 ` marxin at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-11-17 8:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107718
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
it seems to split the reduction, performing many 0.99 ** n in parallel which is
stupid itself as those compute the same result ...
I'd say the benchmark is stupid and with -ffast-math we could optimize it to
pow (0.99, LEN_1D/2), aka const-fold the inner loop in final value replacement.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug middle-end/107718] clang optimizes TSVC s317 a lot better
2022-11-16 17:11 [Bug middle-end/107718] New: clang optimizes TSVC s317 a lot better hubicka at gcc dot gnu.org
2022-11-17 8:13 ` [Bug middle-end/107718] " rguenth at gcc dot gnu.org
@ 2022-11-21 9:55 ` marxin at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: marxin at gcc dot gnu.org @ 2022-11-21 9:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107718
Martin Liška <marxin at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
CC| |marxin at gcc dot gnu.org
Last reconfirmed| |2022-11-21
Ever confirmed|0 |1
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-11-21 9:55 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-16 17:11 [Bug middle-end/107718] New: clang optimizes TSVC s317 a lot better hubicka at gcc dot gnu.org
2022-11-17 8:13 ` [Bug middle-end/107718] " rguenth at gcc dot gnu.org
2022-11-21 9:55 ` marxin at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).