public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/107718] New: clang optimizes TSVC s317 a lot better
@ 2022-11-16 17:11 hubicka at gcc dot gnu.org
  2022-11-17  8:13 ` [Bug middle-end/107718] " rguenth at gcc dot gnu.org
  2022-11-21  9:55 ` marxin at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: hubicka at gcc dot gnu.org @ 2022-11-16 17:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107718

            Bug ID: 107718
           Summary: clang optimizes TSVC s317 a lot better
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

This is a stupid benchmark but still...

jh@alberti:~/tsvc/bin> more tt2.c

typedef double real_t;
#define iterations 100000
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D];
real_t qq;
int
main(void)
{

    real_t q;
    for (int nl = 0; nl < 5*iterations; nl++) {
        q = (real_t)1.;
        for (int i = 0; i < LEN_1D/2; i++) {
            q *= (real_t).99;
        }
        qq+=q;
    }

    return q;
}
jh@alberti:~/tsvc/bin> time ./a.out

real    0m0.805s
user    0m0.805s
sys     0m0.000s
jh@alberti:~/tsvc/bin> clang -Ofast -march=native tt2.c  
jh@alberti:~/tsvc/bin> time ./a.out

real    0m0.010s
user    0m0.007s
sys     0m0.003s

Clang does:
.LBB0_2:                                #   Parent Loop BB0_1 Depth=1
                                        # =>  This Inner Loop Header: Depth=2
        vmulpd  %zmm2, %zmm3, %zmm3
        vmulpd  %zmm2, %zmm4, %zmm4
        vmulpd  %zmm2, %zmm5, %zmm5
        vmulpd  %zmm2, %zmm6, %zmm6
        addl    $-3200, %ecx                    # imm = 0xF380
        jne     .LBB0_2
# %bb.3:                                #   in Loop: Header=BB0_1 Depth=1
        vmulpd  %zmm3, %zmm4, %zmm3


So it runs multiplications and because of unrolling combines the exponent?

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-11-21  9:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-16 17:11 [Bug middle-end/107718] New: clang optimizes TSVC s317 a lot better hubicka at gcc dot gnu.org
2022-11-17  8:13 ` [Bug middle-end/107718] " rguenth at gcc dot gnu.org
2022-11-21  9:55 ` marxin at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).