[Bug middle-end/107718] New: clang optimizes TSVC s317 a lot better

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug middle-end/107718] New: clang optimizes TSVC s317 a lot better
@ 2022-11-16 17:11 hubicka at gcc dot gnu.org
  2022-11-17  8:13 ` [Bug middle-end/107718] " rguenth at gcc dot gnu.org
  2022-11-21  9:55 ` marxin at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: hubicka at gcc dot gnu.org @ 2022-11-16 17:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107718

            Bug ID: 107718
           Summary: clang optimizes TSVC s317 a lot better
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

This is a stupid benchmark but still...

jh@alberti:~/tsvc/bin> more tt2.c

typedef double real_t;
#define iterations 100000
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D];
real_t qq;
int
main(void)
{

    real_t q;
    for (int nl = 0; nl < 5*iterations; nl++) {
        q = (real_t)1.;
        for (int i = 0; i < LEN_1D/2; i++) {
            q *= (real_t).99;
        }
        qq+=q;
    }

    return q;
}
jh@alberti:~/tsvc/bin> time ./a.out

real    0m0.805s
user    0m0.805s
sys     0m0.000s
jh@alberti:~/tsvc/bin> clang -Ofast -march=native tt2.c  
jh@alberti:~/tsvc/bin> time ./a.out

real    0m0.010s
user    0m0.007s
sys     0m0.003s

Clang does:
.LBB0_2:                                #   Parent Loop BB0_1 Depth=1
                                        # =>  This Inner Loop Header: Depth=2
        vmulpd  %zmm2, %zmm3, %zmm3
        vmulpd  %zmm2, %zmm4, %zmm4
        vmulpd  %zmm2, %zmm5, %zmm5
        vmulpd  %zmm2, %zmm6, %zmm6
        addl    $-3200, %ecx                    # imm = 0xF380
        jne     .LBB0_2
# %bb.3:                                #   in Loop: Header=BB0_1 Depth=1
        vmulpd  %zmm3, %zmm4, %zmm3


So it runs multiplications and because of unrolling combines the exponent?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug middle-end/107718] clang optimizes TSVC s317 a lot better
  2022-11-16 17:11 [Bug middle-end/107718] New: clang optimizes TSVC s317 a lot better hubicka at gcc dot gnu.org
@ 2022-11-17  8:13 ` rguenth at gcc dot gnu.org
  2022-11-21  9:55 ` marxin at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-11-17  8:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107718

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
it seems to split the reduction, performing many 0.99 ** n in parallel which is
stupid itself as those compute the same result ...

I'd say the benchmark is stupid and with -ffast-math we could optimize it to
pow (0.99, LEN_1D/2), aka const-fold the inner loop in final value replacement.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug middle-end/107718] clang optimizes TSVC s317 a lot better
  2022-11-16 17:11 [Bug middle-end/107718] New: clang optimizes TSVC s317 a lot better hubicka at gcc dot gnu.org
  2022-11-17  8:13 ` [Bug middle-end/107718] " rguenth at gcc dot gnu.org
@ 2022-11-21  9:55 ` marxin at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: marxin at gcc dot gnu.org @ 2022-11-21  9:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107718

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
                 CC|                            |marxin at gcc dot gnu.org
   Last reconfirmed|                            |2022-11-21
     Ever confirmed|0                           |1

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-11-21  9:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-16 17:11 [Bug middle-end/107718] New: clang optimizes TSVC s317 a lot better hubicka at gcc dot gnu.org
2022-11-17  8:13 ` [Bug middle-end/107718] " rguenth at gcc dot gnu.org
2022-11-21  9:55 ` marxin at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).