public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/100076] New: eembc/automotive/basefp01 has 30.3% regression compare -O2 -ftree-vectorize with -O2 on SKX/CLX
@ 2021-04-14  2:21 crazylht at gmail dot com
  2021-04-14  3:16 ` [Bug tree-optimization/100076] eembc/automotive/basefp01 has 30.3% regression compare -O2 -ftree-vectorize with -O2 on CLX/Znver3 hjl.tools at gmail dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: crazylht at gmail dot com @ 2021-04-14  2:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100076

            Bug ID: 100076
           Summary: eembc/automotive/basefp01 has 30.3% regression compare
                    -O2 -ftree-vectorize with -O2 on SKX/CLX
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: crazylht at gmail dot com
                CC: hjl.tools at gmail dot com
  Target Milestone: ---

Refer to https://godbolt.org/z/e3nfz3xvW

cat testcase.c

int
t_run_test(double a)
{

        static double P1, Q1;
        static varsize polyX1[9];
        polyX1[1] = a;
        P1 = (varsize)constantP[0];
        polyX1[1] = a;

// Loop 1
        for( int i1 = 2 ; i1 <= 8 ; i1++ )
        {
            polyX1[i1] = polyX1[i1 - 1] * polyX1[1] ;
        }


        P1 = (varsize)constantP[0] ;
// Loop 2
        for( int i1 = 1 ; i1 <= 8 ; i1++ )
        {
            P1 += (varsize)constantP[i1] * polyX1[i1] ;
        }


        Q1 = (varsize)constantQ[0] ;
// Loop 3
        for( int i1 = 1 ; i1 <= 8 ; i1++ )
        {
            Q1 += (varsize)constantQ[i1] * polyX1[i1] ;
        }


        return a = a * P1 / Q1 ;

}

Loop 1 write array polyX1 which is used by Loop2 and Loop 3, with
-ftree-vectorize -O2, Loop2 and Loop 3 are vectorized, but Loop 1 is not since
it have inter-iterative dependence, then for array polyX1, there're 64-bit
stores in loop 1 and 128-bit load in Loop2 and Loop 3, and it causes store
forwarding stalls which hurt performance.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-04-15  9:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-14  2:21 [Bug tree-optimization/100076] New: eembc/automotive/basefp01 has 30.3% regression compare -O2 -ftree-vectorize with -O2 on SKX/CLX crazylht at gmail dot com
2021-04-14  3:16 ` [Bug tree-optimization/100076] eembc/automotive/basefp01 has 30.3% regression compare -O2 -ftree-vectorize with -O2 on CLX/Znver3 hjl.tools at gmail dot com
2021-04-14  5:28 ` crazylht at gmail dot com
2021-04-14  7:08 ` rguenth at gcc dot gnu.org
2021-04-14  8:22 ` crazylht at gmail dot com
2021-04-15  7:35 ` rguenth at gcc dot gnu.org
2021-04-15  9:23 ` crazylht at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).