[Bug tree-optimization/107715] TSVC s161 for double runs at zen4 30 times slower when vectorization is enabled

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "hubicka at ucw dot cz" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/107715] TSVC s161 for double runs at zen4 30 times slower when vectorization is enabled
Date: Wed, 16 Nov 2022 15:28:30 +0000	[thread overview]
Message-ID: <bug-107715-4-6KPNl9WpfU@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-107715-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107715

--- Comment #2 from Jan Hubicka <hubicka at ucw dot cz> ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107715
> 
> --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
> Because store data races are allowed with -Ofast masked stores are not used so
> we instead get
> 
>   vect__ifc__80.24_114 = VEC_COND_EXPR <mask__58.15_104, vect__45.20_109,
> vect__ifc__78.23_113>;
>   _ifc__80 = _58 ? _45 : _ifc__78;
>   MEM <vector(8) double> [(double *)vectp_c.25_116] = vect__ifc__80.24_114;
> 
> which somehow is later turned into masked stores?  In fact we expand from
> 
>   vect__43.18_107 = MEM <vector(8) double> [(double *)&a + ivtmp.75_134 * 1];
>   vect__ifc__78.23_113 = MEM <vector(8) double> [(double *)&c + 8B +
> ivtmp.75_134 * 1];
>   _97 = .COND_FMA (mask__58.15_104, vect_pretmp_36.14_102,
> vect_pretmp_36.14_102, vect__43.18_107, vect__ifc__78.23_113);
>   MEM <vector(8) double> [(double *)&c + 8B + ivtmp.75_134 * 1] = _97;
>   vect__38.29_121 = MEM <vector(8) double> [(double *)&c + ivtmp.75_134 * 1];
>   vect__39.32_124 = MEM <vector(8) double> [(double *)&e + ivtmp.75_134 * 1];
>   _98 = vect__35.11_99 >= { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 };
>   _100 = .COND_FMA (_98, vect_pretmp_36.14_102, vect__39.32_124,
> vect__38.29_121, vect__43.18_107);
>   MEM <vector(8) double> [(double *)&a + ivtmp.75_134 * 1] = _100;
> 
> the vectorizer has optimize_mask_stores () which is supposed to replace
> .MASK_STORE with
> 
>  if (mask != { 0, 0, 0 ... })
>    <code depending on the mask store>
> 
> and thus optimize the mask == 0 case.  But that only triggers for .MASK_STORE.
> 
> You can see this when you force .MASK_STORE via -O3 -ffast-math (without
> -fallow-store-data-races) you get this effect:

Yep, -fno-allow-store-data-races fixes the problem

jh@alberti:~/tsvc/bin> /home/jh/trunk-install/bin/gcc test.c -Ofast 
-march=native  -lm  
jh@alberti:~/tsvc/bin> perf stat ./a.out

 Performance counter stats for './a.out':

         37,289.50 msec task-clock:u              #    1.000 CPUs utilized      
                 0      context-switches:u        #    0.000 /sec               
                 0      cpu-migrations:u          #    0.000 /sec               
               431      page-faults:u             #   11.558 /sec               
   137,411,365,539      cycles:u                  #    3.685 GHz               
      (83.33%)
       991,673,172      stalled-cycles-frontend:u #    0.72% frontend cycles
idle     (83.34%)
           506,793      stalled-cycles-backend:u  #    0.00% backend cycles
idle      (83.34%)
     3,400,375,204      instructions:u            #    0.02  insn per cycle     
                                                  #    0.29  stalled cycles per
insn  (83.34%)
       200,235,802      branches:u                #    5.370 M/sec             
      (83.34%)
            73,962      branch-misses:u           #    0.04% of all branches   
      (83.33%)

      37.305121352 seconds time elapsed

      37.285467000 seconds user
       0.000000000 seconds sys


jh@alberti:~/tsvc/bin> /home/jh/trunk-install/bin/gcc test.c -Ofast 
-march=native  -lm  -fno-allow-store-data-races
jh@alberti:~/tsvc/bin> perf stat ./a.out

 Performance counter stats for './a.out':

            667.95 msec task-clock:u              #    0.999 CPUs utilized      
                 0      context-switches:u        #    0.000 /sec               
                 0      cpu-migrations:u          #    0.000 /sec               
               367      page-faults:u             #  549.439 /sec               
     2,434,906,671      cycles:u                  #    3.645 GHz               
      (83.24%)
            19,681      stalled-cycles-frontend:u #    0.00% frontend cycles
idle     (83.24%)
            12,495      stalled-cycles-backend:u  #    0.00% backend cycles
idle      (83.24%)
     2,793,482,139      instructions:u            #    1.15  insn per cycle     
                                                  #    0.00  stalled cycles per
insn  (83.24%)
       598,879,536      branches:u                #  896.588 M/sec             
      (83.78%)
            50,649      branch-misses:u           #    0.01% of all branches   
      (83.26%)

       0.668807640 seconds time elapsed

       0.668660000 seconds user
       0.000000000 seconds sys

So I suppose it is L1 trashing. l1-dcache-loads goes up from
2,000,413,936 to 11,044,576,207

I suppose it would be too fancy for vectorizer to work out the overall
memory consumption here :) It sort of should have all the info...

Honza

next prev parent reply	other threads:[~2022-11-16 15:28 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-16 14:10 [Bug tree-optimization/107715] New: " hubicka at gcc dot gnu.org
2022-11-16 15:08 ` [Bug tree-optimization/107715] " rguenth at gcc dot gnu.org
2022-11-16 15:28 ` hubicka at ucw dot cz [this message]
2022-11-16 15:35 ` amonakov at gcc dot gnu.org
2022-11-16 17:20 ` [Bug tree-optimization/107715] TSVC s161 and s277 " hubicka at gcc dot gnu.org
2022-11-21 10:02 ` marxin at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-107715-4-6KPNl9WpfU@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).