public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/110832] New: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
@ 2023-07-27 17:27 hubicka at gcc dot gnu.org
  2023-07-27 17:28 ` [Bug middle-end/110832] " hubicka at gcc dot gnu.org
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-27 17:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

            Bug ID: 110832
           Summary: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76
                    (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27
                    03:44) on zen3 and core
           Product: gcc
           Version: 13.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

Biggest regression is seen here
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=466.758.0
zen3
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=466.758.0

Curiously zen2 improves:
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=171.758.0

I can see instruction count differnece in perfs:
 Performance counter stats for './a.out':

          10923.70 msec task-clock:u                     #    1.000 CPUs
utilized             
                 0      context-switches:u               #    0.000 /sec        
                 0      cpu-migrations:u                 #    0.000 /sec        
             15510      page-faults:u                    #    1.420 K/sec       
       59062937176      cycles:u                         #    5.407 GHz        
                (83.33%)
          12607081      stalled-cycles-frontend:u        #    0.02% frontend
cycles idle        (83.34%)
         122404896      stalled-cycles-backend:u         #    0.21% backend
cycles idle         (83.34%)
      112648123380      instructions:u                   #    1.91  insn per
cycle            
                                                  #    0.00  stalled cycles per
insn     (83.34%)
        9666338531      branches:u                       #  884.896 M/sec      
                (83.34%)
           2937216      branch-misses:u                  #    0.03% of all
branches             (83.31%)

      10.924108973 seconds time elapsed

      10.912056000 seconds user
       0.012000000 seconds sys


 Performance counter stats for './b.out':

          11025.38 msec task-clock:u                     #    1.000 CPUs
utilized             
                 0      context-switches:u               #    0.000 /sec        
                 0      cpu-migrations:u                 #    0.000 /sec        
             14998      page-faults:u                    #    1.360 K/sec       
       59436352848      cycles:u                         #    5.391 GHz        
                (83.31%)
           9217660      stalled-cycles-frontend:u        #    0.02% frontend
cycles idle        (83.32%)
         210162784      stalled-cycles-backend:u         #    0.35% backend
cycles idle         (83.35%)
      131604240004      instructions:u                   #    2.21  insn per
cycle            
                                                  #    0.00  stalled cycles per
insn     (83.35%)
        9657712171      branches:u                       #  875.953 M/sec      
                (83.35%)
           3146487      branch-misses:u                  #    0.03% of all
branches             (83.33%)

      11.025701172 seconds time elapsed

      11.005646000 seconds user
       0.020002000 seconds sys

however perf report does not show clear differences in times of functions.
I

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/110832] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
  2023-07-27 17:27 [Bug middle-end/110832] New: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core hubicka at gcc dot gnu.org
@ 2023-07-27 17:28 ` hubicka at gcc dot gnu.org
  2023-07-27 18:35 ` hubicka at ucw dot cz
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-27 17:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

--- Comment #1 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
This time it seems that there is only one profile change:

commit 645c67f80c6258c1f54ec567f604008adbdb8a04
Author: Jan Hubicka <jh@suse.cz>
Date:   Wed Jul 26 08:59:23 2023 +0200

    Fix profile_count::to_sreal_scale

    gcc/ChangeLog:

            * profile-count.cc (profile_count::to_sreal_scale): Value is not
know
            if we divide by zero.

Which should not be very important.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/110832] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
  2023-07-27 17:27 [Bug middle-end/110832] New: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core hubicka at gcc dot gnu.org
  2023-07-27 17:28 ` [Bug middle-end/110832] " hubicka at gcc dot gnu.org
@ 2023-07-27 18:35 ` hubicka at ucw dot cz
  2023-07-28  6:21 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: hubicka at ucw dot cz @ 2023-07-27 18:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

--- Comment #2 from Jan Hubicka <hubicka at ucw dot cz> ---
I tested that the profile change makes no difference.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/110832] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
  2023-07-27 17:27 [Bug middle-end/110832] New: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core hubicka at gcc dot gnu.org
  2023-07-27 17:28 ` [Bug middle-end/110832] " hubicka at gcc dot gnu.org
  2023-07-27 18:35 ` hubicka at ucw dot cz
@ 2023-07-28  6:21 ` rguenth at gcc dot gnu.org
  2023-07-28  6:21 ` [Bug middle-end/110832] [14 Regression] " rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-28  6:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization,
                   |                            |needs-bisection

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Maybe r14-2786-gade30fad6669e5

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
  2023-07-27 17:27 [Bug middle-end/110832] New: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core hubicka at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-07-28  6:21 ` rguenth at gcc dot gnu.org
@ 2023-07-28  6:21 ` rguenth at gcc dot gnu.org
  2023-07-28 12:30 ` ubizjak at gmail dot com
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-28  6:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|14% capacita -O2 regression |[14 Regression] 14%
                   |between g:9fdbd7d6fa5e0a76  |capacita -O2 regression
                   |(2023-07-26 01:45) and      |between g:9fdbd7d6fa5e0a76
                   |g:ca912a39cccdd990          |(2023-07-26 01:45) and
                   |(2023-07-27 03:44) on zen3  |g:ca912a39cccdd990
                   |and core                    |(2023-07-27 03:44) on zen3
                   |                            |and core
             Target|                            |x86_64-*-*
   Target Milestone|---                         |14.0
            Version|13.1.0                      |14.0

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
  2023-07-27 17:27 [Bug middle-end/110832] New: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core hubicka at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-07-28  6:21 ` [Bug middle-end/110832] [14 Regression] " rguenth at gcc dot gnu.org
@ 2023-07-28 12:30 ` ubizjak at gmail dot com
  2023-07-28 12:36 ` ubizjak at gmail dot com
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: ubizjak at gmail dot com @ 2023-07-28 12:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

--- Comment #4 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 55652
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55652&action=edit
Patch to recover performance for -funsafe-math-optimizations

This patch will recover performance with -funsafe-math-optimizations.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
  2023-07-27 17:27 [Bug middle-end/110832] New: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core hubicka at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-07-28 12:30 ` ubizjak at gmail dot com
@ 2023-07-28 12:36 ` ubizjak at gmail dot com
  2023-07-28 13:28 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: ubizjak at gmail dot com @ 2023-07-28 12:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ubizjak at gmail dot com

--- Comment #5 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Richard Biener from comment #3)
> Maybe r14-2786-gade30fad6669e5

Yes. This is the cost to sanitize operands before every operation.

However, we can recover the performance for -funsafe-math-optimizations with
the patch, attached to the previous message, from:

      21,592075559 seconds time elapsed
to:
      20,047717312 seconds time elapsed

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
  2023-07-27 17:27 [Bug middle-end/110832] New: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core hubicka at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-07-28 12:36 ` ubizjak at gmail dot com
@ 2023-07-28 13:28 ` rguenth at gcc dot gnu.org
  2023-07-28 13:41 ` ubizjak at gmail dot com
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-28 13:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Do we know whether we could in theory improve the sanitizing by optimization
without -funsafe-math-optimizations (I think -fno-trapping-math,
-ffinite-math-only -fno-signalling-nans should be a better guard?)?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
  2023-07-27 17:27 [Bug middle-end/110832] New: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core hubicka at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2023-07-28 13:28 ` rguenth at gcc dot gnu.org
@ 2023-07-28 13:41 ` ubizjak at gmail dot com
  2023-07-28 15:46 ` ubizjak at gmail dot com
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: ubizjak at gmail dot com @ 2023-07-28 13:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Richard Biener from comment #6)
> Do we know whether we could in theory improve the sanitizing by optimization
> without -funsafe-math-optimizations (I think -fno-trapping-math,
> -ffinite-math-only -fno-signalling-nans should be a better guard?)?

I was looking at -funsafe-math-optimizations because the compiler links in
crtfastmath.c which sets DAZ and FTZ flags, so eventual denormals won't bother
us. -fu-m-o also enables -fno-trapping-math, which assumes masked FP
exceptions, so we can still allow V2SF infinities and NaNs. FYI, clang enables
this optimization by default, since it defaults to -fno-trapping-math. It seems
to me that they don't care about denormals.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
  2023-07-27 17:27 [Bug middle-end/110832] New: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core hubicka at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2023-07-28 13:41 ` ubizjak at gmail dot com
@ 2023-07-28 15:46 ` ubizjak at gmail dot com
  2023-07-31  4:58 ` crazylht at gmail dot com
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: ubizjak at gmail dot com @ 2023-07-28 15:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

--- Comment #8 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Richard Biener from comment #6)
> Do we know whether we could in theory improve the sanitizing by optimization
> without -funsafe-math-optimizations (I think -fno-trapping-math,
> -ffinite-math-only -fno-signalling-nans should be a better guard?)?

Regarding the sanitizing, we can remove all sanitizing MOVQ instructions
between trapping instructions (IOW, the result of ADDPS is guaranteed to have
zeros in the high part outside V2SF, so MOVQ is unnecessary in front of a
follow-up MULPS).

I think that some instruction back-walking pass on the RTL insn stream would be
able to identify these unnecessary instructions and remove them.

Also, as mentioned elsewhere, it is really hard to get non-zero value to the
highpart of XMM register. The compiler takes great care to always load values
via MOVQ, so one has to craft a special code that works around all these
fences. OTOH, in two years since gcc-11 was released with the V2SF support, not
a single PR involving spurious exceptions was reported. Even capacita benchmark
enables:

Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG IEEE_DENORMAL

without problems.

As an example here, it looks that polyhedron capacita greatly benefits from
V2SF vectors, and I was surprised that sanitizing MOVQ has such an effect here.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
  2023-07-27 17:27 [Bug middle-end/110832] New: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core hubicka at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2023-07-28 15:46 ` ubizjak at gmail dot com
@ 2023-07-31  4:58 ` crazylht at gmail dot com
  2023-07-31  7:54 ` ubizjak at gmail dot com
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2023-07-31  4:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #8)
> (In reply to Richard Biener from comment #6)
> > Do we know whether we could in theory improve the sanitizing by optimization
> > without -funsafe-math-optimizations (I think -fno-trapping-math,
> > -ffinite-math-only -fno-signalling-nans should be a better guard?)?
> 
> Regarding the sanitizing, we can remove all sanitizing MOVQ instructions
> between trapping instructions (IOW, the result of ADDPS is guaranteed to
> have zeros in the high part outside V2SF, so MOVQ is unnecessary in front of
> a follow-up MULPS).
> 
> I think that some instruction back-walking pass on the RTL insn stream would
> be able to identify these unnecessary instructions and remove them.
> 

V2SFmode operand can be produced by direct patterns or SUBREG,
I'm thinking about only sanitizing those V2SFmode operations when there's a
subreg in source operand and make sure every other patterns which set V2SFmode
dest will clear upper bits.(inlucde
mov<mode>_internal,vec_concatv2sf_sse4_1,sse_storehps,sse_storehps,*vec_concatv2sf_sse)
for mov<mode>_internal, we can just set alternative (v,v) with mode DI, then it
will use vmovq, for other alternatives which set sse_regs, the instructions has
already cleared the upper bits.

For vec_concatv2sf_sse4_1/sse_storehps/sse_storehps/*vec_concatv2sf_sse, we can
change them into define_insn_and_split,  splitting into a V4SF instruction(like
we did for those V2SFmode patterns), and use SUBREG for the dest or explicitly
sanitizing the dest.


BTW looks like *vec_concatv2df_sse4_1 can be merged into *vec_concatv2sf_sse

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
  2023-07-27 17:27 [Bug middle-end/110832] New: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core hubicka at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2023-07-31  4:58 ` crazylht at gmail dot com
@ 2023-07-31  7:54 ` ubizjak at gmail dot com
  2023-08-08 16:56 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: ubizjak at gmail dot com @ 2023-07-31  7:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

--- Comment #10 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #9)
> for mov<mode>_internal, we can just set alternative (v,v) with mode DI, then
> it will use vmovq, for other alternatives which set sse_regs, the
> instructions has already cleared the upper bits.
Move instructions can be sanitized in ix86_expand_vector_move. If the target is
in V2SFmode and the source is a subreg register, then movq_v2sf_to_sse should
be emitted. However, we would still like to emit MOVAPS reg, reg for V2SF to
V2SF moves, because MOVAPS may be eliminated by hardware, while MOVQ won't be.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
  2023-07-27 17:27 [Bug middle-end/110832] New: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core hubicka at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2023-07-31  7:54 ` ubizjak at gmail dot com
@ 2023-08-08 16:56 ` cvs-commit at gcc dot gnu.org
  2023-08-09  9:46 ` ubizjak at gmail dot com
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-08-08 16:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:ad5b757d99b5a121198b79a6a42c1f15ae86a190

commit r14-3085-gad5b757d99b5a121198b79a6a42c1f15ae86a190
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Tue Aug 8 18:53:51 2023 +0200

    i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math
[PR110832]

    Also introduce -m[no-]partial-vector-fp-math option to disable trapping
    V2SF named patterns in order to avoid generation of partial vector V4SFmode
    trapping instructions.

    The new option is enabled by default, because even with sanitization,
    a small but consistent speed up of 2 to 3% with Polyhedron capacita
    benchmark can be achieved vs. scalar code.

    Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9%
    vs. scalar code.  This is what clang does by default, as it defaults
    to -fno-trapping-math.

            PR target/110832

    gcc/ChangeLog:

            * config/i386/i386.opt (mpartial-vector-fp-math): New option.
            * config/i386/mmx.md (movq_<mode>_to_sse): Do not sanitize
            upper part of V2SFmode register with -fno-trapping-math.
            (<plusminusmult:insn>v2sf3): Enable for ix86_partial_vec_fp_math.
            (divv2sf3): Ditto.
            (<smaxmin:code>v2sf3): Ditto.
            (sqrtv2sf2): Ditto.
            (*mmx_haddv2sf3_low): Ditto.
            (*mmx_hsubv2sf3_low): Ditto.
            (vec_addsubv2sf3): Ditto.
            (vec_cmpv2sfv2si): Ditto.
            (vcond<V2FI:mode>v2sf): Ditto.
            (fmav2sf4): Ditto.
            (fmsv2sf4): Ditto.
            (fnmav2sf4): Ditto.
            (fnmsv2sf4): Ditto.
            (fix_truncv2sfv2si2): Ditto.
            (fixuns_truncv2sfv2si2): Ditto.
            (floatv2siv2sf2): Ditto.
            (floatunsv2siv2sf2): Ditto.
            (nearbyintv2sf2): Ditto.
            (rintv2sf2): Ditto.
            (lrintv2sfv2si2): Ditto.
            (ceilv2sf2): Ditto.
            (lceilv2sfv2si2): Ditto.
            (floorv2sf2): Ditto.
            (lfloorv2sfv2si2): Ditto.
            (btruncv2sf2): Ditto.
            (roundv2sf2): Ditto.
            (lroundv2sfv2si2): Ditto.
            * doc/invoke.texi (x86 Options): Document
            -mpartial-vector-fp-math option.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/pr110832-1.c: New test.
            * gcc.target/i386/pr110832-2.c: New test.
            * gcc.target/i386/pr110832-3.c: New test.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
  2023-07-27 17:27 [Bug middle-end/110832] New: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core hubicka at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2023-08-08 16:56 ` cvs-commit at gcc dot gnu.org
@ 2023-08-09  9:46 ` ubizjak at gmail dot com
  2023-08-10  6:06 ` cvs-commit at gcc dot gnu.org
  2023-08-25  8:59 ` ubizjak at gmail dot com
  14 siblings, 0 replies; 16+ messages in thread
From: ubizjak at gmail dot com @ 2023-08-09  9:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2023-08-09
           Keywords|needs-bisection             |
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
  2023-07-27 17:27 [Bug middle-end/110832] New: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core hubicka at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2023-08-09  9:46 ` ubizjak at gmail dot com
@ 2023-08-10  6:06 ` cvs-commit at gcc dot gnu.org
  2023-08-25  8:59 ` ubizjak at gmail dot com
  14 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-08-10  6:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:0c563a935c47e507ad97e15860ac017c14877b31

commit r14-3118-g0c563a935c47e507ad97e15860ac017c14877b31
Author: liuhongt <hongtao.liu@intel.com>
Date:   Wed Aug 9 14:25:53 2023 +0800

    i386: Do not sanitize upper part of V2HFmode and V4HFmode reg with
-fno-trapping-math [PR110832]

    Also add ix86_partial_vec_fp_math to to condition of V2HF/V4HF named
    patterns in order to avoid generation of partial vector V8HFmode
    trapping instructions.

    gcc/ChangeLog:

            PR target/110832
            * config/i386/mmx.md: (movq_<mode>_to_sse): Also do not
            sanitize upper part of V4HFmode register with
            -fno-trapping-math.
            (<insn>v4hf3): Enable for ix86_partial_vec_fp_math.
            (<divv4hf3): Ditto.
            (<insn>v2hf3): Ditto.
            (divv2hf3): Ditto.
            (movd_v2hf_to_sse): Do not sanitize upper part of V2HFmode
            register with -fno-trapping-math.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
  2023-07-27 17:27 [Bug middle-end/110832] New: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core hubicka at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2023-08-10  6:06 ` cvs-commit at gcc dot gnu.org
@ 2023-08-25  8:59 ` ubizjak at gmail dot com
  14 siblings, 0 replies; 16+ messages in thread
From: ubizjak at gmail dot com @ 2023-08-25  8:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #13 from Uroš Bizjak <ubizjak at gmail dot com> ---
Let's keep this patch to gcc-14+. The runtime regression is now due to strict
IEEE compilance, where the compiler sanitizes every partial vector input to
potentially trapping instructions. OTOH, -fno-trapping-math removes
sanitization fixups (and the documentation documents possible issues with
assembler and builtins passing non-conformat FP values), and
-m[no-]partial-vector-fp-math option is introduced to completely disable
potentially traping instructions for partial vectors.

So, fixed for gcc-14+.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-08-25  8:59 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-27 17:27 [Bug middle-end/110832] New: 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core hubicka at gcc dot gnu.org
2023-07-27 17:28 ` [Bug middle-end/110832] " hubicka at gcc dot gnu.org
2023-07-27 18:35 ` hubicka at ucw dot cz
2023-07-28  6:21 ` rguenth at gcc dot gnu.org
2023-07-28  6:21 ` [Bug middle-end/110832] [14 Regression] " rguenth at gcc dot gnu.org
2023-07-28 12:30 ` ubizjak at gmail dot com
2023-07-28 12:36 ` ubizjak at gmail dot com
2023-07-28 13:28 ` rguenth at gcc dot gnu.org
2023-07-28 13:41 ` ubizjak at gmail dot com
2023-07-28 15:46 ` ubizjak at gmail dot com
2023-07-31  4:58 ` crazylht at gmail dot com
2023-07-31  7:54 ` ubizjak at gmail dot com
2023-08-08 16:56 ` cvs-commit at gcc dot gnu.org
2023-08-09  9:46 ` ubizjak at gmail dot com
2023-08-10  6:06 ` cvs-commit at gcc dot gnu.org
2023-08-25  8:59 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).