public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/110713] New: Fatigue2 runs twice as fast with increased inlining limits
@ 2023-07-18 11:03 hubicka at gcc dot gnu.org
  0 siblings, 0 replies; only message in thread
From: hubicka at gcc dot gnu.org @ 2023-07-18 11:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110713

            Bug ID: 110713
           Summary: Fatigue2 runs twice as fast with increased inlining
                    limits
           Product: gcc
           Version: 13.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

jh@ryzen3:~/pb11/lin/source> ~/trunk-histogram/bin/gfortran fatigue2.f90 -Ofast
-march=native -fdump-tree-all-details-blocks -fdump-rtl-all-details
-fdump-ipa-all-details --param max-inline-insns-auto=110 ; perf stat ./a.out
>/dev/null

 Performance counter stats for './a.out':

          13937.07 msec task-clock:u                     #    1.000 CPUs
utilized             
                 0      context-switches:u               #    0.000 /sec        
                 0      cpu-migrations:u                 #    0.000 /sec        
               138      page-faults:u                    #    9.902 /sec        
       67489472294      cycles:u                         #    4.842 GHz        
                (83.33%)
          38791427      stalled-cycles-frontend:u        #    0.06% frontend
cycles idle        (83.33%)
           2351353      stalled-cycles-backend:u         #    0.00% backend
cycles idle         (83.33%)
      147268347462      instructions:u                   #    2.18  insn per
cycle            
                                                  #    0.00  stalled cycles per
insn     (83.33%)
        5705431257      branches:u                       #  409.371 M/sec      
                (83.35%)
          13638274      branch-misses:u                  #    0.24% of all
branches             (83.35%)

      13.941876147 seconds time elapsed

      13.933226000 seconds user
       0.003999000 seconds sys


jh@ryzen3:~/pb11/lin/source> ~/trunk-histogram/bin/gfortran fatigue2.f90 -Ofast
-march=native -fdump-tree-all-details-blocks -fdump-rtl-all-details
-fdump-ipa-all-details  ; perf stat ./a.out >/dev/null

 Performance counter stats for './a.out':

          31300.68 msec task-clock:u                     #    1.000 CPUs
utilized             
                 0      context-switches:u               #    0.000 /sec        
                 0      cpu-migrations:u                 #    0.000 /sec        
               138      page-faults:u                    #    4.409 /sec        
      150619261261      cycles:u                         #    4.812 GHz        
                (83.32%)
         779861463      stalled-cycles-frontend:u        #    0.52% frontend
cycles idle        (83.33%)
           4695025      stalled-cycles-backend:u         #    0.00% backend
cycles idle         (83.34%)
      242822794319      instructions:u                   #    1.61  insn per
cycle            
                                                  #    0.00  stalled cycles per
insn     (83.34%)
       13542051898      branches:u                       #  432.644 M/sec      
                (83.34%)
          14587945      branch-misses:u                  #    0.11% of all
branches             (83.34%)

      31.301169341 seconds time elapsed

      31.296826000 seconds user
       0.003999000 seconds sys

The main differnece is inlning generalized_hookes_law. While it looks quite big
at release_ssa time, after vectorization it gets loopless and inlining is a big
win.

      function generalized_hookes_law (strain_tensor, lambda, mu) result
(stress_tensor)
!
!      Author:       Dr. John K. Prentice
!      Affiliation:  Quetzal Computational Associates, Inc.
!      Dates:        28 November 1997
!
!      Purpose:      Apply the generalized Hooke's law for elasticity to the
strain tensor
!                    (or strain rate tensor) to compute the stress tensor (or
stress rate
!                    tensor)
!
!############################################################################################
!
!      Input:
!
!        strain_tensor                [selected_real_kind(15,90),
dimension(3,3)]
!                                     stress tensor
!
!        lambda                       [selected_real_kind(15,90)]
!                                     Lame constant Lambda
!
!        mu                           [selected_real_kind(15,90)]
!                                     Lame constant mu
!
!     Output:
!
!        stress_tensor                [selected_real_kind(15,90),
dimension(3,3)]
!                                     stress tensor
!
!############################################################################################
!
!
!=========== formal variables =============
!
      real (kind = LONGreal), dimension(:,:), intent(in) :: strain_tensor
      real (kind = LONGreal), intent(in) :: lambda, mu
      real (kind = LONGreal), dimension(3,3) :: stress_tensor
!
!========== internal variables ============
!
      real (kind = LONGreal), dimension(6) ::generalized_strain_vector,        
            &
                                             generalized_stress_vector
      real (kind = LONGreal), dimension(6,6) :: generalized_constitutive_tensor
      integer :: i
!
!        construct the generalized constitutive tensor for elasticity
!
      generalized_constitutive_tensor(:,:) = 0.0_LONGreal
      generalized_constitutive_tensor(1,1) = lambda + 2.0_LONGreal * mu
      generalized_constitutive_tensor(1,2) = lambda
      generalized_constitutive_tensor(1,3) = lambda
      generalized_constitutive_tensor(2,1) = lambda
      generalized_constitutive_tensor(2,2) = lambda + 2.0_LONGreal * mu
      generalized_constitutive_tensor(2,3) = lambda
      generalized_constitutive_tensor(3,1) = lambda
      generalized_constitutive_tensor(3,2) = lambda
      generalized_constitutive_tensor(3,3) = lambda + 2.0_LONGreal * mu
      generalized_constitutive_tensor(4,4) = mu
      generalized_constitutive_tensor(5,5) = mu
      generalized_constitutive_tensor(6,6) = mu
!
!        construct the generalized strain vector (using double index notation)
!
      generalized_strain_vector(1) = strain_tensor(1,1)
      generalized_strain_vector(2) = strain_tensor(2,2)
      generalized_strain_vector(3) = strain_tensor(3,3)
      generalized_strain_vector(4) = strain_tensor(2,3)
      generalized_strain_vector(5) = strain_tensor(1,3)
      generalized_strain_vector(6) = strain_tensor(1,2)
!
!        compute the generalized stress vector
!
      do i = 1, 6
          generalized_stress_vector(i) =
dot_product(generalized_constitutive_tensor(i,:),  &
                                                               
generalized_strain_vector(:))
      end do
!
!        update the stress tensor 
!
      stress_tensor(1,1) = generalized_stress_vector(1)
      stress_tensor(2,2) = generalized_stress_vector(2)
      stress_tensor(3,3) = generalized_stress_vector(3)
      stress_tensor(2,3) = generalized_stress_vector(4)
      stress_tensor(1,3) = generalized_stress_vector(5)
      stress_tensor(1,2) = generalized_stress_vector(6)
      stress_tensor(3,2) = stress_tensor(2,3)
      stress_tensor(3,1) = stress_tensor(1,3)
      stress_tensor(2,1) = stress_tensor(1,2)
!
      end function generalized_hookes_law

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-07-18 11:03 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-18 11:03 [Bug middle-end/110713] New: Fatigue2 runs twice as fast with increased inlining limits hubicka at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).