public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/110713] New: Fatigue2 runs twice as fast with increased inlining limits
@ 2023-07-18 11:03 hubicka at gcc dot gnu.org
0 siblings, 0 replies; only message in thread
From: hubicka at gcc dot gnu.org @ 2023-07-18 11:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110713
Bug ID: 110713
Summary: Fatigue2 runs twice as fast with increased inlining
limits
Product: gcc
Version: 13.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
jh@ryzen3:~/pb11/lin/source> ~/trunk-histogram/bin/gfortran fatigue2.f90 -Ofast
-march=native -fdump-tree-all-details-blocks -fdump-rtl-all-details
-fdump-ipa-all-details --param max-inline-insns-auto=110 ; perf stat ./a.out
>/dev/null
Performance counter stats for './a.out':
13937.07 msec task-clock:u # 1.000 CPUs
utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
138 page-faults:u # 9.902 /sec
67489472294 cycles:u # 4.842 GHz
(83.33%)
38791427 stalled-cycles-frontend:u # 0.06% frontend
cycles idle (83.33%)
2351353 stalled-cycles-backend:u # 0.00% backend
cycles idle (83.33%)
147268347462 instructions:u # 2.18 insn per
cycle
# 0.00 stalled cycles per
insn (83.33%)
5705431257 branches:u # 409.371 M/sec
(83.35%)
13638274 branch-misses:u # 0.24% of all
branches (83.35%)
13.941876147 seconds time elapsed
13.933226000 seconds user
0.003999000 seconds sys
jh@ryzen3:~/pb11/lin/source> ~/trunk-histogram/bin/gfortran fatigue2.f90 -Ofast
-march=native -fdump-tree-all-details-blocks -fdump-rtl-all-details
-fdump-ipa-all-details ; perf stat ./a.out >/dev/null
Performance counter stats for './a.out':
31300.68 msec task-clock:u # 1.000 CPUs
utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
138 page-faults:u # 4.409 /sec
150619261261 cycles:u # 4.812 GHz
(83.32%)
779861463 stalled-cycles-frontend:u # 0.52% frontend
cycles idle (83.33%)
4695025 stalled-cycles-backend:u # 0.00% backend
cycles idle (83.34%)
242822794319 instructions:u # 1.61 insn per
cycle
# 0.00 stalled cycles per
insn (83.34%)
13542051898 branches:u # 432.644 M/sec
(83.34%)
14587945 branch-misses:u # 0.11% of all
branches (83.34%)
31.301169341 seconds time elapsed
31.296826000 seconds user
0.003999000 seconds sys
The main differnece is inlning generalized_hookes_law. While it looks quite big
at release_ssa time, after vectorization it gets loopless and inlining is a big
win.
function generalized_hookes_law (strain_tensor, lambda, mu) result
(stress_tensor)
!
! Author: Dr. John K. Prentice
! Affiliation: Quetzal Computational Associates, Inc.
! Dates: 28 November 1997
!
! Purpose: Apply the generalized Hooke's law for elasticity to the
strain tensor
! (or strain rate tensor) to compute the stress tensor (or
stress rate
! tensor)
!
!############################################################################################
!
! Input:
!
! strain_tensor [selected_real_kind(15,90),
dimension(3,3)]
! stress tensor
!
! lambda [selected_real_kind(15,90)]
! Lame constant Lambda
!
! mu [selected_real_kind(15,90)]
! Lame constant mu
!
! Output:
!
! stress_tensor [selected_real_kind(15,90),
dimension(3,3)]
! stress tensor
!
!############################################################################################
!
!
!=========== formal variables =============
!
real (kind = LONGreal), dimension(:,:), intent(in) :: strain_tensor
real (kind = LONGreal), intent(in) :: lambda, mu
real (kind = LONGreal), dimension(3,3) :: stress_tensor
!
!========== internal variables ============
!
real (kind = LONGreal), dimension(6) ::generalized_strain_vector,
&
generalized_stress_vector
real (kind = LONGreal), dimension(6,6) :: generalized_constitutive_tensor
integer :: i
!
! construct the generalized constitutive tensor for elasticity
!
generalized_constitutive_tensor(:,:) = 0.0_LONGreal
generalized_constitutive_tensor(1,1) = lambda + 2.0_LONGreal * mu
generalized_constitutive_tensor(1,2) = lambda
generalized_constitutive_tensor(1,3) = lambda
generalized_constitutive_tensor(2,1) = lambda
generalized_constitutive_tensor(2,2) = lambda + 2.0_LONGreal * mu
generalized_constitutive_tensor(2,3) = lambda
generalized_constitutive_tensor(3,1) = lambda
generalized_constitutive_tensor(3,2) = lambda
generalized_constitutive_tensor(3,3) = lambda + 2.0_LONGreal * mu
generalized_constitutive_tensor(4,4) = mu
generalized_constitutive_tensor(5,5) = mu
generalized_constitutive_tensor(6,6) = mu
!
! construct the generalized strain vector (using double index notation)
!
generalized_strain_vector(1) = strain_tensor(1,1)
generalized_strain_vector(2) = strain_tensor(2,2)
generalized_strain_vector(3) = strain_tensor(3,3)
generalized_strain_vector(4) = strain_tensor(2,3)
generalized_strain_vector(5) = strain_tensor(1,3)
generalized_strain_vector(6) = strain_tensor(1,2)
!
! compute the generalized stress vector
!
do i = 1, 6
generalized_stress_vector(i) =
dot_product(generalized_constitutive_tensor(i,:), &
generalized_strain_vector(:))
end do
!
! update the stress tensor
!
stress_tensor(1,1) = generalized_stress_vector(1)
stress_tensor(2,2) = generalized_stress_vector(2)
stress_tensor(3,3) = generalized_stress_vector(3)
stress_tensor(2,3) = generalized_stress_vector(4)
stress_tensor(1,3) = generalized_stress_vector(5)
stress_tensor(1,2) = generalized_stress_vector(6)
stress_tensor(3,2) = stress_tensor(2,3)
stress_tensor(3,1) = stress_tensor(1,3)
stress_tensor(2,1) = stress_tensor(1,2)
!
end function generalized_hookes_law
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2023-07-18 11:03 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-18 11:03 [Bug middle-end/110713] New: Fatigue2 runs twice as fast with increased inlining limits hubicka at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).