From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 9B4F738582B0; Tue, 18 Jul 2023 11:03:03 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9B4F738582B0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1689678183; bh=xO3pwbP+gTAxqRJ98zX32qY3XMn2s9uA9sDnFtXP3xY=; h=From:To:Subject:Date:From; b=rEmSOZMyRr5O+a0N3/OkIYBaA/hBdInQ7Nncr40MxvkA0yIJ2tXHDX66LO8ffb60P GcY6wPe34ZTd9JYICjn6P/3pIQQY6EHEucRIyDUn9LLqxaq0HmQtWIhqGLA7SuDJE8 fTI0t1bnAM3ta5Rh8cxY2kD4FAdwYWi95DEFQAIY= From: "hubicka at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/110713] New: Fatigue2 runs twice as fast with increased inlining limits Date: Tue, 18 Jul 2023 11:03:02 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 13.1.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: hubicka at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110713 Bug ID: 110713 Summary: Fatigue2 runs twice as fast with increased inlining limits Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- jh@ryzen3:~/pb11/lin/source> ~/trunk-histogram/bin/gfortran fatigue2.f90 -O= fast -march=3Dnative -fdump-tree-all-details-blocks -fdump-rtl-all-details -fdump-ipa-all-details --param max-inline-insns-auto=3D110 ; perf stat ./a.= out >/dev/null Performance counter stats for './a.out': 13937.07 msec task-clock:u # 1.000 CPUs utilized=20=20=20=20=20=20=20=20=20=20=20=20=20 0 context-switches:u # 0.000 /sec=20= =20=20=20=20=20=20=20 0 cpu-migrations:u # 0.000 /sec=20= =20=20=20=20=20=20=20 138 page-faults:u # 9.902 /sec=20= =20=20=20=20=20=20=20 67489472294 cycles:u # 4.842 GHz=20= =20=20=20=20=20=20=20 (83.33%) 38791427 stalled-cycles-frontend:u # 0.06% frontend cycles idle (83.33%) 2351353 stalled-cycles-backend:u # 0.00% backend cycles idle (83.33%) 147268347462 instructions:u # 2.18 insn per cycle=20=20=20=20=20=20=20=20=20=20=20=20 # 0.00 stalled cycles= per insn (83.33%) 5705431257 branches:u # 409.371 M/sec= =20=20=20=20=20=20 (83.35%) 13638274 branch-misses:u # 0.24% of all branches (83.35%) 13.941876147 seconds time elapsed 13.933226000 seconds user 0.003999000 seconds sys jh@ryzen3:~/pb11/lin/source> ~/trunk-histogram/bin/gfortran fatigue2.f90 -O= fast -march=3Dnative -fdump-tree-all-details-blocks -fdump-rtl-all-details -fdump-ipa-all-details ; perf stat ./a.out >/dev/null Performance counter stats for './a.out': 31300.68 msec task-clock:u # 1.000 CPUs utilized=20=20=20=20=20=20=20=20=20=20=20=20=20 0 context-switches:u # 0.000 /sec=20= =20=20=20=20=20=20=20 0 cpu-migrations:u # 0.000 /sec=20= =20=20=20=20=20=20=20 138 page-faults:u # 4.409 /sec=20= =20=20=20=20=20=20=20 150619261261 cycles:u # 4.812 GHz=20= =20=20=20=20=20=20=20 (83.32%) 779861463 stalled-cycles-frontend:u # 0.52% frontend cycles idle (83.33%) 4695025 stalled-cycles-backend:u # 0.00% backend cycles idle (83.34%) 242822794319 instructions:u # 1.61 insn per cycle=20=20=20=20=20=20=20=20=20=20=20=20 # 0.00 stalled cycles= per insn (83.34%) 13542051898 branches:u # 432.644 M/sec= =20=20=20=20=20=20 (83.34%) 14587945 branch-misses:u # 0.11% of all branches (83.34%) 31.301169341 seconds time elapsed 31.296826000 seconds user 0.003999000 seconds sys The main differnece is inlning generalized_hookes_law. While it looks quite= big at release_ssa time, after vectorization it gets loopless and inlining is a= big win. function generalized_hookes_law (strain_tensor, lambda, mu) result (stress_tensor) ! ! Author: Dr. John K. Prentice ! Affiliation: Quetzal Computational Associates, Inc. ! Dates: 28 November 1997 ! ! Purpose: Apply the generalized Hooke's law for elasticity to the strain tensor ! (or strain rate tensor) to compute the stress tensor (= or stress rate ! tensor) ! !##########################################################################= ################## ! ! Input: ! ! strain_tensor [selected_real_kind(15,90), dimension(3,3)] ! stress tensor ! ! lambda [selected_real_kind(15,90)] ! Lame constant Lambda ! ! mu [selected_real_kind(15,90)] ! Lame constant mu ! ! Output: ! ! stress_tensor [selected_real_kind(15,90), dimension(3,3)] ! stress tensor ! !##########################################################################= ################## ! ! !=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D formal variables =3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D ! real (kind =3D LONGreal), dimension(:,:), intent(in) :: strain_tensor real (kind =3D LONGreal), intent(in) :: lambda, mu real (kind =3D LONGreal), dimension(3,3) :: stress_tensor ! !=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D internal variables =3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D ! real (kind =3D LONGreal), dimension(6) ::generalized_strain_vector,= =20=20=20=20=20=20=20=20 & generalized_stress_vector real (kind =3D LONGreal), dimension(6,6) :: generalized_constitutive_= tensor integer :: i ! ! construct the generalized constitutive tensor for elasticity ! generalized_constitutive_tensor(:,:) =3D 0.0_LONGreal generalized_constitutive_tensor(1,1) =3D lambda + 2.0_LONGreal * mu generalized_constitutive_tensor(1,2) =3D lambda generalized_constitutive_tensor(1,3) =3D lambda generalized_constitutive_tensor(2,1) =3D lambda generalized_constitutive_tensor(2,2) =3D lambda + 2.0_LONGreal * mu generalized_constitutive_tensor(2,3) =3D lambda generalized_constitutive_tensor(3,1) =3D lambda generalized_constitutive_tensor(3,2) =3D lambda generalized_constitutive_tensor(3,3) =3D lambda + 2.0_LONGreal * mu generalized_constitutive_tensor(4,4) =3D mu generalized_constitutive_tensor(5,5) =3D mu generalized_constitutive_tensor(6,6) =3D mu ! ! construct the generalized strain vector (using double index notati= on) ! generalized_strain_vector(1) =3D strain_tensor(1,1) generalized_strain_vector(2) =3D strain_tensor(2,2) generalized_strain_vector(3) =3D strain_tensor(3,3) generalized_strain_vector(4) =3D strain_tensor(2,3) generalized_strain_vector(5) =3D strain_tensor(1,3) generalized_strain_vector(6) =3D strain_tensor(1,2) ! ! compute the generalized stress vector ! do i =3D 1, 6 generalized_stress_vector(i) =3D dot_product(generalized_constitutive_tensor(i,:), & =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20 generalized_strain_vector(:)) end do ! ! update the stress tensor=20 ! stress_tensor(1,1) =3D generalized_stress_vector(1) stress_tensor(2,2) =3D generalized_stress_vector(2) stress_tensor(3,3) =3D generalized_stress_vector(3) stress_tensor(2,3) =3D generalized_stress_vector(4) stress_tensor(1,3) =3D generalized_stress_vector(5) stress_tensor(1,2) =3D generalized_stress_vector(6) stress_tensor(3,2) =3D stress_tensor(2,3) stress_tensor(3,1) =3D stress_tensor(1,3) stress_tensor(2,1) =3D stress_tensor(1,2) ! end function generalized_hookes_law=