public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
@ 2024-01-19 1:22 juzhe.zhong at rivai dot ai
2024-01-19 1:36 ` [Bug tree-optimization/113495] " juzhe.zhong at rivai dot ai
` (33 more replies)
0 siblings, 34 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19 1:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
Bug ID: 113495
Summary: RISC-V: Time and memory awful consumption of SPEC2017
wrf benchmark
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: juzhe.zhong at rivai dot ai
Target Milestone: ---
riscv64-unknown-linux-gnu-gfortran -march=rv64gcv_zvl256b -O3 -S -ftime-report
real 63m18.771s
user 60m19.036s
sys 2m59.787s
60+ minutes.
After investigation, the time report show 2 PASS are critical:
loop invariant motion :2600.28 ( 72%) 1.68 ( 1%)2602.12 ( 69%)
2617k ( 0%)
loop invariant consume most of the time 72% time.
The other is the VSETVL PASS:
vsetvl: earliest_fuse_vsetvl_info : 438.26 ( 12%) 79.82 ( 47%) 518.08 (
14%)221807M ( 75%)
vsetvl: pre_global_vsetvl_info : 135.98 ( 4%) 31.71 ( 19%) 167.69 ( 4%)
71950M ( 24%)
The phase 2 and phase 3 of VSETVL PASS consume 16% times and 99% memory.
I will look into VSETVL PASS issue but I am not able to take care of loop
invariant issue.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
@ 2024-01-19 1:36 ` juzhe.zhong at rivai dot ai
2024-01-19 1:38 ` juzhe.zhong at rivai dot ai
` (32 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19 1:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #1 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Created attachment 57149
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57149&action=edit
spec2017 wrf
spec2017 wrf
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
2024-01-19 1:36 ` [Bug tree-optimization/113495] " juzhe.zhong at rivai dot ai
@ 2024-01-19 1:38 ` juzhe.zhong at rivai dot ai
2024-01-19 1:48 ` juzhe.zhong at rivai dot ai
` (31 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19 1:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #2 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
To build the attachment file, we need these following file from SPEC2017:
module_big_step_utilities_em.mod module_cumulus_driver.mod
module_fddagd_driver.mod module_model_constants.mod
module_shallowcu_driver.mod
module_comm_dm.mod module_dm.mod
module_first_rk_step_part1.mod module_pbl_driver.mod
module_state_description.mod
module_configure.mod module_domain.mod
module_force_scm.mod module_radiation_driver.mod
module_surface_driver.mod
module_convtrans_prep.mod module_em.mod
module_fr_fire_driver_wrf.mod module_scalar_tables.mod module_utility.mod
But I failed to create attachment for them since they are too big.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
2024-01-19 1:36 ` [Bug tree-optimization/113495] " juzhe.zhong at rivai dot ai
2024-01-19 1:38 ` juzhe.zhong at rivai dot ai
@ 2024-01-19 1:48 ` juzhe.zhong at rivai dot ai
2024-01-19 1:52 ` juzhe.zhong at rivai dot ai
` (30 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19 1:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Ok. The reduced case:
# 1 "module_first_rk_step_part1.fppized.f90"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "module_first_rk_step_part1.fppized.f90"
!WRF:MEDIATION_LAYER:SOLVER
MODULE module_first_rk_step_part1
CONTAINS
SUBROUTINE first_rk_step_part1 ( grid , config_flags &
, moist , moist_tend &
, chem , chem_tend &
, tracer, tracer_tend &
, scalar , scalar_tend &
, fdda3d, fdda2d &
, aerod &
, ru_tendf, rv_tendf &
, rw_tendf, t_tendf &
, ph_tendf, mu_tendf &
, tke_tend &
, adapt_step_flag , curr_secs &
, psim , psih , wspd , gz1oz0 , br , chklowq &
, cu_act_flag , hol , th_phy &
, pi_phy , p_phy , t_phy &
, dz8w , p8w , t8w &
, ids, ide, jds, jde, kds, kde &
, ims, ime, jms, jme, kms, kme &
, ips, ipe, jps, jpe, kps, kpe &
, imsx,imex,jmsx,jmex,kmsx,kmex &
, ipsx,ipex,jpsx,jpex,kpsx,kpex &
, imsy,imey,jmsy,jmey,kmsy,kmey &
, ipsy,ipey,jpsy,jpey,kpsy,kpey &
, k_start , k_end &
, f_flux &
)
USE module_state_description
USE module_model_constants
USE module_domain, ONLY : domain, domain_clock_get, get_ijk_from_subgrid
USE module_configure, ONLY : grid_config_rec_type, model_config_rec
USE module_radiation_driver, ONLY : pre_radiation_driver, radiation_driver
USE module_surface_driver, ONLY : surface_driver
USE module_cumulus_driver, ONLY : cumulus_driver
USE module_shallowcu_driver, ONLY : shallowcu_driver
USE module_pbl_driver, ONLY : pbl_driver
USE module_fr_fire_driver_wrf, ONLY : fire_driver_em_step
USE module_fddagd_driver, ONLY : fddagd_driver
USE module_em, ONLY : init_zero_tendency
USE module_force_scm
USE module_convtrans_prep
USE module_big_step_utilities_em, ONLY : phy_prep
use module_scalar_tables
USE module_dm, ONLY : local_communicator, mytask, ntasks, ntasks_x,
ntasks_y, local_communicator_periodic, wrf_dm_maxval
USE module_comm_dm, ONLY :
halo_em_phys_a_sub,halo_em_fdda_sfc_sub,halo_pwp_sub,halo_em_chem_e_3_sub, &
halo_em_chem_e_5_sub, halo_em_hydro_noahmp_sub
USE module_utility
IMPLICIT NONE
TYPE ( domain ), INTENT(INOUT) :: grid
TYPE ( grid_config_rec_type ), INTENT(IN) :: config_flags
TYPE(WRFU_Time) :: currentTime
INTEGER, INTENT(IN) :: ids, ide, jds, jde, kds, kde, &
ims, ime, jms, jme, kms, kme, &
ips, ipe, jps, jpe, kps, kpe, &
imsx,imex,jmsx,jmex,kmsx,kmex, &
ipsx,ipex,jpsx,jpex,kpsx,kpex, &
imsy,imey,jmsy,jmey,kmsy,kmey, &
ipsy,ipey,jpsy,jpey,kpsy,kpey
LOGICAL ,INTENT(IN) :: adapt_step_flag
REAL, INTENT(IN) :: curr_secs
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_moist),INTENT(INOUT) ::
moist
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_moist),INTENT(INOUT) ::
moist_tend
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_chem),INTENT(INOUT) ::
chem
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_chem),INTENT(INOUT) ::
chem_tend
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_tracer),INTENT(INOUT) ::
tracer
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_tracer),INTENT(INOUT) ::
tracer_tend
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_scalar),INTENT(INOUT) ::
scalar
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_scalar),INTENT(INOUT) ::
scalar_tend
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_fdda3d),INTENT(INOUT) ::
fdda3d
REAL ,DIMENSION(ims:ime,1:1,jms:jme,num_fdda2d),INTENT(INOUT) ::
fdda2d
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_aerod),INTENT(INOUT) ::
aerod
REAL ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: psim
REAL ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: psih
REAL ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: wspd
REAL ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: gz1oz0
REAL ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: br
REAL ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: chklowq
LOGICAL ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: cu_act_flag
REAL ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: hol
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: th_phy
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: pi_phy
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: p_phy
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: t_phy
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: dz8w
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: p8w
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: t8w
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: ru_tendf
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: rv_tendf
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: rw_tendf
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: ph_tendf
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: t_tendf
REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: tke_tend
REAL ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: mu_tendf
INTEGER, INTENT(IN) :: k_start, k_end
LOGICAL, INTENT(IN), OPTIONAL :: f_flux
! Local
real :: HYDRO_dt
REAL, DIMENSION( ims:ime, jms:jme ) :: exch_temf ! 1/7/09 WA
REAL, DIMENSION( ims:ime, jms:jme ) :: ht_loc, mixht
INTEGER :: ij
INTEGER num_roof_layers
INTEGER num_wall_layers
INTEGER num_road_layers
INTEGER iswater
LOGICAL :: l_flux
INTEGER :: isurban
INTEGER rk_step
INTEGER :: yr, month, day, hr, minute, sec, rc
CHARACTER(LEN=80) :: mesg
INTEGER :: sids , side , sjds , sjde , skds , skde ,
&
sims , sime , sjms , sjme , skms , skme ,
&
sips , sipe , sjps , sjpe , skps , skpe
CHARACTER (LEN=256) :: mminlu
CHARACTER (LEN=1000) :: message
CALL get_ijk_from_subgrid ( grid , &
sids, side, sjds, sjde, skds, skde, &
sims, sime, sjms, sjme, skms, skme, &
sips, sipe, sjps, sjpe, skps, skpe )
! initialize all tendencies to zero in order to update physics
! tendencies first (separate from dry dynamics).
l_flux=.FALSE.
if (present(f_flux)) l_flux=f_flux
rk_step = 1
DO ij = 1 , grid%num_tiles
CALL wrf_debug ( 200 , ' call init_zero_tendency' )
CALL init_zero_tendency ( ru_tendf, rv_tendf, rw_tendf, &
ph_tendf, t_tendf, tke_tend, &
mu_tendf, &
moist_tend,chem_tend,scalar_tend, &
tracer_tend,num_tracer, &
num_moist,num_chem,num_scalar, &
rk_step, &
ids, ide, jds, jde, kds, kde, &
ims, ime, jms, jme, kms, kme, &
grid%i_start(ij), grid%i_end(ij), &
grid%j_start(ij), grid%j_end(ij), &
k_start, k_end )
END DO
!STARTOFREGISTRYGENERATEDINCLUDE 'inc/HALO_EM_PHYS_A.inc'
!
! WARNING This file is generated automatically by use_registry
! using the data base in the file named Registry.
! Do not edit. Your changes to this file will be lost.
!
CALL HALO_EM_PHYS_A_sub ( grid, &
local_communicator, &
mytask, ntasks, ntasks_x, ntasks_y, &
ids, ide, jds, jde, kds, kde, &
ims, ime, jms, jme, kms, kme, &
ips, ipe, jps, jpe, kps, kpe )
!ENDOFREGISTRYGENERATEDINCLUDE
DO ij = 1 , grid%num_tiles
CALL wrf_debug ( 200 , ' call phy_prep' )
CALL phy_prep ( config_flags, &
grid%mut, grid%muu, grid%muv, grid%u_2, &
grid%v_2, grid%p, grid%pb, grid%alt, &
grid%ph_2, grid%phb, grid%t_2, grid%tsk, moist,
num_moist, &
grid%rho,th_phy, p_phy, pi_phy, grid%u_phy, grid%v_phy,
&
p8w, t_phy, t8w, grid%z, grid%z_at_w, dz8w, &
grid%p_hyd, grid%p_hyd_w, grid%dnw, &
grid%fnm, grid%fnp, grid%znw, grid%p_top, &
grid%rthraten, &
grid%rthblten, grid%rublten, grid%rvblten, &
grid%rqvblten, grid%rqcblten, grid%rqiblten, &
grid%rucuten, grid%rvcuten, grid%rthcuten, &
grid%rqvcuten, grid%rqccuten, grid%rqrcuten, &
grid%rqicuten, grid%rqscuten, &
grid%rushten, grid%rvshten, grid%rthshten, &
grid%rqvshten, grid%rqcshten, grid%rqrshten, &
grid%rqishten, grid%rqsshten, grid%rqgshten, &
grid%rthften, grid%rqvften, &
grid%RUNDGDTEN, grid%RVNDGDTEN, grid%RTHNDGDTEN, &
grid%RPHNDGDTEN,grid%RQVNDGDTEN, grid%RMUNDGDTEN,&
!jdf
grid%landmask,grid%xland, &
!jdf
ids, ide, jds, jde, kds, kde, &
ims, ime, jms, jme, kms, kme, &
grid%i_start(ij), grid%i_end(ij), &
grid%j_start(ij), grid%j_end(ij), &
k_start, k_end )
ENDDO
! radiation
CALL domain_clock_get( grid, current_time=currentTime, &
current_timestr=mesg )
CALL WRFU_TimeGet( currentTime, YY=yr, dayOfYear=day, H=hr, M=minute,
S=sec, rc=rc)
IF( rc/= WRFU_SUCCESS)THEN
CALL wrf_error_fatal('WRFU_TimeGet failed')
ENDIF
! this driver is only needed to handle non-local shadowing effects
CALL pre_radiation_driver ( grid, config_flags &
& ,itimestep=grid%itimestep, ra_call_offset=grid%ra_call_offset
&
& ,XLAT=grid%xlat, XLONG=grid%xlong, GMT=grid%gmt
&
& ,julian=grid%julian, xtime=grid%xtime, RADT=grid%radt
&
& ,STEPRA=grid%stepra
&
& ,ht=grid%ht,dx=grid%dx,dy=grid%dy,sina=grid%sina,cosa=grid%cosa
&
& ,shadowmask=grid%shadowmask,slope_rad=config_flags%slope_rad
&
& ,topo_shading=config_flags%topo_shading
&
& ,shadlen=config_flags%shadlen,ht_shad=grid%ht_shad,ht_loc=ht_loc
&
& ,ht_shad_bxs=grid%ht_shad_bxs, ht_shad_bxe=grid%ht_shad_bxe
&
& ,ht_shad_bys=grid%ht_shad_bys, ht_shad_bye=grid%ht_shad_bye
&
& ,nested=config_flags%nested, min_ptchsz=grid%min_ptchsz
&
& ,spec_bdy_width=config_flags%spec_bdy_width
&
! indexes
& ,IDS=ids,IDE=ide, JDS=jds,JDE=jde, KDS=kds,KDE=kde &
& ,IMS=ims,IME=ime, JMS=jms,JME=jme, KMS=kms,KME=kme &
& ,IPS=ips,IPE=ipe, JPS=jps,JPE=jpe, KPS=kps,KPE=kpe &
& ,i_start=grid%i_start,i_end=min(grid%i_end, ide-1) &
& ,j_start=grid%j_start,j_end=min(grid%j_end, jde-1) &
& ,kts=k_start, kte=min(k_end,kde-1) &
& ,num_tiles=grid%num_tiles )
CALL wrf_debug ( 200 , ' call radiation_driver' )
CALL radiation_driver( &
& p_top=grid%p_top & !DJW 140312 added p_top for vertical nesting
& ,ACFRCV=grid%acfrcv ,ACFRST=grid%acfrst
,ALBEDO=grid%albedo &
& ,CFRACH=grid%cfrach ,CFRACL=grid%cfracl
,CFRACM=grid%cfracm &
& ,CUPPT=grid%cuppt ,CZMEAN=grid%czmean ,DT=grid%dt
&
& ,DZ8W=dz8w ,EMISS=grid%emiss ,GLW=grid%glw
&
& ,GMT=grid%gmt ,GSW=grid%gsw ,HBOT=grid%hbot
&
& ,HTOP=grid%htop ,HBOTR=grid%hbotr
,HTOPR=grid%htopr &
& ,ICLOUD=config_flags%icloud
&
& ,ITIMESTEP=grid%itimestep,JULDAY=grid%julday ,
JULIAN=grid%julian &
& ,JULYR=grid%julyr ,LW_PHYSICS=config_flags%ra_lw_physics
&
& ,NCFRCV=grid%ncfrcv ,NCFRST=grid%ncfrst ,NPHS=1
&
& ,o3input=config_flags%o3input ,O3rad=grid%o3rad
&
& ,aer_opt=config_flags%aer_opt
,aerod=aerod(:,:,:,P_ocarbon:P_upperaer) &
& ,swint_opt=config_flags%swint_opt
&
& ,P8W=grid%p_hyd_w ,P=grid%p_hyd ,PI=pi_phy
&
& ,RADT=grid%radt ,RA_CALL_OFFSET=grid%ra_call_offset
&
& ,RHO=grid%rho ,RLWTOA=grid%rlwtoa
&
& ,RSWTOA=grid%rswtoa ,RTHRATEN=grid%rthraten
&
& ,RTHRATENLW=grid%rthratenlw ,RTHRATENSW=grid%rthratensw
&
& ,SNOW=grid%snow ,STEPRA=grid%stepra
,SWDOWN=grid%swdown &
& ,SWDOWNC=grid%swdownc ,SW_PHYSICS=config_flags%ra_sw_physics
&
& ,T8W=t8w ,T=grid%t_phy
,TAUCLDC=grid%taucldc &
& ,TAUCLDI=grid%taucldi ,TSK=grid%tsk
,VEGFRA=grid%vegfra &
& ,WARM_RAIN=grid%warm_rain ,XICE=grid%xice
,XLAND=grid%xland &
& ,XLAT=grid%xlat ,XLONG=grid%xlong ,YR=yr
&
! SSiB LSM radiation components (fds 06/2010)
& ,ALSWVISDIR=grid%alswvisdir ,ALSWVISDIF=grid%alswvisdif &
!ssib
& ,ALSWNIRDIR=grid%alswnirdir ,ALSWNIRDIF=grid%alswnirdif &
!ssib
& ,SWVISDIR=grid%swvisdir ,SWVISDIF=grid%swvisdif &
!ssib
& ,SWNIRDIR=grid%swnirdir ,SWNIRDIF=grid%swnirdif &
!ssib
& ,SF_SURFACE_PHYSICS=config_flags%sf_surface_physics &
!ssib
! WRF-solar and aerosol variables from jararias 2013/8 and 2013/11
& ,SWDDIR=grid%swddir,SWDDNI=grid%swddni,SWDDIF=grid%swddif
&
& ,Gx=grid%Gx,Bx=grid%Bx,gg=grid%gg,bb=grid%bb
&
& ,swdown_ref=grid%swdown_ref,swddir_ref=grid%swddir_ref
&
& ,coszen_ref=grid%coszen_ref
&
& ,aer_type=config_flags%aer_type
&
&
,aer_aod550_opt=config_flags%aer_aod550_opt,aer_aod550_val=config_flags%aer_aod550_val
&
&
,aer_angexp_opt=config_flags%aer_angexp_opt,aer_angexp_val=config_flags%aer_angexp_val
&
&
,aer_ssa_opt=config_flags%aer_ssa_opt,aer_ssa_val=config_flags%aer_ssa_val
&
&
,aer_asy_opt=config_flags%aer_asy_opt,aer_asy_val=config_flags%aer_asy_val
&
&
,aod5502d=grid%aod5502d,angexp2d=grid%angexp2d,aerssa2d=grid%aerssa2d
&
& ,aerasy2d=grid%aerasy2d,aod5503d=grid%aod5503d
&
!Optional solar variables
& ,DECLINX=grid%declin ,SOLCONX=grid%solcon ,COSZEN=grid%coszen
,HRANG=grid%hrang &
& , CEN_LAT=grid%cen_lat &
& ,Z=grid%z &
& ,ALEVSIZ=grid%alevsiz, no_src_types=grid%no_src_types &
& ,LEVSIZ=grid%levsiz, N_OZMIXM=num_ozmixm &
& ,N_AEROSOLC=num_aerosolc &
& ,PAERLEV=grid%paerlev ,ID=grid%id &
& ,CAM_ABS_DIM1=grid%cam_abs_dim1, CAM_ABS_DIM2=grid%cam_abs_dim2 &
& ,CAM_ABS_FREQ_S=grid%cam_abs_freq_s &
& ,XTIME=grid%xtime
&
,CURR_SECS=curr_secs, ADAPT_STEP_FLAG=adapt_step_flag &
! indexes
& ,IDS=ids,IDE=ide, JDS=jds,JDE=jde, KDS=kds,KDE=kde &
& ,IMS=ims,IME=ime, JMS=jms,JME=jme, KMS=kms,KME=kme &
& ,i_start=grid%i_start,i_end=min(grid%i_end, ide-1) &
& ,j_start=grid%j_start,j_end=min(grid%j_end, jde-1) &
& ,kts=k_start, kte=min(k_end,kde-1) &
& ,num_tiles=grid%num_tiles &
! Optional
!JJS 20101020 vvvvv
& , TLWDN=grid%tlwdn, TLWUP=grid%tlwup & !
goddard schemes
& , SLWDN=grid%slwdn, SLWUP=grid%slwup & !
goddard schemes
& , TSWDN=grid%tswdn, TSWUP=grid%tswup & !
goddard schemes
& , SSWDN=grid%sswdn, SSWUP=grid%sswup & !
goddard schemes
!JJS 20101020 ^^^^^
& , CLDFRA=grid%cldfra, CLDFRA_MP_ALL=grid%cldfra_mp_all &
& , LRADIUS=grid%LRADIUS,IRADIUS=grid%IRADIUS &
!BSINGH(01/22/2014)
& , CLDFRA_DP=grid%cldfra_dp & !
ckay for subgrid cloud
& , CLDFRA_SH=grid%cldfra_sh &
& , re_cloud=grid%re_cloud, re_ice=grid%re_ice,
re_snow=grid%re_snow & ! G. Thompson
& , has_reqc=grid%has_reqc, has_reqi=grid%has_reqi,
has_reqs=grid%has_reqs & ! G. Thompson
& , PB=grid%pb &
& , F_ICE_PHY=grid%f_ice_phy,F_RAIN_PHY=grid%f_rain_phy &
& , QV=moist(ims,kms,jms,P_QV), F_QV=F_QV &
& , QC=moist(ims,kms,jms,P_QC), F_QC=F_QC &
& , QR=moist(ims,kms,jms,P_QR), F_QR=F_QR &
& , QI=moist(ims,kms,jms,P_QI), F_QI=F_QI &
& , QS=moist(ims,kms,jms,P_QS), F_QS=F_QS &
& , QG=moist(ims,kms,jms,P_QG), F_QG=F_QG &
& , QNDROP=scalar(ims,kms,jms,P_QNDROP), F_QNDROP=F_QNDROP &
& ,ACSWUPT=grid%acswupt ,ACSWUPTC=grid%acswuptc &
& ,ACSWDNT=grid%acswdnt ,ACSWDNTC=grid%acswdntc &
& ,ACSWUPB=grid%acswupb ,ACSWUPBC=grid%acswupbc &
& ,ACSWDNB=grid%acswdnb ,ACSWDNBC=grid%acswdnbc &
& ,ACLWUPT=grid%aclwupt ,ACLWUPTC=grid%aclwuptc &
& ,ACLWDNT=grid%aclwdnt ,ACLWDNTC=grid%aclwdntc &
& ,ACLWUPB=grid%aclwupb ,ACLWUPBC=grid%aclwupbc &
& ,ACLWDNB=grid%aclwdnb ,ACLWDNBC=grid%aclwdnbc &
& ,SWUPT=grid%swupt ,SWUPTC=grid%swuptc &
& ,SWDNT=grid%swdnt ,SWDNTC=grid%swdntc &
& ,SWUPB=grid%swupb ,SWUPBC=grid%swupbc &
& ,SWDNB=grid%swdnb ,SWDNBC=grid%swdnbc &
& ,LWUPT=grid%lwupt ,LWUPTC=grid%lwuptc &
& ,LWDNT=grid%lwdnt ,LWDNTC=grid%lwdntc &
& ,LWUPB=grid%lwupb ,LWUPBC=grid%lwupbc &
& ,LWDNB=grid%lwdnb ,LWDNBC=grid%lwdnbc &
& ,LWCF=grid%lwcf
&
& ,SWCF=grid%swcf
&
& ,OLR=grid%olr
&
& ,AERODM=grid%aerodm, PINA=grid%pina, AODTOT=grid%aodtot
&
& ,OZMIXM=grid%ozmixm, PIN=grid%pin
&
& ,M_PS_1=grid%m_ps_1, M_PS_2=grid%m_ps_2,
AEROSOLC_1=grid%aerosolc_1 &
& ,AEROSOLC_2=grid%aerosolc_2, M_HYBI0=grid%m_hybi
&
& ,ABSTOT=grid%abstot, ABSNXT=grid%absnxt, EMSTOT=grid%emstot
&
& ,RADTACTTIME=grid%radtacttime
&
& ,ICLOUD_CU=config_flags%ICLOUD_CU &
& ,QC_CU=grid%QC_CU , QI_CU=grid%QI_CU &
&
,slope_rad=config_flags%slope_rad,topo_shading=config_flags%topo_shading &
& ,shadowmask=grid%shadowmask,ht=grid%ht,dx=grid%dx,dy=grid%dy &
& ,IS_CAMMGMP_USED = grid%is_CAMMGMP_used )
!********* Surface driver
! surface
!gmm halo of wtd and riverflow for leafhydro
IF ( config_flags%sf_surface_physics.eq.NOAHMPSCHEME ) THEN
IF ( config_flags%opt_run.eq.5.and.mod(grid%itimestep,grid%STEPWTD).eq.0
) THEN
!STARTOFREGISTRYGENERATEDINCLUDE 'inc/HALO_EM_HYDRO_NOAHMP.inc'
!
! WARNING This file is generated automatically by use_registry
! using the data base in the file named Registry.
! Do not edit. Your changes to this file will be lost.
!
CALL HALO_EM_HYDRO_NOAHMP_sub ( grid, &
local_communicator, &
mytask, ntasks, ntasks_x, ntasks_y, &
ids, ide, jds, jde, kds, kde, &
ims, ime, jms, jme, kms, kme, &
ips, ipe, jps, jpe, kps, kpe )
!ENDOFREGISTRYGENERATEDINCLUDE
ENDIF
ENDIF
END SUBROUTINE first_rk_step_part1
END MODULE module_first_rk_step_part1
Which can easily help us to debug memory hog since we don't need to compile it
with too long time:
real 0m22.924s
user 0m22.242s
sys 0m0.640s
But we can see the memory-hog in report:
machine dep reorg : 2.05 ( 9%) 0.33 ( 56%) 2.40 ( 10%)
939M ( 80%)
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (2 preceding siblings ...)
2024-01-19 1:48 ` juzhe.zhong at rivai dot ai
@ 2024-01-19 1:52 ` juzhe.zhong at rivai dot ai
2024-01-19 1:55 ` [Bug rtl-optimization/113495] " pinskia at gcc dot gnu.org
` (29 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19 1:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #4 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Also, the original file with -fno-move-loop-invariants reduce compile time from
60 minutes into 7 minutes:
real 7m12.528s
user 6m55.214s
sys 0m17.147s
machine dep reorg : 75.93 ( 18%) 14.23 ( 88%) 90.15 ( 21%)
33383M ( 95%
The memory report is quite obvious (consume 95% memory).
So, I believe VSETVL PASS is not the main reason of compile-time-hog,
it should be loop invariant PASS.
But VSETVL PASS is main reason of memory-hog.
I am not familiar with loop invariant pass. Can anyone help to debug
compile-time
hog of loop invariant PASS. Or should we disable loop invariant pass by default
for RISC-V ?
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (3 preceding siblings ...)
2024-01-19 1:52 ` juzhe.zhong at rivai dot ai
@ 2024-01-19 1:55 ` pinskia at gcc dot gnu.org
2024-01-19 1:56 ` juzhe.zhong at rivai dot ai
` (28 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-19 1:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|tree-optimization |rtl-optimization
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note "loop invariant motion" is the RTL based loop invariant motion pass.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (4 preceding siblings ...)
2024-01-19 1:55 ` [Bug rtl-optimization/113495] " pinskia at gcc dot gnu.org
@ 2024-01-19 1:56 ` juzhe.zhong at rivai dot ai
2024-01-19 3:08 ` patrick at rivosinc dot com
` (27 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19 1:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #6 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Andrew Pinski from comment #5)
> Note "loop invariant motion" is the RTL based loop invariant motion pass.
So you mean it should be still RISC-V issue, right ?
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (5 preceding siblings ...)
2024-01-19 1:56 ` juzhe.zhong at rivai dot ai
@ 2024-01-19 3:08 ` patrick at rivosinc dot com
2024-01-19 3:12 ` pinskia at gcc dot gnu.org
` (26 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: patrick at rivosinc dot com @ 2024-01-19 3:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
Patrick O'Neill <patrick at rivosinc dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |patrick at rivosinc dot com
--- Comment #7 from Patrick O'Neill <patrick at rivosinc dot com> ---
I believe the memory hog is caused by this:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/riscv/riscv-vsetvl.cc;h=2067073185f8c0f398908b164a99b592948e6d2d;hb=565935f93a7da629da89b05812a3e8c43287598f#l2427
In the slightly reduced test program I was using to debug there were ~35k bb's
leading to num_expr being roughly 1 million. vsetvl then makes 35k bitmaps of
~1 million bits.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (6 preceding siblings ...)
2024-01-19 3:08 ` patrick at rivosinc dot com
@ 2024-01-19 3:12 ` pinskia at gcc dot gnu.org
2024-01-19 3:14 ` pinskia at gcc dot gnu.org
` (25 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-19 3:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Patrick O'Neill from comment #7)
> I believe the memory hog is caused by this:
> https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/riscv/riscv-vsetvl.cc;
> h=2067073185f8c0f398908b164a99b592948e6d2d;
> hb=565935f93a7da629da89b05812a3e8c43287598f#l2427
>
> In the slightly reduced test program I was using to debug there were ~35k
> bb's leading to num_expr being roughly 1 million. vsetvl then makes 35k
> bitmaps of ~1 million bits.
How sparse is this bitmap will be? bitmap instead of sbitmap should be used if
the bitmap is going to be sparse. sbitmap is a fixed sized based on the bitmap
size while bitmap is better for sparse bitmaps as it is implemented as linked
list.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (7 preceding siblings ...)
2024-01-19 3:12 ` pinskia at gcc dot gnu.org
@ 2024-01-19 3:14 ` pinskia at gcc dot gnu.org
2024-01-19 3:33 ` juzhe.zhong at rivai dot ai
` (24 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-19 3:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #8)
> How sparse is this bitmap will be? bitmap instead of sbitmap should be used
> if the bitmap is going to be sparse. sbitmap is a fixed sized based on the
> bitmap size while bitmap is better for sparse bitmaps as it is implemented
> as linked list.
Also it seems like DF already has def_in/def_out info, how much is duplicated
information from there?
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (8 preceding siblings ...)
2024-01-19 3:14 ` pinskia at gcc dot gnu.org
@ 2024-01-19 3:33 ` juzhe.zhong at rivai dot ai
2024-01-19 3:34 ` juzhe.zhong at rivai dot ai
` (23 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19 3:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #10 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
No, it's not caused here. I removed the whole function compute_avl_def_data.
The memory usage doesn't change.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (9 preceding siblings ...)
2024-01-19 3:33 ` juzhe.zhong at rivai dot ai
@ 2024-01-19 3:34 ` juzhe.zhong at rivai dot ai
2024-01-19 3:44 ` juzhe.zhong at rivai dot ai
` (22 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19 3:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #11 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
It should be compute_lcm_local_properties. The memory usage reduce 50% after I
remove this function. I am still investigating.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (10 preceding siblings ...)
2024-01-19 3:34 ` juzhe.zhong at rivai dot ai
@ 2024-01-19 3:44 ` juzhe.zhong at rivai dot ai
2024-01-19 3:46 ` juzhe.zhong at rivai dot ai
` (21 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19 3:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #12 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Ok. Here is a simple fix which give some hints:
diff --git a/gcc/config/riscv/riscv-vsetvl.cc
b/gcc/config/riscv/riscv-vsetvl.cc
index 2067073185f..ede818140dc 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2719,10 +2719,11 @@ pre_vsetvl::compute_lcm_local_properties ()
for (int i = 0; i < num_exprs; i += 1)
{
const vsetvl_info &info = *m_exprs[i];
- if (!info.has_nonvlmax_reg_avl () && !info.has_vl ())
+ bool has_nonvlmax_reg_avl_p = info.has_nonvlmax_reg_avl ();
+ if (!has_nonvlmax_reg_avl_p && !info.has_vl ())
continue;
- if (info.has_nonvlmax_reg_avl ())
+ if (has_nonvlmax_reg_avl_p)
{
unsigned int regno;
sbitmap_iterator sbi;
@@ -3556,7 +3557,7 @@ const pass_data pass_data_vsetvl = {
RTL_PASS, /* type */
"vsetvl", /* name */
OPTGROUP_NONE, /* optinfo_flags */
- TV_NONE, /* tv_id */
+ TV_MACH_DEP, /* tv_id */
0, /* properties_required */
0, /* properties_provided */
0, /* properties_destroyed */
Memory usage from 931M -> 781M. Memory usage reduce significantly.
Note that I didn't change all has_nonvlmax_reg_avl, We have so many places
calling has_nonvlmax_reg_avl...
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (11 preceding siblings ...)
2024-01-19 3:44 ` juzhe.zhong at rivai dot ai
@ 2024-01-19 3:46 ` juzhe.zhong at rivai dot ai
2024-01-19 3:52 ` juzhe.zhong at rivai dot ai
` (20 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19 3:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #13 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
So I think we should investigate why calling has_nonvlmax_reg_avl cost so much
memory.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (12 preceding siblings ...)
2024-01-19 3:46 ` juzhe.zhong at rivai dot ai
@ 2024-01-19 3:52 ` juzhe.zhong at rivai dot ai
2024-01-19 3:56 ` pinskia at gcc dot gnu.org
` (19 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19 3:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #14 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Oh. I known the reason now.
The issue is not RISC-V backend VSETVL PASS.
It's memory bug of rtx_equal_p I think.
We are calling rtx_equal_p which is very costly.
For example, has_nonvlmax_reg_avl is calling rtx_equal_p.
So I keep all codes unchange, then replace comparison as follows:
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 93a1238a5ab..1c85c8ee3c6 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4988,7 +4988,7 @@ nonvlmax_avl_type_p (rtx_insn *rinsn)
bool
vlmax_avl_p (rtx x)
{
- return x && rtx_equal_p (x, RVV_VLMAX);
+ return x && REG_P (x) && REGNO (x) == X0_REGNUM/*rtx_equal_p (x,
RVV_VLMAX)*/;
}
Use REGNO (x) == X0_REGNUM instead of rtx_equal_p.
Memory-hog issue is gone:
939M -> 725k.
So I am gonna send a patch to walk around rtx_equal_p issues which cause
memory-hog.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (13 preceding siblings ...)
2024-01-19 3:52 ` juzhe.zhong at rivai dot ai
@ 2024-01-19 3:56 ` pinskia at gcc dot gnu.org
2024-01-19 3:58 ` juzhe.zhong at rivai dot ai
` (18 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-19 3:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #15 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #14)
> Oh. I known the reason now.
>
> The issue is not RISC-V backend VSETVL PASS.
>
> It's memory bug of rtx_equal_p I think.
It is not rtx_equal_p but rather RVV_VLMAX which is defined as:
riscv-protos.h:#define RVV_VLMAX gen_rtx_REG (Pmode, X0_REGNUM)
Seems like you could cache that somewhere ...
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (14 preceding siblings ...)
2024-01-19 3:56 ` pinskia at gcc dot gnu.org
@ 2024-01-19 3:58 ` juzhe.zhong at rivai dot ai
2024-01-19 4:00 ` juzhe.zhong at rivai dot ai
` (17 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19 3:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #16 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Andrew Pinski from comment #15)
> (In reply to JuzheZhong from comment #14)
> > Oh. I known the reason now.
> >
> > The issue is not RISC-V backend VSETVL PASS.
> >
> > It's memory bug of rtx_equal_p I think.
>
>
> It is not rtx_equal_p but rather RVV_VLMAX which is defined as:
> riscv-protos.h:#define RVV_VLMAX gen_rtx_REG (Pmode, X0_REGNUM)
>
> Seems like you could cache that somewhere ...
Oh. Make sense to me. Thank you so much.
I think memory-hog issue will be fixed soon.
But the compile-time hog issue of loop invariant motion is still not fixed.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (15 preceding siblings ...)
2024-01-19 3:58 ` juzhe.zhong at rivai dot ai
@ 2024-01-19 4:00 ` juzhe.zhong at rivai dot ai
2024-01-19 8:23 ` juzhe.zhong at rivai dot ai
` (16 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19 4:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #17 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Ok. Confirm the original test 33383M -> 4796k now.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (16 preceding siblings ...)
2024-01-19 4:00 ` juzhe.zhong at rivai dot ai
@ 2024-01-19 8:23 ` juzhe.zhong at rivai dot ai
2024-01-19 8:41 ` juzhe.zhong at rivai dot ai
` (15 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19 8:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #18 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Hi, Robin.
I have fixed patch for memory-hog:
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643418.html
I will commit it after the testing.
But compile-time hog still exists which is loop invariant motion PASS.
with -fno-move-loop-invariants, we become quite faster.
Could you take a look at it ?
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (17 preceding siblings ...)
2024-01-19 8:23 ` juzhe.zhong at rivai dot ai
@ 2024-01-19 8:41 ` juzhe.zhong at rivai dot ai
2024-01-19 9:23 ` rguenth at gcc dot gnu.org
` (14 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19 8:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #19 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to JuzheZhong from comment #18)
> Hi, Robin.
>
> I have fixed patch for memory-hog:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643418.html
>
> I will commit it after the testing.
>
> But compile-time hog still exists which is loop invariant motion PASS.
>
> with -fno-move-loop-invariants, we become quite faster.
>
> Could you take a look at it ?
Note that with default -march=rv64gcv_zvl256b -O3:
real 63m18.771s
user 60m19.036s
sys 2m59.787s
But with -march=rv64gcv_zvl256b -O3 -fno-move-loop-invariants:
real 6m52.984s
user 6m42.473s
sys 0m10.375s
10 times faster without loop invariant motion.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (18 preceding siblings ...)
2024-01-19 8:41 ` juzhe.zhong at rivai dot ai
@ 2024-01-19 9:23 ` rguenth at gcc dot gnu.org
2024-01-19 9:24 ` rguenth at gcc dot gnu.org
` (13 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-19 9:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
See Also| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=111241,
| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=46590
--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> ---
IIRC there's a duplicate for this. It's df_analyze_loop calling
df_reorganize_refs_* which is doing O(function-size) work for each loop.
With -O3 and vectorization the number of loops tends to blow up, making the
issue worse.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (19 preceding siblings ...)
2024-01-19 9:23 ` rguenth at gcc dot gnu.org
@ 2024-01-19 9:24 ` rguenth at gcc dot gnu.org
2024-01-19 9:28 ` juzhe.zhong at rivai dot ai
` (12 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-19 9:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
I once tried to avoid df_reorganize_refs and/or optimize this with the blocks
involved but failed.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (20 preceding siblings ...)
2024-01-19 9:24 ` rguenth at gcc dot gnu.org
@ 2024-01-19 9:28 ` juzhe.zhong at rivai dot ai
2024-01-19 9:35 ` kito at gcc dot gnu.org
` (11 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19 9:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #22 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Richard Biener from comment #21)
> I once tried to avoid df_reorganize_refs and/or optimize this with the
> blocks involved but failed.
I am considering whether we should disable LICM for RISC-V by default if vector
is enabled ?
Since the compile time explode 10 times is really horrible.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (21 preceding siblings ...)
2024-01-19 9:28 ` juzhe.zhong at rivai dot ai
@ 2024-01-19 9:35 ` kito at gcc dot gnu.org
2024-01-19 10:03 ` cvs-commit at gcc dot gnu.org
` (10 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: kito at gcc dot gnu.org @ 2024-01-19 9:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #23 from Kito Cheng <kito at gcc dot gnu.org> ---
> I am considering whether we should disable LICM for RISC-V by default if vector is enabled ?
That's will cause regression for other program, also may hurt those program not
vectorized but benefited from LICM.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (22 preceding siblings ...)
2024-01-19 9:35 ` kito at gcc dot gnu.org
@ 2024-01-19 10:03 ` cvs-commit at gcc dot gnu.org
2024-01-19 10:05 ` juzhe.zhong at rivai dot ai
` (9 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-19 10:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #24 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Robin Dapp <rdapp@gcc.gnu.org>:
https://gcc.gnu.org/g:01260a823073675e13dd1fc85cf2657a5396adf2
commit r14-8282-g01260a823073675e13dd1fc85cf2657a5396adf2
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date: Fri Jan 19 16:34:25 2024 +0800
RISC-V: Fix RVV_VLMAX
This patch fixes memory hog found in SPEC2017 wrf benchmark which caused by
RVV_VLMAX since RVV_VLMAX generate brand new rtx by gen_rtx_REG (Pmode,
X0_REGNUM)
every time we call RVV_VLMAX, that is, we are always generating garbage and
redundant
(reg:DI 0 zero) rtx.
After this patch fix, the memory hog is gone.
Time variable usr sys
wall GGC
machine dep reorg : 1.99 ( 9%) 0.35 ( 56%) 2.33 (
10%) 939M ( 80%) [Before this patch]
machine dep reorg : 1.71 ( 6%) 0.16 ( 27%) 3.77 (
6%) 659k ( 0%) [After this patch]
Time variable usr sys
wall GGC
machine dep reorg : 75.93 ( 18%) 14.23 ( 88%) 90.15 (
21%) 33383M ( 95%) [Before this patch]
machine dep reorg : 56.00 ( 14%) 7.92 ( 77%) 63.93 (
15%) 4361k ( 0%) [After this patch]
Test is running. Ok for trunk if I passed the test with no regresion ?
PR target/113495
gcc/ChangeLog:
* config/riscv/riscv-protos.h (RVV_VLMAX): Change to
regno_reg_rtx[X0_REGNUM].
(RVV_VUNDEF): Ditto.
* config/riscv/riscv-vsetvl.cc: Add timevar.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (23 preceding siblings ...)
2024-01-19 10:03 ` cvs-commit at gcc dot gnu.org
@ 2024-01-19 10:05 ` juzhe.zhong at rivai dot ai
2024-01-19 10:22 ` rguenther at suse dot de
` (8 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19 10:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #25 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
RISC-V backend memory-hog issue is fixed.
But compile time hog in LICM still there, so keep this PR open.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (24 preceding siblings ...)
2024-01-19 10:05 ` juzhe.zhong at rivai dot ai
@ 2024-01-19 10:22 ` rguenther at suse dot de
2024-01-22 11:42 ` rdapp at gcc dot gnu.org
` (7 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: rguenther at suse dot de @ 2024-01-19 10:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #26 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 19 Jan 2024, juzhe.zhong at rivai dot ai wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
>
> --- Comment #22 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> (In reply to Richard Biener from comment #21)
> > I once tried to avoid df_reorganize_refs and/or optimize this with the
> > blocks involved but failed.
>
> I am considering whether we should disable LICM for RISC-V by default if vector
> is enabled ?
> Since the compile time explode 10 times is really horrible.
I think that's a bad idea. It only explodes for some degenerate cases.
The best would be to fix invariant motion to keep DF up-to-date so
it can stop using df_analyze_loop and instead analyze the whole function.
Or maybe change it to use the rtl-ssa framework instead.
There's already param_loop_invariant_max_bbs_in_loop:
/* Process the loops, innermost first. */
for (auto loop : loops_list (cfun, LI_FROM_INNERMOST))
{
curr_loop = loop;
/* move_single_loop_invariants for very large loops is time
consuming
and might need a lot of memory. For -O1 only do loop invariant
motion for very small loops. */
unsigned max_bbs = param_loop_invariant_max_bbs_in_loop;
if (optimize < 2)
max_bbs /= 10;
if (loop->num_nodes <= max_bbs)
move_single_loop_invariants (loop);
}
it might be possible to restrict invariant motion to innermost loops
when the overall number of loops is too large (with a new param
for that). And when the number of innermost loops also exceeds
the limit avoid even that? The above also misses a
optimize_loop_for_speed_p (loop) check (probably doesn't make
a difference, but you could try).
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (25 preceding siblings ...)
2024-01-19 10:22 ` rguenther at suse dot de
@ 2024-01-22 11:42 ` rdapp at gcc dot gnu.org
2024-01-22 11:51 ` juzhe.zhong at rivai dot ai
` (6 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: rdapp at gcc dot gnu.org @ 2024-01-22 11:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #27 from Robin Dapp <rdapp at gcc dot gnu.org> ---
Following up on this:
I'm seeing the same thing Patrick does. We create a lot of large non-sparse
sbitmaps that amount to around 33G in total.
I did local experiments replacing all sbitmaps that are not needed for LCM by
regular bitmaps. Apart from output differences vs the original version the
testsuite is unchanged.
As expected, wrf now takes longer to compiler, 8 mins vs 4ish mins before and
we still use 2.7G of RAM for this single file (Likely because of the remaining
sbitmaps) compared to a max of 1.2ish G that the rest of the commpilation uses.
One possibility to get the best of both worlds would be to threshold based on
num_bbs * num_exprs. Once we exceed it switch to the bitmap pass, otherwise
keep sbitmaps for performance.
Messaging with Juzhe offline, his best guess for the LICM time is that he
enabled checking for dataflow which slows down this particular compilation by a
lot. Therefore it doesn't look like a generic problem.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (26 preceding siblings ...)
2024-01-22 11:42 ` rdapp at gcc dot gnu.org
@ 2024-01-22 11:51 ` juzhe.zhong at rivai dot ai
2024-01-22 12:00 ` rguenth at gcc dot gnu.org
` (5 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-22 11:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #28 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Robin Dapp from comment #27)
> Following up on this:
>
> I'm seeing the same thing Patrick does. We create a lot of large non-sparse
> sbitmaps that amount to around 33G in total.
>
> I did local experiments replacing all sbitmaps that are not needed for LCM
> by regular bitmaps. Apart from output differences vs the original version
> the testsuite is unchanged.
>
> As expected, wrf now takes longer to compiler, 8 mins vs 4ish mins before
> and we still use 2.7G of RAM for this single file (Likely because of the
> remaining sbitmaps) compared to a max of 1.2ish G that the rest of the
> commpilation uses.
>
> One possibility to get the best of both worlds would be to threshold based
> on num_bbs * num_exprs. Once we exceed it switch to the bitmap pass,
> otherwise keep sbitmaps for performance.
>
> Messaging with Juzhe offline, his best guess for the LICM time is that he
> enabled checking for dataflow which slows down this particular compilation
> by a lot. Therefore it doesn't look like a generic problem.
Thanks. I don't think replacing sbitmap is the best solution.
Let's me first disable DF check and reproduce 33G memory consumption in my
local
machine.
I think the best way to optimize the memory consumption is to optimize the
VSETLV PASS algorithm and codes. I have an idea to optimize.
I am gonna work on it.
Thanks for reporting.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (27 preceding siblings ...)
2024-01-22 11:51 ` juzhe.zhong at rivai dot ai
@ 2024-01-22 12:00 ` rguenth at gcc dot gnu.org
2024-01-22 13:21 ` juzhe.zhong at rivai dot ai
` (4 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-22 12:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #29 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #26)
> On Fri, 19 Jan 2024, juzhe.zhong at rivai dot ai wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
> >
> > --- Comment #22 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> > (In reply to Richard Biener from comment #21)
> > > I once tried to avoid df_reorganize_refs and/or optimize this with the
> > > blocks involved but failed.
> >
> > I am considering whether we should disable LICM for RISC-V by default if vector
> > is enabled ?
> > Since the compile time explode 10 times is really horrible.
>
> I think that's a bad idea. It only explodes for some degenerate cases.
> The best would be to fix invariant motion to keep DF up-to-date so
> it can stop using df_analyze_loop and instead analyze the whole function.
> Or maybe change it to use the rtl-ssa framework instead.
>
> There's already param_loop_invariant_max_bbs_in_loop:
>
> /* Process the loops, innermost first. */
> for (auto loop : loops_list (cfun, LI_FROM_INNERMOST))
> {
> curr_loop = loop;
> /* move_single_loop_invariants for very large loops is time
> consuming
> and might need a lot of memory. For -O1 only do loop invariant
> motion for very small loops. */
> unsigned max_bbs = param_loop_invariant_max_bbs_in_loop;
> if (optimize < 2)
> max_bbs /= 10;
> if (loop->num_nodes <= max_bbs)
> move_single_loop_invariants (loop);
> }
>
> it might be possible to restrict invariant motion to innermost loops
> when the overall number of loops is too large (with a new param
> for that). And when the number of innermost loops also exceeds
> the limit avoid even that? The above also misses a
> optimize_loop_for_speed_p (loop) check (probably doesn't make
> a difference, but you could try).
Ah, sorry - I was mis-matching LICM to invariant motion above, still
invariant motion is the biggest offender (might be due to DF checking
if you enabled that).
As for sbitmap vs. bitmap it's a difficult call. When there's big
profile hits on individual bit operations (bitmap_bit_p, bitmap_set_bit)
it might may off to use bitmap but with tree view. There's also
sparseset but that requires even more memory.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (28 preceding siblings ...)
2024-01-22 12:00 ` rguenth at gcc dot gnu.org
@ 2024-01-22 13:21 ` juzhe.zhong at rivai dot ai
2024-01-22 15:04 ` juzhe.zhong at rivai dot ai
` (3 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-22 13:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #30 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Ok. I believe m_avl_def_in && m_avl_def_out can be removed with a better
algorthm.
Then the memory-hog should be fixed soon.
I am gonna rewrite avl_vl_unmodified_between_p and trigger full coverage
testingl
Since it's going to be a big change there.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (29 preceding siblings ...)
2024-01-22 13:21 ` juzhe.zhong at rivai dot ai
@ 2024-01-22 15:04 ` juzhe.zhong at rivai dot ai
2024-01-24 0:30 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-22 15:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #31 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
machine dep reorg : 403.69 ( 56%) 23.48 ( 93%) 427.17 ( 57%)
5290k ( 0%)
Confirm remove RTL DF checking, LICM is no longer be compile-time hog issue.
VSETVL PASS count 56% compile-time.
Even though I can' see memory-hog in GGC -ftime-report, I can see 33G memory
usage
in htop.
Confirm both compile-hog and memory-hog are VSETVL PASS issue.
I will work on optimize compile-time as well as memory-usage of VSETVL PASS.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (30 preceding siblings ...)
2024-01-22 15:04 ` juzhe.zhong at rivai dot ai
@ 2024-01-24 0:30 ` cvs-commit at gcc dot gnu.org
2024-01-31 0:29 ` cvs-commit at gcc dot gnu.org
2024-01-31 1:25 ` juzhe.zhong at rivai dot ai
33 siblings, 0 replies; 35+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-24 0:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #32 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:
https://gcc.gnu.org/g:3132d2d36b4705bb762e61b1c8ca4da7c78a8321
commit r14-8378-g3132d2d36b4705bb762e61b1c8ca4da7c78a8321
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date: Tue Jan 23 18:12:49 2024 +0800
RISC-V: Fix large memory usage of VSETVL PASS [PR113495]
SPEC 2017 wrf benchmark expose unreasonble memory usage of VSETVL PASS
that is, VSETVL PASS consume over 33 GB memory which make use impossible
to compile SPEC 2017 wrf in a laptop.
The root cause is wasting-memory variables:
unsigned num_exprs = num_bbs * num_regs;
sbitmap *avl_def_loc = sbitmap_vector_alloc (num_bbs, num_exprs);
sbitmap *m_kill = sbitmap_vector_alloc (num_bbs, num_exprs);
m_avl_def_in = sbitmap_vector_alloc (num_bbs, num_exprs);
m_avl_def_out = sbitmap_vector_alloc (num_bbs, num_exprs);
I find that compute_avl_def_data can be achieved by RTL_SSA framework.
Replace the code implementation base on RTL_SSA framework.
After this patch, the memory-hog issue is fixed.
simple vsetvl memory usage (valgrind --tool=massif --pages-as-heap=yes
--massif-out-file=massif.out)
is 1.673 GB.
lazy vsetvl memory usage (valgrind --tool=massif --pages-as-heap=yes
--massif-out-file=massif.out)
is 2.441 GB.
Tested on both RV32 and RV64, no regression.
gcc/ChangeLog:
PR target/113495
* config/riscv/riscv-vsetvl.cc (get_expr_id): Remove.
(get_regno): Ditto.
(get_bb_index): Ditto.
(pre_vsetvl::compute_avl_def_data): Ditto.
(pre_vsetvl::earliest_fuse_vsetvl_info): Fix large memory usage.
(pre_vsetvl::pre_global_vsetvl_info): Ditto.
gcc/testsuite/ChangeLog:
PR target/113495
* gcc.target/riscv/rvv/vsetvl/avl_single-107.c: Adapt test.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (31 preceding siblings ...)
2024-01-24 0:30 ` cvs-commit at gcc dot gnu.org
@ 2024-01-31 0:29 ` cvs-commit at gcc dot gnu.org
2024-01-31 1:25 ` juzhe.zhong at rivai dot ai
33 siblings, 0 replies; 35+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-31 0:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #33 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:
https://gcc.gnu.org/g:9dd10de15b183f7b662905e1383fdc3a08755f2e
commit r14-8639-g9dd10de15b183f7b662905e1383fdc3a08755f2e
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date: Mon Jan 29 19:32:02 2024 +0800
RISC-V: Fix VSETLV PASS compile-time issue
The compile time issue was discovered in SPEC 2017 wrf:
Use time and -ftime-report to analyze the profile data of SPEC 2017 wrf
compilation .
Before this patch (Lazy vsetvl):
scheduling : 121.89 ( 15%) 0.53 ( 11%) 122.72 (
15%) 13M ( 1%)
machine dep reorg : 424.61 ( 53%) 1.84 ( 37%) 427.44 (
53%) 5290k ( 0%)
real 13m27.074s
user 13m19.539s
sys 0m5.180s
Simple vsetvl:
machine dep reorg : 0.10 ( 0%) 0.00 ( 0%) 0.11 (
0%) 4138k ( 0%)
real 6m5.780s
user 6m2.396s
sys 0m2.373s
The machine dep reorg is the compile time of VSETVL PASS (424 seconds)
which counts 53% of
the compilation time, spends much more time than scheduling.
After investigation, the critical patch of VSETVL pass is
compute_lcm_local_properties which
is called every iteration of phase 2 (earliest fusion) and phase 3 (global
lcm).
This patch optimized the codes of compute_lcm_local_properties to reduce
the compilation time.
After this patch:
scheduling : 117.51 ( 27%) 0.21 ( 6%) 118.04 (
27%) 13M ( 1%)
machine dep reorg : 80.13 ( 18%) 0.91 ( 26%) 81.26 (
18%) 5290k ( 0%)
real 7m25.374s
user 7m20.116s
sys 0m3.795s
The optimization of this patch is very obvious, lazy VSETVL PASS: 424s
(53%) -> 80s (18%) which
spend less time than scheduling.
Tested on both RV32 and RV64 no regression. Ok for trunk ?
PR target/113495
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (extract_single_source): Remove.
(pre_vsetvl::compute_vsetvl_def_data): Fix compile time issue.
(pre_vsetvl::compute_transparent): New function.
(pre_vsetvl::compute_lcm_local_properties): Fix compile time time
issue.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
` (32 preceding siblings ...)
2024-01-31 0:29 ` cvs-commit at gcc dot gnu.org
@ 2024-01-31 1:25 ` juzhe.zhong at rivai dot ai
33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-31 1:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
JuzheZhong <juzhe.zhong at rivai dot ai> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |FIXED
--- Comment #34 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Fixed.
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2024-01-31 1:25 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
2024-01-19 1:36 ` [Bug tree-optimization/113495] " juzhe.zhong at rivai dot ai
2024-01-19 1:38 ` juzhe.zhong at rivai dot ai
2024-01-19 1:48 ` juzhe.zhong at rivai dot ai
2024-01-19 1:52 ` juzhe.zhong at rivai dot ai
2024-01-19 1:55 ` [Bug rtl-optimization/113495] " pinskia at gcc dot gnu.org
2024-01-19 1:56 ` juzhe.zhong at rivai dot ai
2024-01-19 3:08 ` patrick at rivosinc dot com
2024-01-19 3:12 ` pinskia at gcc dot gnu.org
2024-01-19 3:14 ` pinskia at gcc dot gnu.org
2024-01-19 3:33 ` juzhe.zhong at rivai dot ai
2024-01-19 3:34 ` juzhe.zhong at rivai dot ai
2024-01-19 3:44 ` juzhe.zhong at rivai dot ai
2024-01-19 3:46 ` juzhe.zhong at rivai dot ai
2024-01-19 3:52 ` juzhe.zhong at rivai dot ai
2024-01-19 3:56 ` pinskia at gcc dot gnu.org
2024-01-19 3:58 ` juzhe.zhong at rivai dot ai
2024-01-19 4:00 ` juzhe.zhong at rivai dot ai
2024-01-19 8:23 ` juzhe.zhong at rivai dot ai
2024-01-19 8:41 ` juzhe.zhong at rivai dot ai
2024-01-19 9:23 ` rguenth at gcc dot gnu.org
2024-01-19 9:24 ` rguenth at gcc dot gnu.org
2024-01-19 9:28 ` juzhe.zhong at rivai dot ai
2024-01-19 9:35 ` kito at gcc dot gnu.org
2024-01-19 10:03 ` cvs-commit at gcc dot gnu.org
2024-01-19 10:05 ` juzhe.zhong at rivai dot ai
2024-01-19 10:22 ` rguenther at suse dot de
2024-01-22 11:42 ` rdapp at gcc dot gnu.org
2024-01-22 11:51 ` juzhe.zhong at rivai dot ai
2024-01-22 12:00 ` rguenth at gcc dot gnu.org
2024-01-22 13:21 ` juzhe.zhong at rivai dot ai
2024-01-22 15:04 ` juzhe.zhong at rivai dot ai
2024-01-24 0:30 ` cvs-commit at gcc dot gnu.org
2024-01-31 0:29 ` cvs-commit at gcc dot gnu.org
2024-01-31 1:25 ` juzhe.zhong at rivai dot ai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).