public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark @ 2024-01-19 1:22 juzhe.zhong at rivai dot ai 2024-01-19 1:36 ` [Bug tree-optimization/113495] " juzhe.zhong at rivai dot ai ` (33 more replies) 0 siblings, 34 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-19 1:22 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 Bug ID: 113495 Summary: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: juzhe.zhong at rivai dot ai Target Milestone: --- riscv64-unknown-linux-gnu-gfortran -march=rv64gcv_zvl256b -O3 -S -ftime-report real 63m18.771s user 60m19.036s sys 2m59.787s 60+ minutes. After investigation, the time report show 2 PASS are critical: loop invariant motion :2600.28 ( 72%) 1.68 ( 1%)2602.12 ( 69%) 2617k ( 0%) loop invariant consume most of the time 72% time. The other is the VSETVL PASS: vsetvl: earliest_fuse_vsetvl_info : 438.26 ( 12%) 79.82 ( 47%) 518.08 ( 14%)221807M ( 75%) vsetvl: pre_global_vsetvl_info : 135.98 ( 4%) 31.71 ( 19%) 167.69 ( 4%) 71950M ( 24%) The phase 2 and phase 3 of VSETVL PASS consume 16% times and 99% memory. I will look into VSETVL PASS issue but I am not able to take care of loop invariant issue. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai @ 2024-01-19 1:36 ` juzhe.zhong at rivai dot ai 2024-01-19 1:38 ` juzhe.zhong at rivai dot ai ` (32 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-19 1:36 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #1 from JuzheZhong <juzhe.zhong at rivai dot ai> --- Created attachment 57149 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57149&action=edit spec2017 wrf spec2017 wrf ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai 2024-01-19 1:36 ` [Bug tree-optimization/113495] " juzhe.zhong at rivai dot ai @ 2024-01-19 1:38 ` juzhe.zhong at rivai dot ai 2024-01-19 1:48 ` juzhe.zhong at rivai dot ai ` (31 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-19 1:38 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #2 from JuzheZhong <juzhe.zhong at rivai dot ai> --- To build the attachment file, we need these following file from SPEC2017: module_big_step_utilities_em.mod module_cumulus_driver.mod module_fddagd_driver.mod module_model_constants.mod module_shallowcu_driver.mod module_comm_dm.mod module_dm.mod module_first_rk_step_part1.mod module_pbl_driver.mod module_state_description.mod module_configure.mod module_domain.mod module_force_scm.mod module_radiation_driver.mod module_surface_driver.mod module_convtrans_prep.mod module_em.mod module_fr_fire_driver_wrf.mod module_scalar_tables.mod module_utility.mod But I failed to create attachment for them since they are too big. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai 2024-01-19 1:36 ` [Bug tree-optimization/113495] " juzhe.zhong at rivai dot ai 2024-01-19 1:38 ` juzhe.zhong at rivai dot ai @ 2024-01-19 1:48 ` juzhe.zhong at rivai dot ai 2024-01-19 1:52 ` juzhe.zhong at rivai dot ai ` (30 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-19 1:48 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> --- Ok. The reduced case: # 1 "module_first_rk_step_part1.fppized.f90" # 1 "<built-in>" # 1 "<command-line>" # 1 "module_first_rk_step_part1.fppized.f90" !WRF:MEDIATION_LAYER:SOLVER MODULE module_first_rk_step_part1 CONTAINS SUBROUTINE first_rk_step_part1 ( grid , config_flags & , moist , moist_tend & , chem , chem_tend & , tracer, tracer_tend & , scalar , scalar_tend & , fdda3d, fdda2d & , aerod & , ru_tendf, rv_tendf & , rw_tendf, t_tendf & , ph_tendf, mu_tendf & , tke_tend & , adapt_step_flag , curr_secs & , psim , psih , wspd , gz1oz0 , br , chklowq & , cu_act_flag , hol , th_phy & , pi_phy , p_phy , t_phy & , dz8w , p8w , t8w & , ids, ide, jds, jde, kds, kde & , ims, ime, jms, jme, kms, kme & , ips, ipe, jps, jpe, kps, kpe & , imsx,imex,jmsx,jmex,kmsx,kmex & , ipsx,ipex,jpsx,jpex,kpsx,kpex & , imsy,imey,jmsy,jmey,kmsy,kmey & , ipsy,ipey,jpsy,jpey,kpsy,kpey & , k_start , k_end & , f_flux & ) USE module_state_description USE module_model_constants USE module_domain, ONLY : domain, domain_clock_get, get_ijk_from_subgrid USE module_configure, ONLY : grid_config_rec_type, model_config_rec USE module_radiation_driver, ONLY : pre_radiation_driver, radiation_driver USE module_surface_driver, ONLY : surface_driver USE module_cumulus_driver, ONLY : cumulus_driver USE module_shallowcu_driver, ONLY : shallowcu_driver USE module_pbl_driver, ONLY : pbl_driver USE module_fr_fire_driver_wrf, ONLY : fire_driver_em_step USE module_fddagd_driver, ONLY : fddagd_driver USE module_em, ONLY : init_zero_tendency USE module_force_scm USE module_convtrans_prep USE module_big_step_utilities_em, ONLY : phy_prep use module_scalar_tables USE module_dm, ONLY : local_communicator, mytask, ntasks, ntasks_x, ntasks_y, local_communicator_periodic, wrf_dm_maxval USE module_comm_dm, ONLY : halo_em_phys_a_sub,halo_em_fdda_sfc_sub,halo_pwp_sub,halo_em_chem_e_3_sub, & halo_em_chem_e_5_sub, halo_em_hydro_noahmp_sub USE module_utility IMPLICIT NONE TYPE ( domain ), INTENT(INOUT) :: grid TYPE ( grid_config_rec_type ), INTENT(IN) :: config_flags TYPE(WRFU_Time) :: currentTime INTEGER, INTENT(IN) :: ids, ide, jds, jde, kds, kde, & ims, ime, jms, jme, kms, kme, & ips, ipe, jps, jpe, kps, kpe, & imsx,imex,jmsx,jmex,kmsx,kmex, & ipsx,ipex,jpsx,jpex,kpsx,kpex, & imsy,imey,jmsy,jmey,kmsy,kmey, & ipsy,ipey,jpsy,jpey,kpsy,kpey LOGICAL ,INTENT(IN) :: adapt_step_flag REAL, INTENT(IN) :: curr_secs REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_moist),INTENT(INOUT) :: moist REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_moist),INTENT(INOUT) :: moist_tend REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_chem),INTENT(INOUT) :: chem REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_chem),INTENT(INOUT) :: chem_tend REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_tracer),INTENT(INOUT) :: tracer REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_tracer),INTENT(INOUT) :: tracer_tend REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_scalar),INTENT(INOUT) :: scalar REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_scalar),INTENT(INOUT) :: scalar_tend REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_fdda3d),INTENT(INOUT) :: fdda3d REAL ,DIMENSION(ims:ime,1:1,jms:jme,num_fdda2d),INTENT(INOUT) :: fdda2d REAL ,DIMENSION(ims:ime,kms:kme,jms:jme,num_aerod),INTENT(INOUT) :: aerod REAL ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: psim REAL ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: psih REAL ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: wspd REAL ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: gz1oz0 REAL ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: br REAL ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: chklowq LOGICAL ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: cu_act_flag REAL ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: hol REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: th_phy REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: pi_phy REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: p_phy REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: t_phy REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: dz8w REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: p8w REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: t8w REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: ru_tendf REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: rv_tendf REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: rw_tendf REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: ph_tendf REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: t_tendf REAL ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: tke_tend REAL ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: mu_tendf INTEGER, INTENT(IN) :: k_start, k_end LOGICAL, INTENT(IN), OPTIONAL :: f_flux ! Local real :: HYDRO_dt REAL, DIMENSION( ims:ime, jms:jme ) :: exch_temf ! 1/7/09 WA REAL, DIMENSION( ims:ime, jms:jme ) :: ht_loc, mixht INTEGER :: ij INTEGER num_roof_layers INTEGER num_wall_layers INTEGER num_road_layers INTEGER iswater LOGICAL :: l_flux INTEGER :: isurban INTEGER rk_step INTEGER :: yr, month, day, hr, minute, sec, rc CHARACTER(LEN=80) :: mesg INTEGER :: sids , side , sjds , sjde , skds , skde , & sims , sime , sjms , sjme , skms , skme , & sips , sipe , sjps , sjpe , skps , skpe CHARACTER (LEN=256) :: mminlu CHARACTER (LEN=1000) :: message CALL get_ijk_from_subgrid ( grid , & sids, side, sjds, sjde, skds, skde, & sims, sime, sjms, sjme, skms, skme, & sips, sipe, sjps, sjpe, skps, skpe ) ! initialize all tendencies to zero in order to update physics ! tendencies first (separate from dry dynamics). l_flux=.FALSE. if (present(f_flux)) l_flux=f_flux rk_step = 1 DO ij = 1 , grid%num_tiles CALL wrf_debug ( 200 , ' call init_zero_tendency' ) CALL init_zero_tendency ( ru_tendf, rv_tendf, rw_tendf, & ph_tendf, t_tendf, tke_tend, & mu_tendf, & moist_tend,chem_tend,scalar_tend, & tracer_tend,num_tracer, & num_moist,num_chem,num_scalar, & rk_step, & ids, ide, jds, jde, kds, kde, & ims, ime, jms, jme, kms, kme, & grid%i_start(ij), grid%i_end(ij), & grid%j_start(ij), grid%j_end(ij), & k_start, k_end ) END DO !STARTOFREGISTRYGENERATEDINCLUDE 'inc/HALO_EM_PHYS_A.inc' ! ! WARNING This file is generated automatically by use_registry ! using the data base in the file named Registry. ! Do not edit. Your changes to this file will be lost. ! CALL HALO_EM_PHYS_A_sub ( grid, & local_communicator, & mytask, ntasks, ntasks_x, ntasks_y, & ids, ide, jds, jde, kds, kde, & ims, ime, jms, jme, kms, kme, & ips, ipe, jps, jpe, kps, kpe ) !ENDOFREGISTRYGENERATEDINCLUDE DO ij = 1 , grid%num_tiles CALL wrf_debug ( 200 , ' call phy_prep' ) CALL phy_prep ( config_flags, & grid%mut, grid%muu, grid%muv, grid%u_2, & grid%v_2, grid%p, grid%pb, grid%alt, & grid%ph_2, grid%phb, grid%t_2, grid%tsk, moist, num_moist, & grid%rho,th_phy, p_phy, pi_phy, grid%u_phy, grid%v_phy, & p8w, t_phy, t8w, grid%z, grid%z_at_w, dz8w, & grid%p_hyd, grid%p_hyd_w, grid%dnw, & grid%fnm, grid%fnp, grid%znw, grid%p_top, & grid%rthraten, & grid%rthblten, grid%rublten, grid%rvblten, & grid%rqvblten, grid%rqcblten, grid%rqiblten, & grid%rucuten, grid%rvcuten, grid%rthcuten, & grid%rqvcuten, grid%rqccuten, grid%rqrcuten, & grid%rqicuten, grid%rqscuten, & grid%rushten, grid%rvshten, grid%rthshten, & grid%rqvshten, grid%rqcshten, grid%rqrshten, & grid%rqishten, grid%rqsshten, grid%rqgshten, & grid%rthften, grid%rqvften, & grid%RUNDGDTEN, grid%RVNDGDTEN, grid%RTHNDGDTEN, & grid%RPHNDGDTEN,grid%RQVNDGDTEN, grid%RMUNDGDTEN,& !jdf grid%landmask,grid%xland, & !jdf ids, ide, jds, jde, kds, kde, & ims, ime, jms, jme, kms, kme, & grid%i_start(ij), grid%i_end(ij), & grid%j_start(ij), grid%j_end(ij), & k_start, k_end ) ENDDO ! radiation CALL domain_clock_get( grid, current_time=currentTime, & current_timestr=mesg ) CALL WRFU_TimeGet( currentTime, YY=yr, dayOfYear=day, H=hr, M=minute, S=sec, rc=rc) IF( rc/= WRFU_SUCCESS)THEN CALL wrf_error_fatal('WRFU_TimeGet failed') ENDIF ! this driver is only needed to handle non-local shadowing effects CALL pre_radiation_driver ( grid, config_flags & & ,itimestep=grid%itimestep, ra_call_offset=grid%ra_call_offset & & ,XLAT=grid%xlat, XLONG=grid%xlong, GMT=grid%gmt & & ,julian=grid%julian, xtime=grid%xtime, RADT=grid%radt & & ,STEPRA=grid%stepra & & ,ht=grid%ht,dx=grid%dx,dy=grid%dy,sina=grid%sina,cosa=grid%cosa & & ,shadowmask=grid%shadowmask,slope_rad=config_flags%slope_rad & & ,topo_shading=config_flags%topo_shading & & ,shadlen=config_flags%shadlen,ht_shad=grid%ht_shad,ht_loc=ht_loc & & ,ht_shad_bxs=grid%ht_shad_bxs, ht_shad_bxe=grid%ht_shad_bxe & & ,ht_shad_bys=grid%ht_shad_bys, ht_shad_bye=grid%ht_shad_bye & & ,nested=config_flags%nested, min_ptchsz=grid%min_ptchsz & & ,spec_bdy_width=config_flags%spec_bdy_width & ! indexes & ,IDS=ids,IDE=ide, JDS=jds,JDE=jde, KDS=kds,KDE=kde & & ,IMS=ims,IME=ime, JMS=jms,JME=jme, KMS=kms,KME=kme & & ,IPS=ips,IPE=ipe, JPS=jps,JPE=jpe, KPS=kps,KPE=kpe & & ,i_start=grid%i_start,i_end=min(grid%i_end, ide-1) & & ,j_start=grid%j_start,j_end=min(grid%j_end, jde-1) & & ,kts=k_start, kte=min(k_end,kde-1) & & ,num_tiles=grid%num_tiles ) CALL wrf_debug ( 200 , ' call radiation_driver' ) CALL radiation_driver( & & p_top=grid%p_top & !DJW 140312 added p_top for vertical nesting & ,ACFRCV=grid%acfrcv ,ACFRST=grid%acfrst ,ALBEDO=grid%albedo & & ,CFRACH=grid%cfrach ,CFRACL=grid%cfracl ,CFRACM=grid%cfracm & & ,CUPPT=grid%cuppt ,CZMEAN=grid%czmean ,DT=grid%dt & & ,DZ8W=dz8w ,EMISS=grid%emiss ,GLW=grid%glw & & ,GMT=grid%gmt ,GSW=grid%gsw ,HBOT=grid%hbot & & ,HTOP=grid%htop ,HBOTR=grid%hbotr ,HTOPR=grid%htopr & & ,ICLOUD=config_flags%icloud & & ,ITIMESTEP=grid%itimestep,JULDAY=grid%julday , JULIAN=grid%julian & & ,JULYR=grid%julyr ,LW_PHYSICS=config_flags%ra_lw_physics & & ,NCFRCV=grid%ncfrcv ,NCFRST=grid%ncfrst ,NPHS=1 & & ,o3input=config_flags%o3input ,O3rad=grid%o3rad & & ,aer_opt=config_flags%aer_opt ,aerod=aerod(:,:,:,P_ocarbon:P_upperaer) & & ,swint_opt=config_flags%swint_opt & & ,P8W=grid%p_hyd_w ,P=grid%p_hyd ,PI=pi_phy & & ,RADT=grid%radt ,RA_CALL_OFFSET=grid%ra_call_offset & & ,RHO=grid%rho ,RLWTOA=grid%rlwtoa & & ,RSWTOA=grid%rswtoa ,RTHRATEN=grid%rthraten & & ,RTHRATENLW=grid%rthratenlw ,RTHRATENSW=grid%rthratensw & & ,SNOW=grid%snow ,STEPRA=grid%stepra ,SWDOWN=grid%swdown & & ,SWDOWNC=grid%swdownc ,SW_PHYSICS=config_flags%ra_sw_physics & & ,T8W=t8w ,T=grid%t_phy ,TAUCLDC=grid%taucldc & & ,TAUCLDI=grid%taucldi ,TSK=grid%tsk ,VEGFRA=grid%vegfra & & ,WARM_RAIN=grid%warm_rain ,XICE=grid%xice ,XLAND=grid%xland & & ,XLAT=grid%xlat ,XLONG=grid%xlong ,YR=yr & ! SSiB LSM radiation components (fds 06/2010) & ,ALSWVISDIR=grid%alswvisdir ,ALSWVISDIF=grid%alswvisdif & !ssib & ,ALSWNIRDIR=grid%alswnirdir ,ALSWNIRDIF=grid%alswnirdif & !ssib & ,SWVISDIR=grid%swvisdir ,SWVISDIF=grid%swvisdif & !ssib & ,SWNIRDIR=grid%swnirdir ,SWNIRDIF=grid%swnirdif & !ssib & ,SF_SURFACE_PHYSICS=config_flags%sf_surface_physics & !ssib ! WRF-solar and aerosol variables from jararias 2013/8 and 2013/11 & ,SWDDIR=grid%swddir,SWDDNI=grid%swddni,SWDDIF=grid%swddif & & ,Gx=grid%Gx,Bx=grid%Bx,gg=grid%gg,bb=grid%bb & & ,swdown_ref=grid%swdown_ref,swddir_ref=grid%swddir_ref & & ,coszen_ref=grid%coszen_ref & & ,aer_type=config_flags%aer_type & & ,aer_aod550_opt=config_flags%aer_aod550_opt,aer_aod550_val=config_flags%aer_aod550_val & & ,aer_angexp_opt=config_flags%aer_angexp_opt,aer_angexp_val=config_flags%aer_angexp_val & & ,aer_ssa_opt=config_flags%aer_ssa_opt,aer_ssa_val=config_flags%aer_ssa_val & & ,aer_asy_opt=config_flags%aer_asy_opt,aer_asy_val=config_flags%aer_asy_val & & ,aod5502d=grid%aod5502d,angexp2d=grid%angexp2d,aerssa2d=grid%aerssa2d & & ,aerasy2d=grid%aerasy2d,aod5503d=grid%aod5503d & !Optional solar variables & ,DECLINX=grid%declin ,SOLCONX=grid%solcon ,COSZEN=grid%coszen ,HRANG=grid%hrang & & , CEN_LAT=grid%cen_lat & & ,Z=grid%z & & ,ALEVSIZ=grid%alevsiz, no_src_types=grid%no_src_types & & ,LEVSIZ=grid%levsiz, N_OZMIXM=num_ozmixm & & ,N_AEROSOLC=num_aerosolc & & ,PAERLEV=grid%paerlev ,ID=grid%id & & ,CAM_ABS_DIM1=grid%cam_abs_dim1, CAM_ABS_DIM2=grid%cam_abs_dim2 & & ,CAM_ABS_FREQ_S=grid%cam_abs_freq_s & & ,XTIME=grid%xtime & ,CURR_SECS=curr_secs, ADAPT_STEP_FLAG=adapt_step_flag & ! indexes & ,IDS=ids,IDE=ide, JDS=jds,JDE=jde, KDS=kds,KDE=kde & & ,IMS=ims,IME=ime, JMS=jms,JME=jme, KMS=kms,KME=kme & & ,i_start=grid%i_start,i_end=min(grid%i_end, ide-1) & & ,j_start=grid%j_start,j_end=min(grid%j_end, jde-1) & & ,kts=k_start, kte=min(k_end,kde-1) & & ,num_tiles=grid%num_tiles & ! Optional !JJS 20101020 vvvvv & , TLWDN=grid%tlwdn, TLWUP=grid%tlwup & ! goddard schemes & , SLWDN=grid%slwdn, SLWUP=grid%slwup & ! goddard schemes & , TSWDN=grid%tswdn, TSWUP=grid%tswup & ! goddard schemes & , SSWDN=grid%sswdn, SSWUP=grid%sswup & ! goddard schemes !JJS 20101020 ^^^^^ & , CLDFRA=grid%cldfra, CLDFRA_MP_ALL=grid%cldfra_mp_all & & , LRADIUS=grid%LRADIUS,IRADIUS=grid%IRADIUS & !BSINGH(01/22/2014) & , CLDFRA_DP=grid%cldfra_dp & ! ckay for subgrid cloud & , CLDFRA_SH=grid%cldfra_sh & & , re_cloud=grid%re_cloud, re_ice=grid%re_ice, re_snow=grid%re_snow & ! G. Thompson & , has_reqc=grid%has_reqc, has_reqi=grid%has_reqi, has_reqs=grid%has_reqs & ! G. Thompson & , PB=grid%pb & & , F_ICE_PHY=grid%f_ice_phy,F_RAIN_PHY=grid%f_rain_phy & & , QV=moist(ims,kms,jms,P_QV), F_QV=F_QV & & , QC=moist(ims,kms,jms,P_QC), F_QC=F_QC & & , QR=moist(ims,kms,jms,P_QR), F_QR=F_QR & & , QI=moist(ims,kms,jms,P_QI), F_QI=F_QI & & , QS=moist(ims,kms,jms,P_QS), F_QS=F_QS & & , QG=moist(ims,kms,jms,P_QG), F_QG=F_QG & & , QNDROP=scalar(ims,kms,jms,P_QNDROP), F_QNDROP=F_QNDROP & & ,ACSWUPT=grid%acswupt ,ACSWUPTC=grid%acswuptc & & ,ACSWDNT=grid%acswdnt ,ACSWDNTC=grid%acswdntc & & ,ACSWUPB=grid%acswupb ,ACSWUPBC=grid%acswupbc & & ,ACSWDNB=grid%acswdnb ,ACSWDNBC=grid%acswdnbc & & ,ACLWUPT=grid%aclwupt ,ACLWUPTC=grid%aclwuptc & & ,ACLWDNT=grid%aclwdnt ,ACLWDNTC=grid%aclwdntc & & ,ACLWUPB=grid%aclwupb ,ACLWUPBC=grid%aclwupbc & & ,ACLWDNB=grid%aclwdnb ,ACLWDNBC=grid%aclwdnbc & & ,SWUPT=grid%swupt ,SWUPTC=grid%swuptc & & ,SWDNT=grid%swdnt ,SWDNTC=grid%swdntc & & ,SWUPB=grid%swupb ,SWUPBC=grid%swupbc & & ,SWDNB=grid%swdnb ,SWDNBC=grid%swdnbc & & ,LWUPT=grid%lwupt ,LWUPTC=grid%lwuptc & & ,LWDNT=grid%lwdnt ,LWDNTC=grid%lwdntc & & ,LWUPB=grid%lwupb ,LWUPBC=grid%lwupbc & & ,LWDNB=grid%lwdnb ,LWDNBC=grid%lwdnbc & & ,LWCF=grid%lwcf & & ,SWCF=grid%swcf & & ,OLR=grid%olr & & ,AERODM=grid%aerodm, PINA=grid%pina, AODTOT=grid%aodtot & & ,OZMIXM=grid%ozmixm, PIN=grid%pin & & ,M_PS_1=grid%m_ps_1, M_PS_2=grid%m_ps_2, AEROSOLC_1=grid%aerosolc_1 & & ,AEROSOLC_2=grid%aerosolc_2, M_HYBI0=grid%m_hybi & & ,ABSTOT=grid%abstot, ABSNXT=grid%absnxt, EMSTOT=grid%emstot & & ,RADTACTTIME=grid%radtacttime & & ,ICLOUD_CU=config_flags%ICLOUD_CU & & ,QC_CU=grid%QC_CU , QI_CU=grid%QI_CU & & ,slope_rad=config_flags%slope_rad,topo_shading=config_flags%topo_shading & & ,shadowmask=grid%shadowmask,ht=grid%ht,dx=grid%dx,dy=grid%dy & & ,IS_CAMMGMP_USED = grid%is_CAMMGMP_used ) !********* Surface driver ! surface !gmm halo of wtd and riverflow for leafhydro IF ( config_flags%sf_surface_physics.eq.NOAHMPSCHEME ) THEN IF ( config_flags%opt_run.eq.5.and.mod(grid%itimestep,grid%STEPWTD).eq.0 ) THEN !STARTOFREGISTRYGENERATEDINCLUDE 'inc/HALO_EM_HYDRO_NOAHMP.inc' ! ! WARNING This file is generated automatically by use_registry ! using the data base in the file named Registry. ! Do not edit. Your changes to this file will be lost. ! CALL HALO_EM_HYDRO_NOAHMP_sub ( grid, & local_communicator, & mytask, ntasks, ntasks_x, ntasks_y, & ids, ide, jds, jde, kds, kde, & ims, ime, jms, jme, kms, kme, & ips, ipe, jps, jpe, kps, kpe ) !ENDOFREGISTRYGENERATEDINCLUDE ENDIF ENDIF END SUBROUTINE first_rk_step_part1 END MODULE module_first_rk_step_part1 Which can easily help us to debug memory hog since we don't need to compile it with too long time: real 0m22.924s user 0m22.242s sys 0m0.640s But we can see the memory-hog in report: machine dep reorg : 2.05 ( 9%) 0.33 ( 56%) 2.40 ( 10%) 939M ( 80%) ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug tree-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (2 preceding siblings ...) 2024-01-19 1:48 ` juzhe.zhong at rivai dot ai @ 2024-01-19 1:52 ` juzhe.zhong at rivai dot ai 2024-01-19 1:55 ` [Bug rtl-optimization/113495] " pinskia at gcc dot gnu.org ` (29 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-19 1:52 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #4 from JuzheZhong <juzhe.zhong at rivai dot ai> --- Also, the original file with -fno-move-loop-invariants reduce compile time from 60 minutes into 7 minutes: real 7m12.528s user 6m55.214s sys 0m17.147s machine dep reorg : 75.93 ( 18%) 14.23 ( 88%) 90.15 ( 21%) 33383M ( 95% The memory report is quite obvious (consume 95% memory). So, I believe VSETVL PASS is not the main reason of compile-time-hog, it should be loop invariant PASS. But VSETVL PASS is main reason of memory-hog. I am not familiar with loop invariant pass. Can anyone help to debug compile-time hog of loop invariant PASS. Or should we disable loop invariant pass by default for RISC-V ? ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (3 preceding siblings ...) 2024-01-19 1:52 ` juzhe.zhong at rivai dot ai @ 2024-01-19 1:55 ` pinskia at gcc dot gnu.org 2024-01-19 1:56 ` juzhe.zhong at rivai dot ai ` (28 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: pinskia at gcc dot gnu.org @ 2024-01-19 1:55 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|tree-optimization |rtl-optimization --- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Note "loop invariant motion" is the RTL based loop invariant motion pass. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (4 preceding siblings ...) 2024-01-19 1:55 ` [Bug rtl-optimization/113495] " pinskia at gcc dot gnu.org @ 2024-01-19 1:56 ` juzhe.zhong at rivai dot ai 2024-01-19 3:08 ` patrick at rivosinc dot com ` (27 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-19 1:56 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #6 from JuzheZhong <juzhe.zhong at rivai dot ai> --- (In reply to Andrew Pinski from comment #5) > Note "loop invariant motion" is the RTL based loop invariant motion pass. So you mean it should be still RISC-V issue, right ? ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (5 preceding siblings ...) 2024-01-19 1:56 ` juzhe.zhong at rivai dot ai @ 2024-01-19 3:08 ` patrick at rivosinc dot com 2024-01-19 3:12 ` pinskia at gcc dot gnu.org ` (26 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: patrick at rivosinc dot com @ 2024-01-19 3:08 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 Patrick O'Neill <patrick at rivosinc dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |patrick at rivosinc dot com --- Comment #7 from Patrick O'Neill <patrick at rivosinc dot com> --- I believe the memory hog is caused by this: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/riscv/riscv-vsetvl.cc;h=2067073185f8c0f398908b164a99b592948e6d2d;hb=565935f93a7da629da89b05812a3e8c43287598f#l2427 In the slightly reduced test program I was using to debug there were ~35k bb's leading to num_expr being roughly 1 million. vsetvl then makes 35k bitmaps of ~1 million bits. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (6 preceding siblings ...) 2024-01-19 3:08 ` patrick at rivosinc dot com @ 2024-01-19 3:12 ` pinskia at gcc dot gnu.org 2024-01-19 3:14 ` pinskia at gcc dot gnu.org ` (25 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: pinskia at gcc dot gnu.org @ 2024-01-19 3:12 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Patrick O'Neill from comment #7) > I believe the memory hog is caused by this: > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/riscv/riscv-vsetvl.cc; > h=2067073185f8c0f398908b164a99b592948e6d2d; > hb=565935f93a7da629da89b05812a3e8c43287598f#l2427 > > In the slightly reduced test program I was using to debug there were ~35k > bb's leading to num_expr being roughly 1 million. vsetvl then makes 35k > bitmaps of ~1 million bits. How sparse is this bitmap will be? bitmap instead of sbitmap should be used if the bitmap is going to be sparse. sbitmap is a fixed sized based on the bitmap size while bitmap is better for sparse bitmaps as it is implemented as linked list. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (7 preceding siblings ...) 2024-01-19 3:12 ` pinskia at gcc dot gnu.org @ 2024-01-19 3:14 ` pinskia at gcc dot gnu.org 2024-01-19 3:33 ` juzhe.zhong at rivai dot ai ` (24 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: pinskia at gcc dot gnu.org @ 2024-01-19 3:14 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Andrew Pinski from comment #8) > How sparse is this bitmap will be? bitmap instead of sbitmap should be used > if the bitmap is going to be sparse. sbitmap is a fixed sized based on the > bitmap size while bitmap is better for sparse bitmaps as it is implemented > as linked list. Also it seems like DF already has def_in/def_out info, how much is duplicated information from there? ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (8 preceding siblings ...) 2024-01-19 3:14 ` pinskia at gcc dot gnu.org @ 2024-01-19 3:33 ` juzhe.zhong at rivai dot ai 2024-01-19 3:34 ` juzhe.zhong at rivai dot ai ` (23 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-19 3:33 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #10 from JuzheZhong <juzhe.zhong at rivai dot ai> --- No, it's not caused here. I removed the whole function compute_avl_def_data. The memory usage doesn't change. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (9 preceding siblings ...) 2024-01-19 3:33 ` juzhe.zhong at rivai dot ai @ 2024-01-19 3:34 ` juzhe.zhong at rivai dot ai 2024-01-19 3:44 ` juzhe.zhong at rivai dot ai ` (22 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-19 3:34 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #11 from JuzheZhong <juzhe.zhong at rivai dot ai> --- It should be compute_lcm_local_properties. The memory usage reduce 50% after I remove this function. I am still investigating. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (10 preceding siblings ...) 2024-01-19 3:34 ` juzhe.zhong at rivai dot ai @ 2024-01-19 3:44 ` juzhe.zhong at rivai dot ai 2024-01-19 3:46 ` juzhe.zhong at rivai dot ai ` (21 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-19 3:44 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #12 from JuzheZhong <juzhe.zhong at rivai dot ai> --- Ok. Here is a simple fix which give some hints: diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 2067073185f..ede818140dc 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -2719,10 +2719,11 @@ pre_vsetvl::compute_lcm_local_properties () for (int i = 0; i < num_exprs; i += 1) { const vsetvl_info &info = *m_exprs[i]; - if (!info.has_nonvlmax_reg_avl () && !info.has_vl ()) + bool has_nonvlmax_reg_avl_p = info.has_nonvlmax_reg_avl (); + if (!has_nonvlmax_reg_avl_p && !info.has_vl ()) continue; - if (info.has_nonvlmax_reg_avl ()) + if (has_nonvlmax_reg_avl_p) { unsigned int regno; sbitmap_iterator sbi; @@ -3556,7 +3557,7 @@ const pass_data pass_data_vsetvl = { RTL_PASS, /* type */ "vsetvl", /* name */ OPTGROUP_NONE, /* optinfo_flags */ - TV_NONE, /* tv_id */ + TV_MACH_DEP, /* tv_id */ 0, /* properties_required */ 0, /* properties_provided */ 0, /* properties_destroyed */ Memory usage from 931M -> 781M. Memory usage reduce significantly. Note that I didn't change all has_nonvlmax_reg_avl, We have so many places calling has_nonvlmax_reg_avl... ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (11 preceding siblings ...) 2024-01-19 3:44 ` juzhe.zhong at rivai dot ai @ 2024-01-19 3:46 ` juzhe.zhong at rivai dot ai 2024-01-19 3:52 ` juzhe.zhong at rivai dot ai ` (20 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-19 3:46 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #13 from JuzheZhong <juzhe.zhong at rivai dot ai> --- So I think we should investigate why calling has_nonvlmax_reg_avl cost so much memory. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (12 preceding siblings ...) 2024-01-19 3:46 ` juzhe.zhong at rivai dot ai @ 2024-01-19 3:52 ` juzhe.zhong at rivai dot ai 2024-01-19 3:56 ` pinskia at gcc dot gnu.org ` (19 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-19 3:52 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #14 from JuzheZhong <juzhe.zhong at rivai dot ai> --- Oh. I known the reason now. The issue is not RISC-V backend VSETVL PASS. It's memory bug of rtx_equal_p I think. We are calling rtx_equal_p which is very costly. For example, has_nonvlmax_reg_avl is calling rtx_equal_p. So I keep all codes unchange, then replace comparison as follows: diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 93a1238a5ab..1c85c8ee3c6 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -4988,7 +4988,7 @@ nonvlmax_avl_type_p (rtx_insn *rinsn) bool vlmax_avl_p (rtx x) { - return x && rtx_equal_p (x, RVV_VLMAX); + return x && REG_P (x) && REGNO (x) == X0_REGNUM/*rtx_equal_p (x, RVV_VLMAX)*/; } Use REGNO (x) == X0_REGNUM instead of rtx_equal_p. Memory-hog issue is gone: 939M -> 725k. So I am gonna send a patch to walk around rtx_equal_p issues which cause memory-hog. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (13 preceding siblings ...) 2024-01-19 3:52 ` juzhe.zhong at rivai dot ai @ 2024-01-19 3:56 ` pinskia at gcc dot gnu.org 2024-01-19 3:58 ` juzhe.zhong at rivai dot ai ` (18 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: pinskia at gcc dot gnu.org @ 2024-01-19 3:56 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #15 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to JuzheZhong from comment #14) > Oh. I known the reason now. > > The issue is not RISC-V backend VSETVL PASS. > > It's memory bug of rtx_equal_p I think. It is not rtx_equal_p but rather RVV_VLMAX which is defined as: riscv-protos.h:#define RVV_VLMAX gen_rtx_REG (Pmode, X0_REGNUM) Seems like you could cache that somewhere ... ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (14 preceding siblings ...) 2024-01-19 3:56 ` pinskia at gcc dot gnu.org @ 2024-01-19 3:58 ` juzhe.zhong at rivai dot ai 2024-01-19 4:00 ` juzhe.zhong at rivai dot ai ` (17 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-19 3:58 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #16 from JuzheZhong <juzhe.zhong at rivai dot ai> --- (In reply to Andrew Pinski from comment #15) > (In reply to JuzheZhong from comment #14) > > Oh. I known the reason now. > > > > The issue is not RISC-V backend VSETVL PASS. > > > > It's memory bug of rtx_equal_p I think. > > > It is not rtx_equal_p but rather RVV_VLMAX which is defined as: > riscv-protos.h:#define RVV_VLMAX gen_rtx_REG (Pmode, X0_REGNUM) > > Seems like you could cache that somewhere ... Oh. Make sense to me. Thank you so much. I think memory-hog issue will be fixed soon. But the compile-time hog issue of loop invariant motion is still not fixed. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (15 preceding siblings ...) 2024-01-19 3:58 ` juzhe.zhong at rivai dot ai @ 2024-01-19 4:00 ` juzhe.zhong at rivai dot ai 2024-01-19 8:23 ` juzhe.zhong at rivai dot ai ` (16 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-19 4:00 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #17 from JuzheZhong <juzhe.zhong at rivai dot ai> --- Ok. Confirm the original test 33383M -> 4796k now. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (16 preceding siblings ...) 2024-01-19 4:00 ` juzhe.zhong at rivai dot ai @ 2024-01-19 8:23 ` juzhe.zhong at rivai dot ai 2024-01-19 8:41 ` juzhe.zhong at rivai dot ai ` (15 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-19 8:23 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #18 from JuzheZhong <juzhe.zhong at rivai dot ai> --- Hi, Robin. I have fixed patch for memory-hog: https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643418.html I will commit it after the testing. But compile-time hog still exists which is loop invariant motion PASS. with -fno-move-loop-invariants, we become quite faster. Could you take a look at it ? ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (17 preceding siblings ...) 2024-01-19 8:23 ` juzhe.zhong at rivai dot ai @ 2024-01-19 8:41 ` juzhe.zhong at rivai dot ai 2024-01-19 9:23 ` rguenth at gcc dot gnu.org ` (14 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-19 8:41 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #19 from JuzheZhong <juzhe.zhong at rivai dot ai> --- (In reply to JuzheZhong from comment #18) > Hi, Robin. > > I have fixed patch for memory-hog: > https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643418.html > > I will commit it after the testing. > > But compile-time hog still exists which is loop invariant motion PASS. > > with -fno-move-loop-invariants, we become quite faster. > > Could you take a look at it ? Note that with default -march=rv64gcv_zvl256b -O3: real 63m18.771s user 60m19.036s sys 2m59.787s But with -march=rv64gcv_zvl256b -O3 -fno-move-loop-invariants: real 6m52.984s user 6m42.473s sys 0m10.375s 10 times faster without loop invariant motion. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (18 preceding siblings ...) 2024-01-19 8:41 ` juzhe.zhong at rivai dot ai @ 2024-01-19 9:23 ` rguenth at gcc dot gnu.org 2024-01-19 9:24 ` rguenth at gcc dot gnu.org ` (13 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: rguenth at gcc dot gnu.org @ 2024-01-19 9:23 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://gcc.gnu.org/bugzill | |a/show_bug.cgi?id=111241, | |https://gcc.gnu.org/bugzill | |a/show_bug.cgi?id=46590 --- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> --- IIRC there's a duplicate for this. It's df_analyze_loop calling df_reorganize_refs_* which is doing O(function-size) work for each loop. With -O3 and vectorization the number of loops tends to blow up, making the issue worse. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (19 preceding siblings ...) 2024-01-19 9:23 ` rguenth at gcc dot gnu.org @ 2024-01-19 9:24 ` rguenth at gcc dot gnu.org 2024-01-19 9:28 ` juzhe.zhong at rivai dot ai ` (12 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: rguenth at gcc dot gnu.org @ 2024-01-19 9:24 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> --- I once tried to avoid df_reorganize_refs and/or optimize this with the blocks involved but failed. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (20 preceding siblings ...) 2024-01-19 9:24 ` rguenth at gcc dot gnu.org @ 2024-01-19 9:28 ` juzhe.zhong at rivai dot ai 2024-01-19 9:35 ` kito at gcc dot gnu.org ` (11 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-19 9:28 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #22 from JuzheZhong <juzhe.zhong at rivai dot ai> --- (In reply to Richard Biener from comment #21) > I once tried to avoid df_reorganize_refs and/or optimize this with the > blocks involved but failed. I am considering whether we should disable LICM for RISC-V by default if vector is enabled ? Since the compile time explode 10 times is really horrible. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (21 preceding siblings ...) 2024-01-19 9:28 ` juzhe.zhong at rivai dot ai @ 2024-01-19 9:35 ` kito at gcc dot gnu.org 2024-01-19 10:03 ` cvs-commit at gcc dot gnu.org ` (10 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: kito at gcc dot gnu.org @ 2024-01-19 9:35 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #23 from Kito Cheng <kito at gcc dot gnu.org> --- > I am considering whether we should disable LICM for RISC-V by default if vector is enabled ? That's will cause regression for other program, also may hurt those program not vectorized but benefited from LICM. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (22 preceding siblings ...) 2024-01-19 9:35 ` kito at gcc dot gnu.org @ 2024-01-19 10:03 ` cvs-commit at gcc dot gnu.org 2024-01-19 10:05 ` juzhe.zhong at rivai dot ai ` (9 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: cvs-commit at gcc dot gnu.org @ 2024-01-19 10:03 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #24 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Robin Dapp <rdapp@gcc.gnu.org>: https://gcc.gnu.org/g:01260a823073675e13dd1fc85cf2657a5396adf2 commit r14-8282-g01260a823073675e13dd1fc85cf2657a5396adf2 Author: Juzhe-Zhong <juzhe.zhong@rivai.ai> Date: Fri Jan 19 16:34:25 2024 +0800 RISC-V: Fix RVV_VLMAX This patch fixes memory hog found in SPEC2017 wrf benchmark which caused by RVV_VLMAX since RVV_VLMAX generate brand new rtx by gen_rtx_REG (Pmode, X0_REGNUM) every time we call RVV_VLMAX, that is, we are always generating garbage and redundant (reg:DI 0 zero) rtx. After this patch fix, the memory hog is gone. Time variable usr sys wall GGC machine dep reorg : 1.99 ( 9%) 0.35 ( 56%) 2.33 ( 10%) 939M ( 80%) [Before this patch] machine dep reorg : 1.71 ( 6%) 0.16 ( 27%) 3.77 ( 6%) 659k ( 0%) [After this patch] Time variable usr sys wall GGC machine dep reorg : 75.93 ( 18%) 14.23 ( 88%) 90.15 ( 21%) 33383M ( 95%) [Before this patch] machine dep reorg : 56.00 ( 14%) 7.92 ( 77%) 63.93 ( 15%) 4361k ( 0%) [After this patch] Test is running. Ok for trunk if I passed the test with no regresion ? PR target/113495 gcc/ChangeLog: * config/riscv/riscv-protos.h (RVV_VLMAX): Change to regno_reg_rtx[X0_REGNUM]. (RVV_VUNDEF): Ditto. * config/riscv/riscv-vsetvl.cc: Add timevar. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (23 preceding siblings ...) 2024-01-19 10:03 ` cvs-commit at gcc dot gnu.org @ 2024-01-19 10:05 ` juzhe.zhong at rivai dot ai 2024-01-19 10:22 ` rguenther at suse dot de ` (8 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-19 10:05 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #25 from JuzheZhong <juzhe.zhong at rivai dot ai> --- RISC-V backend memory-hog issue is fixed. But compile time hog in LICM still there, so keep this PR open. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (24 preceding siblings ...) 2024-01-19 10:05 ` juzhe.zhong at rivai dot ai @ 2024-01-19 10:22 ` rguenther at suse dot de 2024-01-22 11:42 ` rdapp at gcc dot gnu.org ` (7 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: rguenther at suse dot de @ 2024-01-19 10:22 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #26 from rguenther at suse dot de <rguenther at suse dot de> --- On Fri, 19 Jan 2024, juzhe.zhong at rivai dot ai wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 > > --- Comment #22 from JuzheZhong <juzhe.zhong at rivai dot ai> --- > (In reply to Richard Biener from comment #21) > > I once tried to avoid df_reorganize_refs and/or optimize this with the > > blocks involved but failed. > > I am considering whether we should disable LICM for RISC-V by default if vector > is enabled ? > Since the compile time explode 10 times is really horrible. I think that's a bad idea. It only explodes for some degenerate cases. The best would be to fix invariant motion to keep DF up-to-date so it can stop using df_analyze_loop and instead analyze the whole function. Or maybe change it to use the rtl-ssa framework instead. There's already param_loop_invariant_max_bbs_in_loop: /* Process the loops, innermost first. */ for (auto loop : loops_list (cfun, LI_FROM_INNERMOST)) { curr_loop = loop; /* move_single_loop_invariants for very large loops is time consuming and might need a lot of memory. For -O1 only do loop invariant motion for very small loops. */ unsigned max_bbs = param_loop_invariant_max_bbs_in_loop; if (optimize < 2) max_bbs /= 10; if (loop->num_nodes <= max_bbs) move_single_loop_invariants (loop); } it might be possible to restrict invariant motion to innermost loops when the overall number of loops is too large (with a new param for that). And when the number of innermost loops also exceeds the limit avoid even that? The above also misses a optimize_loop_for_speed_p (loop) check (probably doesn't make a difference, but you could try). ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (25 preceding siblings ...) 2024-01-19 10:22 ` rguenther at suse dot de @ 2024-01-22 11:42 ` rdapp at gcc dot gnu.org 2024-01-22 11:51 ` juzhe.zhong at rivai dot ai ` (6 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: rdapp at gcc dot gnu.org @ 2024-01-22 11:42 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #27 from Robin Dapp <rdapp at gcc dot gnu.org> --- Following up on this: I'm seeing the same thing Patrick does. We create a lot of large non-sparse sbitmaps that amount to around 33G in total. I did local experiments replacing all sbitmaps that are not needed for LCM by regular bitmaps. Apart from output differences vs the original version the testsuite is unchanged. As expected, wrf now takes longer to compiler, 8 mins vs 4ish mins before and we still use 2.7G of RAM for this single file (Likely because of the remaining sbitmaps) compared to a max of 1.2ish G that the rest of the commpilation uses. One possibility to get the best of both worlds would be to threshold based on num_bbs * num_exprs. Once we exceed it switch to the bitmap pass, otherwise keep sbitmaps for performance. Messaging with Juzhe offline, his best guess for the LICM time is that he enabled checking for dataflow which slows down this particular compilation by a lot. Therefore it doesn't look like a generic problem. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (26 preceding siblings ...) 2024-01-22 11:42 ` rdapp at gcc dot gnu.org @ 2024-01-22 11:51 ` juzhe.zhong at rivai dot ai 2024-01-22 12:00 ` rguenth at gcc dot gnu.org ` (5 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-22 11:51 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #28 from JuzheZhong <juzhe.zhong at rivai dot ai> --- (In reply to Robin Dapp from comment #27) > Following up on this: > > I'm seeing the same thing Patrick does. We create a lot of large non-sparse > sbitmaps that amount to around 33G in total. > > I did local experiments replacing all sbitmaps that are not needed for LCM > by regular bitmaps. Apart from output differences vs the original version > the testsuite is unchanged. > > As expected, wrf now takes longer to compiler, 8 mins vs 4ish mins before > and we still use 2.7G of RAM for this single file (Likely because of the > remaining sbitmaps) compared to a max of 1.2ish G that the rest of the > commpilation uses. > > One possibility to get the best of both worlds would be to threshold based > on num_bbs * num_exprs. Once we exceed it switch to the bitmap pass, > otherwise keep sbitmaps for performance. > > Messaging with Juzhe offline, his best guess for the LICM time is that he > enabled checking for dataflow which slows down this particular compilation > by a lot. Therefore it doesn't look like a generic problem. Thanks. I don't think replacing sbitmap is the best solution. Let's me first disable DF check and reproduce 33G memory consumption in my local machine. I think the best way to optimize the memory consumption is to optimize the VSETLV PASS algorithm and codes. I have an idea to optimize. I am gonna work on it. Thanks for reporting. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (27 preceding siblings ...) 2024-01-22 11:51 ` juzhe.zhong at rivai dot ai @ 2024-01-22 12:00 ` rguenth at gcc dot gnu.org 2024-01-22 13:21 ` juzhe.zhong at rivai dot ai ` (4 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: rguenth at gcc dot gnu.org @ 2024-01-22 12:00 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #29 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to rguenther@suse.de from comment #26) > On Fri, 19 Jan 2024, juzhe.zhong at rivai dot ai wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 > > > > --- Comment #22 from JuzheZhong <juzhe.zhong at rivai dot ai> --- > > (In reply to Richard Biener from comment #21) > > > I once tried to avoid df_reorganize_refs and/or optimize this with the > > > blocks involved but failed. > > > > I am considering whether we should disable LICM for RISC-V by default if vector > > is enabled ? > > Since the compile time explode 10 times is really horrible. > > I think that's a bad idea. It only explodes for some degenerate cases. > The best would be to fix invariant motion to keep DF up-to-date so > it can stop using df_analyze_loop and instead analyze the whole function. > Or maybe change it to use the rtl-ssa framework instead. > > There's already param_loop_invariant_max_bbs_in_loop: > > /* Process the loops, innermost first. */ > for (auto loop : loops_list (cfun, LI_FROM_INNERMOST)) > { > curr_loop = loop; > /* move_single_loop_invariants for very large loops is time > consuming > and might need a lot of memory. For -O1 only do loop invariant > motion for very small loops. */ > unsigned max_bbs = param_loop_invariant_max_bbs_in_loop; > if (optimize < 2) > max_bbs /= 10; > if (loop->num_nodes <= max_bbs) > move_single_loop_invariants (loop); > } > > it might be possible to restrict invariant motion to innermost loops > when the overall number of loops is too large (with a new param > for that). And when the number of innermost loops also exceeds > the limit avoid even that? The above also misses a > optimize_loop_for_speed_p (loop) check (probably doesn't make > a difference, but you could try). Ah, sorry - I was mis-matching LICM to invariant motion above, still invariant motion is the biggest offender (might be due to DF checking if you enabled that). As for sbitmap vs. bitmap it's a difficult call. When there's big profile hits on individual bit operations (bitmap_bit_p, bitmap_set_bit) it might may off to use bitmap but with tree view. There's also sparseset but that requires even more memory. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (28 preceding siblings ...) 2024-01-22 12:00 ` rguenth at gcc dot gnu.org @ 2024-01-22 13:21 ` juzhe.zhong at rivai dot ai 2024-01-22 15:04 ` juzhe.zhong at rivai dot ai ` (3 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-22 13:21 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #30 from JuzheZhong <juzhe.zhong at rivai dot ai> --- Ok. I believe m_avl_def_in && m_avl_def_out can be removed with a better algorthm. Then the memory-hog should be fixed soon. I am gonna rewrite avl_vl_unmodified_between_p and trigger full coverage testingl Since it's going to be a big change there. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (29 preceding siblings ...) 2024-01-22 13:21 ` juzhe.zhong at rivai dot ai @ 2024-01-22 15:04 ` juzhe.zhong at rivai dot ai 2024-01-24 0:30 ` cvs-commit at gcc dot gnu.org ` (2 subsequent siblings) 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-22 15:04 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #31 from JuzheZhong <juzhe.zhong at rivai dot ai> --- machine dep reorg : 403.69 ( 56%) 23.48 ( 93%) 427.17 ( 57%) 5290k ( 0%) Confirm remove RTL DF checking, LICM is no longer be compile-time hog issue. VSETVL PASS count 56% compile-time. Even though I can' see memory-hog in GGC -ftime-report, I can see 33G memory usage in htop. Confirm both compile-hog and memory-hog are VSETVL PASS issue. I will work on optimize compile-time as well as memory-usage of VSETVL PASS. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (30 preceding siblings ...) 2024-01-22 15:04 ` juzhe.zhong at rivai dot ai @ 2024-01-24 0:30 ` cvs-commit at gcc dot gnu.org 2024-01-31 0:29 ` cvs-commit at gcc dot gnu.org 2024-01-31 1:25 ` juzhe.zhong at rivai dot ai 33 siblings, 0 replies; 35+ messages in thread From: cvs-commit at gcc dot gnu.org @ 2024-01-24 0:30 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #32 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Pan Li <panli@gcc.gnu.org>: https://gcc.gnu.org/g:3132d2d36b4705bb762e61b1c8ca4da7c78a8321 commit r14-8378-g3132d2d36b4705bb762e61b1c8ca4da7c78a8321 Author: Juzhe-Zhong <juzhe.zhong@rivai.ai> Date: Tue Jan 23 18:12:49 2024 +0800 RISC-V: Fix large memory usage of VSETVL PASS [PR113495] SPEC 2017 wrf benchmark expose unreasonble memory usage of VSETVL PASS that is, VSETVL PASS consume over 33 GB memory which make use impossible to compile SPEC 2017 wrf in a laptop. The root cause is wasting-memory variables: unsigned num_exprs = num_bbs * num_regs; sbitmap *avl_def_loc = sbitmap_vector_alloc (num_bbs, num_exprs); sbitmap *m_kill = sbitmap_vector_alloc (num_bbs, num_exprs); m_avl_def_in = sbitmap_vector_alloc (num_bbs, num_exprs); m_avl_def_out = sbitmap_vector_alloc (num_bbs, num_exprs); I find that compute_avl_def_data can be achieved by RTL_SSA framework. Replace the code implementation base on RTL_SSA framework. After this patch, the memory-hog issue is fixed. simple vsetvl memory usage (valgrind --tool=massif --pages-as-heap=yes --massif-out-file=massif.out) is 1.673 GB. lazy vsetvl memory usage (valgrind --tool=massif --pages-as-heap=yes --massif-out-file=massif.out) is 2.441 GB. Tested on both RV32 and RV64, no regression. gcc/ChangeLog: PR target/113495 * config/riscv/riscv-vsetvl.cc (get_expr_id): Remove. (get_regno): Ditto. (get_bb_index): Ditto. (pre_vsetvl::compute_avl_def_data): Ditto. (pre_vsetvl::earliest_fuse_vsetvl_info): Fix large memory usage. (pre_vsetvl::pre_global_vsetvl_info): Ditto. gcc/testsuite/ChangeLog: PR target/113495 * gcc.target/riscv/rvv/vsetvl/avl_single-107.c: Adapt test. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (31 preceding siblings ...) 2024-01-24 0:30 ` cvs-commit at gcc dot gnu.org @ 2024-01-31 0:29 ` cvs-commit at gcc dot gnu.org 2024-01-31 1:25 ` juzhe.zhong at rivai dot ai 33 siblings, 0 replies; 35+ messages in thread From: cvs-commit at gcc dot gnu.org @ 2024-01-31 0:29 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 --- Comment #33 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Pan Li <panli@gcc.gnu.org>: https://gcc.gnu.org/g:9dd10de15b183f7b662905e1383fdc3a08755f2e commit r14-8639-g9dd10de15b183f7b662905e1383fdc3a08755f2e Author: Juzhe-Zhong <juzhe.zhong@rivai.ai> Date: Mon Jan 29 19:32:02 2024 +0800 RISC-V: Fix VSETLV PASS compile-time issue The compile time issue was discovered in SPEC 2017 wrf: Use time and -ftime-report to analyze the profile data of SPEC 2017 wrf compilation . Before this patch (Lazy vsetvl): scheduling : 121.89 ( 15%) 0.53 ( 11%) 122.72 ( 15%) 13M ( 1%) machine dep reorg : 424.61 ( 53%) 1.84 ( 37%) 427.44 ( 53%) 5290k ( 0%) real 13m27.074s user 13m19.539s sys 0m5.180s Simple vsetvl: machine dep reorg : 0.10 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 4138k ( 0%) real 6m5.780s user 6m2.396s sys 0m2.373s The machine dep reorg is the compile time of VSETVL PASS (424 seconds) which counts 53% of the compilation time, spends much more time than scheduling. After investigation, the critical patch of VSETVL pass is compute_lcm_local_properties which is called every iteration of phase 2 (earliest fusion) and phase 3 (global lcm). This patch optimized the codes of compute_lcm_local_properties to reduce the compilation time. After this patch: scheduling : 117.51 ( 27%) 0.21 ( 6%) 118.04 ( 27%) 13M ( 1%) machine dep reorg : 80.13 ( 18%) 0.91 ( 26%) 81.26 ( 18%) 5290k ( 0%) real 7m25.374s user 7m20.116s sys 0m3.795s The optimization of this patch is very obvious, lazy VSETVL PASS: 424s (53%) -> 80s (18%) which spend less time than scheduling. Tested on both RV32 and RV64 no regression. Ok for trunk ? PR target/113495 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (extract_single_source): Remove. (pre_vsetvl::compute_vsetvl_def_data): Fix compile time issue. (pre_vsetvl::compute_transparent): New function. (pre_vsetvl::compute_lcm_local_properties): Fix compile time time issue. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai ` (32 preceding siblings ...) 2024-01-31 0:29 ` cvs-commit at gcc dot gnu.org @ 2024-01-31 1:25 ` juzhe.zhong at rivai dot ai 33 siblings, 0 replies; 35+ messages in thread From: juzhe.zhong at rivai dot ai @ 2024-01-31 1:25 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495 JuzheZhong <juzhe.zhong at rivai dot ai> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #34 from JuzheZhong <juzhe.zhong at rivai dot ai> --- Fixed. ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2024-01-31 1:25 UTC | newest] Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-01-19 1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai 2024-01-19 1:36 ` [Bug tree-optimization/113495] " juzhe.zhong at rivai dot ai 2024-01-19 1:38 ` juzhe.zhong at rivai dot ai 2024-01-19 1:48 ` juzhe.zhong at rivai dot ai 2024-01-19 1:52 ` juzhe.zhong at rivai dot ai 2024-01-19 1:55 ` [Bug rtl-optimization/113495] " pinskia at gcc dot gnu.org 2024-01-19 1:56 ` juzhe.zhong at rivai dot ai 2024-01-19 3:08 ` patrick at rivosinc dot com 2024-01-19 3:12 ` pinskia at gcc dot gnu.org 2024-01-19 3:14 ` pinskia at gcc dot gnu.org 2024-01-19 3:33 ` juzhe.zhong at rivai dot ai 2024-01-19 3:34 ` juzhe.zhong at rivai dot ai 2024-01-19 3:44 ` juzhe.zhong at rivai dot ai 2024-01-19 3:46 ` juzhe.zhong at rivai dot ai 2024-01-19 3:52 ` juzhe.zhong at rivai dot ai 2024-01-19 3:56 ` pinskia at gcc dot gnu.org 2024-01-19 3:58 ` juzhe.zhong at rivai dot ai 2024-01-19 4:00 ` juzhe.zhong at rivai dot ai 2024-01-19 8:23 ` juzhe.zhong at rivai dot ai 2024-01-19 8:41 ` juzhe.zhong at rivai dot ai 2024-01-19 9:23 ` rguenth at gcc dot gnu.org 2024-01-19 9:24 ` rguenth at gcc dot gnu.org 2024-01-19 9:28 ` juzhe.zhong at rivai dot ai 2024-01-19 9:35 ` kito at gcc dot gnu.org 2024-01-19 10:03 ` cvs-commit at gcc dot gnu.org 2024-01-19 10:05 ` juzhe.zhong at rivai dot ai 2024-01-19 10:22 ` rguenther at suse dot de 2024-01-22 11:42 ` rdapp at gcc dot gnu.org 2024-01-22 11:51 ` juzhe.zhong at rivai dot ai 2024-01-22 12:00 ` rguenth at gcc dot gnu.org 2024-01-22 13:21 ` juzhe.zhong at rivai dot ai 2024-01-22 15:04 ` juzhe.zhong at rivai dot ai 2024-01-24 0:30 ` cvs-commit at gcc dot gnu.org 2024-01-31 0:29 ` cvs-commit at gcc dot gnu.org 2024-01-31 1:25 ` juzhe.zhong at rivai dot ai
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).