[Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
@ 2024-01-19  1:22 juzhe.zhong at rivai dot ai
  2024-01-19  1:36 ` [Bug tree-optimization/113495] " juzhe.zhong at rivai dot ai
                   ` (33 more replies)
  0 siblings, 34 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19  1:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

            Bug ID: 113495
           Summary: RISC-V: Time and memory awful consumption of SPEC2017
                    wrf benchmark
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

riscv64-unknown-linux-gnu-gfortran -march=rv64gcv_zvl256b -O3 -S -ftime-report

real    63m18.771s
user    60m19.036s
sys     2m59.787s

60+ minutes.

After investigation, the time report show 2 PASS are critical:

loop invariant motion              :2600.28 ( 72%)   1.68 (  1%)2602.12 ( 69%) 
2617k (  0%)

loop invariant consume most of the time 72% time.

The other is the VSETVL PASS:

vsetvl: earliest_fuse_vsetvl_info  : 438.26 ( 12%)  79.82 ( 47%) 518.08 (
14%)221807M ( 75%)
 vsetvl: pre_global_vsetvl_info     : 135.98 (  4%)  31.71 ( 19%) 167.69 (  4%)
71950M ( 24%)

The phase 2 and phase 3 of VSETVL PASS consume 16% times and 99% memory.

I will look into VSETVL PASS issue but I am not able to take care of loop
invariant issue.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug tree-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
@ 2024-01-19  1:36 ` juzhe.zhong at rivai dot ai
  2024-01-19  1:38 ` juzhe.zhong at rivai dot ai
                   ` (32 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19  1:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #1 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Created attachment 57149
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57149&action=edit
spec2017 wrf

spec2017 wrf

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug tree-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
  2024-01-19  1:36 ` [Bug tree-optimization/113495] " juzhe.zhong at rivai dot ai
@ 2024-01-19  1:38 ` juzhe.zhong at rivai dot ai
  2024-01-19  1:48 ` juzhe.zhong at rivai dot ai
                   ` (31 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19  1:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #2 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
To build the attachment file, we need these following file from SPEC2017:

module_big_step_utilities_em.mod  module_cumulus_driver.mod 
module_fddagd_driver.mod        module_model_constants.mod  
module_shallowcu_driver.mod
module_comm_dm.mod                module_dm.mod             
module_first_rk_step_part1.mod  module_pbl_driver.mod       
module_state_description.mod 
module_configure.mod              module_domain.mod         
module_force_scm.mod            module_radiation_driver.mod 
module_surface_driver.mod
module_convtrans_prep.mod         module_em.mod             
module_fr_fire_driver_wrf.mod   module_scalar_tables.mod     module_utility.mod

But I failed to create attachment for them since they are too big.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug tree-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
  2024-01-19  1:36 ` [Bug tree-optimization/113495] " juzhe.zhong at rivai dot ai
  2024-01-19  1:38 ` juzhe.zhong at rivai dot ai
@ 2024-01-19  1:48 ` juzhe.zhong at rivai dot ai
  2024-01-19  1:52 ` juzhe.zhong at rivai dot ai
                   ` (30 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19  1:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Ok. The reduced case:

# 1 "module_first_rk_step_part1.fppized.f90"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "module_first_rk_step_part1.fppized.f90"
!WRF:MEDIATION_LAYER:SOLVER


MODULE module_first_rk_step_part1

CONTAINS

  SUBROUTINE first_rk_step_part1 (   grid , config_flags              &
                             , moist , moist_tend               &
                             , chem  , chem_tend                &
                             , tracer, tracer_tend              &
                             , scalar , scalar_tend             &
                             , fdda3d, fdda2d                   &
                             , aerod                            &
                             , ru_tendf, rv_tendf               &
                             , rw_tendf, t_tendf                &
                             , ph_tendf, mu_tendf               &
                             , tke_tend                         &
                             , adapt_step_flag , curr_secs      &
                             , psim , psih , wspd , gz1oz0 , br , chklowq &
                             , cu_act_flag , hol , th_phy       &
                             , pi_phy , p_phy , t_phy           &
                             , dz8w , p8w , t8w                 &
                             , ids, ide, jds, jde, kds, kde     &
                             , ims, ime, jms, jme, kms, kme     &
                             , ips, ipe, jps, jpe, kps, kpe     &
                             , imsx,imex,jmsx,jmex,kmsx,kmex    &
                             , ipsx,ipex,jpsx,jpex,kpsx,kpex    &
                             , imsy,imey,jmsy,jmey,kmsy,kmey    &
                             , ipsy,ipey,jpsy,jpey,kpsy,kpey    &
                             , k_start , k_end                  &
                             , f_flux                           &
                            )
    USE module_state_description
    USE module_model_constants
    USE module_domain, ONLY : domain, domain_clock_get, get_ijk_from_subgrid
    USE module_configure, ONLY : grid_config_rec_type, model_config_rec
    USE module_radiation_driver, ONLY : pre_radiation_driver, radiation_driver
    USE module_surface_driver, ONLY : surface_driver
    USE module_cumulus_driver, ONLY : cumulus_driver
    USE module_shallowcu_driver, ONLY : shallowcu_driver
    USE module_pbl_driver, ONLY : pbl_driver
    USE module_fr_fire_driver_wrf, ONLY : fire_driver_em_step
    USE module_fddagd_driver, ONLY : fddagd_driver
    USE module_em, ONLY : init_zero_tendency
    USE module_force_scm
    USE module_convtrans_prep
    USE module_big_step_utilities_em, ONLY : phy_prep
use module_scalar_tables
    USE module_dm, ONLY : local_communicator, mytask, ntasks, ntasks_x,
ntasks_y, local_communicator_periodic, wrf_dm_maxval
    USE module_comm_dm, ONLY :
halo_em_phys_a_sub,halo_em_fdda_sfc_sub,halo_pwp_sub,halo_em_chem_e_3_sub, &
    halo_em_chem_e_5_sub, halo_em_hydro_noahmp_sub
    USE module_utility
    IMPLICIT NONE

    TYPE ( domain ), INTENT(INOUT) :: grid
    TYPE ( grid_config_rec_type ), INTENT(IN) :: config_flags
    TYPE(WRFU_Time)                :: currentTime

    INTEGER, INTENT(IN) :: ids, ide, jds, jde, kds, kde,     &
                           ims, ime, jms, jme, kms, kme,     &
                           ips, ipe, jps, jpe, kps, kpe,     &
                           imsx,imex,jmsx,jmex,kmsx,kmex,    &
                           ipsx,ipex,jpsx,jpex,kpsx,kpex,    &
                           imsy,imey,jmsy,jmey,kmsy,kmey,    &
                           ipsy,ipey,jpsy,jpey,kpsy,kpey


    LOGICAL ,INTENT(IN)                        :: adapt_step_flag
    REAL, INTENT(IN)                           :: curr_secs

    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme,num_moist),INTENT(INOUT)   ::
moist
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme,num_moist),INTENT(INOUT)   ::
moist_tend
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme,num_chem),INTENT(INOUT)   ::
chem
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme,num_chem),INTENT(INOUT)   ::
chem_tend
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme,num_tracer),INTENT(INOUT)   ::
tracer
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme,num_tracer),INTENT(INOUT)   ::
tracer_tend
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme,num_scalar),INTENT(INOUT)   ::
scalar
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme,num_scalar),INTENT(INOUT)   ::
scalar_tend
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme,num_fdda3d),INTENT(INOUT)  ::
fdda3d
    REAL    ,DIMENSION(ims:ime,1:1,jms:jme,num_fdda2d),INTENT(INOUT)      ::
fdda2d
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme,num_aerod),INTENT(INOUT)   ::
aerod
    REAL    ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT)         :: psim
    REAL    ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT)         :: psih
    REAL    ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT)         :: wspd
    REAL    ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT)         :: gz1oz0
    REAL    ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT)         :: br
    REAL    ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT)         :: chklowq
    LOGICAL ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT)         :: cu_act_flag
    REAL    ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT)         :: hol

    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: th_phy
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: pi_phy
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: p_phy
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: t_phy
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: dz8w
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: p8w
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: t8w

    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: ru_tendf
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: rv_tendf
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: rw_tendf
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: ph_tendf
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: t_tendf
    REAL    ,DIMENSION(ims:ime,kms:kme,jms:jme), INTENT(INOUT) :: tke_tend

    REAL    ,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: mu_tendf

    INTEGER, INTENT(IN)                           ::  k_start, k_end
    LOGICAL, INTENT(IN), OPTIONAL                 ::  f_flux

! Local
    real :: HYDRO_dt
    REAL, DIMENSION( ims:ime, jms:jme ) :: exch_temf  ! 1/7/09 WA

    REAL, DIMENSION( ims:ime, jms:jme ) :: ht_loc, mixht
    INTEGER                             :: ij
    INTEGER  num_roof_layers
    INTEGER  num_wall_layers
    INTEGER  num_road_layers
    INTEGER  iswater
    LOGICAL  :: l_flux
    INTEGER  :: isurban
    INTEGER  rk_step
    INTEGER                         :: yr, month, day, hr, minute, sec, rc
    CHARACTER(LEN=80)                    :: mesg

   INTEGER                         :: sids , side , sjds , sjde , skds , skde ,
&
                                      sims , sime , sjms , sjme , skms , skme ,
&
                                      sips , sipe , sjps , sjpe , skps , skpe

   CHARACTER (LEN=256) :: mminlu
   CHARACTER (LEN=1000) :: message


  CALL get_ijk_from_subgrid (  grid ,                   &
                            sids, side, sjds, sjde, skds, skde,    &
                            sims, sime, sjms, sjme, skms, skme,    &
                            sips, sipe, sjps, sjpe, skps, skpe    )

 ! initialize all tendencies to zero in order to update physics
 ! tendencies first (separate from dry dynamics).

   l_flux=.FALSE.
   if (present(f_flux)) l_flux=f_flux

    rk_step = 1



       DO ij = 1 , grid%num_tiles

         CALL wrf_debug ( 200 , ' call init_zero_tendency' )
         CALL init_zero_tendency ( ru_tendf, rv_tendf, rw_tendf,     &
                                   ph_tendf, t_tendf, tke_tend,      &
                                   mu_tendf,                         &
                                   moist_tend,chem_tend,scalar_tend, &
                                   tracer_tend,num_tracer,           &
                                   num_moist,num_chem,num_scalar,    &
                                   rk_step,                          &
                                   ids, ide, jds, jde, kds, kde,     &
                                   ims, ime, jms, jme, kms, kme,     &
                                   grid%i_start(ij), grid%i_end(ij), &
                                   grid%j_start(ij), grid%j_end(ij), &
                                   k_start, k_end                   )

       END DO


!STARTOFREGISTRYGENERATEDINCLUDE 'inc/HALO_EM_PHYS_A.inc'
!
! WARNING This file is generated automatically by use_registry
! using the data base in the file named Registry.
! Do not edit.  Your changes to this file will be lost.
!
CALL HALO_EM_PHYS_A_sub ( grid, &
  local_communicator, &
  mytask, ntasks, ntasks_x, ntasks_y, &
  ids, ide, jds, jde, kds, kde,       &
  ims, ime, jms, jme, kms, kme,       &
  ips, ipe, jps, jpe, kps, kpe )
!ENDOFREGISTRYGENERATEDINCLUDE

      DO ij = 1 , grid%num_tiles

        CALL wrf_debug ( 200 , ' call phy_prep' )
        CALL phy_prep ( config_flags,                                    &
                        grid%mut, grid%muu, grid%muv, grid%u_2,          &
                        grid%v_2, grid%p, grid%pb, grid%alt,             &
                        grid%ph_2, grid%phb, grid%t_2, grid%tsk, moist,
num_moist,   &
                        grid%rho,th_phy, p_phy, pi_phy, grid%u_phy, grid%v_phy,
     &
                        p8w, t_phy, t8w, grid%z, grid%z_at_w, dz8w,      &
                        grid%p_hyd, grid%p_hyd_w, grid%dnw,              &
                        grid%fnm, grid%fnp, grid%znw, grid%p_top,        &
                        grid%rthraten,                                   &
                        grid%rthblten, grid%rublten, grid%rvblten,       &
                        grid%rqvblten, grid%rqcblten, grid%rqiblten,     &
                        grid%rucuten,  grid%rvcuten,  grid%rthcuten,     &
                        grid%rqvcuten, grid%rqccuten, grid%rqrcuten,     &
                        grid%rqicuten, grid%rqscuten,                    &
                        grid%rushten,  grid%rvshten,  grid%rthshten,     &
                        grid%rqvshten, grid%rqcshten, grid%rqrshten,     &
                        grid%rqishten, grid%rqsshten, grid%rqgshten,     &
                        grid%rthften,  grid%rqvften,                     &
                        grid%RUNDGDTEN, grid%RVNDGDTEN, grid%RTHNDGDTEN, &
                        grid%RPHNDGDTEN,grid%RQVNDGDTEN, grid%RMUNDGDTEN,&
!jdf
                        grid%landmask,grid%xland,                 &
!jdf
                        ids, ide, jds, jde, kds, kde,                    &
                        ims, ime, jms, jme, kms, kme,                    &
                        grid%i_start(ij), grid%i_end(ij),                &
                        grid%j_start(ij), grid%j_end(ij),                &
                        k_start, k_end                                   )
      ENDDO



! radiation
     CALL domain_clock_get( grid, current_time=currentTime, &
                            current_timestr=mesg )
     CALL WRFU_TimeGet( currentTime, YY=yr, dayOfYear=day, H=hr, M=minute,
S=sec, rc=rc)
         IF( rc/= WRFU_SUCCESS)THEN
         CALL wrf_error_fatal('WRFU_TimeGet failed')
         ENDIF

! this driver is only needed to handle non-local shadowing effects
      CALL pre_radiation_driver ( grid, config_flags                        &
     &        ,itimestep=grid%itimestep, ra_call_offset=grid%ra_call_offset   
&
     &        ,XLAT=grid%xlat, XLONG=grid%xlong, GMT=grid%gmt                 
&
     &        ,julian=grid%julian, xtime=grid%xtime, RADT=grid%radt           
&
     &        ,STEPRA=grid%stepra                                             
&
     &        ,ht=grid%ht,dx=grid%dx,dy=grid%dy,sina=grid%sina,cosa=grid%cosa 
&
     &        ,shadowmask=grid%shadowmask,slope_rad=config_flags%slope_rad    
&
     &        ,topo_shading=config_flags%topo_shading                         
&
     &        ,shadlen=config_flags%shadlen,ht_shad=grid%ht_shad,ht_loc=ht_loc
&
     &        ,ht_shad_bxs=grid%ht_shad_bxs, ht_shad_bxe=grid%ht_shad_bxe     
&
     &        ,ht_shad_bys=grid%ht_shad_bys, ht_shad_bye=grid%ht_shad_bye     
&
     &        ,nested=config_flags%nested, min_ptchsz=grid%min_ptchsz         
&
     &        ,spec_bdy_width=config_flags%spec_bdy_width                     
&
            ! indexes
     &        ,IDS=ids,IDE=ide, JDS=jds,JDE=jde, KDS=kds,KDE=kde          &
     &        ,IMS=ims,IME=ime, JMS=jms,JME=jme, KMS=kms,KME=kme          &
     &        ,IPS=ips,IPE=ipe, JPS=jps,JPE=jpe, KPS=kps,KPE=kpe          &
     &        ,i_start=grid%i_start,i_end=min(grid%i_end, ide-1)          &
     &        ,j_start=grid%j_start,j_end=min(grid%j_end, jde-1)          &
     &        ,kts=k_start, kte=min(k_end,kde-1)                          &
     &        ,num_tiles=grid%num_tiles                                   )

      CALL wrf_debug ( 200 , ' call radiation_driver' )


      CALL radiation_driver(                                                  &
     &         p_top=grid%p_top & !DJW 140312 added p_top for vertical nesting
     &        ,ACFRCV=grid%acfrcv      ,ACFRST=grid%acfrst     
,ALBEDO=grid%albedo  &
     &        ,CFRACH=grid%cfrach      ,CFRACL=grid%cfracl     
,CFRACM=grid%cfracm  &
     &        ,CUPPT=grid%cuppt        ,CZMEAN=grid%czmean      ,DT=grid%dt    
     &
     &        ,DZ8W=dz8w               ,EMISS=grid%emiss        ,GLW=grid%glw  
     &
     &        ,GMT=grid%gmt            ,GSW=grid%gsw            ,HBOT=grid%hbot
     &
     &        ,HTOP=grid%htop          ,HBOTR=grid%hbotr       
,HTOPR=grid%htopr    &
     &        ,ICLOUD=config_flags%icloud                                      
     &
     &        ,ITIMESTEP=grid%itimestep,JULDAY=grid%julday      ,
JULIAN=grid%julian &
     &        ,JULYR=grid%julyr        ,LW_PHYSICS=config_flags%ra_lw_physics  
     &
     &        ,NCFRCV=grid%ncfrcv      ,NCFRST=grid%ncfrst      ,NPHS=1        
     &
     &        ,o3input=config_flags%o3input     ,O3rad=grid%o3rad              
     &
     &        ,aer_opt=config_flags%aer_opt
,aerod=aerod(:,:,:,P_ocarbon:P_upperaer) &
     &        ,swint_opt=config_flags%swint_opt                                
     &
     &        ,P8W=grid%p_hyd_w        ,P=grid%p_hyd            ,PI=pi_phy     
     &
     &        ,RADT=grid%radt          ,RA_CALL_OFFSET=grid%ra_call_offset     
     &
     &        ,RHO=grid%rho            ,RLWTOA=grid%rlwtoa                     
     &
     &        ,RSWTOA=grid%rswtoa      ,RTHRATEN=grid%rthraten                 
     &
     &        ,RTHRATENLW=grid%rthratenlw       ,RTHRATENSW=grid%rthratensw    
     &
     &        ,SNOW=grid%snow          ,STEPRA=grid%stepra     
,SWDOWN=grid%swdown  &
     &        ,SWDOWNC=grid%swdownc    ,SW_PHYSICS=config_flags%ra_sw_physics  
     &
     &        ,T8W=t8w                 ,T=grid%t_phy          
,TAUCLDC=grid%taucldc &
     &        ,TAUCLDI=grid%taucldi    ,TSK=grid%tsk           
,VEGFRA=grid%vegfra  &
     &        ,WARM_RAIN=grid%warm_rain ,XICE=grid%xice        
,XLAND=grid%xland    &
     &        ,XLAT=grid%xlat          ,XLONG=grid%xlong        ,YR=yr         
     &
           ! SSiB LSM radiation components (fds 06/2010)
     &        ,ALSWVISDIR=grid%alswvisdir ,ALSWVISDIF=grid%alswvisdif      & 
!ssib 
     &        ,ALSWNIRDIR=grid%alswnirdir ,ALSWNIRDIF=grid%alswnirdif      & 
!ssib
     &        ,SWVISDIR=grid%swvisdir ,SWVISDIF=grid%swvisdif              & 
!ssib
     &        ,SWNIRDIR=grid%swnirdir ,SWNIRDIF=grid%swnirdif              & 
!ssib
     &        ,SF_SURFACE_PHYSICS=config_flags%sf_surface_physics          & 
!ssib
! WRF-solar and aerosol variables from jararias 2013/8 and 2013/11
     &       ,SWDDIR=grid%swddir,SWDDNI=grid%swddni,SWDDIF=grid%swddif         
                    & 
     &       ,Gx=grid%Gx,Bx=grid%Bx,gg=grid%gg,bb=grid%bb                      
                    &
     &       ,swdown_ref=grid%swdown_ref,swddir_ref=grid%swddir_ref            
                    &
     &       ,coszen_ref=grid%coszen_ref                                       
                    &
     &       ,aer_type=config_flags%aer_type                                   
                    &
     &      
,aer_aod550_opt=config_flags%aer_aod550_opt,aer_aod550_val=config_flags%aer_aod550_val
&
     &      
,aer_angexp_opt=config_flags%aer_angexp_opt,aer_angexp_val=config_flags%aer_angexp_val
&
     &      
,aer_ssa_opt=config_flags%aer_ssa_opt,aer_ssa_val=config_flags%aer_ssa_val     
       &
     &      
,aer_asy_opt=config_flags%aer_asy_opt,aer_asy_val=config_flags%aer_asy_val     
       &
     &      
,aod5502d=grid%aod5502d,angexp2d=grid%angexp2d,aerssa2d=grid%aerssa2d          
       &
     &       ,aerasy2d=grid%aerasy2d,aod5503d=grid%aod5503d                    
                    &
!Optional solar variables
     &        ,DECLINX=grid%declin ,SOLCONX=grid%solcon ,COSZEN=grid%coszen
,HRANG=grid%hrang    &
     &        , CEN_LAT=grid%cen_lat                                      &
     &        ,Z=grid%z                                                   &
     &        ,ALEVSIZ=grid%alevsiz, no_src_types=grid%no_src_types       &
     &        ,LEVSIZ=grid%levsiz, N_OZMIXM=num_ozmixm                    &
     &        ,N_AEROSOLC=num_aerosolc                                    &
     &        ,PAERLEV=grid%paerlev   ,ID=grid%id                         &
     &        ,CAM_ABS_DIM1=grid%cam_abs_dim1, CAM_ABS_DIM2=grid%cam_abs_dim2 &
     &        ,CAM_ABS_FREQ_S=grid%cam_abs_freq_s                         &
     &        ,XTIME=grid%xtime                                               
&
              ,CURR_SECS=curr_secs, ADAPT_STEP_FLAG=adapt_step_flag       &
            ! indexes
     &        ,IDS=ids,IDE=ide, JDS=jds,JDE=jde, KDS=kds,KDE=kde          &
     &        ,IMS=ims,IME=ime, JMS=jms,JME=jme, KMS=kms,KME=kme          &
     &        ,i_start=grid%i_start,i_end=min(grid%i_end, ide-1)          &
     &        ,j_start=grid%j_start,j_end=min(grid%j_end, jde-1)          &
     &        ,kts=k_start, kte=min(k_end,kde-1)                          &
     &        ,num_tiles=grid%num_tiles                                   &
            ! Optional
!JJS 20101020 vvvvv
     &        , TLWDN=grid%tlwdn, TLWUP=grid%tlwup                        & !
goddard schemes
     &        , SLWDN=grid%slwdn, SLWUP=grid%slwup                        & !
goddard schemes
     &        , TSWDN=grid%tswdn, TSWUP=grid%tswup                        & !
goddard schemes
     &        , SSWDN=grid%sswdn, SSWUP=grid%sswup                        & !
goddard schemes
!JJS 20101020 ^^^^^
     &        , CLDFRA=grid%cldfra, CLDFRA_MP_ALL=grid%cldfra_mp_all      &
     &        , LRADIUS=grid%LRADIUS,IRADIUS=grid%IRADIUS                 &
!BSINGH(01/22/2014)
     &        , CLDFRA_DP=grid%cldfra_dp                                  & !
ckay for subgrid cloud
     &        , CLDFRA_SH=grid%cldfra_sh                                  &
     &        , re_cloud=grid%re_cloud, re_ice=grid%re_ice,
re_snow=grid%re_snow & ! G. Thompson
     &        , has_reqc=grid%has_reqc, has_reqi=grid%has_reqi,
has_reqs=grid%has_reqs & ! G. Thompson
     &        , PB=grid%pb                                                &
     &        , F_ICE_PHY=grid%f_ice_phy,F_RAIN_PHY=grid%f_rain_phy       &
     &        , QV=moist(ims,kms,jms,P_QV), F_QV=F_QV                     &
     &        , QC=moist(ims,kms,jms,P_QC), F_QC=F_QC                     &
     &        , QR=moist(ims,kms,jms,P_QR), F_QR=F_QR                     &
     &        , QI=moist(ims,kms,jms,P_QI), F_QI=F_QI                     &
     &        , QS=moist(ims,kms,jms,P_QS), F_QS=F_QS                     &
     &        , QG=moist(ims,kms,jms,P_QG), F_QG=F_QG                     &
     &        , QNDROP=scalar(ims,kms,jms,P_QNDROP), F_QNDROP=F_QNDROP    &
     &        ,ACSWUPT=grid%acswupt    ,ACSWUPTC=grid%acswuptc            &
     &        ,ACSWDNT=grid%acswdnt    ,ACSWDNTC=grid%acswdntc            &
     &        ,ACSWUPB=grid%acswupb    ,ACSWUPBC=grid%acswupbc            &
     &        ,ACSWDNB=grid%acswdnb    ,ACSWDNBC=grid%acswdnbc            &
     &        ,ACLWUPT=grid%aclwupt    ,ACLWUPTC=grid%aclwuptc            &
     &        ,ACLWDNT=grid%aclwdnt    ,ACLWDNTC=grid%aclwdntc            &
     &        ,ACLWUPB=grid%aclwupb    ,ACLWUPBC=grid%aclwupbc            &
     &        ,ACLWDNB=grid%aclwdnb    ,ACLWDNBC=grid%aclwdnbc            &
     &        ,SWUPT=grid%swupt    ,SWUPTC=grid%swuptc                    &
     &        ,SWDNT=grid%swdnt    ,SWDNTC=grid%swdntc                    &
     &        ,SWUPB=grid%swupb    ,SWUPBC=grid%swupbc                    &
     &        ,SWDNB=grid%swdnb    ,SWDNBC=grid%swdnbc                    &
     &        ,LWUPT=grid%lwupt    ,LWUPTC=grid%lwuptc                    &
     &        ,LWDNT=grid%lwdnt    ,LWDNTC=grid%lwdntc                    &
     &        ,LWUPB=grid%lwupb    ,LWUPBC=grid%lwupbc                    &
     &        ,LWDNB=grid%lwdnb    ,LWDNBC=grid%lwdnbc                    &
     &        ,LWCF=grid%lwcf                                                 
&
     &        ,SWCF=grid%swcf                                                 
&
     &        ,OLR=grid%olr                                                   
&
     &        ,AERODM=grid%aerodm, PINA=grid%pina, AODTOT=grid%aodtot         
&
     &        ,OZMIXM=grid%ozmixm, PIN=grid%pin                               
&
     &        ,M_PS_1=grid%m_ps_1, M_PS_2=grid%m_ps_2,
AEROSOLC_1=grid%aerosolc_1        &
     &        ,AEROSOLC_2=grid%aerosolc_2, M_HYBI0=grid%m_hybi                 
    &
     &        ,ABSTOT=grid%abstot, ABSNXT=grid%absnxt, EMSTOT=grid%emstot      
         &
     &        ,RADTACTTIME=grid%radtacttime                                    
    &  
     &        ,ICLOUD_CU=config_flags%ICLOUD_CU                            &
     &        ,QC_CU=grid%QC_CU , QI_CU=grid%QI_CU                         &
     &        
,slope_rad=config_flags%slope_rad,topo_shading=config_flags%topo_shading     &
     &         ,shadowmask=grid%shadowmask,ht=grid%ht,dx=grid%dx,dy=grid%dy &   
     &         ,IS_CAMMGMP_USED = grid%is_CAMMGMP_used    )



!********* Surface driver
! surface



!gmm halo of wtd and riverflow for leafhydro
  IF ( config_flags%sf_surface_physics.eq.NOAHMPSCHEME ) THEN
       IF ( config_flags%opt_run.eq.5.and.mod(grid%itimestep,grid%STEPWTD).eq.0
)  THEN
!STARTOFREGISTRYGENERATEDINCLUDE 'inc/HALO_EM_HYDRO_NOAHMP.inc'
!
! WARNING This file is generated automatically by use_registry
! using the data base in the file named Registry.
! Do not edit.  Your changes to this file will be lost.
!
CALL HALO_EM_HYDRO_NOAHMP_sub ( grid, &
  local_communicator, &
  mytask, ntasks, ntasks_x, ntasks_y, &
  ids, ide, jds, jde, kds, kde,       &
  ims, ime, jms, jme, kms, kme,       &
  ips, ipe, jps, jpe, kps, kpe )
!ENDOFREGISTRYGENERATEDINCLUDE
       ENDIF
  ENDIF

  END SUBROUTINE first_rk_step_part1

END MODULE module_first_rk_step_part1


Which can easily help us to debug memory hog since we don't need to compile it
with too long time:

real    0m22.924s
user    0m22.242s
sys     0m0.640s

But we can see the memory-hog in report:

machine dep reorg                  :   2.05 (  9%)   0.33 ( 56%)   2.40 ( 10%) 
 939M ( 80%)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug tree-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (2 preceding siblings ...)
  2024-01-19  1:48 ` juzhe.zhong at rivai dot ai
@ 2024-01-19  1:52 ` juzhe.zhong at rivai dot ai
  2024-01-19  1:55 ` [Bug rtl-optimization/113495] " pinskia at gcc dot gnu.org
                   ` (29 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19  1:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #4 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Also, the original file with -fno-move-loop-invariants reduce compile time from
60 minutes into 7 minutes:

real    7m12.528s
user    6m55.214s
sys     0m17.147s


machine dep reorg                  :  75.93 ( 18%)  14.23 ( 88%)  90.15 ( 21%)
33383M ( 95%

The memory report is quite obvious (consume 95% memory).

So, I believe VSETVL PASS is not the main reason of compile-time-hog,
it should be loop invariant PASS.

But VSETVL PASS is main reason of memory-hog.

I am not familiar with loop invariant pass. Can anyone help to debug
compile-time
hog of loop invariant PASS. Or should we disable loop invariant pass by default
for RISC-V ?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (3 preceding siblings ...)
  2024-01-19  1:52 ` juzhe.zhong at rivai dot ai
@ 2024-01-19  1:55 ` pinskia at gcc dot gnu.org
  2024-01-19  1:56 ` juzhe.zhong at rivai dot ai
                   ` (28 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-19  1:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|tree-optimization           |rtl-optimization

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note "loop invariant motion" is the RTL based loop invariant motion pass.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (4 preceding siblings ...)
  2024-01-19  1:55 ` [Bug rtl-optimization/113495] " pinskia at gcc dot gnu.org
@ 2024-01-19  1:56 ` juzhe.zhong at rivai dot ai
  2024-01-19  3:08 ` patrick at rivosinc dot com
                   ` (27 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19  1:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #6 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Andrew Pinski from comment #5)
> Note "loop invariant motion" is the RTL based loop invariant motion pass.

So you mean it should be still RISC-V issue, right ?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (5 preceding siblings ...)
  2024-01-19  1:56 ` juzhe.zhong at rivai dot ai
@ 2024-01-19  3:08 ` patrick at rivosinc dot com
  2024-01-19  3:12 ` pinskia at gcc dot gnu.org
                   ` (26 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: patrick at rivosinc dot com @ 2024-01-19  3:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

Patrick O'Neill <patrick at rivosinc dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |patrick at rivosinc dot com

--- Comment #7 from Patrick O'Neill <patrick at rivosinc dot com> ---
I believe the memory hog is caused by this:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/riscv/riscv-vsetvl.cc;h=2067073185f8c0f398908b164a99b592948e6d2d;hb=565935f93a7da629da89b05812a3e8c43287598f#l2427

In the slightly reduced test program I was using to debug there were ~35k bb's
leading to num_expr being roughly 1 million. vsetvl then makes 35k bitmaps of
~1 million bits.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (6 preceding siblings ...)
  2024-01-19  3:08 ` patrick at rivosinc dot com
@ 2024-01-19  3:12 ` pinskia at gcc dot gnu.org
  2024-01-19  3:14 ` pinskia at gcc dot gnu.org
                   ` (25 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-19  3:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Patrick O'Neill from comment #7)
> I believe the memory hog is caused by this:
> https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/riscv/riscv-vsetvl.cc;
> h=2067073185f8c0f398908b164a99b592948e6d2d;
> hb=565935f93a7da629da89b05812a3e8c43287598f#l2427
> 
> In the slightly reduced test program I was using to debug there were ~35k
> bb's leading to num_expr being roughly 1 million. vsetvl then makes 35k
> bitmaps of ~1 million bits.

How sparse is this bitmap will be?  bitmap instead of sbitmap should be used if
the bitmap is going to be sparse. sbitmap is a fixed sized based on the bitmap
size while bitmap is better for sparse bitmaps as it is implemented as linked
list.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (7 preceding siblings ...)
  2024-01-19  3:12 ` pinskia at gcc dot gnu.org
@ 2024-01-19  3:14 ` pinskia at gcc dot gnu.org
  2024-01-19  3:33 ` juzhe.zhong at rivai dot ai
                   ` (24 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-19  3:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #8)
> How sparse is this bitmap will be?  bitmap instead of sbitmap should be used
> if the bitmap is going to be sparse. sbitmap is a fixed sized based on the
> bitmap size while bitmap is better for sparse bitmaps as it is implemented
> as linked list.

Also it seems like DF already has def_in/def_out info, how much is duplicated
information from there?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (8 preceding siblings ...)
  2024-01-19  3:14 ` pinskia at gcc dot gnu.org
@ 2024-01-19  3:33 ` juzhe.zhong at rivai dot ai
  2024-01-19  3:34 ` juzhe.zhong at rivai dot ai
                   ` (23 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19  3:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #10 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
No, it's not caused here. I removed the whole function compute_avl_def_data.

The memory usage doesn't change.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (9 preceding siblings ...)
  2024-01-19  3:33 ` juzhe.zhong at rivai dot ai
@ 2024-01-19  3:34 ` juzhe.zhong at rivai dot ai
  2024-01-19  3:44 ` juzhe.zhong at rivai dot ai
                   ` (22 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19  3:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #11 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
It should be compute_lcm_local_properties. The memory usage reduce 50% after I
remove this function. I am still investigating.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (10 preceding siblings ...)
  2024-01-19  3:34 ` juzhe.zhong at rivai dot ai
@ 2024-01-19  3:44 ` juzhe.zhong at rivai dot ai
  2024-01-19  3:46 ` juzhe.zhong at rivai dot ai
                   ` (21 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19  3:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #12 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Ok. Here is a simple fix which give some hints:


diff --git a/gcc/config/riscv/riscv-vsetvl.cc
b/gcc/config/riscv/riscv-vsetvl.cc
index 2067073185f..ede818140dc 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2719,10 +2719,11 @@ pre_vsetvl::compute_lcm_local_properties ()
          for (int i = 0; i < num_exprs; i += 1)
            {
              const vsetvl_info &info = *m_exprs[i];
-             if (!info.has_nonvlmax_reg_avl () && !info.has_vl ())
+             bool has_nonvlmax_reg_avl_p = info.has_nonvlmax_reg_avl ();
+             if (!has_nonvlmax_reg_avl_p && !info.has_vl ())
                continue;

-             if (info.has_nonvlmax_reg_avl ())
+             if (has_nonvlmax_reg_avl_p)
                {
                  unsigned int regno;
                  sbitmap_iterator sbi;
@@ -3556,7 +3557,7 @@ const pass_data pass_data_vsetvl = {
   RTL_PASS,     /* type */
   "vsetvl",     /* name */
   OPTGROUP_NONE, /* optinfo_flags */
-  TV_NONE,      /* tv_id */
+  TV_MACH_DEP,  /* tv_id */
   0,            /* properties_required */
   0,            /* properties_provided */
   0,            /* properties_destroyed */


Memory usage from 931M -> 781M. Memory usage reduce significantly.

Note that I didn't change all has_nonvlmax_reg_avl, We have so many places
calling  has_nonvlmax_reg_avl...

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (11 preceding siblings ...)
  2024-01-19  3:44 ` juzhe.zhong at rivai dot ai
@ 2024-01-19  3:46 ` juzhe.zhong at rivai dot ai
  2024-01-19  3:52 ` juzhe.zhong at rivai dot ai
                   ` (20 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19  3:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #13 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
So I think we should investigate why calling has_nonvlmax_reg_avl cost so much
memory.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (12 preceding siblings ...)
  2024-01-19  3:46 ` juzhe.zhong at rivai dot ai
@ 2024-01-19  3:52 ` juzhe.zhong at rivai dot ai
  2024-01-19  3:56 ` pinskia at gcc dot gnu.org
                   ` (19 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19  3:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #14 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Oh. I known the reason now.

The issue is not RISC-V backend VSETVL PASS.

It's memory bug of rtx_equal_p I think.

We are calling rtx_equal_p which is very costly.

For example, has_nonvlmax_reg_avl is calling rtx_equal_p.

So I keep all codes unchange, then replace comparison as follows:

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 93a1238a5ab..1c85c8ee3c6 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4988,7 +4988,7 @@ nonvlmax_avl_type_p (rtx_insn *rinsn)
 bool
 vlmax_avl_p (rtx x)
 {
-  return x && rtx_equal_p (x, RVV_VLMAX);
+  return x && REG_P (x) && REGNO (x) == X0_REGNUM/*rtx_equal_p (x,
RVV_VLMAX)*/;
 }

Use REGNO (x) == X0_REGNUM instead of rtx_equal_p.

Memory-hog issue is gone:

939M -> 725k.

So I am gonna send a patch to walk around rtx_equal_p issues which cause
memory-hog.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (13 preceding siblings ...)
  2024-01-19  3:52 ` juzhe.zhong at rivai dot ai
@ 2024-01-19  3:56 ` pinskia at gcc dot gnu.org
  2024-01-19  3:58 ` juzhe.zhong at rivai dot ai
                   ` (18 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-19  3:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #15 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #14)
> Oh. I known the reason now.
> 
> The issue is not RISC-V backend VSETVL PASS.
> 
> It's memory bug of rtx_equal_p I think.


It is not rtx_equal_p but rather RVV_VLMAX which is defined as:
riscv-protos.h:#define RVV_VLMAX gen_rtx_REG (Pmode, X0_REGNUM)

Seems like you could cache that somewhere ...

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (14 preceding siblings ...)
  2024-01-19  3:56 ` pinskia at gcc dot gnu.org
@ 2024-01-19  3:58 ` juzhe.zhong at rivai dot ai
  2024-01-19  4:00 ` juzhe.zhong at rivai dot ai
                   ` (17 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19  3:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #16 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Andrew Pinski from comment #15)
> (In reply to JuzheZhong from comment #14)
> > Oh. I known the reason now.
> > 
> > The issue is not RISC-V backend VSETVL PASS.
> > 
> > It's memory bug of rtx_equal_p I think.
> 
> 
> It is not rtx_equal_p but rather RVV_VLMAX which is defined as:
> riscv-protos.h:#define RVV_VLMAX gen_rtx_REG (Pmode, X0_REGNUM)
> 
> Seems like you could cache that somewhere ...

Oh. Make sense to me. Thank you so much.
I think memory-hog issue will be fixed soon.

But the compile-time hog issue of loop invariant motion is still not fixed.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (15 preceding siblings ...)
  2024-01-19  3:58 ` juzhe.zhong at rivai dot ai
@ 2024-01-19  4:00 ` juzhe.zhong at rivai dot ai
  2024-01-19  8:23 ` juzhe.zhong at rivai dot ai
                   ` (16 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19  4:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #17 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Ok. Confirm the original test 33383M -> 4796k now.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (16 preceding siblings ...)
  2024-01-19  4:00 ` juzhe.zhong at rivai dot ai
@ 2024-01-19  8:23 ` juzhe.zhong at rivai dot ai
  2024-01-19  8:41 ` juzhe.zhong at rivai dot ai
                   ` (15 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19  8:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #18 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Hi, Robin.

I have fixed patch for memory-hog:
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643418.html

I will commit it after the testing.

But compile-time hog still exists which is loop invariant motion PASS.

with -fno-move-loop-invariants, we become quite faster.

Could you take a look at it ?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (17 preceding siblings ...)
  2024-01-19  8:23 ` juzhe.zhong at rivai dot ai
@ 2024-01-19  8:41 ` juzhe.zhong at rivai dot ai
  2024-01-19  9:23 ` rguenth at gcc dot gnu.org
                   ` (14 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19  8:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #19 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to JuzheZhong from comment #18)
> Hi, Robin.
> 
> I have fixed patch for memory-hog:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643418.html
> 
> I will commit it after the testing.
> 
> But compile-time hog still exists which is loop invariant motion PASS.
> 
> with -fno-move-loop-invariants, we become quite faster.
> 
> Could you take a look at it ?

Note that with default -march=rv64gcv_zvl256b  -O3:
real    63m18.771s
user    60m19.036s
sys     2m59.787s

But with -march=rv64gcv_zvl256b -O3 -fno-move-loop-invariants:
real    6m52.984s
user    6m42.473s
sys     0m10.375s

10 times faster without loop invariant motion.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (18 preceding siblings ...)
  2024-01-19  8:41 ` juzhe.zhong at rivai dot ai
@ 2024-01-19  9:23 ` rguenth at gcc dot gnu.org
  2024-01-19  9:24 ` rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-19  9:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=111241,
                   |                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=46590

--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> ---
IIRC there's a duplicate for this.  It's df_analyze_loop calling
df_reorganize_refs_* which is doing O(function-size) work for each loop.

With -O3 and vectorization the number of loops tends to blow up, making the
issue worse.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (19 preceding siblings ...)
  2024-01-19  9:23 ` rguenth at gcc dot gnu.org
@ 2024-01-19  9:24 ` rguenth at gcc dot gnu.org
  2024-01-19  9:28 ` juzhe.zhong at rivai dot ai
                   ` (12 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-19  9:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
I once tried to avoid df_reorganize_refs and/or optimize this with the blocks
involved but failed.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (20 preceding siblings ...)
  2024-01-19  9:24 ` rguenth at gcc dot gnu.org
@ 2024-01-19  9:28 ` juzhe.zhong at rivai dot ai
  2024-01-19  9:35 ` kito at gcc dot gnu.org
                   ` (11 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19  9:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #22 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Richard Biener from comment #21)
> I once tried to avoid df_reorganize_refs and/or optimize this with the
> blocks involved but failed.

I am considering whether we should disable LICM for RISC-V by default if vector
is enabled ?
Since the compile time explode 10 times is really horrible.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (21 preceding siblings ...)
  2024-01-19  9:28 ` juzhe.zhong at rivai dot ai
@ 2024-01-19  9:35 ` kito at gcc dot gnu.org
  2024-01-19 10:03 ` cvs-commit at gcc dot gnu.org
                   ` (10 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: kito at gcc dot gnu.org @ 2024-01-19  9:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #23 from Kito Cheng <kito at gcc dot gnu.org> ---
> I am considering whether we should disable LICM for RISC-V by default if vector is enabled ?

That's will cause regression for other program, also may hurt those program not
vectorized but benefited from LICM.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (22 preceding siblings ...)
  2024-01-19  9:35 ` kito at gcc dot gnu.org
@ 2024-01-19 10:03 ` cvs-commit at gcc dot gnu.org
  2024-01-19 10:05 ` juzhe.zhong at rivai dot ai
                   ` (9 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-19 10:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #24 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Robin Dapp <rdapp@gcc.gnu.org>:

https://gcc.gnu.org/g:01260a823073675e13dd1fc85cf2657a5396adf2

commit r14-8282-g01260a823073675e13dd1fc85cf2657a5396adf2
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Fri Jan 19 16:34:25 2024 +0800

    RISC-V: Fix RVV_VLMAX

    This patch fixes memory hog found in SPEC2017 wrf benchmark which caused by
    RVV_VLMAX since RVV_VLMAX generate brand new rtx by gen_rtx_REG (Pmode,
X0_REGNUM)
    every time we call RVV_VLMAX, that is, we are always generating garbage and
redundant
    (reg:DI 0 zero) rtx.

    After this patch fix, the memory hog is gone.

    Time variable                                   usr           sys         
wall           GGC
     machine dep reorg                  :   1.99 (  9%)   0.35 ( 56%)   2.33 (
10%)   939M ( 80%) [Before this patch]
     machine dep reorg                  :   1.71 (  6%)   0.16 ( 27%)   3.77 ( 
6%)   659k (  0%) [After this patch]

    Time variable                                   usr           sys         
wall           GGC
     machine dep reorg                  :  75.93 ( 18%)  14.23 ( 88%)  90.15 (
21%) 33383M ( 95%) [Before this patch]
     machine dep reorg                  :  56.00 ( 14%)   7.92 ( 77%)  63.93 (
15%)  4361k (  0%) [After this patch]

    Test is running. Ok for trunk if I passed the test with no regresion ?

            PR target/113495

    gcc/ChangeLog:

            * config/riscv/riscv-protos.h (RVV_VLMAX): Change to
regno_reg_rtx[X0_REGNUM].
            (RVV_VUNDEF): Ditto.
            * config/riscv/riscv-vsetvl.cc: Add timevar.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (23 preceding siblings ...)
  2024-01-19 10:03 ` cvs-commit at gcc dot gnu.org
@ 2024-01-19 10:05 ` juzhe.zhong at rivai dot ai
  2024-01-19 10:22 ` rguenther at suse dot de
                   ` (8 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-19 10:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #25 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
RISC-V backend memory-hog issue is fixed.
But compile time hog in LICM still there, so keep this PR open.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (24 preceding siblings ...)
  2024-01-19 10:05 ` juzhe.zhong at rivai dot ai
@ 2024-01-19 10:22 ` rguenther at suse dot de
  2024-01-22 11:42 ` rdapp at gcc dot gnu.org
                   ` (7 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: rguenther at suse dot de @ 2024-01-19 10:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #26 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 19 Jan 2024, juzhe.zhong at rivai dot ai wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
> 
> --- Comment #22 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> (In reply to Richard Biener from comment #21)
> > I once tried to avoid df_reorganize_refs and/or optimize this with the
> > blocks involved but failed.
> 
> I am considering whether we should disable LICM for RISC-V by default if vector
> is enabled ?
> Since the compile time explode 10 times is really horrible.

I think that's a bad idea.  It only explodes for some degenerate cases.
The best would be to fix invariant motion to keep DF up-to-date so
it can stop using df_analyze_loop and instead analyze the whole function.
Or maybe change it to use the rtl-ssa framework instead.

There's already param_loop_invariant_max_bbs_in_loop:

  /* Process the loops, innermost first.  */
  for (auto loop : loops_list (cfun, LI_FROM_INNERMOST))
    {
      curr_loop = loop;
      /* move_single_loop_invariants for very large loops is time 
consuming
         and might need a lot of memory.  For -O1 only do loop invariant
         motion for very small loops.  */
      unsigned max_bbs = param_loop_invariant_max_bbs_in_loop;
      if (optimize < 2)
        max_bbs /= 10;
      if (loop->num_nodes <= max_bbs)
        move_single_loop_invariants (loop);
    }

it might be possible to restrict invariant motion to innermost loops
when the overall number of loops is too large (with a new param
for that).  And when the number of innermost loops also exceeds
the limit avoid even that?  The above also misses a
optimize_loop_for_speed_p (loop) check (probably doesn't make
a difference, but you could try).

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (25 preceding siblings ...)
  2024-01-19 10:22 ` rguenther at suse dot de
@ 2024-01-22 11:42 ` rdapp at gcc dot gnu.org
  2024-01-22 11:51 ` juzhe.zhong at rivai dot ai
                   ` (6 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: rdapp at gcc dot gnu.org @ 2024-01-22 11:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #27 from Robin Dapp <rdapp at gcc dot gnu.org> ---
Following up on this:

I'm seeing the same thing Patrick does.  We create a lot of large non-sparse
sbitmaps that amount to around 33G in total.

I did local experiments replacing all sbitmaps that are not needed for LCM by
regular bitmaps.  Apart from output differences vs the original version the
testsuite is unchanged.

As expected, wrf now takes longer to compiler, 8 mins vs 4ish mins before and
we still use 2.7G of RAM for this single file (Likely because of the remaining
sbitmaps) compared to a max of 1.2ish G that the rest of the commpilation uses.

One possibility to get the best of both worlds would be to threshold based on
num_bbs * num_exprs.  Once we exceed it switch to the bitmap pass, otherwise
keep sbitmaps for performance. 

Messaging with Juzhe offline, his best guess for the LICM time is that he
enabled checking for dataflow which slows down this particular compilation by a
lot.  Therefore it doesn't look like a generic problem.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (26 preceding siblings ...)
  2024-01-22 11:42 ` rdapp at gcc dot gnu.org
@ 2024-01-22 11:51 ` juzhe.zhong at rivai dot ai
  2024-01-22 12:00 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-22 11:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #28 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Robin Dapp from comment #27)
> Following up on this:
> 
> I'm seeing the same thing Patrick does.  We create a lot of large non-sparse
> sbitmaps that amount to around 33G in total.
> 
> I did local experiments replacing all sbitmaps that are not needed for LCM
> by regular bitmaps.  Apart from output differences vs the original version
> the testsuite is unchanged.
> 
> As expected, wrf now takes longer to compiler, 8 mins vs 4ish mins before
> and we still use 2.7G of RAM for this single file (Likely because of the
> remaining sbitmaps) compared to a max of 1.2ish G that the rest of the
> commpilation uses.
> 
> One possibility to get the best of both worlds would be to threshold based
> on num_bbs * num_exprs.  Once we exceed it switch to the bitmap pass,
> otherwise keep sbitmaps for performance. 
> 
> Messaging with Juzhe offline, his best guess for the LICM time is that he
> enabled checking for dataflow which slows down this particular compilation
> by a lot.  Therefore it doesn't look like a generic problem.

Thanks. I don't think replacing sbitmap is the best solution.
Let's me first disable DF check and reproduce 33G memory consumption in my
local
machine.

I think the best way to optimize the memory consumption is to optimize the
VSETLV PASS algorithm and codes. I have an idea to optimize.
I am gonna work on it.

Thanks for reporting.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (27 preceding siblings ...)
  2024-01-22 11:51 ` juzhe.zhong at rivai dot ai
@ 2024-01-22 12:00 ` rguenth at gcc dot gnu.org
  2024-01-22 13:21 ` juzhe.zhong at rivai dot ai
                   ` (4 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-22 12:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #29 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #26)
> On Fri, 19 Jan 2024, juzhe.zhong at rivai dot ai wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
> > 
> > --- Comment #22 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> > (In reply to Richard Biener from comment #21)
> > > I once tried to avoid df_reorganize_refs and/or optimize this with the
> > > blocks involved but failed.
> > 
> > I am considering whether we should disable LICM for RISC-V by default if vector
> > is enabled ?
> > Since the compile time explode 10 times is really horrible.
> 
> I think that's a bad idea.  It only explodes for some degenerate cases.
> The best would be to fix invariant motion to keep DF up-to-date so
> it can stop using df_analyze_loop and instead analyze the whole function.
> Or maybe change it to use the rtl-ssa framework instead.
> 
> There's already param_loop_invariant_max_bbs_in_loop:
> 
>   /* Process the loops, innermost first.  */
>   for (auto loop : loops_list (cfun, LI_FROM_INNERMOST))
>     {
>       curr_loop = loop;
>       /* move_single_loop_invariants for very large loops is time 
> consuming
>          and might need a lot of memory.  For -O1 only do loop invariant
>          motion for very small loops.  */
>       unsigned max_bbs = param_loop_invariant_max_bbs_in_loop;
>       if (optimize < 2)
>         max_bbs /= 10;
>       if (loop->num_nodes <= max_bbs)
>         move_single_loop_invariants (loop);
>     }
> 
> it might be possible to restrict invariant motion to innermost loops
> when the overall number of loops is too large (with a new param
> for that).  And when the number of innermost loops also exceeds
> the limit avoid even that?  The above also misses a
> optimize_loop_for_speed_p (loop) check (probably doesn't make
> a difference, but you could try).

Ah, sorry - I was mis-matching LICM to invariant motion above, still
invariant motion is the biggest offender (might be due to DF checking
if you enabled that).

As for sbitmap vs. bitmap it's a difficult call.  When there's big
profile hits on individual bit operations (bitmap_bit_p, bitmap_set_bit)
it might may off to use bitmap but with tree view.  There's also
sparseset but that requires even more memory.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (28 preceding siblings ...)
  2024-01-22 12:00 ` rguenth at gcc dot gnu.org
@ 2024-01-22 13:21 ` juzhe.zhong at rivai dot ai
  2024-01-22 15:04 ` juzhe.zhong at rivai dot ai
                   ` (3 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-22 13:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #30 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Ok. I believe m_avl_def_in && m_avl_def_out can be removed with a better
algorthm.

Then the memory-hog should be fixed soon.

I am gonna rewrite avl_vl_unmodified_between_p and trigger full coverage
testingl
Since it's going to be a big change there.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (29 preceding siblings ...)
  2024-01-22 13:21 ` juzhe.zhong at rivai dot ai
@ 2024-01-22 15:04 ` juzhe.zhong at rivai dot ai
  2024-01-24  0:30 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-22 15:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #31 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
machine dep reorg                  : 403.69 ( 56%)  23.48 ( 93%) 427.17 ( 57%) 
5290k (  0%)

Confirm remove RTL DF checking, LICM is no longer be compile-time hog issue.

VSETVL PASS count 56% compile-time.

Even though I can' see memory-hog in GGC -ftime-report, I can see 33G memory
usage
in htop.

Confirm both compile-hog and memory-hog are VSETVL PASS issue.

I will work on optimize compile-time as well as memory-usage of VSETVL PASS.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (30 preceding siblings ...)
  2024-01-22 15:04 ` juzhe.zhong at rivai dot ai
@ 2024-01-24  0:30 ` cvs-commit at gcc dot gnu.org
  2024-01-31  0:29 ` cvs-commit at gcc dot gnu.org
  2024-01-31  1:25 ` juzhe.zhong at rivai dot ai
  33 siblings, 0 replies; 35+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-24  0:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #32 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:3132d2d36b4705bb762e61b1c8ca4da7c78a8321

commit r14-8378-g3132d2d36b4705bb762e61b1c8ca4da7c78a8321
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Tue Jan 23 18:12:49 2024 +0800

    RISC-V: Fix large memory usage of VSETVL PASS [PR113495]

    SPEC 2017 wrf benchmark expose unreasonble memory usage of VSETVL PASS
    that is, VSETVL PASS consume over 33 GB memory which make use impossible
    to compile SPEC 2017 wrf in a laptop.

    The root cause is wasting-memory variables:

    unsigned num_exprs = num_bbs * num_regs;
    sbitmap *avl_def_loc = sbitmap_vector_alloc (num_bbs, num_exprs);
    sbitmap *m_kill = sbitmap_vector_alloc (num_bbs, num_exprs);
    m_avl_def_in = sbitmap_vector_alloc (num_bbs, num_exprs);
    m_avl_def_out = sbitmap_vector_alloc (num_bbs, num_exprs);

    I find that compute_avl_def_data can be achieved by RTL_SSA framework.
    Replace the code implementation base on RTL_SSA framework.

    After this patch, the memory-hog issue is fixed.

    simple vsetvl memory usage (valgrind --tool=massif --pages-as-heap=yes
--massif-out-file=massif.out)
    is 1.673 GB.

    lazy vsetvl memory usage (valgrind --tool=massif --pages-as-heap=yes
--massif-out-file=massif.out)
    is 2.441 GB.

    Tested on both RV32 and RV64, no regression.

    gcc/ChangeLog:

            PR target/113495
            * config/riscv/riscv-vsetvl.cc (get_expr_id): Remove.
            (get_regno): Ditto.
            (get_bb_index): Ditto.
            (pre_vsetvl::compute_avl_def_data): Ditto.
            (pre_vsetvl::earliest_fuse_vsetvl_info): Fix large memory usage.
            (pre_vsetvl::pre_global_vsetvl_info): Ditto.

    gcc/testsuite/ChangeLog:

            PR target/113495
            * gcc.target/riscv/rvv/vsetvl/avl_single-107.c: Adapt test.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (31 preceding siblings ...)
  2024-01-24  0:30 ` cvs-commit at gcc dot gnu.org
@ 2024-01-31  0:29 ` cvs-commit at gcc dot gnu.org
  2024-01-31  1:25 ` juzhe.zhong at rivai dot ai
  33 siblings, 0 replies; 35+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-31  0:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

--- Comment #33 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:9dd10de15b183f7b662905e1383fdc3a08755f2e

commit r14-8639-g9dd10de15b183f7b662905e1383fdc3a08755f2e
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Mon Jan 29 19:32:02 2024 +0800

    RISC-V: Fix VSETLV PASS compile-time issue

    The compile time issue was discovered in SPEC 2017 wrf:

    Use time and -ftime-report to analyze the profile data of SPEC 2017 wrf
compilation .

    Before this patch (Lazy vsetvl):

    scheduling                         : 121.89 ( 15%)   0.53 ( 11%) 122.72 (
15%)    13M (  1%)
    machine dep reorg                  : 424.61 ( 53%)   1.84 ( 37%) 427.44 (
53%)  5290k (  0%)
    real    13m27.074s
    user    13m19.539s
    sys     0m5.180s

    Simple vsetvl:

    machine dep reorg                  :   0.10 (  0%)   0.00 (  0%)   0.11 ( 
0%)  4138k (  0%)
    real    6m5.780s
    user    6m2.396s
    sys     0m2.373s

    The machine dep reorg is the compile time of VSETVL PASS (424 seconds)
which counts 53% of
    the compilation time, spends much more time than scheduling.

    After investigation, the critical patch of VSETVL pass is
compute_lcm_local_properties which
    is called every iteration of phase 2 (earliest fusion) and phase 3 (global
lcm).

    This patch optimized the codes of compute_lcm_local_properties to reduce
the compilation time.

    After this patch:

    scheduling                         : 117.51 ( 27%)   0.21 (  6%) 118.04 (
27%)    13M (  1%)
    machine dep reorg                  :  80.13 ( 18%)   0.91 ( 26%)  81.26 (
18%)  5290k (  0%)
    real    7m25.374s
    user    7m20.116s
    sys     0m3.795s

    The optimization of this patch is very obvious, lazy VSETVL PASS: 424s
(53%) -> 80s (18%) which
    spend less time than scheduling.

    Tested on both RV32 and RV64 no regression.  Ok for trunk ?

            PR target/113495

    gcc/ChangeLog:

            * config/riscv/riscv-vsetvl.cc (extract_single_source): Remove.
            (pre_vsetvl::compute_vsetvl_def_data): Fix compile time issue.
            (pre_vsetvl::compute_transparent): New function.
            (pre_vsetvl::compute_lcm_local_properties): Fix compile time time
issue.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug rtl-optimization/113495] RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark
  2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
                   ` (32 preceding siblings ...)
  2024-01-31  0:29 ` cvs-commit at gcc dot gnu.org
@ 2024-01-31  1:25 ` juzhe.zhong at rivai dot ai
  33 siblings, 0 replies; 35+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-31  1:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495

JuzheZhong <juzhe.zhong at rivai dot ai> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #34 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Fixed.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2024-01-31  1:25 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-19  1:22 [Bug c/113495] New: RISC-V: Time and memory awful consumption of SPEC2017 wrf benchmark juzhe.zhong at rivai dot ai
2024-01-19  1:36 ` [Bug tree-optimization/113495] " juzhe.zhong at rivai dot ai
2024-01-19  1:38 ` juzhe.zhong at rivai dot ai
2024-01-19  1:48 ` juzhe.zhong at rivai dot ai
2024-01-19  1:52 ` juzhe.zhong at rivai dot ai
2024-01-19  1:55 ` [Bug rtl-optimization/113495] " pinskia at gcc dot gnu.org
2024-01-19  1:56 ` juzhe.zhong at rivai dot ai
2024-01-19  3:08 ` patrick at rivosinc dot com
2024-01-19  3:12 ` pinskia at gcc dot gnu.org
2024-01-19  3:14 ` pinskia at gcc dot gnu.org
2024-01-19  3:33 ` juzhe.zhong at rivai dot ai
2024-01-19  3:34 ` juzhe.zhong at rivai dot ai
2024-01-19  3:44 ` juzhe.zhong at rivai dot ai
2024-01-19  3:46 ` juzhe.zhong at rivai dot ai
2024-01-19  3:52 ` juzhe.zhong at rivai dot ai
2024-01-19  3:56 ` pinskia at gcc dot gnu.org
2024-01-19  3:58 ` juzhe.zhong at rivai dot ai
2024-01-19  4:00 ` juzhe.zhong at rivai dot ai
2024-01-19  8:23 ` juzhe.zhong at rivai dot ai
2024-01-19  8:41 ` juzhe.zhong at rivai dot ai
2024-01-19  9:23 ` rguenth at gcc dot gnu.org
2024-01-19  9:24 ` rguenth at gcc dot gnu.org
2024-01-19  9:28 ` juzhe.zhong at rivai dot ai
2024-01-19  9:35 ` kito at gcc dot gnu.org
2024-01-19 10:03 ` cvs-commit at gcc dot gnu.org
2024-01-19 10:05 ` juzhe.zhong at rivai dot ai
2024-01-19 10:22 ` rguenther at suse dot de
2024-01-22 11:42 ` rdapp at gcc dot gnu.org
2024-01-22 11:51 ` juzhe.zhong at rivai dot ai
2024-01-22 12:00 ` rguenth at gcc dot gnu.org
2024-01-22 13:21 ` juzhe.zhong at rivai dot ai
2024-01-22 15:04 ` juzhe.zhong at rivai dot ai
2024-01-24  0:30 ` cvs-commit at gcc dot gnu.org
2024-01-31  0:29 ` cvs-commit at gcc dot gnu.org
2024-01-31  1:25 ` juzhe.zhong at rivai dot ai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).