From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12e.google.com (mail-lf1-x12e.google.com [IPv6:2a00:1450:4864:20::12e]) by sourceware.org (Postfix) with ESMTPS id 4B0303858C2D for ; Mon, 20 Nov 2023 13:49:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4B0303858C2D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4B0303858C2D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::12e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700488148; cv=none; b=ZQl7z9wwmi0tVhuxf1lOtn+dr2ZkHdEt5lKz4HvkEVyjVTpjvcRU8xmRGnv9iOCUzwUp1QTrQUzW7uTXOhIvrTfid0Cn+kNX4PJPmBXnGsI1TrE3gYs0KiewrQWpNiek89dOtE6HfjXLsy3LQTyLo8ipzRFAcMSShYlzsxjSdqg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700488148; c=relaxed/simple; bh=wLcdOLs2XrYSgVuidiPfcds98dDlLZU/umatsMfC8bE=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=EtS64SyhydeOKXHKbPwhme07wxwu7UJD/X8gmZcxZ1ktgUcAg8hcnR5vf9lYgSKn9z5oTr4BnMu2fjXATMNi3fQXlSyFhF7CFw89t4OiR1l9MxVExWLZxzdnCjRLBcz7GU7Er/AhDmUBCEn5BTmLLexzBMIQZoONAVI/Lz6PLtE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x12e.google.com with SMTP id 2adb3069b0e04-507cd62472dso5752359e87.0 for ; Mon, 20 Nov 2023 05:49:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700488144; x=1701092944; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=wrtxZ9/OL/WhsK9PLZac+Gw1bsaawEIzbEL9aUB29mk=; b=IFYZNLmNG8dbYU/iWxhpm38TOoFQctoIiBDQviqssy4a9WiHPEvq8lkJmeubIv4OPS Vp6vJU6wHfZ/S+I+Pp9VKciq38Xt4s116a3pqaCaD7ryvuZdcPDGEnKJFyhLOYctk3Et frlNS/PKCA+r4s0kVK07+NzfxlIs9eaSjOSrBfNg/8I3SBM2SY0TWHxJOnW+dOK5gp2G lmxBuCCNRaIGN0J3SLj7z73oMoSUDTgpTAcW7tXzfs29Ny1vUBaUbTNYIiTUVK0QP1b5 S4bo4OaanQ8vGQEH4PkGGFGu86HavlJrGxEQms7FX9fII6guTcIqGzlYcc1FmEWF1UIh youw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700488144; x=1701092944; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wrtxZ9/OL/WhsK9PLZac+Gw1bsaawEIzbEL9aUB29mk=; b=RXzPqIC6WhUTU8kD68g5IICZ5fFz7CvOPQi7tDnt6LjQ5RhAc76Qv2g0D5cfkjq5X1 BXs5xZy+YoGi7jkijL/weki3w7qZF851ANrPI9obGNOzdmwUZJhhs8E7qLEsG08iq/rK ss+6IBZ4XKNm//OywT8lZrmtrERxm5TPsa/wg8u/nqY2jkE8SYWJntK591CufRVRFzlp 1NMU+vR8bsX87Ejvod64fjLwgVmpJmoVfrsyvslEmjxDwBST4kL6UQaDADguF1UKdSeP ae8fA38Qpj+aPnkWdIb5pFNvKEELJQy0g0di5op5HkVGNZIk3jaZ7O6aeL3IcbCmnVzC MjMg== X-Gm-Message-State: AOJu0Yy3lEaK7qVgikBrS5oR6s8pJN2c6ghUtxa0hvTY/rZ+PbJX4XUq XmNbVTGXt7ZDah7kQm+XKO2gbK6CnRQ3Rwg+Nwo= X-Google-Smtp-Source: AGHT+IGjHDYZFiooXifiH68EdmBpu1GPSHbS/9neY3C+bHAPanL8yX/sfm5hX13Kq3gU1gDbuSltqMB3pmglBbz6t1c= X-Received: by 2002:ac2:515b:0:b0:503:2879:567 with SMTP id q27-20020ac2515b000000b0050328790567mr2187914lfd.28.1700488144453; Mon, 20 Nov 2023 05:49:04 -0800 (PST) MIME-Version: 1.0 References: <20231120120649.672893-1-maxim.kuvyrkov@linaro.org> <20231120120649.672893-2-maxim.kuvyrkov@linaro.org> <07EBD741-EFBC-4D91-8882-CF64E97F0A33@linaro.org> In-Reply-To: <07EBD741-EFBC-4D91-8882-CF64E97F0A33@linaro.org> From: Richard Biener Date: Mon, 20 Nov 2023 14:45:26 +0100 Message-ID: Subject: Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior To: Maxim Kuvyrkov Cc: GCC Patches , Bernd Schmidt , Vladimir Makarov , Jeff Law Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Nov 20, 2023 at 2:42=E2=80=AFPM Maxim Kuvyrkov wrote: > > > On Nov 20, 2023, at 17:09, Richard Biener = wrote: > > > > On Mon, Nov 20, 2023 at 1:08=E2=80=AFPM Maxim Kuvyrkov > > wrote: > >> > >> This patch avoids sched-deps.cc:find_inc() creating exponential number > >> of dependencies, which become memory and compilation time hogs. > >> Consider example (simplified from PR96388) ... > >> =3D=3D=3D > >> sp=3Dsp-4 // sp_insnA > >> mem_insnA1[sp+A1] > >> ... > >> mem_insnAN[sp+AN] > >> sp=3Dsp-4 // sp_insnB > >> mem_insnB1[sp+B1] > >> ... > >> mem_insnBM[sp+BM] > >> =3D=3D=3D > >> ... in this example find_modifiable_mems() will arrange for mem_insnA* > >> to be able to pass sp_insnA, and, while doing this, will create > >> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB > >> is a consumer of sp_insnA. After this sp_insnB will have N new > >> backward dependencies. > >> Then find_modifiable_mems() gets to mem_insnB*s and starts to create > >> N new dependencies for _every_ mem_insnB*. This gets us N*M new > >> dependencies. > >> > >> In PR96833's testcase N and M are 10k-15k, which causes RAM usage of > >> 30GB and compilation time of 30 minutes, with sched2 accounting for > >> 95% of both metrics. After this patch the RAM usage is down to 1GB > >> and compilation time is down to 3-4 minutes, with sched2 no longer > >> standing out on -ftime-report or memory usage. > >> > >> gcc/ChangeLog: > >> > >> PR rtl-optimization/96388 > >> PR rtl-optimization/111554 > >> * sched-deps.cc (find_inc): Avoid exponential behavior. > >> --- > >> gcc/sched-deps.cc | 45 +++++++++++++++++++++++++++++++++++++++++---- > >> 1 file changed, 41 insertions(+), 4 deletions(-) > >> > >> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc > >> index c23218890f3..397bb9fd462 100644 > >> --- a/gcc/sched-deps.cc > >> +++ b/gcc/sched-deps.cc > >> @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rt= x_insn *insn, bool before_mem) > >> /* Once a suitable mem reference has been found and the corresponding = data > >> in MII has been filled in, this function is called to find a suitab= le > >> add or inc insn involving the register we found in the memory > >> - reference. */ > >> + reference. > >> + If successful, this function will create additional dependencies b= etween > >> + - mii->inc_insn's producers and mii->mem_insn as a consumer (if ba= ckwards) > >> + - mii->inc_insn's consumers and mii->mem_insn as a producer (if !b= ackwards). > >> +*/ > >> > >> static bool > >> find_inc (struct mem_inc_info *mii, bool backwards) > >> { > >> sd_iterator_def sd_it; > >> dep_t dep; > >> + sd_list_types_def mem_deps =3D backwards ? SD_LIST_HARD_BACK : SD_L= IST_FORW; > >> + int n_mem_deps =3D sd_lists_size (mii->mem_insn, mem_deps); > >> > >> - sd_it =3D sd_iterator_start (mii->mem_insn, > >> - backwards ? SD_LIST_HARD_BACK : SD_LIST_F= ORW); > >> + sd_it =3D sd_iterator_start (mii->mem_insn, mem_deps); > >> while (sd_iterator_cond (&sd_it, &dep)) > >> { > >> dep_node_t node =3D DEP_LINK_NODE (*sd_it.linkp); > >> rtx_insn *pro =3D DEP_PRO (dep); > >> rtx_insn *con =3D DEP_CON (dep); > >> - rtx_insn *inc_cand =3D backwards ? pro : con; > >> + rtx_insn *inc_cand; > >> + int n_inc_deps; > >> + > >> + if (backwards) > >> + { > >> + inc_cand =3D pro; > >> + n_inc_deps =3D sd_lists_size (inc_cand, SD_LIST_BACK); > >> + } > >> + else > >> + { > >> + inc_cand =3D con; > >> + n_inc_deps =3D sd_lists_size (inc_cand, SD_LIST_FORW); > >> + } > >> + > >> + /* In the FOR_EACH_DEP loop below we will create additional n_i= nc_deps > >> + for mem_insn. This by itself is not a problem, since each me= m_insn > >> + will have only a few inc_insns associated with it. However, = if > >> + we consider that a single inc_insn may have a lot of mem_insn= s, AND, > >> + on top of that, a few other inc_insns associated with it -- > >> + those _other inc_insns_ will get (n_mem_deps * number of MEM = insns) > >> + dependencies created for them. This may cause an exponential > >> + growth of memory usage and scheduling time. > >> + See PR96388 for details. > >> + We [heuristically] use n_inc_deps as a proxy for the number o= f MEM > >> + insns, and drop opportunities for breaking modifiable_mem dep= endencies > >> + when dependency lists grow beyond reasonable size. */ > >> + if (n_mem_deps * n_inc_deps > >> + >=3D param_max_pending_list_length * param_max_pending_list_= length) > >> + goto next; > >> + > >> if (DEP_NONREG (dep) || DEP_MULTIPLE (dep)) > > > > it looks like this check is a lot cheaper than computing sd_lists_size = so > > can we keep that first? sd_lists_size might be even more expensive > > than parse_add_or_inc, > > sd_lists_size() is cheap, it doesn't walk or count dependencies; it just = adds 2-3 integers together. > > The reason why sd_lists_size() has a loop is to be able to transparently = handle dependencies split among sub-lists, e.g., sd_lists_size(SD_LIST_HARD= _BACK | SD_LIST_SPEC_BACK) will return total number of backward dependencie= s. I see. It still better to move the DEP_NONREG/DEP_MULTIPLE checks, they do not depend on any of the code above it, no? Richard. > -- > Maxim Kuvyrkov > https://www.linaro.org > >