From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by sourceware.org (Postfix) with ESMTPS id F15F73858D33 for ; Mon, 20 Nov 2023 19:00:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F15F73858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F15F73858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::42b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700506811; cv=none; b=SYIn65BqStGUFNUT69sCcaiJVi4N1qTpOCc72emh4Rg2L9+4JUMyr+JLonYCSYjtoie5OagmKuwMx0qQMeQOFINahDMyLL6JdqP+B/BRUugmJhgpjp/O61InsGGTY/fPfNmtRUYd4JIOlVqMafXoEb+kQkG96CHU1/gg/Eg5c90= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700506811; c=relaxed/simple; bh=iRTxIehOLIMW3LtIeAsJiZp6o6/si8NYvuobtndDwxA=; h=DKIM-Signature:Mime-Version:Subject:From:Date:Message-Id:To; b=jKIVBsOiXRu895af8xpAZ54EnxoD+Kfm+DTiNckGfoyhyuvEwFFbUSUtlOwFswKXOg+WEAGED9NYRFoZmVdrARr3uOOElPcjRXBNhrvjoV271xaA5SXbwyafE5Aqwh4chtw6+OQoPhqWV+wD8C2SI5ZRv0+yNvETFTDIEPCPqUM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wr1-x42b.google.com with SMTP id ffacd0b85a97d-3316db2c5c0so709925f8f.1 for ; Mon, 20 Nov 2023 11:00:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1700506807; x=1701111607; darn=gcc.gnu.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=iZNbkUT792Qni19eQ50mMCtt+cd+vqPVlgx6QfT3Mhk=; b=MIlVF6nXe5Ty5RMB2p9QZOPcaa9g6HhU8saGIK6Y47lmzy36yvoY+7742uibJPGwYj PbBqZQw44DZye6a74l1HJQzOIFJWGD/yWQLORy0zEE9pHHIafKl839bff0m6notlJco6 +gfaZCKSIkmuNAIQimsqy3vW0XUBFF/4I6n+LGwdL3kO2Nnu6M+0MrbL7GRcqRwv8S2/ yC3snzlOSQb45WuszyeV1gle6jIilaBWBI1JMPBW+1RvdOImsLhEb6RcNFLtLVV6xB+e 1tXf9I9ZqO4mh1mKrojY09330UHOYXgy2lXvR6PSSsYPPcNZ4PvtOQFY2kAlTrhGepyJ gVdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700506807; x=1701111607; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iZNbkUT792Qni19eQ50mMCtt+cd+vqPVlgx6QfT3Mhk=; b=h6fs5Jk3oPKQc5JjMbLLiMKyHzm5PdpwcGnqUiUi7sT8h5fEhC3kmWyK5isyU1ZNPe rz0Zbj78Vv5Kq3ZEd4/ZqdsiD7KkyYqbXMIQFOcxgBfLFwiVagj5PugMpSZ9BPrLekMo vbW7ffo5OIWBiRrACXTrEeOGJI1hJmy08g1eTgbBYjevoOqqY+nKcXQV6XMjcHuxBMyg z+iMponIRXeN/y6+v53Sy97j6DzcLXuMmm/B1Qt3DtrdWRgPXMpFIJ1ou1nTChB0KUuP C49/F7rfpwzLbMZbi72Nokd6n/qkZp6NgFtznksoXBLaMorKdiMDBSIPc3tDC1Y7VQIC F7gw== X-Gm-Message-State: AOJu0YzMjjOhnMEambtyiiQ/rSU/mY9QMarApeS1o7KnexQuYBFbrl/5 A1WuaaRKiEzhjQNVTMc7lJZw X-Google-Smtp-Source: AGHT+IEpWusicBn63lwe0mboKluxP+uzcAN2zSPCGbFRNgncJ2rH3HcBm4fZX56xY5aA1950LHPNAA== X-Received: by 2002:a5d:6c69:0:b0:332:c6ae:ea03 with SMTP id r9-20020a5d6c69000000b00332c6aeea03mr4020139wrz.1.1700506807429; Mon, 20 Nov 2023 11:00:07 -0800 (PST) Received: from smtpclient.apple ([194.35.123.130]) by smtp.gmail.com with ESMTPSA id b11-20020a05600010cb00b00332c0d256c5sm7096717wrx.80.2023.11.20.11.00.06 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 Nov 2023 11:00:06 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6\)) Subject: Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior From: Maxim Kuvyrkov In-Reply-To: Date: Mon, 20 Nov 2023 22:59:54 +0400 Cc: GCC Patches , Bernd Schmidt , Vladimir Makarov , Jeff Law Content-Transfer-Encoding: quoted-printable Message-Id: References: <20231120120649.672893-1-maxim.kuvyrkov@linaro.org> <20231120120649.672893-2-maxim.kuvyrkov@linaro.org> <07EBD741-EFBC-4D91-8882-CF64E97F0A33@linaro.org> To: Richard Guenther X-Mailer: Apple Mail (2.3731.700.6) X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > On Nov 20, 2023, at 17:45, Richard Biener = wrote: >=20 > On Mon, Nov 20, 2023 at 2:42=E2=80=AFPM Maxim Kuvyrkov > wrote: >>=20 >>> On Nov 20, 2023, at 17:09, Richard Biener = wrote: >>>=20 >>> On Mon, Nov 20, 2023 at 1:08=E2=80=AFPM Maxim Kuvyrkov >>> wrote: >>>>=20 >>>> This patch avoids sched-deps.cc:find_inc() creating exponential = number >>>> of dependencies, which become memory and compilation time hogs. >>>> Consider example (simplified from PR96388) ... >>>> =3D=3D=3D >>>> sp=3Dsp-4 // sp_insnA >>>> mem_insnA1[sp+A1] >>>> ... >>>> mem_insnAN[sp+AN] >>>> sp=3Dsp-4 // sp_insnB >>>> mem_insnB1[sp+B1] >>>> ... >>>> mem_insnBM[sp+BM] >>>> =3D=3D=3D >>>> ... in this example find_modifiable_mems() will arrange for = mem_insnA* >>>> to be able to pass sp_insnA, and, while doing this, will create >>>> dependencies between all mem_insnA*s and sp_insnB -- because = sp_insnB >>>> is a consumer of sp_insnA. After this sp_insnB will have N new >>>> backward dependencies. >>>> Then find_modifiable_mems() gets to mem_insnB*s and starts to = create >>>> N new dependencies for _every_ mem_insnB*. This gets us N*M new >>>> dependencies. >>>>=20 >>>> In PR96833's testcase N and M are 10k-15k, which causes RAM usage = of >>>> 30GB and compilation time of 30 minutes, with sched2 accounting for >>>> 95% of both metrics. After this patch the RAM usage is down to 1GB >>>> and compilation time is down to 3-4 minutes, with sched2 no longer >>>> standing out on -ftime-report or memory usage. >>>>=20 >>>> gcc/ChangeLog: >>>>=20 >>>> PR rtl-optimization/96388 >>>> PR rtl-optimization/111554 >>>> * sched-deps.cc (find_inc): Avoid exponential behavior. >>>> --- >>>> gcc/sched-deps.cc | 45 = +++++++++++++++++++++++++++++++++++++++++---- >>>> 1 file changed, 41 insertions(+), 4 deletions(-) >>>>=20 >>>> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc >>>> index c23218890f3..397bb9fd462 100644 >>>> --- a/gcc/sched-deps.cc >>>> +++ b/gcc/sched-deps.cc >>>> @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, = rtx_insn *insn, bool before_mem) >>>> /* Once a suitable mem reference has been found and the = corresponding data >>>> in MII has been filled in, this function is called to find a = suitable >>>> add or inc insn involving the register we found in the memory >>>> - reference. */ >>>> + reference. >>>> + If successful, this function will create additional = dependencies between >>>> + - mii->inc_insn's producers and mii->mem_insn as a consumer (if = backwards) >>>> + - mii->inc_insn's consumers and mii->mem_insn as a producer (if = !backwards). >>>> +*/ >>>>=20 >>>> static bool >>>> find_inc (struct mem_inc_info *mii, bool backwards) >>>> { >>>> sd_iterator_def sd_it; >>>> dep_t dep; >>>> + sd_list_types_def mem_deps =3D backwards ? SD_LIST_HARD_BACK : = SD_LIST_FORW; >>>> + int n_mem_deps =3D sd_lists_size (mii->mem_insn, mem_deps); >>>>=20 >>>> - sd_it =3D sd_iterator_start (mii->mem_insn, >>>> - backwards ? SD_LIST_HARD_BACK : = SD_LIST_FORW); >>>> + sd_it =3D sd_iterator_start (mii->mem_insn, mem_deps); >>>> while (sd_iterator_cond (&sd_it, &dep)) >>>> { >>>> dep_node_t node =3D DEP_LINK_NODE (*sd_it.linkp); >>>> rtx_insn *pro =3D DEP_PRO (dep); >>>> rtx_insn *con =3D DEP_CON (dep); >>>> - rtx_insn *inc_cand =3D backwards ? pro : con; >>>> + rtx_insn *inc_cand; >>>> + int n_inc_deps; >>>> + >>>> + if (backwards) >>>> + { >>>> + inc_cand =3D pro; >>>> + n_inc_deps =3D sd_lists_size (inc_cand, SD_LIST_BACK); >>>> + } >>>> + else >>>> + { >>>> + inc_cand =3D con; >>>> + n_inc_deps =3D sd_lists_size (inc_cand, SD_LIST_FORW); >>>> + } >>>> + >>>> + /* In the FOR_EACH_DEP loop below we will create additional = n_inc_deps >>>> + for mem_insn. This by itself is not a problem, since each = mem_insn >>>> + will have only a few inc_insns associated with it. = However, if >>>> + we consider that a single inc_insn may have a lot of = mem_insns, AND, >>>> + on top of that, a few other inc_insns associated with it = -- >>>> + those _other inc_insns_ will get (n_mem_deps * number of = MEM insns) >>>> + dependencies created for them. This may cause an = exponential >>>> + growth of memory usage and scheduling time. >>>> + See PR96388 for details. >>>> + We [heuristically] use n_inc_deps as a proxy for the = number of MEM >>>> + insns, and drop opportunities for breaking modifiable_mem = dependencies >>>> + when dependency lists grow beyond reasonable size. */ >>>> + if (n_mem_deps * n_inc_deps >>>> + >=3D param_max_pending_list_length * = param_max_pending_list_length) >>>> + goto next; >>>> + >>>> if (DEP_NONREG (dep) || DEP_MULTIPLE (dep)) >>>=20 >>> it looks like this check is a lot cheaper than computing = sd_lists_size so >>> can we keep that first? sd_lists_size might be even more expensive >>> than parse_add_or_inc, >>=20 >> sd_lists_size() is cheap, it doesn't walk or count dependencies; it = just adds 2-3 integers together. >>=20 >> The reason why sd_lists_size() has a loop is to be able to = transparently handle dependencies split among sub-lists, e.g., = sd_lists_size(SD_LIST_HARD_BACK | SD_LIST_SPEC_BACK) will return total = number of backward dependencies. >=20 > I see. It still better to move the DEP_NONREG/DEP_MULTIPLE checks, > they do not depend on any > of the code above it, no? Yes. Adjusted in v2. Thanks, -- Maxim Kuvyrkov https://www.linaro.org