From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=dNNQ=HB=linaro.org=maxim.kuvyrkov@sourceware.org>
Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b])
	by sourceware.org (Postfix) with ESMTPS id F15F73858D33
	for <gcc-patches@gcc.gnu.org>; Mon, 20 Nov 2023 19:00:08 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F15F73858D33
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F15F73858D33
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::42b
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700506811; cv=none;
	b=SYIn65BqStGUFNUT69sCcaiJVi4N1qTpOCc72emh4Rg2L9+4JUMyr+JLonYCSYjtoie5OagmKuwMx0qQMeQOFINahDMyLL6JdqP+B/BRUugmJhgpjp/O61InsGGTY/fPfNmtRUYd4JIOlVqMafXoEb+kQkG96CHU1/gg/Eg5c90=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1700506811; c=relaxed/simple;
	bh=iRTxIehOLIMW3LtIeAsJiZp6o6/si8NYvuobtndDwxA=;
	h=DKIM-Signature:Mime-Version:Subject:From:Date:Message-Id:To; b=jKIVBsOiXRu895af8xpAZ54EnxoD+Kfm+DTiNckGfoyhyuvEwFFbUSUtlOwFswKXOg+WEAGED9NYRFoZmVdrARr3uOOElPcjRXBNhrvjoV271xaA5SXbwyafE5Aqwh4chtw6+OQoPhqWV+wD8C2SI5ZRv0+yNvETFTDIEPCPqUM=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-wr1-x42b.google.com with SMTP id ffacd0b85a97d-3316db2c5c0so709925f8f.1
        for <gcc-patches@gcc.gnu.org>; Mon, 20 Nov 2023 11:00:08 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google; t=1700506807; x=1701111607; darn=gcc.gnu.org;
        h=to:references:message-id:content-transfer-encoding:cc:date
         :in-reply-to:from:subject:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=iZNbkUT792Qni19eQ50mMCtt+cd+vqPVlgx6QfT3Mhk=;
        b=MIlVF6nXe5Ty5RMB2p9QZOPcaa9g6HhU8saGIK6Y47lmzy36yvoY+7742uibJPGwYj
         PbBqZQw44DZye6a74l1HJQzOIFJWGD/yWQLORy0zEE9pHHIafKl839bff0m6notlJco6
         +gfaZCKSIkmuNAIQimsqy3vW0XUBFF/4I6n+LGwdL3kO2Nnu6M+0MrbL7GRcqRwv8S2/
         yC3snzlOSQb45WuszyeV1gle6jIilaBWBI1JMPBW+1RvdOImsLhEb6RcNFLtLVV6xB+e
         1tXf9I9ZqO4mh1mKrojY09330UHOYXgy2lXvR6PSSsYPPcNZ4PvtOQFY2kAlTrhGepyJ
         gVdg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1700506807; x=1701111607;
        h=to:references:message-id:content-transfer-encoding:cc:date
         :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=iZNbkUT792Qni19eQ50mMCtt+cd+vqPVlgx6QfT3Mhk=;
        b=h6fs5Jk3oPKQc5JjMbLLiMKyHzm5PdpwcGnqUiUi7sT8h5fEhC3kmWyK5isyU1ZNPe
         rz0Zbj78Vv5Kq3ZEd4/ZqdsiD7KkyYqbXMIQFOcxgBfLFwiVagj5PugMpSZ9BPrLekMo
         vbW7ffo5OIWBiRrACXTrEeOGJI1hJmy08g1eTgbBYjevoOqqY+nKcXQV6XMjcHuxBMyg
         z+iMponIRXeN/y6+v53Sy97j6DzcLXuMmm/B1Qt3DtrdWRgPXMpFIJ1ou1nTChB0KUuP
         C49/F7rfpwzLbMZbi72Nokd6n/qkZp6NgFtznksoXBLaMorKdiMDBSIPc3tDC1Y7VQIC
         F7gw==
X-Gm-Message-State: AOJu0YzMjjOhnMEambtyiiQ/rSU/mY9QMarApeS1o7KnexQuYBFbrl/5
	A1WuaaRKiEzhjQNVTMc7lJZw
X-Google-Smtp-Source: AGHT+IEpWusicBn63lwe0mboKluxP+uzcAN2zSPCGbFRNgncJ2rH3HcBm4fZX56xY5aA1950LHPNAA==
X-Received: by 2002:a5d:6c69:0:b0:332:c6ae:ea03 with SMTP id r9-20020a5d6c69000000b00332c6aeea03mr4020139wrz.1.1700506807429;
        Mon, 20 Nov 2023 11:00:07 -0800 (PST)
Received: from smtpclient.apple ([194.35.123.130])
        by smtp.gmail.com with ESMTPSA id b11-20020a05600010cb00b00332c0d256c5sm7096717wrx.80.2023.11.20.11.00.06
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Mon, 20 Nov 2023 11:00:06 -0800 (PST)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6\))
Subject: Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid
 exponential behavior
From: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
In-Reply-To: <CAFiYyc3Sa66W0jDqOdcBajMYhGURzphFG8Rqk6JLfKtO7Zj8PQ@mail.gmail.com>
Date: Mon, 20 Nov 2023 22:59:54 +0400
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
 Bernd Schmidt <bernds_cb1@t-online.de>,
 Vladimir Makarov <vmakarov@redhat.com>,
 Jeff Law <jeffreyalaw@gmail.com>
Content-Transfer-Encoding: quoted-printable
Message-Id: <FA240D42-8A4F-4504-B16C-5630DEB2FF0A@linaro.org>
References: <20231120120649.672893-1-maxim.kuvyrkov@linaro.org>
 <20231120120649.672893-2-maxim.kuvyrkov@linaro.org>
 <CAFiYyc04uW9eBGf=t4r29FZspeuMpnSyCPRAht1cyb-9aEYwEA@mail.gmail.com>
 <07EBD741-EFBC-4D91-8882-CF64E97F0A33@linaro.org>
 <CAFiYyc3Sa66W0jDqOdcBajMYhGURzphFG8Rqk6JLfKtO7Zj8PQ@mail.gmail.com>
To: Richard Guenther <richard.guenther@gmail.com>
X-Mailer: Apple Mail (2.3731.700.6)
X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

> On Nov 20, 2023, at 17:45, Richard Biener <richard.guenther@gmail.com> =
wrote:
>=20
> On Mon, Nov 20, 2023 at 2:42=E2=80=AFPM Maxim Kuvyrkov
> <maxim.kuvyrkov@linaro.org> wrote:
>>=20
>>> On Nov 20, 2023, at 17:09, Richard Biener =
<richard.guenther@gmail.com> wrote:
>>>=20
>>> On Mon, Nov 20, 2023 at 1:08=E2=80=AFPM Maxim Kuvyrkov
>>> <maxim.kuvyrkov@linaro.org> wrote:
>>>>=20
>>>> This patch avoids sched-deps.cc:find_inc() creating exponential =
number
>>>> of dependencies, which become memory and compilation time hogs.
>>>> Consider example (simplified from PR96388) ...
>>>> =3D=3D=3D
>>>> sp=3Dsp-4 // sp_insnA
>>>> mem_insnA1[sp+A1]
>>>> ...
>>>> mem_insnAN[sp+AN]
>>>> sp=3Dsp-4 // sp_insnB
>>>> mem_insnB1[sp+B1]
>>>> ...
>>>> mem_insnBM[sp+BM]
>>>> =3D=3D=3D
>>>> ... in this example find_modifiable_mems() will arrange for =
mem_insnA*
>>>> to be able to pass sp_insnA, and, while doing this, will create
>>>> dependencies between all mem_insnA*s and sp_insnB -- because =
sp_insnB
>>>> is a consumer of sp_insnA.  After this sp_insnB will have N new
>>>> backward dependencies.
>>>> Then find_modifiable_mems() gets to mem_insnB*s and starts to =
create
>>>> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
>>>> dependencies.
>>>>=20
>>>> In PR96833's testcase N and M are 10k-15k, which causes RAM usage =
of
>>>> 30GB and compilation time of 30 minutes, with sched2 accounting for
>>>> 95% of both metrics.  After this patch the RAM usage is down to 1GB
>>>> and compilation time is down to 3-4 minutes, with sched2 no longer
>>>> standing out on -ftime-report or memory usage.
>>>>=20
>>>> gcc/ChangeLog:
>>>>=20
>>>>       PR rtl-optimization/96388
>>>>       PR rtl-optimization/111554
>>>>       * sched-deps.cc (find_inc): Avoid exponential behavior.
>>>> ---
>>>> gcc/sched-deps.cc | 45 =
+++++++++++++++++++++++++++++++++++++++++----
>>>> 1 file changed, 41 insertions(+), 4 deletions(-)
>>>>=20
>>>> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
>>>> index c23218890f3..397bb9fd462 100644
>>>> --- a/gcc/sched-deps.cc
>>>> +++ b/gcc/sched-deps.cc
>>>> @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, =
rtx_insn *insn, bool before_mem)
>>>> /* Once a suitable mem reference has been found and the =
corresponding data
>>>>   in MII has been filled in, this function is called to find a =
suitable
>>>>   add or inc insn involving the register we found in the memory
>>>> -   reference.  */
>>>> +   reference.
>>>> +   If successful, this function will create additional =
dependencies between
>>>> +   - mii->inc_insn's producers and mii->mem_insn as a consumer (if =
backwards)
>>>> +   - mii->inc_insn's consumers and mii->mem_insn as a producer (if =
!backwards).
>>>> +*/
>>>>=20
>>>> static bool
>>>> find_inc (struct mem_inc_info *mii, bool backwards)
>>>> {
>>>>  sd_iterator_def sd_it;
>>>>  dep_t dep;
>>>> +  sd_list_types_def mem_deps =3D backwards ? SD_LIST_HARD_BACK : =
SD_LIST_FORW;
>>>> +  int n_mem_deps =3D sd_lists_size (mii->mem_insn, mem_deps);
>>>>=20
>>>> -  sd_it =3D sd_iterator_start (mii->mem_insn,
>>>> -                            backwards ? SD_LIST_HARD_BACK : =
SD_LIST_FORW);
>>>> +  sd_it =3D sd_iterator_start (mii->mem_insn, mem_deps);
>>>>  while (sd_iterator_cond (&sd_it, &dep))
>>>>    {
>>>>      dep_node_t node =3D DEP_LINK_NODE (*sd_it.linkp);
>>>>      rtx_insn *pro =3D DEP_PRO (dep);
>>>>      rtx_insn *con =3D DEP_CON (dep);
>>>> -      rtx_insn *inc_cand =3D backwards ? pro : con;
>>>> +      rtx_insn *inc_cand;
>>>> +      int n_inc_deps;
>>>> +
>>>> +      if (backwards)
>>>> +       {
>>>> +         inc_cand =3D pro;
>>>> +         n_inc_deps =3D sd_lists_size (inc_cand, SD_LIST_BACK);
>>>> +       }
>>>> +      else
>>>> +       {
>>>> +         inc_cand =3D con;
>>>> +         n_inc_deps =3D sd_lists_size (inc_cand, SD_LIST_FORW);
>>>> +       }
>>>> +
>>>> +      /* In the FOR_EACH_DEP loop below we will create additional =
n_inc_deps
>>>> +        for mem_insn.  This by itself is not a problem, since each =
mem_insn
>>>> +        will have only a few inc_insns associated with it.  =
However, if
>>>> +        we consider that a single inc_insn may have a lot of =
mem_insns, AND,
>>>> +        on top of that, a few other inc_insns associated with it =
--
>>>> +        those _other inc_insns_ will get (n_mem_deps * number of =
MEM insns)
>>>> +        dependencies created for them.  This may cause an =
exponential
>>>> +        growth of memory usage and scheduling time.
>>>> +        See PR96388 for details.
>>>> +        We [heuristically] use n_inc_deps as a proxy for the =
number of MEM
>>>> +        insns, and drop opportunities for breaking modifiable_mem =
dependencies
>>>> +        when dependency lists grow beyond reasonable size.  */
>>>> +      if (n_mem_deps * n_inc_deps
>>>> +         >=3D param_max_pending_list_length * =
param_max_pending_list_length)
>>>> +       goto next;
>>>> +
>>>>      if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))
>>>=20
>>> it looks like this check is a lot cheaper than computing =
sd_lists_size so
>>> can we keep that first?  sd_lists_size might be even more expensive
>>> than parse_add_or_inc,
>>=20
>> sd_lists_size() is cheap, it doesn't walk or count dependencies; it =
just adds 2-3 integers together.
>>=20
>> The reason why sd_lists_size() has a loop is to be able to =
transparently handle dependencies split among sub-lists, e.g., =
sd_lists_size(SD_LIST_HARD_BACK | SD_LIST_SPEC_BACK) will return total =
number of backward dependencies.
>=20
> I see.  It still better to move the DEP_NONREG/DEP_MULTIPLE checks,
> they do not depend on any
> of the code above it, no?

Yes.  Adjusted in v2.

Thanks,

--
Maxim Kuvyrkov
https://www.linaro.org