From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12e.google.com (mail-lf1-x12e.google.com [IPv6:2a00:1450:4864:20::12e]) by sourceware.org (Postfix) with ESMTPS id 5EA1C38432CC for ; Mon, 20 Nov 2023 13:12:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5EA1C38432CC Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5EA1C38432CC Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::12e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700485972; cv=none; b=rl6Qya5CIElOyJyXOVHEIBYhRmWlu5uHRglAgCfvQE/hUTZCGPRzehByaLsJK60tvdJuK6VijBmUFESE8PC4pXE5K0D1TXMsSMhBIuEwGmj1uCRHhGxreuuPBxfdQqk3oqGFHyfBJHhBR4Xcw6HIuXNvonstkyHxfStrwSo+vhQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700485972; c=relaxed/simple; bh=grYbixbSTUkV6157yMEDEhF18SVijUA9q+NQWAbXDEw=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=Fa6AjRMtdiuxletnNCXiqrw8dBtrFQPavCkTa6u10VbdlqekrerBPpImcPE//W3W4CNs1MYiEMaHftNGJKgumQCPg/cEMZ75f0lC2Y3X3Z+1MYO3GAE1i4eKZuzyagQAvXq8FI0aj6f9B3WfC+5kjLPCh6iXbP+bmTqQyO4liHQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x12e.google.com with SMTP id 2adb3069b0e04-50aabfa1b75so1483812e87.3 for ; Mon, 20 Nov 2023 05:12:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700485969; x=1701090769; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=twDL0O6Qyavim9JSDRtsiAFbmyUlACiRMd7k/zokPng=; b=T6GbSSYXJxEkoJ70SW6OabBF6Bi1uawnRPkHFKdVf4ecZviZ/rUBL4OVnIjFU7cM83 2yXYbVJuqpOSZRo3AHFNoHndL7iXSDi1bsfjIMOaxwqGNmZWuWqdjhHO7rQLcYXlUuUP 0NKgzxWupGz0B8YLCZjOcf36Woq3nTGiUHdgcz8IrqYo2020aDYLVgHz6W3F3peyyqAU eD8qRS5JGQgsHs8Xt59ao974wnwlHHhi7aFtSVZSqDImHZEWVY5sPBTulf8HlVpdTHZa O8OPFnhpDgKQQ6kQWCokfiLzh+bxo2MhZHUR/C6JXhcsVsUco3T1eD6xGLtyM138CDl+ konQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700485969; x=1701090769; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=twDL0O6Qyavim9JSDRtsiAFbmyUlACiRMd7k/zokPng=; b=n1p9XW+IDpL5LpSWIqXzTHogFJj4xxTM1yVKc+jIR8nZwUJ3a/d2oWp+u4XOQL5v5H aFtIpj44kEXzZ4b6uBsBdWPpXTlEHm5qysLkVYCI3KmBF86b/b4zXx7O37xuUCt6Dsfn qbtqhdM5Ol+FdC4UkBzF8PL0+bZo7EBgxxEfRF0vbJ+foT6/qUJfU5Gju8Sgfrq9zE7O cagFPiu4Rmsis/G4g0yGZ0RLUgB2C1WR3hyeql8vaZr9fe+sWW4jFC7NxWdMHHjFQqX5 82Mmsz4d6x2QjRcqZOKLxX1XF25cOTVDfWsZCNQyYgNGtE8ApHzwHD61W7WMC4ae90X1 MWGw== X-Gm-Message-State: AOJu0YxMjYbeTd9ICcf7IGrWvmMMthZ7K7Pij+KBsf61M473hoOv1Tb/ Hw5KD9fBcRTQqz7YgqUGNOpHSBDjGjhLniNHAA4= X-Google-Smtp-Source: AGHT+IFyTQEwXCNbqYO2EJARVzBPmoA+nx3YVUyLTTV5pmsYLJRB4uJO/98s0t3Iq4hHOPReajXDueAtguVAGnyFdS4= X-Received: by 2002:ac2:5635:0:b0:507:9693:12aa with SMTP id b21-20020ac25635000000b00507969312aamr5282898lff.15.1700485968580; Mon, 20 Nov 2023 05:12:48 -0800 (PST) MIME-Version: 1.0 References: <20231120120649.672893-1-maxim.kuvyrkov@linaro.org> <20231120120649.672893-2-maxim.kuvyrkov@linaro.org> In-Reply-To: <20231120120649.672893-2-maxim.kuvyrkov@linaro.org> From: Richard Biener Date: Mon, 20 Nov 2023 14:09:10 +0100 Message-ID: Subject: Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid exponential behavior To: Maxim Kuvyrkov Cc: gcc-patches@gcc.gnu.org, Bernd Schmidt , Vladimir Makarov , Jeff Law Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Nov 20, 2023 at 1:08=E2=80=AFPM Maxim Kuvyrkov wrote: > > This patch avoids sched-deps.cc:find_inc() creating exponential number > of dependencies, which become memory and compilation time hogs. > Consider example (simplified from PR96388) ... > =3D=3D=3D > sp=3Dsp-4 // sp_insnA > mem_insnA1[sp+A1] > ... > mem_insnAN[sp+AN] > sp=3Dsp-4 // sp_insnB > mem_insnB1[sp+B1] > ... > mem_insnBM[sp+BM] > =3D=3D=3D > ... in this example find_modifiable_mems() will arrange for mem_insnA* > to be able to pass sp_insnA, and, while doing this, will create > dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB > is a consumer of sp_insnA. After this sp_insnB will have N new > backward dependencies. > Then find_modifiable_mems() gets to mem_insnB*s and starts to create > N new dependencies for _every_ mem_insnB*. This gets us N*M new > dependencies. > > In PR96833's testcase N and M are 10k-15k, which causes RAM usage of > 30GB and compilation time of 30 minutes, with sched2 accounting for > 95% of both metrics. After this patch the RAM usage is down to 1GB > and compilation time is down to 3-4 minutes, with sched2 no longer > standing out on -ftime-report or memory usage. > > gcc/ChangeLog: > > PR rtl-optimization/96388 > PR rtl-optimization/111554 > * sched-deps.cc (find_inc): Avoid exponential behavior. > --- > gcc/sched-deps.cc | 45 +++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 41 insertions(+), 4 deletions(-) > > diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc > index c23218890f3..397bb9fd462 100644 > --- a/gcc/sched-deps.cc > +++ b/gcc/sched-deps.cc > @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rtx_i= nsn *insn, bool before_mem) > /* Once a suitable mem reference has been found and the corresponding da= ta > in MII has been filled in, this function is called to find a suitable > add or inc insn involving the register we found in the memory > - reference. */ > + reference. > + If successful, this function will create additional dependencies betw= een > + - mii->inc_insn's producers and mii->mem_insn as a consumer (if backw= ards) > + - mii->inc_insn's consumers and mii->mem_insn as a producer (if !back= wards). > +*/ > > static bool > find_inc (struct mem_inc_info *mii, bool backwards) > { > sd_iterator_def sd_it; > dep_t dep; > + sd_list_types_def mem_deps =3D backwards ? SD_LIST_HARD_BACK : SD_LIST= _FORW; > + int n_mem_deps =3D sd_lists_size (mii->mem_insn, mem_deps); > > - sd_it =3D sd_iterator_start (mii->mem_insn, > - backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW= ); > + sd_it =3D sd_iterator_start (mii->mem_insn, mem_deps); > while (sd_iterator_cond (&sd_it, &dep)) > { > dep_node_t node =3D DEP_LINK_NODE (*sd_it.linkp); > rtx_insn *pro =3D DEP_PRO (dep); > rtx_insn *con =3D DEP_CON (dep); > - rtx_insn *inc_cand =3D backwards ? pro : con; > + rtx_insn *inc_cand; > + int n_inc_deps; > + > + if (backwards) > + { > + inc_cand =3D pro; > + n_inc_deps =3D sd_lists_size (inc_cand, SD_LIST_BACK); > + } > + else > + { > + inc_cand =3D con; > + n_inc_deps =3D sd_lists_size (inc_cand, SD_LIST_FORW); > + } > + > + /* In the FOR_EACH_DEP loop below we will create additional n_inc_= deps > + for mem_insn. This by itself is not a problem, since each mem_i= nsn > + will have only a few inc_insns associated with it. However, if > + we consider that a single inc_insn may have a lot of mem_insns, = AND, > + on top of that, a few other inc_insns associated with it -- > + those _other inc_insns_ will get (n_mem_deps * number of MEM ins= ns) > + dependencies created for them. This may cause an exponential > + growth of memory usage and scheduling time. > + See PR96388 for details. > + We [heuristically] use n_inc_deps as a proxy for the number of M= EM > + insns, and drop opportunities for breaking modifiable_mem depend= encies > + when dependency lists grow beyond reasonable size. */ > + if (n_mem_deps * n_inc_deps > + >=3D param_max_pending_list_length * param_max_pending_list_len= gth) > + goto next; > + > if (DEP_NONREG (dep) || DEP_MULTIPLE (dep)) it looks like this check is a lot cheaper than computing sd_lists_size so can we keep that first? sd_lists_size might be even more expensive than parse_add_or_inc, and in principle you only need to count up to param_max_pending_list_length * param_max_pending_list_length for n_mem_deps and that devided by n_mem_deps when counting n_inc_deps, but I guess that's premature optimization at that point. So OK from my side with moving the if (DEP_NONREG (dep) || DEP_MULTIPLE (de= p)) before the list counting. Richard. > goto next; > + > if (parse_add_or_inc (mii, inc_cand, backwards)) > { > struct dep_replacement *desc; > @@ -4838,6 +4873,8 @@ find_inc (struct mem_inc_info *mii, bool backwards) > desc->insn =3D mii->mem_insn; > move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con), > INSN_SPEC_BACK_DEPS (con)); > + > + gcc_assert (mii->inc_insn =3D=3D inc_cand); > if (backwards) > { > FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep) > -- > 2.34.1 >