From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=BwkY=HB=gmail.com=richard.guenther@sourceware.org>
Received: from mail-lf1-x12e.google.com (mail-lf1-x12e.google.com [IPv6:2a00:1450:4864:20::12e])
	by sourceware.org (Postfix) with ESMTPS id 5EA1C38432CC
	for <gcc-patches@gcc.gnu.org>; Mon, 20 Nov 2023 13:12:50 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5EA1C38432CC
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5EA1C38432CC
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::12e
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700485972; cv=none;
	b=rl6Qya5CIElOyJyXOVHEIBYhRmWlu5uHRglAgCfvQE/hUTZCGPRzehByaLsJK60tvdJuK6VijBmUFESE8PC4pXE5K0D1TXMsSMhBIuEwGmj1uCRHhGxreuuPBxfdQqk3oqGFHyfBJHhBR4Xcw6HIuXNvonstkyHxfStrwSo+vhQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1700485972; c=relaxed/simple;
	bh=grYbixbSTUkV6157yMEDEhF18SVijUA9q+NQWAbXDEw=;
	h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=Fa6AjRMtdiuxletnNCXiqrw8dBtrFQPavCkTa6u10VbdlqekrerBPpImcPE//W3W4CNs1MYiEMaHftNGJKgumQCPg/cEMZ75f0lC2Y3X3Z+1MYO3GAE1i4eKZuzyagQAvXq8FI0aj6f9B3WfC+5kjLPCh6iXbP+bmTqQyO4liHQ=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-lf1-x12e.google.com with SMTP id 2adb3069b0e04-50aabfa1b75so1483812e87.3
        for <gcc-patches@gcc.gnu.org>; Mon, 20 Nov 2023 05:12:50 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1700485969; x=1701090769; darn=gcc.gnu.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=twDL0O6Qyavim9JSDRtsiAFbmyUlACiRMd7k/zokPng=;
        b=T6GbSSYXJxEkoJ70SW6OabBF6Bi1uawnRPkHFKdVf4ecZviZ/rUBL4OVnIjFU7cM83
         2yXYbVJuqpOSZRo3AHFNoHndL7iXSDi1bsfjIMOaxwqGNmZWuWqdjhHO7rQLcYXlUuUP
         0NKgzxWupGz0B8YLCZjOcf36Woq3nTGiUHdgcz8IrqYo2020aDYLVgHz6W3F3peyyqAU
         eD8qRS5JGQgsHs8Xt59ao974wnwlHHhi7aFtSVZSqDImHZEWVY5sPBTulf8HlVpdTHZa
         O8OPFnhpDgKQQ6kQWCokfiLzh+bxo2MhZHUR/C6JXhcsVsUco3T1eD6xGLtyM138CDl+
         konQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1700485969; x=1701090769;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=twDL0O6Qyavim9JSDRtsiAFbmyUlACiRMd7k/zokPng=;
        b=n1p9XW+IDpL5LpSWIqXzTHogFJj4xxTM1yVKc+jIR8nZwUJ3a/d2oWp+u4XOQL5v5H
         aFtIpj44kEXzZ4b6uBsBdWPpXTlEHm5qysLkVYCI3KmBF86b/b4zXx7O37xuUCt6Dsfn
         qbtqhdM5Ol+FdC4UkBzF8PL0+bZo7EBgxxEfRF0vbJ+foT6/qUJfU5Gju8Sgfrq9zE7O
         cagFPiu4Rmsis/G4g0yGZ0RLUgB2C1WR3hyeql8vaZr9fe+sWW4jFC7NxWdMHHjFQqX5
         82Mmsz4d6x2QjRcqZOKLxX1XF25cOTVDfWsZCNQyYgNGtE8ApHzwHD61W7WMC4ae90X1
         MWGw==
X-Gm-Message-State: AOJu0YxMjYbeTd9ICcf7IGrWvmMMthZ7K7Pij+KBsf61M473hoOv1Tb/
	Hw5KD9fBcRTQqz7YgqUGNOpHSBDjGjhLniNHAA4=
X-Google-Smtp-Source: AGHT+IFyTQEwXCNbqYO2EJARVzBPmoA+nx3YVUyLTTV5pmsYLJRB4uJO/98s0t3Iq4hHOPReajXDueAtguVAGnyFdS4=
X-Received: by 2002:ac2:5635:0:b0:507:9693:12aa with SMTP id
 b21-20020ac25635000000b00507969312aamr5282898lff.15.1700485968580; Mon, 20
 Nov 2023 05:12:48 -0800 (PST)
MIME-Version: 1.0
References: <20231120120649.672893-1-maxim.kuvyrkov@linaro.org> <20231120120649.672893-2-maxim.kuvyrkov@linaro.org>
In-Reply-To: <20231120120649.672893-2-maxim.kuvyrkov@linaro.org>
From: Richard Biener <richard.guenther@gmail.com>
Date: Mon, 20 Nov 2023 14:09:10 +0100
Message-ID: <CAFiYyc04uW9eBGf=t4r29FZspeuMpnSyCPRAht1cyb-9aEYwEA@mail.gmail.com>
Subject: Re: [PATCH 1/1] sched-deps.cc (find_modifiable_mems): Avoid
 exponential behavior
To: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
Cc: gcc-patches@gcc.gnu.org, Bernd Schmidt <bernds_cb1@t-online.de>, 
	Vladimir Makarov <vmakarov@redhat.com>, Jeff Law <jeffreyalaw@gmail.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Mon, Nov 20, 2023 at 1:08=E2=80=AFPM Maxim Kuvyrkov
<maxim.kuvyrkov@linaro.org> wrote:
>
> This patch avoids sched-deps.cc:find_inc() creating exponential number
> of dependencies, which become memory and compilation time hogs.
> Consider example (simplified from PR96388) ...
> =3D=3D=3D
> sp=3Dsp-4 // sp_insnA
> mem_insnA1[sp+A1]
> ...
> mem_insnAN[sp+AN]
> sp=3Dsp-4 // sp_insnB
> mem_insnB1[sp+B1]
> ...
> mem_insnBM[sp+BM]
> =3D=3D=3D
> ... in this example find_modifiable_mems() will arrange for mem_insnA*
> to be able to pass sp_insnA, and, while doing this, will create
> dependencies between all mem_insnA*s and sp_insnB -- because sp_insnB
> is a consumer of sp_insnA.  After this sp_insnB will have N new
> backward dependencies.
> Then find_modifiable_mems() gets to mem_insnB*s and starts to create
> N new dependencies for _every_ mem_insnB*.  This gets us N*M new
> dependencies.
>
> In PR96833's testcase N and M are 10k-15k, which causes RAM usage of
> 30GB and compilation time of 30 minutes, with sched2 accounting for
> 95% of both metrics.  After this patch the RAM usage is down to 1GB
> and compilation time is down to 3-4 minutes, with sched2 no longer
> standing out on -ftime-report or memory usage.
>
> gcc/ChangeLog:
>
>         PR rtl-optimization/96388
>         PR rtl-optimization/111554
>         * sched-deps.cc (find_inc): Avoid exponential behavior.
> ---
>  gcc/sched-deps.cc | 45 +++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 41 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index c23218890f3..397bb9fd462 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -4779,24 +4779,59 @@ parse_add_or_inc (struct mem_inc_info *mii, rtx_i=
nsn *insn, bool before_mem)
>  /* Once a suitable mem reference has been found and the corresponding da=
ta
>     in MII has been filled in, this function is called to find a suitable
>     add or inc insn involving the register we found in the memory
> -   reference.  */
> +   reference.
> +   If successful, this function will create additional dependencies betw=
een
> +   - mii->inc_insn's producers and mii->mem_insn as a consumer (if backw=
ards)
> +   - mii->inc_insn's consumers and mii->mem_insn as a producer (if !back=
wards).
> +*/
>
>  static bool
>  find_inc (struct mem_inc_info *mii, bool backwards)
>  {
>    sd_iterator_def sd_it;
>    dep_t dep;
> +  sd_list_types_def mem_deps =3D backwards ? SD_LIST_HARD_BACK : SD_LIST=
_FORW;
> +  int n_mem_deps =3D sd_lists_size (mii->mem_insn, mem_deps);
>
> -  sd_it =3D sd_iterator_start (mii->mem_insn,
> -                            backwards ? SD_LIST_HARD_BACK : SD_LIST_FORW=
);
> +  sd_it =3D sd_iterator_start (mii->mem_insn, mem_deps);
>    while (sd_iterator_cond (&sd_it, &dep))
>      {
>        dep_node_t node =3D DEP_LINK_NODE (*sd_it.linkp);
>        rtx_insn *pro =3D DEP_PRO (dep);
>        rtx_insn *con =3D DEP_CON (dep);
> -      rtx_insn *inc_cand =3D backwards ? pro : con;
> +      rtx_insn *inc_cand;
> +      int n_inc_deps;
> +
> +      if (backwards)
> +       {
> +         inc_cand =3D pro;
> +         n_inc_deps =3D sd_lists_size (inc_cand, SD_LIST_BACK);
> +       }
> +      else
> +       {
> +         inc_cand =3D con;
> +         n_inc_deps =3D sd_lists_size (inc_cand, SD_LIST_FORW);
> +       }
> +
> +      /* In the FOR_EACH_DEP loop below we will create additional n_inc_=
deps
> +        for mem_insn.  This by itself is not a problem, since each mem_i=
nsn
> +        will have only a few inc_insns associated with it.  However, if
> +        we consider that a single inc_insn may have a lot of mem_insns, =
AND,
> +        on top of that, a few other inc_insns associated with it --
> +        those _other inc_insns_ will get (n_mem_deps * number of MEM ins=
ns)
> +        dependencies created for them.  This may cause an exponential
> +        growth of memory usage and scheduling time.
> +        See PR96388 for details.
> +        We [heuristically] use n_inc_deps as a proxy for the number of M=
EM
> +        insns, and drop opportunities for breaking modifiable_mem depend=
encies
> +        when dependency lists grow beyond reasonable size.  */
> +      if (n_mem_deps * n_inc_deps
> +         >=3D param_max_pending_list_length * param_max_pending_list_len=
gth)
> +       goto next;
> +
>        if (DEP_NONREG (dep) || DEP_MULTIPLE (dep))

it looks like this check is a lot cheaper than computing sd_lists_size so
can we keep that first?  sd_lists_size might be even more expensive
than parse_add_or_inc, and in principle you only need to count up
to param_max_pending_list_length * param_max_pending_list_length
for n_mem_deps and that devided by n_mem_deps when counting
n_inc_deps, but I guess that's premature optimization at that point.

So OK from my side with moving the if (DEP_NONREG (dep) || DEP_MULTIPLE (de=
p))
before the list counting.

Richard.

>         goto next;
> +
>        if (parse_add_or_inc (mii, inc_cand, backwards))
>         {
>           struct dep_replacement *desc;
> @@ -4838,6 +4873,8 @@ find_inc (struct mem_inc_info *mii, bool backwards)
>           desc->insn =3D mii->mem_insn;
>           move_dep_link (DEP_NODE_BACK (node), INSN_HARD_BACK_DEPS (con),
>                          INSN_SPEC_BACK_DEPS (con));
> +
> +         gcc_assert (mii->inc_insn =3D=3D inc_cand);
>           if (backwards)
>             {
>               FOR_EACH_DEP (mii->inc_insn, SD_LIST_BACK, sd_it, dep)
> --
> 2.34.1
>