From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.guenther@gmail.com>
Received: from mail-ej1-x633.google.com (mail-ej1-x633.google.com
 [IPv6:2a00:1450:4864:20::633])
 by sourceware.org (Postfix) with ESMTPS id AE2523892475
 for <gcc@gcc.gnu.org>; Fri,  2 Jul 2021 08:07:12 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AE2523892475
Received: by mail-ej1-x633.google.com with SMTP id v20so14776651eji.10
 for <gcc@gcc.gnu.org>; Fri, 02 Jul 2021 01:07:12 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=tHig1hn2plb8pLWmkRAvzrw3DfoM6S8DUHJR+vz+8Xg=;
 b=KN40l0uoTgs/kbH+x+sbqmVMd5cqnWRYOlVdmaSGehQQst2FSYOwSw7CoU6wLHv4hZ
 SI6oYRUUyCXcQc1BNgBDhbXEPtyOrk+sRfQGPNJm9UM8zQTsjckLNXNe4KDXuaESQSGk
 Y8Mxm9uVXChN+OdFfWy/NBf5NNcQZPCDF/U2YkIRFcWIcB+wi2QPc7WJ+u962cInxoWA
 iuSICWkBZ7RgOyxkZFJek82DaHKLAGrjl5V2GKnHZhTAaLuvV+NRkK9PptRqbjkohzt6
 HkwAjLJ7NkT++67+17ZkpPfbMnSKwMyuwJVV07XL8ZrPP6/RKPRy5beA5PkpYfx1ZpkM
 M+cA==
X-Gm-Message-State: AOAM532PEevbswpHLV3PfKObwi5aVelkGIU5VZOF/2bswgcQlSfD6L8S
 BjD0Kk7S5bu8f+ixJw5wr3vuDWhJhlQOCUMqaQs=
X-Google-Smtp-Source: ABdhPJwFOdog8rziDjn9i1J6hyl6Ea4dMmOp3yiWNheZsgEe3EI+duxVL5LwEncioFxehZCBnRFRe5F38XYz7OWUxm0=
X-Received: by 2002:a17:907:a064:: with SMTP id
 ia4mr4158085ejc.482.1625213231721; 
 Fri, 02 Jul 2021 01:07:11 -0700 (PDT)
MIME-Version: 1.0
References: <1338ef7b-57f4-a376-5827-c85392ed53a8@linux.ibm.com>
In-Reply-To: <1338ef7b-57f4-a376-5827-c85392ed53a8@linux.ibm.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Fri, 2 Jul 2021 10:07:01 +0200
Message-ID: <CAFiYyc15i7ErH6K+Cptq4Z+23r3iqLW6pGstQvZLix6KnjWi5g@mail.gmail.com>
Subject: Re: Question on tree LIM
To: "Kewen.Lin" <linkw@linux.ibm.com>, 
 "Andre Vieira (lists)" <Andre.SimoesDiasVieira@arm.com>
Cc: GCC Development <gcc@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Jul 2021 08:07:14 -0000

On Fri, Jul 2, 2021 at 5:34 AM Kewen.Lin via Gcc <gcc@gcc.gnu.org> wrote:
>
> Hi,
>
> I am investigating one degradation related to SPEC2017 exchange2_r,
> with loop vectorization on at -O2, it degraded by 6%.  By some
> isolation, I found it isn't directly caused by vectorization itself,
> but exposed by vectorization, some stuffs for vectorization
> condition checks are hoisted out and they increase the register
> pressure, finally results in more spillings than before.  If I simply
> disable tree lim4, I can see the gap becomes smaller (just 40%+ of
> the original), if further disable rtl lim, it just becomes to 30% of
> the original.  It seems to indicate there is some room to improve in
> both LIMs.
>
> By quick scanning in tree LIM, I noticed that there seems no any
> considerations on register pressure, it looked intentional? I am
> wondering what's the design philosophy behind it?  Is it because that
> it's hard to model register pressure well here?  If so, it seems to
> put the burden onto late RA, which needs to have a good
> rematerialization support.

Yes, it is "intentional" in that doing any kind of prioritization based
on register pressure is hard on the GIMPLE level since most
high-level transforms try to expose followup transforms which you'd
somehow have to anticipate.  Note that LIMs "cost model" (if you can
call it such...) is too simplistic to be a good base to decide which
10 of the 20 candidates you want to move (and I've repeatedly pondered
to remove it completely).

As to putting the burden on RA - yes, that's one possibility.  The other
possibility is to use the register-pressure aware scheduler, though not
sure if that will ever move things into loop bodies.

> btw, the example loop is at line 1150 from src exchange2.fppized.f90
>
>    1150 block(rnext:9, 7, i7) = block(rnext:9, 7, i7) + 10
>
> The extra hoisted statements after the vectorization on this loop
> (cheap cost model btw) are:
>
>     _686 = (integer(kind=8)) rnext_679;
>     _1111 = (sizetype) _19;
>     _1112 = _1111 * 12;
>     _1927 = _1112 + 12;
>   * _1895 = _1927 - _2650;
>     _1113 = (unsigned long) rnext_679;
>   * niters.6220_1128 = 10 - _1113;
>   * _1021 = 9 - _1113;
>   * bnd.6221_940 = niters.6220_1128 >> 2;
>   * niters_vector_mult_vf.6222_939 = niters.6220_1128 & 18446744073709551612;
>     _144 = niters_vector_mult_vf.6222_939 + _1113;
>     tmp.6223_934 = (integer(kind=8)) _144;
>     S.823_1004 = _1021 <= 2 ? _686 : tmp.6223_934;
>   * ivtmp.6410_289 = (unsigned long) S.823_1004;
>
> PS: * indicates the one has a long live interval.

Note for the vectorizer generated conditions there's quite some room for
improvements to reduce the amount of semi-redundant computations.  I've
pointed out some to Andre, in particular suggesting to maintain a single
"remaining scalar iterations" IV across all the checks to avoid keeping
'niters' live and doing all the above masking & shifting repeatedly before
the prologue/main/vectorized epilogue/epilogue loops.  Not sure how far
he got with that idea.

Richard.

>
> BR,
> Kewen