From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x633.google.com (mail-ej1-x633.google.com [IPv6:2a00:1450:4864:20::633]) by sourceware.org (Postfix) with ESMTPS id AE2523892475 for ; Fri, 2 Jul 2021 08:07:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AE2523892475 Received: by mail-ej1-x633.google.com with SMTP id v20so14776651eji.10 for ; Fri, 02 Jul 2021 01:07:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=tHig1hn2plb8pLWmkRAvzrw3DfoM6S8DUHJR+vz+8Xg=; b=KN40l0uoTgs/kbH+x+sbqmVMd5cqnWRYOlVdmaSGehQQst2FSYOwSw7CoU6wLHv4hZ SI6oYRUUyCXcQc1BNgBDhbXEPtyOrk+sRfQGPNJm9UM8zQTsjckLNXNe4KDXuaESQSGk Y8Mxm9uVXChN+OdFfWy/NBf5NNcQZPCDF/U2YkIRFcWIcB+wi2QPc7WJ+u962cInxoWA iuSICWkBZ7RgOyxkZFJek82DaHKLAGrjl5V2GKnHZhTAaLuvV+NRkK9PptRqbjkohzt6 HkwAjLJ7NkT++67+17ZkpPfbMnSKwMyuwJVV07XL8ZrPP6/RKPRy5beA5PkpYfx1ZpkM M+cA== X-Gm-Message-State: AOAM532PEevbswpHLV3PfKObwi5aVelkGIU5VZOF/2bswgcQlSfD6L8S BjD0Kk7S5bu8f+ixJw5wr3vuDWhJhlQOCUMqaQs= X-Google-Smtp-Source: ABdhPJwFOdog8rziDjn9i1J6hyl6Ea4dMmOp3yiWNheZsgEe3EI+duxVL5LwEncioFxehZCBnRFRe5F38XYz7OWUxm0= X-Received: by 2002:a17:907:a064:: with SMTP id ia4mr4158085ejc.482.1625213231721; Fri, 02 Jul 2021 01:07:11 -0700 (PDT) MIME-Version: 1.0 References: <1338ef7b-57f4-a376-5827-c85392ed53a8@linux.ibm.com> In-Reply-To: <1338ef7b-57f4-a376-5827-c85392ed53a8@linux.ibm.com> From: Richard Biener Date: Fri, 2 Jul 2021 10:07:01 +0200 Message-ID: Subject: Re: Question on tree LIM To: "Kewen.Lin" , "Andre Vieira (lists)" Cc: GCC Development Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jul 2021 08:07:14 -0000 On Fri, Jul 2, 2021 at 5:34 AM Kewen.Lin via Gcc wrote: > > Hi, > > I am investigating one degradation related to SPEC2017 exchange2_r, > with loop vectorization on at -O2, it degraded by 6%. By some > isolation, I found it isn't directly caused by vectorization itself, > but exposed by vectorization, some stuffs for vectorization > condition checks are hoisted out and they increase the register > pressure, finally results in more spillings than before. If I simply > disable tree lim4, I can see the gap becomes smaller (just 40%+ of > the original), if further disable rtl lim, it just becomes to 30% of > the original. It seems to indicate there is some room to improve in > both LIMs. > > By quick scanning in tree LIM, I noticed that there seems no any > considerations on register pressure, it looked intentional? I am > wondering what's the design philosophy behind it? Is it because that > it's hard to model register pressure well here? If so, it seems to > put the burden onto late RA, which needs to have a good > rematerialization support. Yes, it is "intentional" in that doing any kind of prioritization based on register pressure is hard on the GIMPLE level since most high-level transforms try to expose followup transforms which you'd somehow have to anticipate. Note that LIMs "cost model" (if you can call it such...) is too simplistic to be a good base to decide which 10 of the 20 candidates you want to move (and I've repeatedly pondered to remove it completely). As to putting the burden on RA - yes, that's one possibility. The other possibility is to use the register-pressure aware scheduler, though not sure if that will ever move things into loop bodies. > btw, the example loop is at line 1150 from src exchange2.fppized.f90 > > 1150 block(rnext:9, 7, i7) = block(rnext:9, 7, i7) + 10 > > The extra hoisted statements after the vectorization on this loop > (cheap cost model btw) are: > > _686 = (integer(kind=8)) rnext_679; > _1111 = (sizetype) _19; > _1112 = _1111 * 12; > _1927 = _1112 + 12; > * _1895 = _1927 - _2650; > _1113 = (unsigned long) rnext_679; > * niters.6220_1128 = 10 - _1113; > * _1021 = 9 - _1113; > * bnd.6221_940 = niters.6220_1128 >> 2; > * niters_vector_mult_vf.6222_939 = niters.6220_1128 & 18446744073709551612; > _144 = niters_vector_mult_vf.6222_939 + _1113; > tmp.6223_934 = (integer(kind=8)) _144; > S.823_1004 = _1021 <= 2 ? _686 : tmp.6223_934; > * ivtmp.6410_289 = (unsigned long) S.823_1004; > > PS: * indicates the one has a long live interval. Note for the vectorizer generated conditions there's quite some room for improvements to reduce the amount of semi-redundant computations. I've pointed out some to Andre, in particular suggesting to maintain a single "remaining scalar iterations" IV across all the checks to avoid keeping 'niters' live and doing all the above masking & shifting repeatedly before the prologue/main/vectorized epilogue/epilogue loops. Not sure how far he got with that idea. Richard. > > BR, > Kewen