From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-507961-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 12194 invoked by alias); 29 Aug 2019 16:29:43 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 12175 invoked by uid 89); 29 Aug 2019 16:29:43 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-7.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=H*f:sk:D46C8D0, H*i:sk:D46C8D0
X-HELO: mail-wr1-f45.google.com
Received: from mail-wr1-f45.google.com (HELO mail-wr1-f45.google.com) (209.85.221.45) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 29 Aug 2019 16:29:41 +0000
Received: by mail-wr1-f45.google.com with SMTP id q12so4047369wrj.12        for <gcc-patches@gcc.gnu.org>; Thu, 29 Aug 2019 09:29:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=gmail.com; s=20161025;        h=date:user-agent:in-reply-to:references:mime-version         :content-transfer-encoding:subject:to:cc:from:message-id;        bh=N1O+6GVqiyvGWl8ZNPBgZwYH/IfyHNwaSPZhrapRVzY=;        b=L6ya765Xh8830U7N5mOyuhTvecqYB43Kp9KhqQZ5mkY+zo2bVa/XNafBsZirUG9qEw         a9CZY/ad7ChZW1RkLIW+68rYulJBXRRZxoYQDuwT6jAKChnWpMN8guhMwV1UIKwPZTQN         KDU6BJI+WHjCWgcmGAKRrJWfQVjn/JlaS+isQVP6uMCJD6wrGDiNxwKKAJ32e8Bfflgg         r6Uj4B8YJApD23utLnI4LgU6oMAcbBArjhI/z7xvrFGEfgryd8YlJWKZsJ4q7VGdGNSp         rKvUPDTeotyegftIiZIpV4H946aHNyl8uW5ir9esn4SSbaiEP/pmcKeaeIoynu+9Vc3j         btUQ==
Return-Path: <richard.guenther@gmail.com>
Received: from [192.168.178.32] (x4d04144c.dyn.telefonica.de. [77.4.20.76])        by smtp.gmail.com with ESMTPSA id 39sm8594401wrc.45.2019.08.29.09.29.37        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);        Thu, 29 Aug 2019 09:29:38 -0700 (PDT)
Date: Thu, 29 Aug 2019 17:34:00 -0000
User-Agent: K-9 Mail for Android
In-Reply-To: <D46C8D08-685F-41A7-8695-23BB65B74A87@linaro.org>
References: <D46C8D08-685F-41A7-8695-23BB65B74A87@linaro.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c
To: gcc-patches@gcc.gnu.org,Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>,GCC Patches <gcc-patches@gcc.gnu.org>
CC: Alexander Monakov <amonakov@ispras.ru>,Wilco Dijkstra <Wilco.Dijkstra@arm.com>
From: Richard Biener <richard.guenther@gmail.com>
Message-ID: <09F25146-8361-4FB0-AE6B-E13BF8CF332F@gmail.com>
X-IsSubscribed: yes
X-SW-Source: 2019-08/txt/msg02006.txt.bz2

On August 29, 2019 5:40:47 PM GMT+02:00, Maxim Kuvyrkov <maxim.kuvyrkov@lin=
aro.org> wrote:
>Hi,
>
>This patch tweaks autoprefetcher heuristic in scheduler to better group
>memory loads and stores together.
>
>From https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D91598:
>
>There are two separate changes, both related to instruction scheduler,
>that cause the regression.  The first change in r253235 is responsible
>for 70% of the regression.
>=3D=3D=3D
>    haifa-sched: fix autopref_rank_for_schedule qsort comparator
>=20=20=20=20
> * haifa-sched.c (autopref_rank_for_schedule): Order 'irrelevant' insns
>            first, always call autopref_rank_data otherwise.
>=20=20=20=20
>=20=20=20=20
>=20=20=20=20
>git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253235
>138bc75d-0d04-0410-961f-82ee72b054a4
>=3D=3D=3D
>
>After this change instead of
>r1 =3D [rb + 0]
>r2 =3D [rb + 8]
>r3 =3D [rb + 16]
>r4 =3D <math with r1>
>r5 =3D <math with r2>
>r6 =3D <math with r3>
>
>we get
>r1 =3D [rb + 0]
><math with r1>
>r2 =3D [rb + 8]
><math with r2>
>r3 =3D [rb + 16]
><math with r3>
>
>which, apparently, cortex-a53 autoprefetcher doesn't recognize.  This
>schedule happens because r2=3D load gets lower priority than the
>"irrelevant" <math with r1> due to the above patch.
>
>If we think about it, the fact that "r1 =3D [rb + 0]" can be scheduled
>means that true dependencies of all similar base+offset loads are
>resolved.  Therefore, for autoprefetcher-friendly schedule we should
>prioritize memory reads before "irrelevant" instructions.

But isn't there also max number of load issues in a fetch window to conside=
r?=20
So interleaving arithmetic with loads might be profitable.=20

>On the other hand, following similar logic, we want to delay memory
>stores as much as possible to start scheduling them only after all
>potential producers are scheduled.  I.e., for autoprefetcher-friendly
>schedule we should prioritize "irrelevant" instructions before memory
>writes.
>
>Obvious patch to implement the above is attached.  It brings 70% of
>regressed performance on this testcase back.
>
>OK to commit?
>
>Regards,
>
>--
>Maxim Kuvyrkov
>www.linaro.org