From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-507966-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 24850 invoked by alias); 29 Aug 2019 17:36:13 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 24841 invoked by uid 89); 29 Aug 2019 17:36:13 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-4.5 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.1 spammy=H*r:MSK, alexander, peculiarity, H*F:D*ru
X-HELO: smtp.ispras.ru
Received: from bran.ispras.ru (HELO smtp.ispras.ru) (83.149.199.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 29 Aug 2019 17:36:11 +0000
Received: from [10.10.3.121] (monopod.intra.ispras.ru [10.10.3.121])	by smtp.ispras.ru (Postfix) with ESMTP id BE7A6203A5;	Thu, 29 Aug 2019 20:36:08 +0300 (MSK)
Date: Thu, 29 Aug 2019 18:18:00 -0000
From: Alexander Monakov <amonakov@ispras.ru>
To: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
cc: Richard Guenther <richard.guenther@gmail.com>, gcc-patches@gcc.gnu.org,     Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Subject: Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c
In-Reply-To: <F3D1DE53-D56C-4293-87C5-AA71EEE67680@linaro.org>
Message-ID: <alpine.LNX.2.20.13.1908291947390.30575@monopod.intra.ispras.ru>
References: <D46C8D08-685F-41A7-8695-23BB65B74A87@linaro.org> <09F25146-8361-4FB0-AE6B-E13BF8CF332F@gmail.com> <F3D1DE53-D56C-4293-87C5-AA71EEE67680@linaro.org>
User-Agent: Alpine 2.20.13 (LNX 116 2015-12-14)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-SW-Source: 2019-08/txt/msg02011.txt.bz2

On Thu, 29 Aug 2019, Maxim Kuvyrkov wrote:

> >> r1 = [rb + 0]
> >> <math with r1>
> >> r2 = [rb + 8]
> >> <math with r2>
> >> r3 = [rb + 16]
> >> <math with r3>
> >> 
> >> which, apparently, cortex-a53 autoprefetcher doesn't recognize.  This
> >> schedule happens because r2= load gets lower priority than the
> >> "irrelevant" <math with r1> due to the above patch.
> >> 
> >> If we think about it, the fact that "r1 = [rb + 0]" can be scheduled
> >> means that true dependencies of all similar base+offset loads are
> >> resolved.  Therefore, for autoprefetcher-friendly schedule we should
> >> prioritize memory reads before "irrelevant" instructions.
> > 
> > But isn't there also max number of load issues in a fetch window to consider? 
> > So interleaving arithmetic with loads might be profitable. 
> 
> It appears that cores with autoprefetcher hardware prefer loads and stores
> bundled together, not interspersed with other instructions to occupy the rest
> of CPU units.

Let me point out that the motivating example has a bigger effect in play:

(1) r1 = [rb + 0]
(2) <math with r1>
(3) r2 = [rb + 8]
(4) <math with r2>
(5) r3 = [rb + 16]
(6) <math with r3>

here Cortex-A53, being an in-order core, cannot issue the load at (3) until
after the load at (1) has completed, because the use at (2) depends on it.
The good schedule allows the three loads to issue in a pipelined fashion.

So essentially the main issue is not a hardware peculiarity, but rather the
bad schedule being totally wrong (it could only make sense if loads had 1-cycle
latency, which they do not).

I think this highlights how implementing this autoprefetch heuristic via the
dfa_lookahead_guard interface looks questionable in the first place, but the
patch itself makes sense to me.

Alexander