From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12194 invoked by alias); 29 Aug 2019 16:29:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 12175 invoked by uid 89); 29 Aug 2019 16:29:43 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-7.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=H*f:sk:D46C8D0, H*i:sk:D46C8D0 X-HELO: mail-wr1-f45.google.com Received: from mail-wr1-f45.google.com (HELO mail-wr1-f45.google.com) (209.85.221.45) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 29 Aug 2019 16:29:41 +0000 Received: by mail-wr1-f45.google.com with SMTP id q12so4047369wrj.12 for ; Thu, 29 Aug 2019 09:29:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:user-agent:in-reply-to:references:mime-version :content-transfer-encoding:subject:to:cc:from:message-id; bh=N1O+6GVqiyvGWl8ZNPBgZwYH/IfyHNwaSPZhrapRVzY=; b=L6ya765Xh8830U7N5mOyuhTvecqYB43Kp9KhqQZ5mkY+zo2bVa/XNafBsZirUG9qEw a9CZY/ad7ChZW1RkLIW+68rYulJBXRRZxoYQDuwT6jAKChnWpMN8guhMwV1UIKwPZTQN KDU6BJI+WHjCWgcmGAKRrJWfQVjn/JlaS+isQVP6uMCJD6wrGDiNxwKKAJ32e8Bfflgg r6Uj4B8YJApD23utLnI4LgU6oMAcbBArjhI/z7xvrFGEfgryd8YlJWKZsJ4q7VGdGNSp rKvUPDTeotyegftIiZIpV4H946aHNyl8uW5ir9esn4SSbaiEP/pmcKeaeIoynu+9Vc3j btUQ== Return-Path: Received: from [192.168.178.32] (x4d04144c.dyn.telefonica.de. [77.4.20.76]) by smtp.gmail.com with ESMTPSA id 39sm8594401wrc.45.2019.08.29.09.29.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 29 Aug 2019 09:29:38 -0700 (PDT) Date: Thu, 29 Aug 2019 17:34:00 -0000 User-Agent: K-9 Mail for Android In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c To: gcc-patches@gcc.gnu.org,Maxim Kuvyrkov ,GCC Patches CC: Alexander Monakov ,Wilco Dijkstra From: Richard Biener Message-ID: <09F25146-8361-4FB0-AE6B-E13BF8CF332F@gmail.com> X-IsSubscribed: yes X-SW-Source: 2019-08/txt/msg02006.txt.bz2 On August 29, 2019 5:40:47 PM GMT+02:00, Maxim Kuvyrkov wrote: >Hi, > >This patch tweaks autoprefetcher heuristic in scheduler to better group >memory loads and stores together. > >From https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D91598: > >There are two separate changes, both related to instruction scheduler, >that cause the regression. The first change in r253235 is responsible >for 70% of the regression. >=3D=3D=3D > haifa-sched: fix autopref_rank_for_schedule qsort comparator >=20=20=20=20 > * haifa-sched.c (autopref_rank_for_schedule): Order 'irrelevant' insns > first, always call autopref_rank_data otherwise. >=20=20=20=20 >=20=20=20=20 >=20=20=20=20 >git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253235 >138bc75d-0d04-0410-961f-82ee72b054a4 >=3D=3D=3D > >After this change instead of >r1 =3D [rb + 0] >r2 =3D [rb + 8] >r3 =3D [rb + 16] >r4 =3D >r5 =3D >r6 =3D > >we get >r1 =3D [rb + 0] > >r2 =3D [rb + 8] > >r3 =3D [rb + 16] > > >which, apparently, cortex-a53 autoprefetcher doesn't recognize. This >schedule happens because r2=3D load gets lower priority than the >"irrelevant" due to the above patch. > >If we think about it, the fact that "r1 =3D [rb + 0]" can be scheduled >means that true dependencies of all similar base+offset loads are >resolved. Therefore, for autoprefetcher-friendly schedule we should >prioritize memory reads before "irrelevant" instructions. But isn't there also max number of load issues in a fetch window to conside= r?=20 So interleaving arithmetic with loads might be profitable.=20 >On the other hand, following similar logic, we want to delay memory >stores as much as possible to start scheduling them only after all >potential producers are scheduled. I.e., for autoprefetcher-friendly >schedule we should prioritize "irrelevant" instructions before memory >writes. > >Obvious patch to implement the above is attached. It brings 70% of >regressed performance on this testcase back. > >OK to commit? > >Regards, > >-- >Maxim Kuvyrkov >www.linaro.org