From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 51200 invoked by alias); 22 Jul 2019 15:55:19 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 51187 invoked by uid 89); 22 Jul 2019 15:55:19 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-27.6 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 spammy=thousands, HTo:U*macro, proceeds, undertaking X-HELO: pio-pvt-msa2.bahnhof.se Received: from pio-pvt-msa2.bahnhof.se (HELO pio-pvt-msa2.bahnhof.se) (79.136.2.41) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 22 Jul 2019 15:55:17 +0000 Received: from localhost (localhost [127.0.0.1]) by pio-pvt-msa2.bahnhof.se (Postfix) with ESMTP id D02113FBA0; Mon, 22 Jul 2019 17:55:14 +0200 (CEST) X-Spam-Score: -2.899 Received: from pio-pvt-msa2.bahnhof.se ([127.0.0.1]) by localhost (pio-pvt-msa2.bahnhof.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ArGn68umIoJd; Mon, 22 Jul 2019 17:55:07 +0200 (CEST) Received: from localhost (h-41-252.A163.priv.bahnhof.se [46.59.41.252]) (Authenticated sender: mb547485) by pio-pvt-msa2.bahnhof.se (Postfix) with ESMTPA id 25BF43F2A5; Mon, 22 Jul 2019 17:55:07 +0200 (CEST) Date: Mon, 22 Jul 2019 15:56:00 -0000 From: Fredrik Noring To: "Maciej W. Rozycki" Cc: Paul Burton , Matthew Fortune , =?utf-8?Q?J=C3=BCrgen?= Urban , gcc-patches@gcc.gnu.org Subject: Re: [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops Message-ID: <20190722155506.GA2726@sx9> References: <20190721171726.GA47580@sx9> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-SW-Source: 2019-07/txt/msg01423.txt.bz2 Hi Maciej, I'm glad to hear from you again! > I think that should be a GAS warning really (similarly to macros that > expand to multiple instructions in a delay slot) as people ought to be > allowed to do what they wish, and then `-Werror' can be used for code > quality enforcement (and possibly disabled on a case-by-case basis). How can one reasonably prevent the bug when compiling a whole Linux distribution with thousands of packages if GAS merely warns and proceeds in many cases? > > [ In theory, GAS could actually insert NOPs prior to the noreorder directive, > > to make the loop longer that six instructions, but GAS does not have that > > kind of capability. Another option that GCC prevents is to place a NOP in > > the delay slot. ] > > Well, GAS does have that capability, although of course it is not enabled > for `noreorder' code. Hmm? I'm unable to make sense of that, since such NOPs are unaffected by noreorder. This is what I meant: loop: addiu $5,$5,1 addiu $4,$4,1 lb $2,-1($5) nop /* These two NOPs would prevent the */ nop /* bug while preserving noreorder. */ .set noreorder .set nomacro bne $2,$3,loop sb $2,-1($4) .set macro .set reorder > For generated code I think however that usually it > will be cheaper performance-wise if a non-trivial delay-slot instruction > is never produced in the affected cases (i.e. a dummy NOP is always used). I suppose so too. One observation is that R5900 BogoMIPS went down from 584 to 195 when switching from the generic kernel variant .set noreorder 1: bnez a0,1b addiu a0,a0,-1 .set reorder that is undefined for R5900 in arch/mips/lib/delay.c, to a special variant beqz a0,2f 1: addiu a0,a0,-1 bnez a0,1b 2: for the R5900 where GAS will place a NOP in the branch delay slot. With loop unrolling BogoMIPS could be inflated again, of course. In general, unrolling is a bit better for the R5900 than general MIPS. Perhaps GCC could be informed about that, possibly via profile-guided optimisation. > > A reasonable fix for GCC is perhaps to update gcc/config/mips/mips.md to not > > make explicit use of the branch delay slot, as suggested by the patch below? > > Then GCC will emit > > > > loop: addiu $5,$5,1 > > addiu $4,$4,1 > > lb $2,-1($5) > > sb $2,-1($4) > > bne $2,$3,loop > > > > that GAS will adjust in the ELF object to > > > > 4: 24a50001 addiu a1,a1,1 > > 8: 24840001 addiu a0,a0,1 > > c: 80a2ffff lb v0,-1(a1) > > 10: a082ffff sb v0,-1(a0) > > 14: 1443fffb bne v0,v1,4 > > 18: 00000000 nop > > > > where a NOP is placed in the delay slot to avoid the bug. For longer loops, > > this relies on GAS making appropriate use of the delay slot. I'm not sure if > > GAS is good at that, though? > > I'm sort-of surprised that GCC has produced `reorder' code here, making > it rely on GAS for delay slot scheduling. Have you used an unusual set of > options that prevents GCC from making `noreorder' code, which as I recall > is the usual default (not for the MIPS16 mode IIRC, as well as some > obscure corner cases)? I used the suggested patch, and recompiled the kernel with it, and verified with the machine code tool that there were no undefined loops. Wouldn't that be enough? > > diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md > > index e17b1d522f0..acd31a8960c 100644 > > --- a/gcc/config/mips/mips.md > > +++ b/gcc/config/mips/mips.md > > @@ -1091,6 +1091,7 @@ > > > > (define_delay (and (eq_attr "type" "branch") > > (not (match_test "TARGET_MIPS16")) > > + (not (match_test "TARGET_FIX_R5900")) > > (eq_attr "branch_likely" "yes")) > > [(eq_attr "can_delay" "yes") > > (nil) > > @@ -1100,6 +1101,7 @@ > > ;; not annul on false. > > (define_delay (and (eq_attr "type" "branch") > > (not (match_test "TARGET_MIPS16")) > > + (not (match_test "TARGET_FIX_R5900")) > > (ior (match_test "TARGET_CB_NEVER") > > (and (eq_attr "compact_form" "maybe") > > (not (match_test "TARGET_CB_ALWAYS"))) > > > > I think you need to modify the default `can_delay' attribute definition > instead (in the same way). I tried to limit the case to branch delays only, which is a rough approximation. Call and jump delay slots are acceptable. Are you referring to this piece? ;; Can the instruction be put into a delay slot? (define_attr "can_delay" "no,yes" (if_then_else (and (eq_attr "type" "!branch,call,jump") (eq_attr "hazard" "none") (match_test "get_attr_insn_count (insn) == 1")) (const_string "yes") (const_string "no"))) > An improved future version might determine the > exact conditions as noted in your proposed commit description, however I'd > suggest making this simple change first. Learning the exact conditions, in a hardware sense, would be interesting indeed, as some short loops actually do appear to work despite being labeled as undefined. A fairly deep understanding of the R5900 implementation is essential for such an undertaking. :) Fredrik