From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10038 invoked by alias); 12 Nov 2018 12:53:41 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 10026 invoked by uid 89); 12 Nov 2018 12:53:40 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS,UNSUBSCRIBE_BODY autolearn=no version=3.3.2 spammy=USE, guessing, lone, genuine X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 12 Nov 2018 12:53:38 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-MBX-03.mgc.mentorg.com) by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) id 1gMBii-00033k-4U from Andrew_Stubbs@mentor.com ; Mon, 12 Nov 2018 04:53:36 -0800 Received: from [172.30.90.69] (137.202.0.90) by SVR-IES-MBX-03.mgc.mentorg.com (139.181.222.3) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Mon, 12 Nov 2018 12:53:31 +0000 Subject: Re: [PATCH 21/25] GCN Back-end (part 2/2). To: Jeff Law , References: <4c633833-1954-4b62-1a96-4f1c2cf541fd@codesourcery.com> From: Andrew Stubbs Message-ID: <4fe09f84-10d6-d30c-e458-27a4a3220eef@codesourcery.com> Date: Mon, 12 Nov 2018 12:53:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2018-11/txt/msg00942.txt.bz2 On 09/11/2018 19:39, Jeff Law wrote: >> + >> +/* Generate epilogue. Called from gen_epilogue during pro_and_epilogue pass. >> + >> + See gcn_expand_prologue for stack details. */ >> + >> +void >> +gcn_expand_epilogue (void) > You probably need a barrier in here to ensure that the scheduler doesn't > move an aliased memory reference into the local stack beyond the stack > adjustment. > > You're less likely to run into it because you eliminate frame pointers > fairly aggressively, but it's still the right thing to do. Sorry, I'm not sure I understand what the problem is? How can this happen? Surely the scheduler wouldn't change the logic of the code? >> + >> +/* Implement TARGET_LEGITIMATE_COMBINED_INSN. >> + >> + Return false if the instruction is not appropriate as a combination of two >> + or more instructions. */ >> + >> +bool >> +gcn_legitimate_combined_insn (rtx_insn *insn) >> +{ >> + rtx pat = PATTERN (insn); >> + >> + /* The combine pass tends to strip (use (exec)) patterns from insns. This >> + means it basically switches everything to use the *_scalar form of the >> + instructions, which is not helpful. So, this function disallows such >> + combinations. Unfortunately, this also disallows combinations of genuine >> + scalar-only patterns, but those only come from explicit expand code. >> + >> + Possible solutions: >> + - Invent TARGET_LEGITIMIZE_COMBINED_INSN. >> + - Remove all (use (EXEC)) and rely on md_reorg with "exec" attribute. >> + */ > This seems a bit hokey. Why specifically is combine removing the USE? I don't understand combine fully enough to explain it now, although at the time I wrote this, and in a GCC 7 code base, I had followed the code through and observed what it was doing. Basically, if you have two patterns that do the same operation, but one has a "parallel" with an additional "use", then combine will tend to prefer the one without the "use". That doesn't stop the code working, but it makes a premature (accidental) decision about instruction selection that we'd prefer to leave to the register allocator. I don't recall if it did this to lone instructions, but it would certainly do so when combining two (or more) instructions, and IIRC there are typically plenty of simple moves around that can be easily combined. >> + /* "Manually Inserted Wait States (NOPs)." >> + >> + GCN hardware detects most kinds of register dependencies, but there >> + are some exceptions documented in the ISA manual. This pass >> + detects the missed cases, and inserts the documented number of NOPs >> + required for correct execution. */ > How unpleasant :( But if it's what you need to do, so be it. I'll > assume the compiler is the right place to do this -- though some ports > handle this kind of stuff in the assembler or linker. We're using an LLVM assembler and linker, so we have tried to use them as is, rather than making parallel changes that would prevent GCC working with the last numbered release of LLVM (see the work around for assembler bugs in the BImode mode instruction). Expecting the assembler to fix this up would also throw off the compiler's offset calculations, and the near/far branch instructions have different register requirements it's better for the compiler to know about. The MIPS backend also inserts NOPs in a similar way. In future, I'd like to have the scheduler insert real instructions into these slots, but that's very much on the to-do list. >> +/* Disable the "current_vector_size" feature intended for >> + AVX<->SSE switching. */ > Guessing you just copied the comment, you probably want to update it to > not refer to AVX/SSE. Nope, that means exactly what it says. See the (unresolved) discussion around "[PATCH 13/25] Create TARGET_DISABLE_CURRENT_VECTOR_SIZE". I'll probably move that into a separate patch to commit after the main port. It'll suffer poor vectorization in some examples in the mean-time, but that patch is not going to be straight-forward. > You probably need to define the safe-speculation stuff > (TARGET_SPECULATION_SAFE_VALUE). Oh, OK. :-( I have no idea whether the architecture has those issues or not. >> +; "addptr" is the same as "add" except that it must not write to VCC or SCC >> +; as a side-effect. Unfortunately GCN3 does not have a suitable instruction >> +; for this, so we use a split to save and restore the condition code. >> +; This pattern must use "Sg" instead of "SD" to prevent the compiler >> +; assigning VCC as the destination. >> +; FIXME: Provide GCN5 implementation > I worry about the save/restore aspects of this. Haven't we discussed > this somewhere?!? I think this came up in the SPECIAL_REGNO_P patch discussion. We eventually found that the underlying problem was the way the save/restore reused pseudoregs. The "addptr" pattern has been rewritten in my draft V2 patchset. It still uses a fixed scratch register, but no longer does save/restore. > Generally I don't see major concerns. THere's some minor things to > fix. As far as the correctness of the code you're generating, well, I'm > going have to assume you've got that right and will own addressing bugs > in that space. Agreed. The bare-machine testsuite runs pretty well, as does the libgomp testsuite in an offloading toolchain. > My inclination would be to have this go forward into gcc-9 as the minor > issues are wrapped up. Thanks for your review, I'll have a V2 patch-set soonish. Andrew