From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 80561 invoked by alias); 17 Sep 2018 11:43:52 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 80532 invoked by uid 89); 17 Sep 2018 11:43:51 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS autolearn=ham version=3.3.2 spammy=altogether X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 17 Sep 2018 11:43:50 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5ACC918A; Mon, 17 Sep 2018 04:43:48 -0700 (PDT) Received: from localhost (unknown [10.32.99.101]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B26DD3F5BD; Mon, 17 Sep 2018 04:43:47 -0700 (PDT) From: Richard Sandiford To: Andrew Stubbs Mail-Followup-To: Andrew Stubbs ,, richard.sandiford@arm.com Cc: Subject: Re: [PATCH 14/25] Disable inefficient vectorization of elementwise loads/stores. References: <87lg804hb1.fsf@arm.com> Date: Mon, 17 Sep 2018 12:40:00 -0000 In-Reply-To: (Andrew Stubbs's message of "Mon, 17 Sep 2018 10:39:47 +0100") Message-ID: <87work2vt9.fsf@arm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-SW-Source: 2018-09/txt/msg00865.txt.bz2 Andrew Stubbs writes: > On 17/09/18 10:14, Richard Sandiford wrote: >> writes: >>> If the autovectorizer tries to load a GCN 64-lane vector elementwise then it >>> blows away the register file and produces horrible code. >> >> Do all the registers really need to be live at once, or is it "just" bad >> scheduling? I'd have expected the initial rtl to load each element and >> then insert it immediately, so that the number of insertions doesn't >> directly affect register pressure. > > They don't need to be live at once, architecturally speaking, but that's > the way it happened. No doubt there is another solution to fix it, but > it's not a use case I believe we want to spend time optimizing. > > Actually, I've not tested what happens without this in GCC 9, so that's > probably worth checking, but I'd still be concerned about it blowing up > on real code somewhere. > >>> This patch simply disallows elementwise loads for such large vectors. >>> Is there >>> a better way to disable this in the middle-end? >> >> Do you ever want elementwise accesses for GCN? If not, it might be >> better to disable them in the target's cost model. > > The hardware is perfectly capable of extracting or setting vector > elements, but given that it can do full gather/scatter from arbitrary > addresses it's not something we want to do in general. > > A normal scalar load will use a vector register (lane 0). The value then > has to be moved to a scalar register, and only then can v_writelane > insert it into the final destination. OK, sounds like the cost of vec_construct is too low then. But looking at the port, I see you have: /* Implement TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST. */ int gcn_vectorization_cost (enum vect_cost_for_stmt ARG_UNUSED (type_of_cost), tree ARG_UNUSED (vectype), int ARG_UNUSED (misalign)) { /* Always vectorize. */ return 1; } which short-circuits the cost-model altogether. Isn't that part of the problem? Richard > > Alternatively you could use a mask_load to load the value directly to > the correct lane, but I don't believe that's something GCC does. > > Andrew