From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-485762-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 80561 invoked by alias); 17 Sep 2018 11:43:52 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 80532 invoked by uid 89); 17 Sep 2018 11:43:51 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS autolearn=ham version=3.3.2 spammy=altogether
X-HELO: foss.arm.com
Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 17 Sep 2018 11:43:50 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5ACC918A;	Mon, 17 Sep 2018 04:43:48 -0700 (PDT)
Received: from localhost (unknown [10.32.99.101])	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B26DD3F5BD;	Mon, 17 Sep 2018 04:43:47 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Andrew Stubbs <ams@codesourcery.com>
Mail-Followup-To: Andrew Stubbs <ams@codesourcery.com>,<gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Cc: <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH 14/25] Disable inefficient vectorization of elementwise loads/stores.
References: <cover.1536144068.git.ams@codesourcery.com>	<fb85f5cc96463b1a779cd4f874dff269960b40a3.1536144068.git.ams@codesourcery.com>	<87lg804hb1.fsf@arm.com>	<fd5ae529-8bdc-dea0-ff20-d8b6470e47cc@codesourcery.com>
Date: Mon, 17 Sep 2018 12:40:00 -0000
In-Reply-To: <fd5ae529-8bdc-dea0-ff20-d8b6470e47cc@codesourcery.com> (Andrew	Stubbs's message of "Mon, 17 Sep 2018 10:39:47 +0100")
Message-ID: <87work2vt9.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-SW-Source: 2018-09/txt/msg00865.txt.bz2

Andrew Stubbs <ams@codesourcery.com> writes:
> On 17/09/18 10:14, Richard Sandiford wrote:
>> <ams@codesourcery.com> writes:
>>> If the autovectorizer tries to load a GCN 64-lane vector elementwise then it
>>> blows away the register file and produces horrible code.
>> 
>> Do all the registers really need to be live at once, or is it "just" bad
>> scheduling?  I'd have expected the initial rtl to load each element and
>> then insert it immediately, so that the number of insertions doesn't
>> directly affect register pressure.
>
> They don't need to be live at once, architecturally speaking, but that's 
> the way it happened.  No doubt there is another solution to fix it, but 
> it's not a use case I believe we want to spend time optimizing.
>
> Actually, I've not tested what happens without this in GCC 9, so that's 
> probably worth checking, but I'd still be concerned about it blowing up 
> on real code somewhere.
>
>>> This patch simply disallows elementwise loads for such large vectors.
>>> Is there
>>> a better way to disable this in the middle-end?
>> 
>> Do you ever want elementwise accesses for GCN?  If not, it might be
>> better to disable them in the target's cost model.
>
> The hardware is perfectly capable of extracting or setting vector 
> elements, but given that it can do full gather/scatter from arbitrary 
> addresses it's not something we want to do in general.
>
> A normal scalar load will use a vector register (lane 0). The value then 
> has to be moved to a scalar register, and only then can v_writelane 
> insert it into the final destination.

OK, sounds like the cost of vec_construct is too low then.  But looking
at the port, I see you have:

/* Implement TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST.  */

int
gcn_vectorization_cost (enum vect_cost_for_stmt ARG_UNUSED (type_of_cost),
			tree ARG_UNUSED (vectype), int ARG_UNUSED (misalign))
{
  /* Always vectorize.  */
  return 1;
}

which short-circuits the cost-model altogether.  Isn't that part
of the problem?

Richard

>
> Alternatively you could use a mask_load to load the value directly to 
> the correct lane, but I don't believe that's something GCC does.
>
> Andrew