From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-485745-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 117101 invoked by alias); 17 Sep 2018 09:40:07 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 117089 invoked by uid 89); 17 Sep 2018 09:40:06 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=capable, perfectly, doubt, lane
X-HELO: relay1.mentorg.com
Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 17 Sep 2018 09:40:05 +0000
Received: from nat-ies.mentorg.com ([192.94.31.2] helo=svr-ies-mbx-01.mgc.mentorg.com)	by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256)	id 1g1q0h-0001a1-Iv from Andrew_Stubbs@mentor.com ; Mon, 17 Sep 2018 02:40:03 -0700
Received: from [172.30.90.144] (137.202.0.90) by svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Mon, 17 Sep 2018 10:39:55 +0100
Subject: Re: [PATCH 14/25] Disable inefficient vectorization of elementwise loads/stores.
To: <gcc-patches@gcc.gnu.org>, <richard.sandiford@arm.com>
References: <cover.1536144068.git.ams@codesourcery.com> <fb85f5cc96463b1a779cd4f874dff269960b40a3.1536144068.git.ams@codesourcery.com> <87lg804hb1.fsf@arm.com>
From: Andrew Stubbs <ams@codesourcery.com>
Message-ID: <fd5ae529-8bdc-dea0-ff20-d8b6470e47cc@codesourcery.com>
Date: Mon, 17 Sep 2018 09:54:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <87lg804hb1.fsf@arm.com>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 7bit
X-SW-Source: 2018-09/txt/msg00848.txt.bz2

On 17/09/18 10:14, Richard Sandiford wrote:
> <ams@codesourcery.com> writes:
>> If the autovectorizer tries to load a GCN 64-lane vector elementwise then it
>> blows away the register file and produces horrible code.
> 
> Do all the registers really need to be live at once, or is it "just" bad
> scheduling?  I'd have expected the initial rtl to load each element and
> then insert it immediately, so that the number of insertions doesn't
> directly affect register pressure.

They don't need to be live at once, architecturally speaking, but that's 
the way it happened.  No doubt there is another solution to fix it, but 
it's not a use case I believe we want to spend time optimizing.

Actually, I've not tested what happens without this in GCC 9, so that's 
probably worth checking, but I'd still be concerned about it blowing up 
on real code somewhere.

>> This patch simply disallows elementwise loads for such large vectors.  Is there
>> a better way to disable this in the middle-end?
> 
> Do you ever want elementwise accesses for GCN?  If not, it might be
> better to disable them in the target's cost model.

The hardware is perfectly capable of extracting or setting vector 
elements, but given that it can do full gather/scatter from arbitrary 
addresses it's not something we want to do in general.

A normal scalar load will use a vector register (lane 0). The value then 
has to be moved to a scalar register, and only then can v_writelane 
insert it into the final destination.

Alternatively you could use a mask_load to load the value directly to 
the correct lane, but I don't believe that's something GCC does.

Andrew