From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-480638-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 129891 invoked by alias); 28 Jun 2018 05:52:52 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 129878 invoked by uid 89); 28 Jun 2018 05:52:51 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=exposed, spills
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 28 Jun 2018 05:52:50 +0000
Received: from smtp.corp.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.27])	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id BC5C8C057F82;	Thu, 28 Jun 2018 05:52:48 +0000 (UTC)
Received: from localhost.localdomain (ovpn-112-5.rdu2.redhat.com [10.10.112.5])	by smtp.corp.redhat.com (Postfix) with ESMTP id 0464486F9B;	Thu, 28 Jun 2018 05:52:47 +0000 (UTC)
Subject: Re: [PATCH] Fix PR84101, account for function ABI details in vectorization costs
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org
References: <alpine.LSU.2.20.1801301054180.18265@zhemvz.fhfr.qr> <dea592f4-12a3-b854-f699-91f0c9bdd923@redhat.com> <alpine.LSU.2.20.1802141233140.18265@zhemvz.fhfr.qr>
From: Jeff Law <law@redhat.com>
Openpgp: preference=signencrypt
Message-ID: <b8165c37-e709-2d4f-0c1c-2ee24079dbbf@redhat.com>
Date: Thu, 28 Jun 2018 05:52:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0
MIME-Version: 1.0
In-Reply-To: <alpine.LSU.2.20.1802141233140.18265@zhemvz.fhfr.qr>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-IsSubscribed: yes
X-SW-Source: 2018-06/txt/msg01761.txt.bz2

[ Returning to an old patch... ]
On 02/14/2018 04:52 AM, Richard Biener wrote:
> On Tue, 13 Feb 2018, Jeff Law wrote:
> 
>> On 01/30/2018 02:59 AM, Richard Biener wrote:
>>>
>>> This patch tries to deal with the "easy" part of a function ABI,
>>> the return value location, in vectorization costing.  The testcase
>>> shows that if we vectorize the returned value but the function
>>> doesn't return in memory or in a vector register but as in this
>>> case in an integer register pair (reg:TI ax) (bah, ABI details
>>> exposed late?  why's this not a parallel?) we end up spilling
>>> badly.
>> PARALLEL is used when the ABI mandates a value be returned in multiple
>> places.  Typically that happens when the value is returned in different
>> types of registers (integer, floating point, vector).
>>
>> Presumably it's not a PARALLEL in this case because the value is only
>> returned in %eax.
> 
> It's returned in %eax and %rdx (TImode after all).  But maybe
> "standard register pairs" are not represented as PARALLEL ...
Register pairs and PARALLELs handle two different issues.

A register pair such as eax/edx is used to hold a value that is larger
than a single register.

A PARALLEL is used to hold a return value that has to appear in multiple
places.  So for example a0/d0 on m68k in some circumstances.

They can even be combined.  You might have eax/edx as the first entry in
a parallel with xmm0 as the second entry.  That would indicate a TImode
value that is in eax/edx as well as in xmm0.

>From the standpoint of costing the spills for a return value we can
generate the return value into any object in the PARALLEL, but we will
have to copy it to all the other objects in the PARALLEL.

So if none of the objects in the PARALLEL are suitable for vector
operations, then obviously we're going to have to copy from the vector
register to all the elements in the PARALLEL.  This is the most
expensive case.

If one of the objects in the PARALLEL is suitable for vector ops, then
that's where we want the result to end up.  We still have to copy the
result to the other elements in the PARALLEL, but it's one less
copy/spill than the previous case.

Note that copying might have to go through memory on some targets if the
registers in the PARALLEL are in different register files.

CONCAT is (IIRC) not supposed to show up in the RTL chain at all.

Note that expr.c may do something stupid with vectors.  Just looking at
emit_group_load_1 makes me wonder if everything is going to go through
memory when we've got PARALLELs and vectors.  That may actually make
your changes to vect_mode_store_cost more correct.

Jeff