From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 129891 invoked by alias); 28 Jun 2018 05:52:52 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 129878 invoked by uid 89); 28 Jun 2018 05:52:51 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=exposed, spills X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 28 Jun 2018 05:52:50 +0000 Received: from smtp.corp.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BC5C8C057F82; Thu, 28 Jun 2018 05:52:48 +0000 (UTC) Received: from localhost.localdomain (ovpn-112-5.rdu2.redhat.com [10.10.112.5]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0464486F9B; Thu, 28 Jun 2018 05:52:47 +0000 (UTC) Subject: Re: [PATCH] Fix PR84101, account for function ABI details in vectorization costs To: Richard Biener Cc: gcc-patches@gcc.gnu.org References: From: Jeff Law Openpgp: preference=signencrypt Message-ID: Date: Thu, 28 Jun 2018 05:52:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2018-06/txt/msg01761.txt.bz2 [ Returning to an old patch... ] On 02/14/2018 04:52 AM, Richard Biener wrote: > On Tue, 13 Feb 2018, Jeff Law wrote: > >> On 01/30/2018 02:59 AM, Richard Biener wrote: >>> >>> This patch tries to deal with the "easy" part of a function ABI, >>> the return value location, in vectorization costing. The testcase >>> shows that if we vectorize the returned value but the function >>> doesn't return in memory or in a vector register but as in this >>> case in an integer register pair (reg:TI ax) (bah, ABI details >>> exposed late? why's this not a parallel?) we end up spilling >>> badly. >> PARALLEL is used when the ABI mandates a value be returned in multiple >> places. Typically that happens when the value is returned in different >> types of registers (integer, floating point, vector). >> >> Presumably it's not a PARALLEL in this case because the value is only >> returned in %eax. > > It's returned in %eax and %rdx (TImode after all). But maybe > "standard register pairs" are not represented as PARALLEL ... Register pairs and PARALLELs handle two different issues. A register pair such as eax/edx is used to hold a value that is larger than a single register. A PARALLEL is used to hold a return value that has to appear in multiple places. So for example a0/d0 on m68k in some circumstances. They can even be combined. You might have eax/edx as the first entry in a parallel with xmm0 as the second entry. That would indicate a TImode value that is in eax/edx as well as in xmm0. >From the standpoint of costing the spills for a return value we can generate the return value into any object in the PARALLEL, but we will have to copy it to all the other objects in the PARALLEL. So if none of the objects in the PARALLEL are suitable for vector operations, then obviously we're going to have to copy from the vector register to all the elements in the PARALLEL. This is the most expensive case. If one of the objects in the PARALLEL is suitable for vector ops, then that's where we want the result to end up. We still have to copy the result to the other elements in the PARALLEL, but it's one less copy/spill than the previous case. Note that copying might have to go through memory on some targets if the registers in the PARALLEL are in different register files. CONCAT is (IIRC) not supposed to show up in the RTL chain at all. Note that expr.c may do something stupid with vectors. Just looking at emit_group_load_1 makes me wonder if everything is going to go through memory when we've got PARALLELs and vectors. That may actually make your changes to vect_mode_store_cost more correct. Jeff