From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 105614 invoked by alias); 3 Nov 2015 17:51:29 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 105596 invoked by uid 89); 3 Nov 2015 17:51:28 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.7 required=5.0 tests=AWL,BAYES_50,KAM_LAZY_DOMAIN_SECURITY,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 03 Nov 2015 17:51:27 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6579F4A; Tue, 3 Nov 2015 09:51:15 -0800 (PST) Received: from [10.2.206.22] (e104437-lin.cambridge.arm.com [10.2.206.22]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4FEC03F50D; Tue, 3 Nov 2015 09:51:24 -0800 (PST) Subject: Re: [AArch64] Update comments on the usage of X30 in FIXED_REGISTERS and CALL_USED_REGISTERS To: Richard Earnshaw References: <56210B82.5060808@arm.com> <563750B7.10304@foss.arm.com> <56375DFB.7050504@foss.arm.com> <563778A3.1050301@foss.arm.com> Cc: "gcc-patches@gcc.gnu.org" , Marcus Shawcroft , James Greenhalgh From: Jiong Wang Message-ID: <5638F41A.9080202@foss.arm.com> Date: Tue, 03 Nov 2015 17:51:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <563778A3.1050301@foss.arm.com> Content-Type: text/plain; charset=iso-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2015-11/txt/msg00260.txt.bz2 On 02/11/15 14:52, Richard Earnshaw wrote: > On 02/11/15 12:58, Jiong Wang wrote: >> >> On 02/11/15 12:01, Richard Earnshaw wrote: >>> On 16/10/15 15:36, Jiong Wang wrote: >>>> The patch https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02654.html >>>> from last year changed the definition of LR in CALL_USED_REGISTERS, >>>> but didn't update the comment above the #define to reflect the new >>>> usage. >>>> >>>> This patch bring the comment inline with the implementation. >>>> >>>> OK for trunk? >>>> >>>> Thanks. >>>> >>>> 2015-10-16 Jiong. Wang >>>> >>>> gcc/ >>>> * config/aarch64/aarch64.h: Update the comments on usage of X30. >>>> >>>> >>>> fix-comment.patch >>>> >>>> >>>> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h >>>> index 5a8db76..1eaaca0 100644 >>>> --- a/gcc/config/aarch64/aarch64.h >>>> +++ b/gcc/config/aarch64/aarch64.h >>>> @@ -210,14 +210,17 @@ extern unsigned aarch64_architecture_version; >>>> significant bits. Unlike AArch32 S1 is not packed into D0, >>>> etc. */ >>>> -/* Note that we don't mark X30 as a call-clobbered register. The >>>> idea is >>>> - that it's really the call instructions themselves which clobber X30. >>>> - We don't care what the called function does with it afterwards. >>>> - >>>> - This approach makes it easier to implement sibcalls. Unlike normal >>>> - calls, sibcalls don't clobber X30, so the register reaches the >>>> - called function intact. EPILOGUE_USES says that X30 is useful >>>> - to the called function. */ >>>> +/* We don't mark X30 as a fixed register while we mark it as a >>>> caller-saved >>>> + register. The idea is we want X30 to be allocable as a caller-saved >>>> + register when possible. >>>> + >>>> + NOTE: although X30 is marked as caller-saved, it's callee-saved >>>> at the same >>>> + time. The caller-saved attribute makes sure if X30 is allocated >>>> as free >>>> + register to hold any temporary value then the value is saved >>>> properly across >>>> + function call. While on AArch64, the call instruction writes the >>>> return >>>> + address to LR. If the called function is a non-leaf function, it >>>> is the >>>> + responsibility of the callee to save and restore LR appropriately >>>> in it's >>>> + prologue / epilogue. */ >>>> >>> Sorry, but I find that just confusing. >>> >>> Wouldn't it be easier just to say: >>> >>> X30 is clobbered by call instructions, so must be treated as a >>> caller-saved register. >> Richard, thanks for the review, but I am not convinced by your change. >> >> "caller-saved" in gcc just means if the live range of the register is >> across function call then the caller will make sure it will be saved and >> restored properly. this is completely a calling convention concept and >> have not relationship with how call instruction works. >> >> So, we mark X30 as caller-saved not because it will be clobbered by the >> call instructions but because we relax it as free register and want it >> to be saved by caller whenever it's allocated by register allocation and >> life range is across function call. >> >> And from my understanding, if one register if clobbered by call >> instruction, it's not must be treated as a caller-saved register, >> instead it must be treated as a *callee-saved* register. Because the call >> instruction is actually assigning a new value to the register then jump >> to the callee in an atomic way thus there is no save/restore from the >> caller of this "new value", callee is full responsible for this. The >> "NOTE" part in the patch is trying to highlight this so following extra >> check in aarch64_layout_frame with be eaiser to understand for others. >> >> /* ... and any callee saved register that dataflow says is live. */ >> for (regno = R0_REGNUM; regno <= R30_REGNUM; regno++) >> if (df_regs_ever_live_p (regno) >> && (regno == R30_REGNUM <--- X30, a caller-save, is >> callee-save as well. >> || !call_used_regs[regno])) >> cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED; >> >> Regards, >> Jiong > Right, I think I now understand what you are trying to say, but I still > think the wording does not convey that. > > We have two statements of fact > 1) On entry to a function LR contains the return address (by the > architecture) > 2) LR cannot retain values across a function call (it is a caller-saved > register by the PCS) > > We then have an implementation perspective on how to use LR given these > constraints: we treat the register as a callee-saved register and put > explicit clobbers on all call instructions. > > So how does the following sound? > > /* Technically, LR should be treated as a caller-saved register (since > it is modified during a subroutine call to contain the return address). > However, from the compiler's perspective, it is best to treat it as a > callee-saved register and then to put explicit clobber instructions on > each call instruction to ensure that live values are not retained in it > across call instructions. This allows us to use the register as a > scratch register between function calls. */ Interesting... I fell this new comments is viewing the behavior of x30 from a different perspective. By just reading this comments, I would have think the implementation on AArch64 is: 1. X30 is set to 0 in FIXED_REGISTERS 2. X30 is set to 0 in CALL_USED_REGISTERS so it will be treated as callee-saved by gcc, as you have wrote "from the compiler's perspective, it is best to treat it as a callee-saved register and then to put explicit clobber instructions on each call instruction to ensure that live values are not retained in it across call instructions" But on current AArch64 implementation, x30 is set to 1 instead of 0 in CALL_USED_REGISTERS, which means we let register allocator treat it as caller-saved register instead of callee-saved. My undertanding is either way will work correctly but there will be performance differences. If x30 is treated as callee-saved, those "clobber x30" in various call pattern will make sure gcc get informed that what's originally kept in x30 needs to be saved. While if x30 is treated as caller-saved, then register allocator will take care of that. I don't know what's the exact difference here between these two but I do fell they are taking different internal gcc code path. Whether x30 is caller or callee-saved will be very sensitive to register allocation, and cause cascade differences in later rtl passes. For a simple benchmarking by bootstrapping gcc on AArch64, enable RTL dump by make BOOT_CFLAGS="-fdump-rtl-ira -fdump-rtl-reload -O2", if we set x30 as callee-saved, then there will be about 0.5% more spilling and pushing than treating x30 as caller-saved which is our current implementation. The code size will be slightly bigger for it set as callee. commands used === grep "Pushing" gcc/*.ira | wc -l grep "Spill" gcc/*.ira | wc-l Anyway, I think this new comment doesn't match our current implementation. Regards, Jiong