From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 130670 invoked by alias); 7 Feb 2017 18:34:30 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 130620 invoked by uid 89); 7 Feb 2017 18:34:29 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy=accidentally, 1323, H*Ad:D*pobox.com, wine X-HELO: sasl.smtp.pobox.com Received: from pb-smtp2.pobox.com (HELO sasl.smtp.pobox.com) (64.147.108.71) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 07 Feb 2017 18:34:19 +0000 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id 5087467546; Tue, 7 Feb 2017 13:34:17 -0500 (EST) Received: from pb-smtp2.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id 48ED767545; Tue, 7 Feb 2017 13:34:17 -0500 (EST) Received: from [192.168.1.4] (unknown [76.215.41.237]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pb-smtp2.pobox.com (Postfix) with ESMTPSA id 72A1867542; Tue, 7 Feb 2017 13:34:16 -0500 (EST) To: gcc-patches , Uros Bizjak , Jan Hubicka , Sandra Loosemore From: Daniel Santos Subject: [RFC] [PATCH v3 0/8] [i386] Use out-of-line stubs for ms_abi pro/epilogues Message-ID: <2fd14fe7-8d06-45ab-fb1e-96c9c8f4c03b@pobox.com> Date: Tue, 07 Feb 2017 18:34:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Pobox-Relay-ID: 07F6B4E0-ED64-11E6-9AE8-A7617B1B28F4-06139138!pb-smtp2.pobox.com X-IsSubscribed: yes X-SW-Source: 2017-02/txt/msg00548.txt.bz2 I apologize to those of you who get this twice, but I accidentally posted to the wrong list! Uros or Jan, Please take this as a ping, as I never bothered pinging after submitting v2 since I found a few more issues with it. :) Although I realize this would be a GCC 8 stage 1 item, I would like to try to get it finished up and tentatively approved as soon as I can. I have tried to summarize this patch set as clearly and succinctly below as possible. Thanks! * This patch set depends upon the "Use aligned SSE movs for re-aligned MS ABI pro/epilogues" patch set: https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01859.html * I have submitted a test program submitted separately: https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00542.html Summary ======= When a 64-bit Microsoft function calls and System V function, ABI differences requires RSI, RDI and XMM6-15 to be considered as clobbered. Saving these registers inline can cost as much as 109 bytes and a similar amount for restoring. This patch set targets 64-bit Wine and aims to mitigate some of these costs by adding ms/sysv save & restore stubs to libgcc, which are called from pro/epilogues rather than emitting the code inline. And since we're already tinkering with stubs, they will also manages the save/restore of all remaining registers if possible. Analysis of building Wine 64 demonstrates a reduction of .text by around 20%, which also translates into a reduction of Wine's install size by 34MiB. As there will usually only be 3 stubs in memory at any time, I'm using the larger mov instructions instead of push/pop to facilitate better parallelization. The basic theory is that the combination of better parallelization and reduced I-cache misses will offset the extra instructions required for implementation, although I have not produced actual performance data yet. For now, I have called this feature -moutline-msabi-xlogues, but Sandra Loosemore has this suggestion: (https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02670.html) > Just as a suggestion (I'm not an i386 maintainer), I'd recommend > spelling the name of this option -mno-inline-msabi-xlogues instead of > -moutline-msabi-xlogues, and making the default -minline-msabi-xlogues. When enabled, the feature is activated when an ms_abi function calls a sysv_abi function if the following is true (evaluated in ix86_compute_frame_layout): TARGET_SSE && !ix86_function_ms_hook_prologue (current_function_decl) && !SEH && !crtl->calls_eh_return && !ix86_static_chain_on_stack && !ix86_using_red_zone () && !flag_split_stack Some of these, like __builtin_eh_return, might be easy to add but I don't have a test for them. StackLayout ============ When active, registers are saved on the stack differently. Note that when not active, stack layout is *unchanged*. [arguments] <- ARG_POINTER saved pc saved frame pointer if frame_pointer_needed <- HARD_FRAME_POINTER [saved regs] if not managed by stub, (e.g. explicitly clobbered) <- reg_save_offset [padding0] <- stack_realign_offset <- Start of out-of-line, stub-managed regs XMM6-15 RSI RDI [RBX] if RBX is clobbered [RBP] if RBP and RBX are clobbered and HFP not used. [R12] if R12 and all previous regs are clobbered [R13] if R13 and all previous regs are clobbered [R14] if R14 and all previous regs are clobbered [R15] if R15 and all previous regs are clobbered <- end of stub-saved/restored regs [padding1] <- outlined_save_offset <- sse_regs_save_offset [padding2] <- FRAME_POINTER [va_arg registers] [frame] ... etc. Stubs ===== There are two sets of stubs for use with and without hard frame pointers. Each set has a save, a restore and a restore-as-tail-call that performs the function's return. Each stub has entry points for the number of registers it's saving. The non-tail-call restore is used when a sibling call is the tail. If a normal register is explicitly clobbered out of the order that hard registers are usually assigned in (e.g., __asm__ __volatile__ ("":::"r15")), then that register will be saved and restored as normal and not by the stub. Stub names: __savms64_(12-18) __resms64_(12-18) __resms64x_(12-18) __savms64f_(12-17) __resms64f_(12-17) __resms64fx_(12-17) Save stubs use RAX as a base register and restore stubs use RSI, the later which is overwritten before returning. Restore-as-tail-call for the non-HFP case uses R10 to restore the stack pointer before returning. Samples ======= Standard case with RBX, RBP and R12 also being used in function: Prologue: lea -0x78(%rsp),%rax sub $0x108,%rsp callq 5874b <__savms64_15> Epilogue (r10 stores the value to restore the stack pointer to): lea 0x90(%rsp),%rsi lea 0x78(%rsi),%r10 jmpq 587eb <__resms64x_15> Stack pointer realignment case (same clobbers): Prologue, stack realignment case: push %rbp mov %rsp,%rbp and $0xfffffffffffffff0,%rsp lea -0x70(%rsp),%rax sub $0x100,%rsp callq 57fc7 <__savms64f_15> Epilogue, stack realignment case: lea 0x90(%rsp),%rsi jmpq 58013 <__resms64fx_15> Testing ======= A comprehensive test program is submitted separately with no additional tests failing. I have also run Wine's tests with no additional failures (although a few very minor tweaks have gone in since I last ran Wine's tests). I have not run -flto tests on Wine as I haven't yet found a way to Wine to build with -flto, maybe I'm just doing it wrong. Daniel Santos gcc/config/i386/i386.c | 700 ++++++++++++++++++++++++++++++++++++++--- gcc/config/i386/i386.h | 22 +- gcc/config/i386/i386.opt | 5 + gcc/config/i386/predicates.md | 155 +++++++++ gcc/config/i386/sse.md | 37 +++ gcc/doc/invoke.texi | 11 +- libgcc/config.host | 2 +- libgcc/config/i386/i386-asm.h | 82 +++++ libgcc/config/i386/resms64.S | 57 ++++ libgcc/config/i386/resms64f.S | 55 ++++ libgcc/config/i386/resms64fx.S | 57 ++++ libgcc/config/i386/resms64x.S | 59 ++++ libgcc/config/i386/savms64.S | 57 ++++ libgcc/config/i386/savms64f.S | 55 ++++ libgcc/config/i386/t-msabi | 7 + 15 files changed, 1323 insertions(+), 38 deletions(-)