From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 102581 invoked by alias); 23 Nov 2016 05:11:38 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 102549 invoked by uid 89); 23 Nov 2016 05:11:37 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-5.2 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy=handful, stubs, wine, gcc540 X-HELO: sasl.smtp.pobox.com Received: from pb-smtp1.pobox.com (HELO sasl.smtp.pobox.com) (64.147.108.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 23 Nov 2016 05:11:26 +0000 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-smtp1.pobox.com (Postfix) with ESMTP id F4012532E3; Wed, 23 Nov 2016 00:11:23 -0500 (EST) Received: from pb-smtp1.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp1.pobox.com (Postfix) with ESMTP id EB5FE532E2; Wed, 23 Nov 2016 00:11:23 -0500 (EST) Received: from [192.168.1.4] (unknown [76.215.41.237]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pb-smtp1.pobox.com (Postfix) with ESMTPSA id 2C4F1532E1; Wed, 23 Nov 2016 00:11:22 -0500 (EST) From: Daniel Santos Subject: [PATCH v2 0/9] Add optimization -moutline-msabi-xlougues (for Wine 64) Cc: Jan Hubicka , Uros Bizjak , Ian Lance Taylor To: gcc-patches Message-ID: <25abd41b-923b-2fea-dfc3-9051af632f44@pobox.com> Date: Wed, 23 Nov 2016 05:11:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------EB144C7FBC0456CC9E3E0E98" X-Pobox-Relay-ID: 46743F4A-B13B-11E6-BC1A-E98412518317-06139138!pb-smtp1.pobox.com X-IsSubscribed: yes X-SW-Source: 2016-11/txt/msg02293.txt.bz2 This is a multi-part message in MIME format. --------------EB144C7FBC0456CC9E3E0E98 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-length: 4950 Due to ABI differences, when a 64-bit Microsoft function calls and System V function, it must consider RSI, RDI and XMM6-15 as clobbered. Saving these registers can cost as much as 109 bytes and a similar amount for restoring. This patch set targets 64-bit Wine and aims to mitigate some of these costs by adding ms-->sysv save & restore stubs to libgcc, which are called from pro/epilogues rather than emitting the code inline. And since we're already tinkering with stubs, they will also manages the save/restore of up to 6 additional registers. Analysis of building Wine 64 demonstrates a reduction of .text by around 20%. The basic theory is that a reduction of I-cache misses will offset the extra instructions required for implementation. And since there are only a handful of stubs that will be in memory, I'm using the larger mov instructions instead of push/pop to facilitate better parallelization. I have not yet produced actual performance data. Here is a sample of some generated code: Prologue: 23c20: 48 8d 44 24 88 lea -0x78(%rsp),%rax 23c25: 48 81 ec 08 01 00 00 sub $0x108,%rsp 23c2c: e8 1a 4b 03 00 callq 5874b <__savms64_15> Epilogue (r10 stores the value to restore the stack pointer to): 23c7c: 48 8d b4 24 90 00 00 lea 0x90(%rsp),%rsi 23c83: 00 23c84: 4c 8d 56 78 lea 0x78(%rsi),%r10 23c88: e9 5e 4b 03 00 jmpq 587eb <__resms64x_15> It would appear that forced stack realignment has become the new normal for Wine 64, since there are many Windows programs that violate the 16-byte alignment requirement, but just so *happen* to not crash on Windows (and therefore claim that Wine should work as Windows happens to behave given the UB). Prologue, stack realignment case: 23c20: 55 push %rbp 23c21: 48 89 e5 mov %rsp,%rbp 23c24: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp 23c28: 48 8d 44 24 90 lea -0x70(%rsp),%rax 23c2d: 48 81 ec 00 01 00 00 sub $0x100,%rsp 23c34: e8 8e 43 03 00 callq 57fc7 <__savms64f_15> Epilogue, stack realignment case: 23c86: 48 8d b4 24 90 00 00 lea 0x90(%rsp),%rsi 23c8d: 00 23c8e: e9 80 43 03 00 jmpq 58013 <__resms64fx_15> No additional regression tests fail with this patch set. I have tested about 12 builds Wine (with varying optimizations & options) and no additional tests fails for that either. (Actually, there appears to be some type of regression prior to this patch set because it magically fixes about 30 failed Wine tests, that don't fail when building with Wine with gcc-5.4.0.) Outstanding issues: 1. My x86 assembly expertise is limited, so I would appreciate examination of my stubs & emitted code! 2. Regression tests only run on my old Phenom. Have not yet tested on AVX cpu (which should use vmovaps instead of movaps). 3. My test program is inadequate (and is not included in this patch set) and needs a lot of cleanup. During development it failed to produce many optimization errors that I got when building Wine. I've been building 64-bit Wine and running Wine's tests in the mean time. 4. It would help to write a benchmarking program/script. 5. I haven't yet figured out how to get Wine building with -flto and I thus haven't tested how these changes affect it yet. 6. I'm not 100% certain yet, but the stubs __resms64f* (restore with hard frame pointer, but return to the function) doesn't appear to ever be used because enabling hard frame pointers disables sibling calls, which is what it's intended to facilitate. gcc/config/i386/i386.c | 704 ++++++++++++++++++++++++++++++++++++++--- gcc/config/i386/i386.h | 22 +- gcc/config/i386/i386.opt | 5 + gcc/config/i386/predicates.md | 155 +++++++++ gcc/config/i386/sse.md | 46 +++ gcc/doc/invoke.texi | 11 +- libgcc/config.host | 2 +- libgcc/config/i386/i386-asm.h | 82 +++++ libgcc/config/i386/resms64.S | 63 ++++ libgcc/config/i386/resms64f.S | 59 ++++ libgcc/config/i386/resms64fx.S | 61 ++++ libgcc/config/i386/resms64x.S | 65 ++++ libgcc/config/i386/savms64.S | 63 ++++ libgcc/config/i386/savms64f.S | 64 ++++ libgcc/config/i386/t-msabi | 7 + 15 files changed, 1358 insertions(+), 51 deletions(-) Changes in Version 2: * Added ChangeLogs (attached). * Changed option from -f to -m and moved from gcc/common.opt to gcc/config/i386/i386.opt. * Solved problem with uncombined SP modifications. * Optimization now works when hard frame pointers are used and stack realignment is not needed. * Added documentation to gcc/doc/invoke.texi Feedback and comments would be most appreciated! Thanks, Daniel --------------EB144C7FBC0456CC9E3E0E98 Content-Type: text/plain; charset=UTF-8; name="ChangeLog-moutline-msabi-xlogues.gcc" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="ChangeLog-moutline-msabi-xlogues.gcc" Content-length: 2802 CSogY29uZmlnL2kzODYvaTM4Ni5vcHQ6IEFkZCBvcHRpb24gLW1vdXRsaW5l LW1zYWJpLXhsb2d1ZXMuCgoJKiBjb25maWcvaTM4Ni9pMzg2LmgKCSh4ODZf NjRfbXNfc3lzdl9leHRyYV9jbG9iYmVyZWRfcmVnaXN0ZXJzKTogQ2hhbmdl IHR5cGUgdG8gdW5zaWduZWQuCgkoTlVNX1g4Nl82NF9NU19DTE9CQkVSRURf UkVHUyk6IE5ldyBtYWNyby4KCShzdHJ1Y3QgbWFjaGluZV9mdW5jdGlvbik6 IEFkZCBuZXcgbWVtYmVycyBvdXRsaW5lX21zX3N5c3YsCglvdXRsaW5lX21z X3N5c3ZfcGFkX2luLCBvdXRsaW5lX21zX3N5c3ZfcGFkX291dCBhbmQKCW91 dGxpbmVfbXNfc3lzdl9leHRyYV9yZWdzLgoKCSogY29uZmlnL2kzODYvaTM4 Ni5jCgkoZW51bSB4bG9ndWVfc3R1Yik6IE5ldyBlbnVtLgoJKGVudW0geGxv Z3VlX3N0dWJfc2V0cyk6IE5ldyBlbnVtLgoJKGNsYXNzIHhsb2d1ZV9sYXlv dXQpOiBOZXcgY2xhc3MuCgkoc3RydWN0IGl4ODZfZnJhbWUpOiBBZGQgb3V0 bGluZWRfc2F2ZV9vZmZzZXQgbWVtYmVyLCBtb2RpZnkgY29tbWVudHMKCXRv IGRldGFpbCBzdGFjayBsYXlvdXQgd2hlbiB1c2luZyBvdXQtb2YtbGluZSBz dHVicy4KCShpeDg2X3RhcmdldF9zdHJpbmcpOiBBZGQgLW1vdXRsaW5lLW1z YWJpLXhsb2d1ZXMgb3B0aW9uLgoKCShzdHViX21hbmFnZWRfcmVncyk6IE5l dyBzdGF0aWMgdmFyaWFibGUuCgkoaXg4Nl9zYXZlX3JlZyk6IEFkZCBuZXcg cGFyYW1ldGVyIGlnbm9yZV9vdXRsaW5lZCB0byBvcHRpb25hbGx5IG9taXQK CXJlZ2lzdGVycyBtYW5hZ2VkIGJ5IG91dC1vZi1saW5lIHN0dWIuCgkoaXg4 Nl9uc2F2ZWRfcmVncyk6IE1vZGlmeSB0byBhY2NvbW1vZGF0ZSBjaGFuZ2Vz IHRvIGl4ODZfc2F2ZV9yZWcuCgkoaXg4Nl9uc2F2ZWRfc3NlcmVncyk6IExp a2V3aXNlLgoJKGl4ODZfZW1pdF9zYXZlX3JlZ3MpOiBMaWtld2lzZS4KCShp eDg2X2VtaXRfc2F2ZV9yZWdzX3VzaW5nX21vdik6IExpa2V3aXNlLgoJKGl4 ODZfZW1pdF9zYXZlX3NzZV9yZWdzX3VzaW5nX21vdik6IExpa2V3aXNlLgoJ KGdldF9zY3JhdGNoX3JlZ2lzdGVyX29uX2VudHJ5KTogTGlrZXdpc2UuCgko aXg4Nl9jb21wdXRlX2ZyYW1lX2xheW91dCk6IE1vZGlmeSB0byBkaXNhYmxl IG0tPm91dGxpbmVfbXNfc3lzdiB3aGVuCglhcHByb3ByaWF0ZSBhbmQgY29t cHV0ZSBmcmFtZSBsYXlvdXQgZm9yIG91dC1vZi1saW5lIHN0dWJzLgoJKGdl bl9mcmFtZV9zZXQpOiBOZXcgZnVuY3Rpb24uCgkoZ2VuX2ZyYW1lX2xvYWQp OiBMaWtld2lzZS4KCShnZW5fZnJhbWVfc3RvcmUpOiBMaWtld2lzZS4KCShl bWl0X21zYWJpX291dGxpbmVkX3NhdmUpOiBMaWtld2lzZS4KCShpeDg2X2V4 cGFuZF9wcm9sb2d1ZSk6IE1vZGlmeSB0byBjYWxsIGVtaXRfbXNhYmlfb3V0 bGluZWRfc2F2ZSB3aGVuCglhcHByb3ByaWF0ZS4KCShpeDg2X2VtaXRfbGVh dmUpOiBBZGQgcGFyYW1ldGVyIHJ0eF9pbnNuICppbnNuLCBhbGxvd2luZyBp dCB0byBiZSB1c2VkCgl0byBvbmx5IGdlbmVyYXRlIHRoZSBub3Rlcy4KCShl bWl0X21zYWJpX291dGxpbmVkX3Jlc3RvcmUpOiBOZXcgZnVuY3Rpb24uCgko aXg4Nl9leHBhbmRfZXBpbG9ndWUpOiBNb2RpZnkgdG8gY2FsbCBlbWl0X21z YWJpX291dGxpbmVkX3Jlc3RvcmUgd2hlbgoJYXBwcm9wcmlhdGUuCgkoaXg4 Nl9leHBhbmRfY2FsbCk6IE1vZGlmeSB0byBlbmFibGUgbS0+b3V0bGluZV9t c19zeXN2IHdoZW4KCWFwcHJvcHJpYXRlLgoKCSogY29uZmlnL2kzODYvcHJl ZGljYXRlcy5tZAoJKHNhdmVfbXVsdGlwbGUpOiBOZXcgcHJlZGljYXRlLgoJ KHJlc3RvcmVfbXVsdGlwbGUpOiBMaWtld2lzZS4KCSogY29uZmlnL2kzODYv c3NlLm1kCgkoc2F2ZV9tdWx0aXBsZTxtb2RlPik6IE5ldyBwYXR0ZXJuLgoJ KHNhdmVfbXVsdGlwbGVfcmVhbGlnbjxtb2RlPik6IExpa2V3aXNlLgoJKHJl c3RvcmVfbXVsdGlwbGU8bW9kZT4pOiBMaWtld2lzZS4KCShyZXN0b3JlX211 bHRpcGxlX2FuZF9yZXR1cm48bW9kZT4pOiBMaWtld2lzZS4KCShyZXN0b3Jl X211bHRpcGxlX2xlYXZlX3JldHVybjxtb2RlPik6IExpa2V3aXNlLgo= --------------EB144C7FBC0456CC9E3E0E98 Content-Type: text/plain; charset=UTF-8; name="ChangeLog-moutline-msabi-xlogues.libgcc" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="ChangeLog-moutline-msabi-xlogues.libgcc" Content-length: 480 CSogY29uZmlnLmhvc3Q6IEFkZCBpMzg2L3QtbXNhYmkgdG8gaTM4Ni90LWxp bnV4IGZpbGUgbGlzdC4KCSogY29uZmlnL2kzODYvaTM4Ni1hc20uaDogTmV3 IGZpbGUuCgkqIGNvbmZpZy9pMzg2L3Jlc21zNjQuUzogTmV3IGZpbGUuCgkq IGNvbmZpZy9pMzg2L3Jlc21zNjRmLlM6IE5ldyBmaWxlLgoJKiBjb25maWcv aTM4Ni9yZXNtczY0ZnguUzogTmV3IGZpbGUuCgkqIGNvbmZpZy9pMzg2L3Jl c21zNjR4LlM6IE5ldyBmaWxlLgoJKiBjb25maWcvaTM4Ni9zYXZtczY0LlM6 IE5ldyBmaWxlLgoJKiBjb25maWcvaTM4Ni9zYXZtczY0Zi5TOiBOZXcgZmls ZS4KCSogY29uZmlnL2kzODYvdC1tc2FiaTogTmV3IGZpbGUuCg== --------------EB144C7FBC0456CC9E3E0E98--