public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Daniel Santos <daniel.santos@pobox.com>
To: gcc-patches <gcc-patches@gcc.gnu.org>,
	Uros Bizjak <ubizjak@gmail.com>, Jan Hubicka <hubicka@ucw.cz>,
	Sandra Loosemore <sandra@codesourcery.com>
Subject: [RFC] [PATCH v3 0/8] [i386] Use out-of-line stubs for ms_abi pro/epilogues
Date: Tue, 07 Feb 2017 18:34:00 -0000	[thread overview]
Message-ID: <2fd14fe7-8d06-45ab-fb1e-96c9c8f4c03b@pobox.com> (raw)

I apologize to those of you who get this twice, but I accidentally 
posted to the wrong list!

Uros or Jan,
Please take this as a ping, as I never bothered pinging after submitting 
v2 since I found a few more issues with it. :) Although I realize this 
would be a GCC 8 stage 1 item, I would like to try to get it finished up 
and tentatively approved as soon as I can.  I have tried to summarize 
this patch set as clearly and succinctly below as possible.  Thanks!

  * This patch set depends upon the "Use aligned SSE movs for re-aligned
    MS ABI pro/epilogues" patch set:
https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01859.html
  * I have submitted a test program submitted separately:
https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00542.html


Summary
=======

When a 64-bit Microsoft function calls and System V function, ABI 
differences requires RSI, RDI and XMM6-15 to be considered as 
clobbered.  Saving these registers inline can cost as much as 109 bytes 
and a similar amount for restoring. This patch set targets 64-bit Wine 
and aims to mitigate some of these costs by adding ms/sysv save & 
restore stubs to libgcc, which are called from pro/epilogues rather than 
emitting the code inline.  And since we're already tinkering with stubs, 
they will also manages the save/restore of all remaining registers if 
possible.  Analysis of building Wine 64 demonstrates a reduction of 
.text by around 20%, which also translates into a reduction of Wine's 
install size by 34MiB.

As there will usually only be 3 stubs in memory at any time, I'm using 
the larger mov instructions instead of push/pop to facilitate better 
parallelization. The basic theory is that the combination of better 
parallelization and reduced I-cache misses will offset the extra 
instructions required for implementation, although I have not produced 
actual performance data yet.

For now, I have called this feature -moutline-msabi-xlogues, but Sandra 
Loosemore has this suggestion: 
(https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02670.html)

> Just as a suggestion (I'm not an i386 maintainer), I'd recommend
> spelling the name of this option -mno-inline-msabi-xlogues instead of
> -moutline-msabi-xlogues, and making the default -minline-msabi-xlogues.

When enabled, the feature is activated when an ms_abi function calls a 
sysv_abi function if the following is true (evaluated in 
ix86_compute_frame_layout):

     TARGET_SSE
     && !ix86_function_ms_hook_prologue (current_function_decl)
     && !SEH
     && !crtl->calls_eh_return
     && !ix86_static_chain_on_stack
     && !ix86_using_red_zone ()
     && !flag_split_stack

Some of these, like __builtin_eh_return, might be easy to add but I 
don't have a test for them.


StackLayout
============

When active, registers are saved on the stack differently. Note that 
when not active, stack layout is *unchanged*.

     [arguments]
                             <- ARG_POINTER
     saved pc

     saved frame pointer     if frame_pointer_needed
                             <- HARD_FRAME_POINTER
     [saved regs]            if not managed by stub, (e.g. explicitly 
clobbered)
                             <- reg_save_offset
     [padding0]
                             <- stack_realign_offset
                             <- Start of out-of-line, stub-managed regs
     XMM6-15
     RSI
     RDI
     [RBX]                   if RBX is clobbered
     [RBP]                   if RBP and RBX are clobbered and HFP not used.
     [R12]                   if R12 and all previous regs are clobbered
     [R13]                   if R13 and all previous regs are clobbered
     [R14]                   if R14 and all previous regs are clobbered
     [R15]                   if R15 and all previous regs are clobbered
                             <- end of stub-saved/restored regs
     [padding1]
                             <- outlined_save_offset
                             <- sse_regs_save_offset
     [padding2]
                             <- FRAME_POINTER
     [va_arg registers]

     [frame]
     ... etc.


Stubs
=====

There are two sets of stubs for use with and without hard frame 
pointers.  Each set has a save, a restore and a restore-as-tail-call 
that performs the function's return.  Each stub has entry points for the 
number of registers it's saving. The non-tail-call restore is used when 
a sibling call is the tail.  If a normal register is explicitly 
clobbered out of the order that hard registers are usually assigned in 
(e.g., __asm__ __volatile__ ("":::"r15")), then that register will be 
saved and restored as normal and not by the stub.

Stub names:
__savms64_(12-18)
__resms64_(12-18)
__resms64x_(12-18)

__savms64f_(12-17)
__resms64f_(12-17)
__resms64fx_(12-17)

Save stubs use RAX as a base register and restore stubs use RSI, the 
later which is overwritten before returning. Restore-as-tail-call for 
the non-HFP case uses R10 to restore the stack pointer before returning.

Samples
=======

Standard case with RBX, RBP and R12 also being used in function:

   Prologue:
     lea    -0x78(%rsp),%rax
     sub    $0x108,%rsp
     callq  5874b <__savms64_15>

   Epilogue (r10 stores the value to restore the stack pointer to):
     lea    0x90(%rsp),%rsi
     lea    0x78(%rsi),%r10
     jmpq   587eb <__resms64x_15>

Stack pointer realignment case (same clobbers):

   Prologue, stack realignment case:
     push   %rbp
     mov    %rsp,%rbp
     and    $0xfffffffffffffff0,%rsp
     lea    -0x70(%rsp),%rax
     sub    $0x100,%rsp
     callq  57fc7 <__savms64f_15>

   Epilogue, stack realignment case:
     lea    0x90(%rsp),%rsi
     jmpq   58013 <__resms64fx_15>


Testing
=======

A comprehensive test program is submitted separately with no additional 
tests failing.  I have also run Wine's tests with no additional failures 
(although a few very minor tweaks have gone in since I last ran Wine's 
tests).  I have not run -flto tests on Wine as I haven't yet found a way 
to Wine to build with -flto, maybe I'm just doing it wrong.

Daniel Santos

  gcc/config/i386/i386.c         | 700 
++++++++++++++++++++++++++++++++++++++---
  gcc/config/i386/i386.h         |  22 +-
  gcc/config/i386/i386.opt       |   5 +
  gcc/config/i386/predicates.md  | 155 +++++++++
  gcc/config/i386/sse.md         |  37 +++
  gcc/doc/invoke.texi            |  11 +-
  libgcc/config.host             |   2 +-
  libgcc/config/i386/i386-asm.h  |  82 +++++
  libgcc/config/i386/resms64.S   |  57 ++++
  libgcc/config/i386/resms64f.S  |  55 ++++
  libgcc/config/i386/resms64fx.S |  57 ++++
  libgcc/config/i386/resms64x.S  |  59 ++++
  libgcc/config/i386/savms64.S   |  57 ++++
  libgcc/config/i386/savms64f.S  |  55 ++++
  libgcc/config/i386/t-msabi     |   7 +
  15 files changed, 1323 insertions(+), 38 deletions(-)

             reply	other threads:[~2017-02-07 18:34 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-07 18:34 Daniel Santos [this message]
2017-02-07 18:36 ` [PATCH 4/8] [i386] Modify ix86_save_reg to optionally omit stub-managed registers Daniel Santos
2017-02-07 18:36 ` [PATCH 2/8] [i386] Add option -moutline-msabi-xlogues Daniel Santos
2017-02-08 23:28   ` Bernhard Reutner-Fischer
2017-02-10  4:43     ` Daniel Santos
2017-02-10 16:54       ` Sandra Loosemore
2017-02-10 17:32         ` Daniel Santos
2017-04-01 22:37         ` Daniel Santos
2017-02-07 18:36 ` [PATCH 3/8] [i386] Adds class xlouge_layout and new fields to struct machine_function Daniel Santos
2017-02-07 18:37 ` [PATCH 7/8] [i386] Add msabi pro/epilogue stubs to libgcc Daniel Santos
2017-02-07 18:37 ` [PATCH 1/8] [i386] Minor refactoring Daniel Santos
2017-02-07 18:37 ` [PATCH 6/8] [i386] Add patterns and predicates foutline-msabi-xlouges Daniel Santos
2017-02-07 18:37 ` [PATCH 5/8] [i386] Modify ix86_compute_frame_layout for foutline-msabi-xlogues Daniel Santos
2017-02-10 10:32 ` [RFC] [PATCH v3 0/8] [i386] Use out-of-line stubs for ms_abi pro/epilogues Uros Bizjak
2017-02-10 11:34   ` JonY
2017-02-10 17:20     ` Daniel Santos
2017-02-11  0:30       ` JonY
2017-02-11  7:24         ` Daniel Santos
2017-03-13 18:40         ` Daniel Santos
2017-03-10  4:42     ` Daniel Santos
2017-03-30 17:55       ` Daniel Santos
2017-03-30 23:28         ` JonY
2017-02-10 17:55   ` Daniel Santos

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2fd14fe7-8d06-45ab-fb1e-96c9c8f4c03b@pobox.com \
    --to=daniel.santos@pobox.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hubicka@ucw.cz \
    --cc=sandra@codesourcery.com \
    --cc=ubizjak@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).