public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2 0/9] Add optimization -moutline-msabi-xlougues (for Wine 64)
@ 2016-11-23  5:11 Daniel Santos
  2016-11-23  5:16 ` [PATCH 1/9] Change type of x86_64_ms_sysv_extra_clobbered_registers Daniel Santos
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Daniel Santos @ 2016-11-23  5:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jan Hubicka, Uros Bizjak, Ian Lance Taylor

[-- Attachment #1: Type: text/plain, Size: 4950 bytes --]

Due to ABI differences, when a 64-bit Microsoft function calls and 
System V function, it  must consider RSI, RDI and XMM6-15 as clobbered. 
Saving these registers can cost as much as 109 bytes and a similar 
amount for restoring. This patch set targets 64-bit Wine and aims to 
mitigate some of these costs by adding ms-->sysv save & restore stubs to 
libgcc, which are called from pro/epilogues rather than emitting the 
code inline.  And since we're already tinkering with stubs, they will 
also manages the save/restore of up to 6 additional registers. Analysis 
of building Wine 64 demonstrates a reduction of .text by around 20%.

The basic theory is that a reduction of I-cache misses will offset the 
extra instructions required for implementation. And since there are only 
a handful of stubs that will be in memory, I'm using the larger mov 
instructions instead of push/pop to facilitate better parallelization. I 
have not yet produced actual performance data.

Here is a sample of some generated code:

Prologue:
    23c20:       48 8d 44 24 88          lea -0x78(%rsp),%rax
    23c25:       48 81 ec 08 01 00 00    sub    $0x108,%rsp
    23c2c:       e8 1a 4b 03 00          callq  5874b <__savms64_15>

Epilogue (r10 stores the value to restore the stack pointer to):
    23c7c:       48 8d b4 24 90 00 00    lea 0x90(%rsp),%rsi
    23c83:       00
    23c84:       4c 8d 56 78             lea 0x78(%rsi),%r10
    23c88:       e9 5e 4b 03 00          jmpq   587eb <__resms64x_15>

It would appear that forced stack realignment has become the new normal 
for Wine 64, since there are many Windows programs that violate the 
16-byte alignment requirement, but just so *happen* to not crash on 
Windows (and therefore claim that Wine should work as Windows happens to 
behave given the UB).

Prologue, stack realignment case:
    23c20:       55                      push   %rbp
    23c21:       48 89 e5                mov    %rsp,%rbp
    23c24:       48 83 e4 f0             and $0xfffffffffffffff0,%rsp
    23c28:       48 8d 44 24 90          lea -0x70(%rsp),%rax
    23c2d:       48 81 ec 00 01 00 00    sub    $0x100,%rsp
    23c34:       e8 8e 43 03 00          callq  57fc7 <__savms64f_15>

Epilogue, stack realignment case:
    23c86:       48 8d b4 24 90 00 00    lea 0x90(%rsp),%rsi
    23c8d:       00
    23c8e:       e9 80 43 03 00          jmpq   58013 <__resms64fx_15>

No additional regression tests fail with this patch set. I have tested 
about 12 builds Wine (with varying optimizations & options) and no 
additional tests fails for that either. (Actually, there appears to be 
some type of regression prior to this patch set because it magically 
fixes about 30 failed Wine tests, that don't fail when building with 
Wine with gcc-5.4.0.)

Outstanding issues:

 1. My x86 assembly expertise is limited, so I would appreciate
    examination of my stubs & emitted code!
 2. Regression tests only run on my old Phenom. Have not yet tested on
    AVX cpu (which should use vmovaps instead of movaps).
 3. My test program is inadequate (and is not included in this patch
    set) and needs a lot of cleanup.  During development it failed to
    produce many optimization errors that I got when building Wine. 
    I've been building 64-bit Wine and running Wine's tests in the mean
    time.
 4. It would help to write a benchmarking program/script.
 5. I haven't yet figured out how to get Wine building with -flto and I
    thus haven't tested how these changes affect it yet.
 6. I'm not 100% certain yet, but the stubs __resms64f* (restore with
    hard frame pointer, but return to the function) doesn't appear to
    ever be used because enabling hard frame pointers disables sibling
    calls, which is what it's intended to facilitate.


  gcc/config/i386/i386.c         | 704 
++++++++++++++++++++++++++++++++++++++---
  gcc/config/i386/i386.h         |  22 +-
  gcc/config/i386/i386.opt       |   5 +
  gcc/config/i386/predicates.md  | 155 +++++++++
  gcc/config/i386/sse.md         |  46 +++
  gcc/doc/invoke.texi            |  11 +-
  libgcc/config.host             |   2 +-
  libgcc/config/i386/i386-asm.h  |  82 +++++
  libgcc/config/i386/resms64.S   |  63 ++++
  libgcc/config/i386/resms64f.S  |  59 ++++
  libgcc/config/i386/resms64fx.S |  61 ++++
  libgcc/config/i386/resms64x.S  |  65 ++++
  libgcc/config/i386/savms64.S   |  63 ++++
  libgcc/config/i386/savms64f.S  |  64 ++++
  libgcc/config/i386/t-msabi     |   7 +
  15 files changed, 1358 insertions(+), 51 deletions(-)


Changes in Version 2:

  * Added ChangeLogs (attached).
  * Changed option from -f to -m and moved from gcc/common.opt to
    gcc/config/i386/i386.opt.
  * Solved problem with uncombined SP modifications.
  * Optimization now works when hard frame pointers are used and stack
    realignment is not needed.
  * Added documentation to gcc/doc/invoke.texi

Feedback and comments would be most appreciated!

Thanks,
Daniel






[-- Attachment #2: ChangeLog-moutline-msabi-xlogues.gcc --]
[-- Type: text/plain, Size: 2066 bytes --]

	* config/i386/i386.opt: Add option -moutline-msabi-xlogues.

	* config/i386/i386.h
	(x86_64_ms_sysv_extra_clobbered_registers): Change type to unsigned.
	(NUM_X86_64_MS_CLOBBERED_REGS): New macro.
	(struct machine_function): Add new members outline_ms_sysv,
	outline_ms_sysv_pad_in, outline_ms_sysv_pad_out and
	outline_ms_sysv_extra_regs.

	* config/i386/i386.c
	(enum xlogue_stub): New enum.
	(enum xlogue_stub_sets): New enum.
	(class xlogue_layout): New class.
	(struct ix86_frame): Add outlined_save_offset member, modify comments
	to detail stack layout when using out-of-line stubs.
	(ix86_target_string): Add -moutline-msabi-xlogues option.

	(stub_managed_regs): New static variable.
	(ix86_save_reg): Add new parameter ignore_outlined to optionally omit
	registers managed by out-of-line stub.
	(ix86_nsaved_regs): Modify to accommodate changes to ix86_save_reg.
	(ix86_nsaved_sseregs): Likewise.
	(ix86_emit_save_regs): Likewise.
	(ix86_emit_save_regs_using_mov): Likewise.
	(ix86_emit_save_sse_regs_using_mov): Likewise.
	(get_scratch_register_on_entry): Likewise.
	(ix86_compute_frame_layout): Modify to disable m->outline_ms_sysv when
	appropriate and compute frame layout for out-of-line stubs.
	(gen_frame_set): New function.
	(gen_frame_load): Likewise.
	(gen_frame_store): Likewise.
	(emit_msabi_outlined_save): Likewise.
	(ix86_expand_prologue): Modify to call emit_msabi_outlined_save when
	appropriate.
	(ix86_emit_leave): Add parameter rtx_insn *insn, allowing it to be used
	to only generate the notes.
	(emit_msabi_outlined_restore): New function.
	(ix86_expand_epilogue): Modify to call emit_msabi_outlined_restore when
	appropriate.
	(ix86_expand_call): Modify to enable m->outline_ms_sysv when
	appropriate.

	* config/i386/predicates.md
	(save_multiple): New predicate.
	(restore_multiple): Likewise.
	* config/i386/sse.md
	(save_multiple<mode>): New pattern.
	(save_multiple_realign<mode>): Likewise.
	(restore_multiple<mode>): Likewise.
	(restore_multiple_and_return<mode>): Likewise.
	(restore_multiple_leave_return<mode>): Likewise.

[-- Attachment #3: ChangeLog-moutline-msabi-xlogues.libgcc --]
[-- Type: text/plain, Size: 352 bytes --]

	* config.host: Add i386/t-msabi to i386/t-linux file list.
	* config/i386/i386-asm.h: New file.
	* config/i386/resms64.S: New file.
	* config/i386/resms64f.S: New file.
	* config/i386/resms64fx.S: New file.
	* config/i386/resms64x.S: New file.
	* config/i386/savms64.S: New file.
	* config/i386/savms64f.S: New file.
	* config/i386/t-msabi: New file.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-11-26  1:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-23  5:11 [PATCH v2 0/9] Add optimization -moutline-msabi-xlougues (for Wine 64) Daniel Santos
2016-11-23  5:16 ` [PATCH 1/9] Change type of x86_64_ms_sysv_extra_clobbered_registers Daniel Santos
2016-11-23  5:16 ` [PATCH 7/9] Add patterns and predicates foutline-msabi-xlouges Daniel Santos
2016-11-23  5:16 ` [PATCH 4/9] Adds class xlouge_layout and new fields to struct machine_function Daniel Santos
2016-11-23  5:16 ` [PATCH 5/9] Modify ix86_save_reg to optionally omit stub-managed registers Daniel Santos
2016-11-23  5:16 ` [PATCH 6/9] Modify ix86_compute_frame_layout for foutline-msabi-xlogues Daniel Santos
2016-11-23  5:16 ` [PATCH 9/9] Add remainder of moutline-msabi-xlogues implementation Daniel Santos
2016-11-23  5:16 ` [PATCH 2/9] Minor refactor in ix86_compute_frame_layout Daniel Santos
2016-11-23  5:16 ` [PATCH 3/9] Add option -moutline-msabi-xlogues Daniel Santos
2016-11-25 23:51   ` Sandra Loosemore
2016-11-26  1:27     ` Daniel Santos
2016-11-23  5:16 ` [PATCH 8/9] Add msabi pro/epilogue stubs to libgcc Daniel Santos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).