Re: Scheduling x86 dispatch windows

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: "H.J. Lu" <hjl.tools@gmail.com>
To: Quentin Neill <quentin.neill.gnu@gmail.com>
Cc: Jeff Law <law@redhat.com>, binutils@sourceware.org, gcc@gcc.gnu.org
Subject: Re: Scheduling x86 dispatch windows
Date: Thu, 10 Jun 2010 22:48:00 -0000	[thread overview]
Message-ID: <AANLkTimteGwx6Amus4SodLsfKaz56v80N-aJzUmWCB9u@mail.gmail.com> (raw)
In-Reply-To: <AANLkTik1eMgCXq8J_Vo_mWTdwx0pZo2mnG2dCtG1jsh8@mail.gmail.com>

On Thu, Jun 10, 2010 at 3:09 PM, Quentin Neill
<quentin.neill.gnu@gmail.com> wrote:
> On Thu, Jun 10, 2010 at 4:08 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Thu, Jun 10, 2010 at 1:59 PM, Quentin Neill
>> <quentin.neill.gnu@gmail.com> wrote:
>>> On Thu, Jun 10, 2010 at 3:03 PM, Jeff Law <law@redhat.com> wrote:
>>>> On 06/10/10 13:52, H.J. Lu wrote:
>>>>> On Thu, Jun 10, 2010 at 11:05 AM, Quentin Neill
>>>>> <quentin.neill.gnu@gmail.com>  wrote:
>>>>>> Cross-posting Reza's call for feedback to the binutils list since it
>>>>>> is relevant - s ee the last few paragraphs regarding how to
>>>>>> "solve the alignment problem".
>>>>>>
>>>>>> Original thread: http://gcc.gnu.org/ml/gcc/2010-06/threads.html#00402
>>>>>>
>>>>>> On Thu, Jun 10, 2010 at 12:20 PM, reza yazdani<yazdani_reza@yahoo.com>
>>>>>>  wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> We are in the process of adding a feature to GCC to take advantage
>>>>>>> of a new hardware feature in the latest AMD micro processor. This
>>>>>>> feature requires a certain mix, ordering and alignments in
>>>>>>> instruction sequences to obtain the expected hardware performance.
>>>>>>>
>>>>>>> I am asking the community to review this high level implementation
>>>>>>> design and give me direction or advice.
>>>>>>>
>>>>>>> The new hardware issues two windows of the size N bytes of
>>>>>>> instructions in every cycle. It goes into accelerate mode if the
>>>>>>> windows have the right combination of instructions or alignments. Our
>>>>>>> goal is to maximize the IPC by proper instruction scheduling and
>>>>>>> alignments.
>>>>>>>
>>>>>>> Here is a summary of the most important requirements:
>>>>>>>
>>>>>>> a) Maximum of N instructions per window.
>>>>>>> b) An instruction may cross the first window.
>>>>>>> c) Each window can have maximum of x memory loads and y memory
>>>>>>>    stores .
>>>>>>> d) The total number of immediate constants in the instructions
>>>>>>>    of a window should not exceed k.
>>>>>>> e) The first window must be aligned on 16 byte boundary.
>>>>>>> f) A Window set terminates when a branch exists in a window.
>>>>>>> g) The number of allowed prefixes varies for instructions.
>>>>>>> h) A window set needs to be padded by prefixes in instructions
>>>>>>>    or terminated by nops to ensure adherence to the rules.
>>>>>>>
>>>>>>> We have the following implementation plan for GCC:
>>>>>>>
>>>>>>> 1) Modify the Haifa scheduler to make the desired arrangement of
>>>>>>>    instructions for the two dispatch windows. The scheduler is called
>>>>>>>    once before and once after register allocation as usual. In both
>>>>>>>    cases it performs dispatch scheduling along with its normal job of
>>>>>>>    instruction scheduling.
>>>>>>>
>>>>>>> The advantage of doing it before register allocation is avoiding
>>>>>>> extra dependencies caused by register allocation which may become
>>>>>>> an obstacle to movement of instructions.  The advantage of doing
>>>>>>> it after register allocation is a consideration for spilling code
>>>>>>> which may be generated by the register allocator.
>>>>>>>
>>>>>>> The algorithm we use is:
>>>>>>>
>>>>>>> a) Considering the current dispatch window set, choose the first
>>>>>>>    instruction from ready queue that does not violate dispatch rules.
>>>>>>> b) When an instruction is selected and scheduled, inform the
>>>>>>>    dispatcher code about the instruction. This step keeps track of the
>>>>>>>    instruction content of windows for future evaluation. It also manages
>>>>>>>    the window set by closing and opening new virtual dispatch windows.
>>>>>>>
>>>>>>> 2) Insertion of alignment code.
>>>>>>>
>>>>>>> In x86 alignment is done by inserting prefixes or by generating
>>>>>>> nops. As the object code is generated by the assembler in GCC, some
>>>>>>> information such as sizes of branches are unknown until assembly or
>>>>>>> link time. To do alignments related to dispatch correctly in GCC,
>>>>>>> we need to iteratively compute prefixes and branch sizes until
>>>>>>> its convergence. This pass currently does not exist in GCC, but it
>>>>>>> exists in the assembler.
>>>>>>>
>>>>>>> There are two possible approaches to solve alignment problem.
>>>>>>>
>>>>>>> a)  Let the assembler performs the alignments and padding needed
>>>>>>>     to adhere with the new machine dispatching rules and avoid an extra
>>>>>>>     pass in GCC.
>>>>>>> b)  Add a new pass to mimic what assembler does before generating
>>>>>>>     the assembly listing in GCC and insert the required alignments.
>>>>>>>
>>>>>>> I appreciate your comments on the proposed implementation procedure
>>>>>>> and the choices a or b above.
>>>>>>>
>>>>>
>>>>> I don't this should be done in assembler. Assembler should just assemble
>>>>> the assembly input.
>>>>
>>>> That adds quite a bit of complication to the compiler though -- getting the
>>>> instruction lengths right (and thus proper packing & alignment) can be
>>>> extremely difficult.  I did some experiments with this on a target with
>>>> *fixed* instruction lengths a while back and even though the port tried hard
>>>> to get lengths right, it would routinely miss something.  Ultimately I
>>>> decided that it forcing the compiler to know instruction lengths with a very
>>>> high degree of accuracy wasn't a sane thing to do.    Dealing with variable
>>>> instruction lengths just adds yet another complexity to the situation.  Then
>>>> add the complication of needing to add specific prefixes or nops and it just
>>>> gets downright ugly.
>>>>
>>>> I'd probably approach this by having the compiler emit a directive which
>>>> states what the desired alignment at a particular point should be, then
>>>> allow the assembler to select the best method to get the desired alignment.
>>>
>>> Jeff,
>>>
>>> This is exactly part of our binutils side of the proposal, which I'll
>>> outline now
>>>
>>> 1. Allow multiple prefixes for ADDR and DS (and possibly others)
>>> a) multiple prefixes are benign in certain modes and are thus chosen for padding
>>> b) although ".byte" works, the "ds" and "addr" prefix mnemonics are
>>> more explicit (and they don't trigger a call to
>>> md_flush_pending_output)
>>>
>>> 2. Add new pseudo-op to delineate alignment boundaries.  This is
>>> needed to signal any dispatch engine (below) to pad.  Here are my top
>>> two candidates, any feedback is appreciated:
>>> a) ".flush" new psuedo op plumbed directly to "md_flush_pending_output()"
>>> b) ".padalign" which calla a new "md_pad_align()"
>>>
>>> 3. Add dispatch optimization infrastructure which
>>> a) is guarded by -mtune flag (and possibly other -f style flags)
>>> b) tracks assembled instruction attributes and their fragments
>>> c) can pad (insert benign prefixes) into previously assembled fragments
>>> d) maintains dispatch engine state (according to some subset of Reza's rules)
>>>
>>> Discussion:
>>>
>>> The flags in 3a) should guard against these changes affecting current behavior.
>>>
>>> The assembly tracking in 3b) is for bookkeeping only; the padding in
>>> 3c) would only occur when a compiler uses the pseudo-op in 2) or when
>>> the dispatch engine in 3d) signals.
>>>
>>> For compilers that know exactly how to pad for the new processor, the
>>> ability to
>>> pad explicitly using 1), 2), and .align/.balign/.p2align should be enough.
>>>
>>> For assembly programs and/or compilers that don't choose to do any
>>> dispatch optimization, it's anticipated that the engine in 3d) would
>>> be useful for optimizing for -mtune=bdver1
>>>
>>> I'll post patches for these soon.
>>
>> Can you do it with directives only?
>
> In theory, if the compiler knows all sizes and offsets, yes (given
> some way to add multiple prefixes).
>
> However in practice, no.
>
> To get  GCC to know all would require replicating most assembler
> functionality in  GCC, including parsing, assembling, and sizing
> (parts of output_insn() and its child output_*() functions).  We
> considered exposing one-line assembly as a library but you have to
> provide (or reuse) the segment/frchain/fragment context, and I don't
> think introducing a GCC->binutils dependency (other than runtime)
> would be easy to introduce into the community.
>
> This wouldn't cover the assembly language case either.
>
> And remember, even if you have all the directives (and the
> programmer/compiler knows all), the assembler must remember potential
> padding locations until the decision (and knowledge about how) to pad
> arrives.
>

x86 assembler isn't an optimizing assembler. -mtune only does
instruction selection.  What you are proposing sounds like an optimizing
assembler to me. Are we going to support scheduling, macro, ...?


-- 
H.J.

next prev parent reply	other threads:[~2010-06-10 22:23 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-10 17:24 reza yazdani
2010-06-10 19:52 ` Quentin Neill
2010-06-10 20:12   ` H.J. Lu
2010-06-10 20:25     ` Jeff Law
2010-06-10 20:38       ` Joern Rennecke
2010-06-10 21:54       ` Quentin Neill
2010-06-10 22:09         ` H.J. Lu
2010-06-10 22:40           ` Quentin Neill
2010-06-10 22:48             ` H.J. Lu [this message]
2010-06-11 23:36               ` Quentin Neill
2010-06-12 18:54                 ` H.J. Lu
2010-06-13 21:54                   ` H.J. Lu
2010-06-14 13:37                     ` Michael Matz
2010-06-14 16:06                       ` Jakub Jelinek
2010-06-11  0:44       ` Daniel Jacobowitz
2010-06-11  5:58         ` Quentin Neill
2010-06-11 16:46           ` Daniel Jacobowitz
2010-06-11 19:21             ` Quentin Neill
2010-06-11 19:41               ` H.J. Lu
2010-06-11 19:50               ` Jakub Jelinek
2010-06-12 11:25               ` Andi Kleen
2010-06-12 22:45                 ` Ian Lance Taylor
2010-06-13 12:35                   ` Andi Kleen
2010-06-13 13:09                     ` Joern Rennecke
2010-06-13 14:36                       ` Andi Kleen
2010-06-13 15:02                         ` Joern Rennecke
2010-06-13 19:56                           ` Chris Lattner
2010-06-13 16:28                         ` Frank Ch. Eigler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTimteGwx6Amus4SodLsfKaz56v80N-aJzUmWCB9u@mail.gmail.com \
    --to=hjl.tools@gmail.com \
    --cc=binutils@sourceware.org \
    --cc=gcc@gcc.gnu.org \
    --cc=law@redhat.com \
    --cc=quentin.neill.gnu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).