From: "H.J. Lu" <hjl.tools@gmail.com>
To: Quentin Neill <quentin.neill.gnu@gmail.com>
Cc: Jeff Law <law@redhat.com>, binutils@sourceware.org, gcc@gcc.gnu.org
Subject: Re: Scheduling x86 dispatch windows
Date: Thu, 10 Jun 2010 22:48:00 -0000 [thread overview]
Message-ID: <AANLkTimteGwx6Amus4SodLsfKaz56v80N-aJzUmWCB9u@mail.gmail.com> (raw)
In-Reply-To: <AANLkTik1eMgCXq8J_Vo_mWTdwx0pZo2mnG2dCtG1jsh8@mail.gmail.com>
On Thu, Jun 10, 2010 at 3:09 PM, Quentin Neill
<quentin.neill.gnu@gmail.com> wrote:
> On Thu, Jun 10, 2010 at 4:08 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Thu, Jun 10, 2010 at 1:59 PM, Quentin Neill
>> <quentin.neill.gnu@gmail.com> wrote:
>>> On Thu, Jun 10, 2010 at 3:03 PM, Jeff Law <law@redhat.com> wrote:
>>>> On 06/10/10 13:52, H.J. Lu wrote:
>>>>> On Thu, Jun 10, 2010 at 11:05 AM, Quentin Neill
>>>>> <quentin.neill.gnu@gmail.com> wrote:
>>>>>> Cross-posting Reza's call for feedback to the binutils list since it
>>>>>> is relevant - s ee the last few paragraphs regarding how to
>>>>>> "solve the alignment problem".
>>>>>>
>>>>>> Original thread: http://gcc.gnu.org/ml/gcc/2010-06/threads.html#00402
>>>>>>
>>>>>> On Thu, Jun 10, 2010 at 12:20 PM, reza yazdani<yazdani_reza@yahoo.com>
>>>>>> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> We are in the process of adding a feature to GCC to take advantage
>>>>>>> of a new hardware feature in the latest AMD micro processor. This
>>>>>>> feature requires a certain mix, ordering and alignments in
>>>>>>> instruction sequences to obtain the expected hardware performance.
>>>>>>>
>>>>>>> I am asking the community to review this high level implementation
>>>>>>> design and give me direction or advice.
>>>>>>>
>>>>>>> The new hardware issues two windows of the size N bytes of
>>>>>>> instructions in every cycle. It goes into accelerate mode if the
>>>>>>> windows have the right combination of instructions or alignments. Our
>>>>>>> goal is to maximize the IPC by proper instruction scheduling and
>>>>>>> alignments.
>>>>>>>
>>>>>>> Here is a summary of the most important requirements:
>>>>>>>
>>>>>>> a) Maximum of N instructions per window.
>>>>>>> b) An instruction may cross the first window.
>>>>>>> c) Each window can have maximum of x memory loads and y memory
>>>>>>> stores .
>>>>>>> d) The total number of immediate constants in the instructions
>>>>>>> of a window should not exceed k.
>>>>>>> e) The first window must be aligned on 16 byte boundary.
>>>>>>> f) A Window set terminates when a branch exists in a window.
>>>>>>> g) The number of allowed prefixes varies for instructions.
>>>>>>> h) A window set needs to be padded by prefixes in instructions
>>>>>>> or terminated by nops to ensure adherence to the rules.
>>>>>>>
>>>>>>> We have the following implementation plan for GCC:
>>>>>>>
>>>>>>> 1) Modify the Haifa scheduler to make the desired arrangement of
>>>>>>> instructions for the two dispatch windows. The scheduler is called
>>>>>>> once before and once after register allocation as usual. In both
>>>>>>> cases it performs dispatch scheduling along with its normal job of
>>>>>>> instruction scheduling.
>>>>>>>
>>>>>>> The advantage of doing it before register allocation is avoiding
>>>>>>> extra dependencies caused by register allocation which may become
>>>>>>> an obstacle to movement of instructions. The advantage of doing
>>>>>>> it after register allocation is a consideration for spilling code
>>>>>>> which may be generated by the register allocator.
>>>>>>>
>>>>>>> The algorithm we use is:
>>>>>>>
>>>>>>> a) Considering the current dispatch window set, choose the first
>>>>>>> instruction from ready queue that does not violate dispatch rules.
>>>>>>> b) When an instruction is selected and scheduled, inform the
>>>>>>> dispatcher code about the instruction. This step keeps track of the
>>>>>>> instruction content of windows for future evaluation. It also manages
>>>>>>> the window set by closing and opening new virtual dispatch windows.
>>>>>>>
>>>>>>> 2) Insertion of alignment code.
>>>>>>>
>>>>>>> In x86 alignment is done by inserting prefixes or by generating
>>>>>>> nops. As the object code is generated by the assembler in GCC, some
>>>>>>> information such as sizes of branches are unknown until assembly or
>>>>>>> link time. To do alignments related to dispatch correctly in GCC,
>>>>>>> we need to iteratively compute prefixes and branch sizes until
>>>>>>> its convergence. This pass currently does not exist in GCC, but it
>>>>>>> exists in the assembler.
>>>>>>>
>>>>>>> There are two possible approaches to solve alignment problem.
>>>>>>>
>>>>>>> a) Let the assembler performs the alignments and padding needed
>>>>>>> to adhere with the new machine dispatching rules and avoid an extra
>>>>>>> pass in GCC.
>>>>>>> b) Add a new pass to mimic what assembler does before generating
>>>>>>> the assembly listing in GCC and insert the required alignments.
>>>>>>>
>>>>>>> I appreciate your comments on the proposed implementation procedure
>>>>>>> and the choices a or b above.
>>>>>>>
>>>>>
>>>>> I don't this should be done in assembler. Assembler should just assemble
>>>>> the assembly input.
>>>>
>>>> That adds quite a bit of complication to the compiler though -- getting the
>>>> instruction lengths right (and thus proper packing & alignment) can be
>>>> extremely difficult. I did some experiments with this on a target with
>>>> *fixed* instruction lengths a while back and even though the port tried hard
>>>> to get lengths right, it would routinely miss something. Ultimately I
>>>> decided that it forcing the compiler to know instruction lengths with a very
>>>> high degree of accuracy wasn't a sane thing to do. Dealing with variable
>>>> instruction lengths just adds yet another complexity to the situation. Then
>>>> add the complication of needing to add specific prefixes or nops and it just
>>>> gets downright ugly.
>>>>
>>>> I'd probably approach this by having the compiler emit a directive which
>>>> states what the desired alignment at a particular point should be, then
>>>> allow the assembler to select the best method to get the desired alignment.
>>>
>>> Jeff,
>>>
>>> This is exactly part of our binutils side of the proposal, which I'll
>>> outline now
>>>
>>> 1. Allow multiple prefixes for ADDR and DS (and possibly others)
>>> a) multiple prefixes are benign in certain modes and are thus chosen for padding
>>> b) although ".byte" works, the "ds" and "addr" prefix mnemonics are
>>> more explicit (and they don't trigger a call to
>>> md_flush_pending_output)
>>>
>>> 2. Add new pseudo-op to delineate alignment boundaries. This is
>>> needed to signal any dispatch engine (below) to pad. Here are my top
>>> two candidates, any feedback is appreciated:
>>> a) ".flush" new psuedo op plumbed directly to "md_flush_pending_output()"
>>> b) ".padalign" which calla a new "md_pad_align()"
>>>
>>> 3. Add dispatch optimization infrastructure which
>>> a) is guarded by -mtune flag (and possibly other -f style flags)
>>> b) tracks assembled instruction attributes and their fragments
>>> c) can pad (insert benign prefixes) into previously assembled fragments
>>> d) maintains dispatch engine state (according to some subset of Reza's rules)
>>>
>>> Discussion:
>>>
>>> The flags in 3a) should guard against these changes affecting current behavior.
>>>
>>> The assembly tracking in 3b) is for bookkeeping only; the padding in
>>> 3c) would only occur when a compiler uses the pseudo-op in 2) or when
>>> the dispatch engine in 3d) signals.
>>>
>>> For compilers that know exactly how to pad for the new processor, the
>>> ability to
>>> pad explicitly using 1), 2), and .align/.balign/.p2align should be enough.
>>>
>>> For assembly programs and/or compilers that don't choose to do any
>>> dispatch optimization, it's anticipated that the engine in 3d) would
>>> be useful for optimizing for -mtune=bdver1
>>>
>>> I'll post patches for these soon.
>>
>> Can you do it with directives only?
>
> In theory, if the compiler knows all sizes and offsets, yes (given
> some way to add multiple prefixes).
>
> However in practice, no.
>
> To get GCC to know all would require replicating most assembler
> functionality in GCC, including parsing, assembling, and sizing
> (parts of output_insn() and its child output_*() functions). We
> considered exposing one-line assembly as a library but you have to
> provide (or reuse) the segment/frchain/fragment context, and I don't
> think introducing a GCC->binutils dependency (other than runtime)
> would be easy to introduce into the community.
>
> This wouldn't cover the assembly language case either.
>
> And remember, even if you have all the directives (and the
> programmer/compiler knows all), the assembler must remember potential
> padding locations until the decision (and knowledge about how) to pad
> arrives.
>
x86 assembler isn't an optimizing assembler. -mtune only does
instruction selection. What you are proposing sounds like an optimizing
assembler to me. Are we going to support scheduling, macro, ...?
--
H.J.
next prev parent reply other threads:[~2010-06-10 22:23 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-10 17:24 reza yazdani
2010-06-10 19:52 ` Quentin Neill
2010-06-10 20:12 ` H.J. Lu
2010-06-10 20:25 ` Jeff Law
2010-06-10 20:38 ` Joern Rennecke
2010-06-10 21:54 ` Quentin Neill
2010-06-10 22:09 ` H.J. Lu
2010-06-10 22:40 ` Quentin Neill
2010-06-10 22:48 ` H.J. Lu [this message]
2010-06-11 23:36 ` Quentin Neill
2010-06-12 18:54 ` H.J. Lu
2010-06-13 21:54 ` H.J. Lu
2010-06-14 13:37 ` Michael Matz
2010-06-14 16:06 ` Jakub Jelinek
2010-06-11 0:44 ` Daniel Jacobowitz
2010-06-11 5:58 ` Quentin Neill
2010-06-11 16:46 ` Daniel Jacobowitz
2010-06-11 19:21 ` Quentin Neill
2010-06-11 19:41 ` H.J. Lu
2010-06-11 19:50 ` Jakub Jelinek
2010-06-12 11:25 ` Andi Kleen
2010-06-12 22:45 ` Ian Lance Taylor
2010-06-13 12:35 ` Andi Kleen
2010-06-13 13:09 ` Joern Rennecke
2010-06-13 14:36 ` Andi Kleen
2010-06-13 15:02 ` Joern Rennecke
2010-06-13 19:56 ` Chris Lattner
2010-06-13 16:28 ` Frank Ch. Eigler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AANLkTimteGwx6Amus4SodLsfKaz56v80N-aJzUmWCB9u@mail.gmail.com \
--to=hjl.tools@gmail.com \
--cc=binutils@sourceware.org \
--cc=gcc@gcc.gnu.org \
--cc=law@redhat.com \
--cc=quentin.neill.gnu@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).