Re: [RFC, PR target/65105] Use vector instructions for scalar 64bit computations on 32bit target

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jeff Law <law@redhat.com>
To: Ilya Enkovich <enkovich.gnu@gmail.com>, gcc-patches@gcc.gnu.org
Subject: Re: [RFC, PR target/65105] Use vector instructions for scalar 64bit computations on 32bit target
Date: Mon, 03 Aug 2015 20:52:00 -0000	[thread overview]
Message-ID: <55BFD483.9020107@redhat.com> (raw)
In-Reply-To: <20150619132130.GA15263@msticlxl57.ims.intel.com>

On 06/19/2015 07:21 AM, Ilya Enkovich wrote:
> Hi,
>
> This patch tries to improve 64bit integer computations on 32bit
> target.  This is achieved by an additional i386 target pass which
> searches for all conversion candidates and tries to transform them
> into vector mode when profitable.
Presumably you're building a chain of related operations that could 
possibly run in the vector unit, then if the costing model says ok, then 
you convert the whole chain.

Note that there's costing issues outside the model that can be expressed 
in GCC.  For example, you can get a significant latency spike in the AVX 
unit if you're not feeding it work regularly.

Our of curiosity, what does LLVM do here in terms of costing models?

>
> Initial problem discussion had several assumptions that this
> optimization should be done in RA.  But implementation of this in RA
> seems really complex.  I don't believe it can be done in a
> reasonalble time.  And taking into account quite narrow performance
> impact, I believe a separate conversion pass is a better solution.
The advantage of doing it in RA is probably the ability to accurately 
know if we've run out of general purpose registers and have vector 
registers to spare in the right spots.  BUt with the amount of rewriting 
going on, it may be excessively complex to do in the allocator.

>
> Here is shortly a list of changes:
>
> 1. Add insn templates for 64bit and/ior/xor/zext for 32bit target to
> avoid split on expand 2. Add new pass to convert scalar computation
> into vector.  The flow of the pass is following: a. Find all
> instructions we may convert b. Split them into chains of dependant
> instructions c. Estimate if chain conversion is profitable d. Convert
> chain if profitable 3. Add splits for not converted insns
Seems to make reasonable sense.

>
> Current cost model uses processor_costs table to estimate how much
> gain somes from a single instruction usage vs pair of instruction +
> estimate cost of scalar->vector and back conversions.  Cost
> estimation doesn't actually use CFG and have a (lot of) room for
> improvement.  The problem here is a lack of workloads to be used for
> tuning.
Right.  I'd think the tuning is probably one of the harder problems 
here.  ISTM one of the metrics you'd want to be looking at is the 
register pressure for both register files across the lifetime of the 
chain of dependent instructions.

Note there are mechanisms to get register pressure estimates so that you 
can use them to help drive this kind of transformation.

 From a correctness standpoint, one of the interesting tests would be to 
turn off all tuning -- ie, always convert if it's supposed to be 
possible.  Then throw as much code as possible at it and see if anything 
breaks.  Also a good time to instrument so that you can then build 
testcases from real-world code.

Also note that with a new pass, you may need to do some compile-time 
benchmarking.

>
> Added DI insns and splits for 32bit target delay insns split until
> reload_completed.  It is a potential degradation for cases when
> conversion doesn't happen. Pass probably may be moved before spli1
> pass to allow early split of not converted insns.  Or new pass itself
> may split not converted chains.
>
> I also had to modify register constraint of movdi for sse->mem
> alternative.  I understand we don't like this alternative for 64bit
> target but for 32bit it is more useful.  E.g. I see mem->mem copies
> go through xmm instead of GPR pair with this change.  May we have
> separate xmm register alternatives for 32bit and bit targets in
> movdi?
The patch as a whole is ultimately Uros's call since it's implemented 
entirely in the x86_64 backend.

A few implementation notes.

Don't use const0_rtx, use CONST0_RTX (mode) whenever possible.  The vast 
majority of the time the right mode is available in some other operand.

For convertible_comparison_p, please include the rtx form of what you're 
looking for in the function comment.  It looks like you're searching for

(set (Z) (compare (ior (subreg (X) (subreg Y)) (const_int 0)

Where the subregs are extracting a SImode value out of a DImode X & Y 
respectively.

Note that you don't seem to be checking for a high vs low word, is that 
intentional?

For has_non_address_hard_reg, the name of the function is somewhat odd 
-- what does "address" in the function name have to do with the 
implementation which doesn't seem to do anything with addresses or 
address registers.

Does that routine DTRT for a value that is an input, but clobbered?

s/registerss/register (in comment before remove_non_convertible_regs)

remove_non_convertible_regs needs to document its parameter CANDIDATES. 
  I figured out it's a bitmap of insn UIDs, but that should be called 
out in the function comment.

It also seems that routine assumes that anything set in CANDIDATES must 
be a single_set?  If so, where is that enforced?

I don't see anything that jumps out as painfully wrong.  Uros really 
needs to review the code as a whole though.

jeff

next prev parent reply	other threads:[~2015-08-03 20:52 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-19 13:45 Ilya Enkovich
2015-07-14 14:38 ` Ilya Enkovich
2015-08-03 20:52 ` Jeff Law [this message]
2015-08-21 13:55   ` Ilya Enkovich
2015-08-21 16:40     ` Jeff Law
2015-09-08 15:53       ` Ilya Enkovich
2015-09-09  8:20         ` Uros Bizjak
2015-09-09  8:31           ` Uros Bizjak
2015-09-14 12:45             ` Ilya Enkovich
2015-09-14 16:50               ` Uros Bizjak
2015-09-23 10:37                 ` Ilya Enkovich
2015-09-23 10:46                   ` Uros Bizjak
2015-09-29 12:51                     ` H.J. Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55BFD483.9020107@redhat.com \
    --to=law@redhat.com \
    --cc=enkovich.gnu@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).