public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Maciej W. Rozycki" <macro@codesourcery.com>
To: Richard Sandiford <rdsandiford@googlemail.com>
Cc: Sandra Loosemore <sandra@codesourcery.com>, <gcc-patches@gcc.gnu.org>
Subject: Re: PING Re: [PATCH, MIPS] add new peephole for 74k dspr2
Date: Tue, 25 Sep 2012 18:06:00 -0000	[thread overview]
Message-ID: <alpine.DEB.1.10.1209251843250.28358@tp.orcam.me.uk> (raw)
In-Reply-To: <87k3vits62.fsf@talisman.home>

On Tue, 25 Sep 2012, Richard Sandiford wrote:

> >>  According to my sources the R4650 has a 4-cycle MULT latency (MAD is 3-4 
> >> cycles on that processor).  An MTHI/MTLO pair will take 2 cycles; 
> >> obviously the resulting larger code may adversely affect cache performance 
> >> in some scenarios.
> >
> > That's not how the 4650 DFA models it though.
> >
> > (define_insn_reservation "generic_hilo" 1
> >   (eq_attr "type" "mfhi,mflo,mthi,mtlo")
> >   "imuldiv*3")
> >
> > (define_insn_reservation "r4650_imul" 4
> >   (and (eq_attr "cpu" "r4650")
> >        (eq_attr "type" "imul,imul3,imadd"))
> >   "imuldiv*4")
> >
> > So if we believed the DFA, MTLO + MTHI would occupy the muldiv unit for 6
> > rather than 4 cycles.  Any attempt to use the DFA would still favour MULT.

 I can't track a reference on R4650 MTHI/MTLO latency; I'd be happy to 
learn of one, or otherwise I wonder where the delay is coming from.  Also 
a small update: apparently MULT is 3 clocks only on the R4650 where 
operands are 16 bits (unsure if it is enough if only one is; for a zero by 
zero multiplication it surely does not matter though).  So I think using a 
MULT here is at least reasonable.

> Although I see the 4kp with its 32-cycle MULTs and MADDs is one where
> MULT $0,$0 would be a really bad choice.  Sigh.  The amount of effort
> required for this optimisation is getting a bit ridiculous.

 I have double-checked some documentation, and in fact many MIPS cores, 
including the current ones, have a configuration option to include either 
a high-performance or an area-efficient MD unit.  Take the M14Kc for 
example -- its high-performance unit has a one-cycle latency/issue rate 
for 16-bit multiplication (two-cycle for full 32 bits; here the width of 
rt is explicitly named) and the area-efficient has a 32-cycle 
latency/issue rate only regardless of the operand size (obviously 
iterating over addition one bit at a time).  The latency of MTHI/MTLO is 1 
across both units.

 So I think this can't really be selected automatically for all cores, 
some human-supplied knowledge about the MD unit used is required -- that 
obviously affects other operations too, e.g. some multiplications 
involving a constant that may be cheaper to do either directly or with a 
sequence of additions depending on the MD unit present (unless optimising 
for size, of course).

  Maciej

  reply	other threads:[~2012-09-25 17:57 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-16 15:13 Sandra Loosemore
2012-08-19 17:23 ` Richard Sandiford
2012-08-22 14:26   ` Sandra Loosemore
2012-08-27 16:37     ` Richard Sandiford
2012-09-18 17:18       ` PING " Sandra Loosemore
2012-09-18 18:17         ` Richard Sandiford
2012-09-24 15:49           ` Maciej W. Rozycki
2012-09-24 21:40             ` Richard Sandiford
2012-09-25  0:52               ` Maciej W. Rozycki
2012-09-25  8:38                 ` Richard Sandiford
2012-09-25 10:29                   ` Richard Sandiford
2012-09-25 18:06                     ` Maciej W. Rozycki [this message]
2012-10-07  8:45                       ` Richard Sandiford
2012-10-08 23:23                         ` Maciej W. Rozycki
2012-10-10 20:00                           ` Richard Sandiford
2012-10-11 15:29                             ` Maciej W. Rozycki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.1.10.1209251843250.28358@tp.orcam.me.uk \
    --to=macro@codesourcery.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rdsandiford@googlemail.com \
    --cc=sandra@codesourcery.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).