Signed divide by 2 pessimization

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Signed divide by 2 pessimization
@ 2000-09-09 12:41 Marek Michalkiewicz
  2000-09-09 13:26 ` Geoff Keating
  0 siblings, 1 reply; 7+ messages in thread
From: Marek Michalkiewicz @ 2000-09-09 12:41 UTC (permalink / raw)
  To: gcc; +Cc: denisc, marekm

Hello,

a few months ago I reported a problem with inefficient code
generated for signed divide by 2 on the AVR target (possibly
some other small targets as well).  The problem is still there,
in expmed.c (expand_divmod), line 3240 in today's CVS:

		    if (abs_d != 2 && BRANCH_COST < 3)

This test decides which one of the two possible algorithms
should be used for signed divide by a +/- power of 2.
On the AVR, branches are cheaper than shifts (especially
by many bits), so making the test always true makes MUCH
better code for this case.  But there is currently no way
to make that test true also for divide by 2 and -2.

I submitted a small patch for expmed.c that adds a new
macro to give the target some control over this test, but
it was rejected.  While I agree that ideally, rtx costs
should be used instead of adding new macros, that hasn't
happened, and doing it would require a lot of testing on
many different machines to make sure it doesn't make things
worse.  (I tried once, and it made x86 code slower, because
its BRANCH_COST seems to be defined lower than it usually is.
I can't tell what happens on other machines.)

I have a workaround for the AVR port, but you don't want
to see it :-) (not yet in CVS, I still hope it will not be
necessary).  What it does is to handle division with a
define_expand, generate rtl for divide by +/- power of 2
(as if the test in expmed.c was always true) ourselves,
and FAIL in other cases.  Yet it only works for divide
by 2, and divide by -2 is still done the inefficient way
(doesn't happen that often though).  So it's a lot of
work in one place (avr.c, avr.md) to avoid a 2-line change
in another place (expmed.c).

After a few months, the ideal solution (using rtx costs)
hasn't come yet.  So, PLEASE consider adding a new macro
in the meantime (I can re-submit the patch, first sent
on 12 May).  I know it is not elegant, but it works for
targets that need it, and doesn't change anything for
other targets.  Please...

Thanks,
Marek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Signed divide by 2 pessimization
  2000-09-09 12:41 Signed divide by 2 pessimization Marek Michalkiewicz
@ 2000-09-09 13:26 ` Geoff Keating
  2000-09-10  4:22   ` Marek Michalkiewicz
  0 siblings, 1 reply; 7+ messages in thread
From: Geoff Keating @ 2000-09-09 13:26 UTC (permalink / raw)
  To: marekm; +Cc: gcc

Marek Michalkiewicz <marekm@linux.org.pl> writes:

> I submitted a small patch for expmed.c that adds a new
> macro to give the target some control over this test, but
> it was rejected.  While I agree that ideally, rtx costs
> should be used instead of adding new macros, that hasn't
> happened, and doing it would require a lot of testing on
> many different machines to make sure it doesn't make things
> worse.  (I tried once, and it made x86 code slower, because
> its BRANCH_COST seems to be defined lower than it usually is.
> I can't tell what happens on other machines.)

This is a sign that x86 should be fixed.  It's bad to work around
problems with a backend in the frontend.  Why is the x86 BRANCH_COST
set too low?

-- 
- Geoffrey Keating <geoffk@cygnus.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Signed divide by 2 pessimization
  2000-09-09 13:26 ` Geoff Keating
@ 2000-09-10  4:22   ` Marek Michalkiewicz
  2000-09-10  6:17     ` Geoff Keating
  0 siblings, 1 reply; 7+ messages in thread
From: Marek Michalkiewicz @ 2000-09-10  4:22 UTC (permalink / raw)
  To: Geoff Keating; +Cc: marekm, gcc

Geoff Keating <geoffk@cygnus.com> wrote:
> This is a sign that x86 should be fixed.  It's bad to work around
> problems with a backend in the frontend.  Why is the x86 BRANCH_COST
> set too low?

That was just my conclusion after some simple tests I did on a
Pentium III.  But the x86 is a complicated beast, and probably
there is no single value of BRANCH_COST that is best in all cases.
(This was already discussed in May.)

What I did was to change the test to look like this:

  (BRANCH_COST < shift_cost[size - 1] * (abs_d != 2) + shift_cost[size - lgup])

(compare the cost of a branch with the cost of one or two shifts)
which does the right thing on the AVR, but x86 BRANCH_COST is also 1
by default, which makes the condition always true, resulting in
smaller but slower code in my simple tests (which by no means should
be considered good benchmarks - things like branch prediction make
it difficult to benchmark).  Changing x86 BRANCH_COST to 2 makes the
fastest code for this case, but I don't know what other (bad) effects
it might have, so I prefer to leave it alone.

The existing (abs_d != 2 && BRANCH_COST < 3) test just happens to do
the right thing for the x86.  On the AVR, it makes the code for divide
by 2 both slower and larger (about twice as big - and that's a little
8-bit micro where code size is very important).  I don't even know what
would be the effect of this change on dozens of other GCC targets, or
non-Intel x86 chips - I just can't test that.

Because it's all so complicated, I'd like a simple and safe change
which can be made now - make (abs_d != 2 && BRANCH_COST < 3) the
default, but allow the target to override it.  Let's do it now, let
the interested targets decide, and worry about improving it later.

The problem potentially affects many small embedded targets, where
shifts (especially by many bits - if a single machine instruction
can only shift by 1) are often much more expensive than branches
(fast because there is no big prefetch queue to flush).

I understand we already have many target macros, so one more is not
very welcome.  On the positive side, that's just fine-tuning (only
affects code size and speed, not correctness), so most targets need
not define it and will get a (sometimes) reasonable default.  People
writing new ports don't need to worry about it either, until they
get the port working and start fine-tuning it.

The patch (rejected) I sent in May, as well as links to related
discussion, is here:

http://gcc.gnu.org/ml/gcc-patches/2000-05/msg00714.html

Please consider this change again.  Thanks!

Marek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Signed divide by 2 pessimization
  2000-09-10  4:22   ` Marek Michalkiewicz
@ 2000-09-10  6:17     ` Geoff Keating
  2000-09-10  9:44       ` Marek Michalkiewicz
  0 siblings, 1 reply; 7+ messages in thread
From: Geoff Keating @ 2000-09-10  6:17 UTC (permalink / raw)
  To: marekm; +Cc: marekm, gcc

> From: Marek Michalkiewicz <marekm@linux.org.pl>
> Date: Sun, 10 Sep 2000 13:01:16 +0200 (CEST)
> CC: marekm@linux.org.pl, gcc@gcc.gnu.org
> 
> Geoff Keating <geoffk@cygnus.com> wrote:
> > This is a sign that x86 should be fixed.  It's bad to work around
> > problems with a backend in the frontend.  Why is the x86 BRANCH_COST
> > set too low?
> 
> That was just my conclusion after some simple tests I did on a
> Pentium III.  But the x86 is a complicated beast, and probably
> there is no single value of BRANCH_COST that is best in all cases.
> (This was already discussed in May.)
> 
> What I did was to change the test to look like this:
> 
>   (BRANCH_COST < shift_cost[size - 1] * (abs_d != 2) + shift_cost[size - lgup])
> 
> (compare the cost of a branch with the cost of one or two shifts)

BRANCH_COST and shift_cost are in different units.  shift_cost is in
the units defined by rtx_cost, in which one register-register insn is
approximately 4 units.  BRANCH_COST is 1 by default, meaning a branch
is equivalent to about one other insn.

Probably the right thing to do is enhance rtx_cost so that it can
provide the cost of a branch insn itself (by default this would be
COSTS_N_INSNS (BRANCH_COST) ), and use that.

> I don't even know what would be the effect of this change on dozens
> of other GCC targets, or non-Intel x86 chips - I just can't test
> that.

No-one can test a change like this on every platform.  That's why you
have to assume that the backends have the correct cost definitions.

> Let's do it now, let the interested targets decide, and worry about
> improving it later.

Alternatively, how about we do it right the first time?

I really don't like the idea that we put in some hack now, and
"someone" will improve it later.  Unless you can provide that
"someone" you have to assume that it will never be fixed.

-- 
- Geoffrey Keating <geoffk@cygnus.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Signed divide by 2 pessimization
  2000-09-10  6:17     ` Geoff Keating
@ 2000-09-10  9:44       ` Marek Michalkiewicz
  2000-09-10 14:41         ` Geoff Keating
  0 siblings, 1 reply; 7+ messages in thread
From: Marek Michalkiewicz @ 2000-09-10  9:44 UTC (permalink / raw)
  To: Geoff Keating; +Cc: marekm, gcc

Geoff Keating <geoffk@cygnus.com> wrote:
> BRANCH_COST and shift_cost are in different units.  shift_cost is in
> the units defined by rtx_cost, in which one register-register insn is
> approximately 4 units.  BRANCH_COST is 1 by default, meaning a branch
> is equivalent to about one other insn.
> 
> Probably the right thing to do is enhance rtx_cost so that it can
> provide the cost of a branch insn itself (by default this would be
> COSTS_N_INSNS (BRANCH_COST) ), and use that.

Thanks for explaining this - these different cost units are a bit
confusing.

COSTS_N_INSNS(N) is now defined as ((N) * 2), and that seems to be
consistent with what I noticed (better code for signed divide by 2
after changing x86 BRANCH_COST from 1 to 2).

The COSTS_N_INSNS macro is only defined in cse.c - would it be OK
to move it to rtl.h (where other functions from cse.c are already
declared) or some other include file, so it can be used elsewhere?
I'm not quite sure how to change rtx_cost just yet, so it seems
easier to just use COSTS_N_INSNS (BRANCH_COST) directly.

I see that at least config/arm/arm.c needs it, and defines it itself
differently (probably should be updated after the 2000-09-06 change
in cse.c).  If COSTS_N_INSNS is made available in expmed.c, it could
be used to scale BRANCH_COST in the test which I'd like to change:

  if (COSTS_N_INSNS (BRANCH_COST) < shift_cost[size - 1] * (abs_d != 2)
				    + shift_cost[size - lgup])
    {
      /* branch and fewer shifts */
    }
  else
    {
      /* no branch, more shifts */
    }

Would something like this be OK?  Should do the right thing on x86
as COSTS_N_INSNS (BRANCH_COST) is 2 there, and I can play with shift
costs on the AVR to make it do the right thing here too.

(As a side note, this place in expmed.c is indented very far to the
right - wouldn't it be a good idea to split expand_divmod in a few
smaller functions?  Or is this function deliberately so big, for speed?)

> Alternatively, how about we do it right the first time?
> 
> I really don't like the idea that we put in some hack now, and
> "someone" will improve it later.  Unless you can provide that
> "someone" you have to assume that it will never be fixed.

If we do it right the first time soon - certainly.

But I'm worried that it is not easy to fix it properly.  If it never
happens because "someone" doesn't exist (not enough motivation because
the important targets happen to work well), I'd like to at least have
the hack now - it already exists, and is better than nothing...

Thanks,
Marek

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Signed divide by 2 pessimization
  2000-09-10  9:44       ` Marek Michalkiewicz
@ 2000-09-10 14:41         ` Geoff Keating
  2000-09-16 12:53           ` Marek Michalkiewicz
  0 siblings, 1 reply; 7+ messages in thread
From: Geoff Keating @ 2000-09-10 14:41 UTC (permalink / raw)
  To: marekm; +Cc: marekm, gcc

> From: Marek Michalkiewicz <marekm@linux.org.pl>
> Date: Sun, 10 Sep 2000 18:07:55 +0200 (CEST)
> CC: marekm@linux.org.pl, gcc@gcc.gnu.org

> Would something like this be OK?  Should do the right thing on x86
> as COSTS_N_INSNS (BRANCH_COST) is 2 there, and I can play with shift
> costs on the AVR to make it do the right thing here too.

Just make rtx_cost deal with branch instructions.  It's not very hard;
a branch looks like (set (pc) (if_then_else (eq ...) (label_ref ...) (pc)))
so detect the if_then_else and the (pc) and return the appropriate
value.  It need not work for every case, so long as this is documented.

> (As a side note, this place in expmed.c is indented very far to the
> right - wouldn't it be a good idea to split expand_divmod in a few
> smaller functions?  Or is this function deliberately so big, for speed?)

No, it's the incremental impact of many small patches.  Feel free to
tidy up the code.

> But I'm worried that it is not easy to fix it properly.  If it never
> happens because "someone" doesn't exist (not enough motivation because
> the important targets happen to work well), I'd like to at least have
> the hack now - it already exists, and is better than nothing...

The problem is that the hack itself has a cost.  Someone will
eventually have to do the work properly anyway; and with the hack,
that someone will then have to try to work out what this hack is
doing, why it was needed, and what each port does with it, and then
when the hack goes away each port which used it will have to be
checked to see that it got the costs right.

I'm not saying that such hacks don't have a place; it's just that
their place is in your local tree, not in the master repository.

-- 
- Geoffrey Keating <geoffk@cygnus.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Signed divide by 2 pessimization
  2000-09-10 14:41         ` Geoff Keating
@ 2000-09-16 12:53           ` Marek Michalkiewicz
  0 siblings, 0 replies; 7+ messages in thread
From: Marek Michalkiewicz @ 2000-09-16 12:53 UTC (permalink / raw)
  To: Geoff Keating; +Cc: marekm, gcc, denisc

Geoff Keating <geoffk@cygnus.com> wrote:
> Just make rtx_cost deal with branch instructions.  It's not very hard;
> a branch looks like (set (pc) (if_then_else (eq ...) (label_ref ...) (pc)))
> so detect the if_then_else and the (pc) and return the appropriate
> value.  It need not work for every case, so long as this is documented.

Now that the COSTS_N_INSNS define has been moved to rtl.h, I think using
COSTS_N_INSNS (BRANCH_COST) directly (at least for now) would be much
simpler, so I would prefer that for now.  (I can try to enhance rtx_cost
and use that later, but I think that would be a separate logical change.)

Testing my change a bit more, I noticed a problem with using shift_cost[]
(possibly not just in my change, but also in other places - I'm not sure) -
this array has MAX_BITS_PER_WORD elements, which defaults to BITS_PER_WORD,
which is defined as 8 in the AVR port.  But the shift counts can be larger
(size can be 32), resulting in bogus cost values (array index out of range).

Now, what would be the best way to get around this?  (AVR has 32 general
8-bit registers, but QI/HI/SI/DI modes are as usual, 8/16/32/64 bits.)
Should the AVR port define MAX_BITS_PER_WORD as 32 (even if BITS_PER_WORD
is always 8), or will that cause problems in other places?  Denis?

> The problem is that the hack itself has a cost.  Someone will
> eventually have to do the work properly anyway; and with the hack,
> that someone will then have to try to work out what this hack is
> doing, why it was needed, and what each port does with it, and then
> when the hack goes away each port which used it will have to be
> checked to see that it got the costs right.

I understand that problem in general, but have you seen the hack?
It is very small, I can document it better in the comments, and
I will be happy to explain it to that someone if necessary.

> I'm not saying that such hacks don't have a place; it's just that
> their place is in your local tree, not in the master repository.

The problem is, there may be a few users of this port who don't have
my local tree.  I can distribute my changes as unofficial patches,
but that is just moving the cost to the users.

Don't get me wrong.  I too would prefer to do everything right from
the start, but if that is hard for some reason, I think a temporary
hack which makes better code is better (for the users) than nothing.
Please...

Thanks,
Marek

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2000-09-16 12:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-09-09 12:41 Signed divide by 2 pessimization Marek Michalkiewicz
2000-09-09 13:26 ` Geoff Keating
2000-09-10  4:22   ` Marek Michalkiewicz
2000-09-10  6:17     ` Geoff Keating
2000-09-10  9:44       ` Marek Michalkiewicz
2000-09-10 14:41         ` Geoff Keating
2000-09-16 12:53           ` Marek Michalkiewicz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).