public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* gcc/1532: Redundant compare instructions on i386
@ 2004-01-20 21:42 Ian Lance Taylor
  2004-01-21  0:47 ` Richard Henderson
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Ian Lance Taylor @ 2004-01-20 21:42 UTC (permalink / raw)
  To: gcc; +Cc: gcc-bugzilla

I've been looking at PR gcc/1532.  This PR is about some poor code
generation for the i386.  It is a regression from 2.95.3.

For a fairly simple test case, found in the PR, gcc generates code
which looks like this:

	cmpl	$1, %eax
	je	.L1
	cmpl	$1, %eax
	jle	.L15

The second cmpl instruction is useless.  The condition flags will not
have changed since the first one.

The i386 backend used to use cc0.  It now generates compares which are
assigned explicit to the flags register, and jumps which explicitly
look at the flags register.  It uses several different comparison
modes, to permit various optimizations.

The RTL for the above instructions looks like this in the 01.rtl file:

(insn 93 90 94 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg/v:SI 66 [ <anonymous> ])
            (const_int 1 [0x1]))) -1 (nil)
    (nil))

(jump_insn 94 93 95 (set (pc)
        (if_then_else (eq (reg:CCZ 17 flags)
                (const_int 0 [0x0]))
            (label_ref 24)
            (pc))) -1 (nil)
    (nil))

(insn 95 94 96 (set (reg:CCGC 17 flags)
        (compare:CCGC (reg/v:SI 66 [ <anonymous> ])
            (const_int 1 [0x1]))) -1 (nil)
    (nil))

(jump_insn 96 95 97 (set (pc)
        (if_then_else (gt (reg:CCGC 17 flags)
                (const_int 0 [0x0]))
            (label_ref 101)
            (pc))) -1 (nil)
    (nil))

Note that the comparisons are done in different modes.

My first thought was to fix CSE so that, when it sees comparisons
which are equivalent except for being in different modes, it converts
them both to the superset mode (this is machine dependent, and
typically results in CCmode), and then applies CSE.  I implemented
this in the CSE pass, but it didn't help because the above compare
instructions were in different basic blocks (obvious in retrospect).

So I implemented the same idea in GCSE.  There it solved this test
case nicely, but failed on more complex cases because it led gcc to
want to copy CCmode values around.  That can be done on the i386, but
(as far as I know) there is no simple way to do it--it has to go
through memory.  In fact, it is typically more efficient to redo the
comparison than it is to save and restore the condition codes--
certainly more efficient in a case like the above, which compares a
single register with a constant.  I didn't get this case fully
working--I would have had to implement secondary reloads and so
forth--but it didn't seem like it was going to help.  (Also, I had to
hack the GCSE pass even more than expected, because it didn't want to
work on hard registers like register 17 above.)

So I punted and wrote a quick pass in machine dependent reorg to
eliminate obvious duplicate comparison cases.  It works fine on this
test case--the resulting code is the same except that the redudant
comparison is eliminated.  But it kind of sucks because it doesn't do
any real dataflow analysis, so it will miss all but the most obvious
cases.  I append that patch below, for reference.

Since the i386 is a reasonably popular platform, I think it would be
nice if gcc avoided generating obviously lousy code.  At this point,
though, I'd like to hear any suggestions for a reasonable approach to
this problem.  The key points as I see them are:

1) Comparisons are represented with different modes for purposes of
   permitting future optimization, but are otherwise implemented by
   the same underlying instruction.

2) All comparisons set the hard register 17 (the EFLAGS register).

3) We want the compiler to eliminate obvious duplication in setting
   the comparison flags.  That is, the compiler should recognize when
   one compare instruction will produce the same result as a previous
   compare instruction.
   a) It should do this despite any mode differences.
   b) It should do this even though the result goes to a hard
      register.
   c) It should not do this in the usual GCSE way, which involves
      copying the value to a pseudo-register, because if that
      pseudo-register is used after eflags has been clobbered the
      resulting code will almost certainly be worse.

There is some other awful code in this test case which I haven't even
looked at yet, a useless jump to the next instruction:

	je	.L1
.L2:
.L1:

I assume that is a separate problem, probably related to inlining.

Ian

This is the patch I wrote which solves the problem in a poor manner.
This is not yet a serious patch submission, just an example.

Index: i386.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.c,v
retrieving revision 1.635
diff -p -u -r1.635 i386.c
--- i386.c	16 Jan 2004 18:53:44 -0000	1.635
+++ i386.c	20 Jan 2004 21:19:08 -0000
@@ -15689,6 +15689,69 @@ k8_avoid_jump_misspredicts (void)
     }
 }
 
+/* Remove duplicate setting of the condition codes.  */
+
+static void
+ix86_remove_dup_cc (void)
+{
+  rtx cc_reg;
+  rtx last_cc_insn;
+  rtx last_cc_src;
+  rtx insn;
+  rtx next;
+
+  cc_reg = gen_rtx_REG (CCmode, FLAGS_REG);
+  last_cc_insn = NULL_RTX;
+  last_cc_src = NULL_RTX;
+
+  for (insn = get_insns (); insn; insn = next)
+    {
+      rtx set;
+
+      next = NEXT_INSN (insn);
+
+      if (GET_CODE (insn) == CODE_LABEL
+	  || GET_CODE (insn) == CALL_INSN)
+	{
+	  last_cc_insn = NULL_RTX;
+	  continue;
+	}
+
+      if (! INSN_P (insn))
+	continue;
+
+      set = single_set (insn);
+      if (set
+	  && GET_CODE (SET_DEST (set)) == REG
+	  && REGNO (SET_DEST (set)) == FLAGS_REG)
+	{
+	  if (last_cc_insn
+	      && (rtx_equal_p (last_cc_src, SET_SRC (set))
+		  || (GET_CODE (SET_SRC (set)) == COMPARE
+		      && GET_CODE (last_cc_src) == COMPARE
+		      && GET_MODE (SET_SRC (set)) != GET_MODE (last_cc_src)
+		      && GET_MODE_CLASS (GET_MODE (SET_SRC (set))) == MODE_CC
+		      && GET_MODE_CLASS (GET_MODE (last_cc_src)) == MODE_CC
+		      && rtx_equal_p (XEXP (SET_SRC (set), 0),
+				      XEXP (last_cc_src, 0))
+		      && rtx_equal_p (XEXP (SET_SRC (set), 1),
+				      XEXP (last_cc_src, 1))))
+	      && ! modified_between_p (SET_SRC (set), last_cc_insn, insn))
+	    delete_insn (insn);
+	  else
+	    {
+	      last_cc_insn = insn;
+	      last_cc_src = SET_SRC (set);
+	    }
+	}
+      else
+	{
+	  if (reg_set_p (cc_reg, insn))
+	    last_cc_insn = NULL_RTX;
+	}
+    }
+}
+
 /* Implement machine specific optimizations.
    At the moment we implement single transformation: AMD Athlon works faster
    when RET is not destination of conditional jump or directly preceded
@@ -15698,6 +15761,9 @@ static void
 ix86_reorg (void)
 {
   edge e;
+
+  if (optimize)
+    ix86_remove_dup_cc ();
 
   if (!TARGET_ATHLON_K8 || !optimize || optimize_size)
     return;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: gcc/1532: Redundant compare instructions on i386
  2004-01-20 21:42 gcc/1532: Redundant compare instructions on i386 Ian Lance Taylor
@ 2004-01-21  0:47 ` Richard Henderson
  2004-01-21  3:24   ` Ian Lance Taylor
  2004-01-22  3:39 ` Jamie Lokier
  2004-01-22  6:27 ` Segher Boessenkool
  2 siblings, 1 reply; 9+ messages in thread
From: Richard Henderson @ 2004-01-21  0:47 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc, gcc-bugzilla

On Tue, Jan 20, 2004 at 04:42:31PM -0500, Ian Lance Taylor wrote:
> So I punted and wrote a quick pass in machine dependent reorg to
> eliminate obvious duplicate comparison cases.  It works fine on this
> test case--the resulting code is the same except that the redudant
> comparison is eliminated.  But it kind of sucks because it doesn't do
> any real dataflow analysis, so it will miss all but the most obvious
> cases.

This is actually not bad, I think, simply because almost everything
clobbers the flags, and therefore data can't flow very far at all.

About the only thing you could do better than what you did is to
follow extended basic blocks.  I.e. follow jumps forward through
jumps to blocks with only one incoming edge.

> 1) Comparisons are represented with different modes for purposes of
>    permitting future optimization, but are otherwise implemented by
>    the same underlying instruction.

This isn't really correct.  Lots of instructions set *some* flags
bits, but not all instructions set all flags bits in useful ways.
The different modes describe how the flags will be used, and thus
constrain which insns may be used to generate the flags.  This is
most apparent when you look at the combination patterns that do
add+compare and the like.

> 3) We want the compiler to eliminate obvious duplication in setting
>    the comparison flags.  That is, the compiler should recognize when
>    one compare instruction will produce the same result as a previous
>    compare instruction.
>    a) It should do this despite any mode differences.
>    b) It should do this even though the result goes to a hard
>       register.
>    c) It should not do this in the usual GCSE way, which involves
>       copying the value to a pseudo-register, because if that
>       pseudo-register is used after eflags has been clobbered the
>       resulting code will almost certainly be worse.

Agreed.

> There is some other awful code in this test case which I haven't even
> looked at yet, a useless jump to the next instruction:
> 
> 	je	.L1
> .L2:
> .L1:
> 
> I assume that is a separate problem, probably related to inlining.

I would assume.  Not sure exactly what would cause that...
It's possible there's something in the rtl between L2 and L1
seen by the optimizers but which produces no assembly.  If
so, there's probably not much to fix there.


r~

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: gcc/1532: Redundant compare instructions on i386
  2004-01-21  0:47 ` Richard Henderson
@ 2004-01-21  3:24   ` Ian Lance Taylor
  2004-01-21  5:34     ` law
  2004-01-21  6:32     ` law
  0 siblings, 2 replies; 9+ messages in thread
From: Ian Lance Taylor @ 2004-01-21  3:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc

Richard Henderson <rth@redhat.com> writes:

> On Tue, Jan 20, 2004 at 04:42:31PM -0500, Ian Lance Taylor wrote:
> > So I punted and wrote a quick pass in machine dependent reorg to
> > eliminate obvious duplicate comparison cases.  It works fine on this
> > test case--the resulting code is the same except that the redudant
> > comparison is eliminated.  But it kind of sucks because it doesn't do
> > any real dataflow analysis, so it will miss all but the most obvious
> > cases.
> 
> This is actually not bad, I think, simply because almost everything
> clobbers the flags, and therefore data can't flow very far at all.
> 
> About the only thing you could do better than what you did is to
> follow extended basic blocks.  I.e. follow jumps forward through
> jumps to blocks with only one incoming edge.

Hmmm, that's true, but also the machine dependent reorg pass is so
late in the process.  It would be nice to do this before reload, at
least, since it can potentially free up a register which won't be
needed for the second compare.  Admittedly you can't do any math,
since that would clobber the flags register, but you can at least move
stuff around.  I dunno.  Maybe it won't make any difference in
practice.

Ian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: gcc/1532: Redundant compare instructions on i386
  2004-01-21  3:24   ` Ian Lance Taylor
@ 2004-01-21  5:34     ` law
  2004-01-21  6:32     ` law
  1 sibling, 0 replies; 9+ messages in thread
From: law @ 2004-01-21  5:34 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Richard Henderson, gcc

In message <m3ektuknh6.fsf@gossamer.airs.com>, Ian Lance Taylor writes:
 >Richard Henderson <rth@redhat.com> writes:
 >
 >> On Tue, Jan 20, 2004 at 04:42:31PM -0500, Ian Lance Taylor wrote:
 >> > So I punted and wrote a quick pass in machine dependent reorg to
 >> > eliminate obvious duplicate comparison cases.  It works fine on this
 >> > test case--the resulting code is the same except that the redudant
 >> > comparison is eliminated.  But it kind of sucks because it doesn't do
 >> > any real dataflow analysis, so it will miss all but the most obvious
 >> > cases.
 >> 
 >> This is actually not bad, I think, simply because almost everything
 >> clobbers the flags, and therefore data can't flow very far at all.
 >> 
 >> About the only thing you could do better than what you did is to
 >> follow extended basic blocks.  I.e. follow jumps forward through
 >> jumps to blocks with only one incoming edge.
 >
 >Hmmm, that's true, but also the machine dependent reorg pass is so
 >late in the process.  It would be nice to do this before reload, at
 >least, since it can potentially free up a register which won't be
 >needed for the second compare.  Admittedly you can't do any math,
 >since that would clobber the flags register, but you can at least move
 >stuff around.  I dunno.  Maybe it won't make any difference in
 >practice.
FWIW, even when you fix the redundant compares, the compiler really ought
to thread all the paths to the switch statement to their ultimate
destinations -- we know at compile time which case will be taken for
each path to the switch statement.

From looking at the tree-ssa dumps, we do manage to determine the switch's
value for each incoming path, but the threading code doesn't thread
through blocks with switch statements (I simply haven't written that
tiny bit of code yet):

  # BLOCK 3
  # PRED: 1 [50.0%]  (false,exec) 0 [29.0%]  (true,exec) 2 [100.0%]  
(fallthru,exec)
  # <D1081>_1 = PHI <2(1), 1(2), 0(0)>;
<L4>:;
  switch (<D1081>_1)
    {
      case 0: goto <L5>;
      case 1: goto <L6>;
      case 2: goto <L7>;
      default : goto <L8>;
    }
  # SUCC: 7 [25.0%]  (exec) 6 [25.0%]  (exec) 5 [25.0%]  (exec) 4 [25.0%]  
(exec)


So once I add that code to the jump threader you should get something like
this pseudo C out of the tree-ssa optimizers:

  if (i == j) goto L1; else goto L2;
L2:

  if (i > j) goto L3; else goto L4;

L4:
  goto L5;

L1:
  return i + j;

L3:
  return i;

L5:
  return j;



Anyway, I'll probably use this as a testcase when I cobble together
the code to thread jumps through switch statements.

jeff



  
























^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: gcc/1532: Redundant compare instructions on i386
  2004-01-21  3:24   ` Ian Lance Taylor
  2004-01-21  5:34     ` law
@ 2004-01-21  6:32     ` law
  2004-01-21  7:48       ` Richard Henderson
  1 sibling, 1 reply; 9+ messages in thread
From: law @ 2004-01-21  6:32 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Richard Henderson, gcc

In message <m3ektuknh6.fsf@gossamer.airs.com>, Ian Lance Taylor writes:
 >Hmmm, that's true, but also the machine dependent reorg pass is so
 >late in the process.  It would be nice to do this before reload, at
 >least, since it can potentially free up a register which won't be
 >needed for the second compare.  Admittedly you can't do any math,
 >since that would clobber the flags register, but you can at least move
 >stuff around.  I dunno.  Maybe it won't make any difference in
 >practice.
FYI -- here's what I get out of the tree-ssa optimizers for that testcase
with a trivial tweak to the jump threading code:


foo (i, j)
{
  int <D1081>;
                                                                               

  # BLOCK 0
  # PRED: ENTRY [100.0%]  (fallthru,exec)
  if (i == j) goto <L9>; else goto <L1>;
  # SUCC: 2 [71.0%]  (false,exec) 1 [29.0%]  (true,exec)
                                                                               

  # BLOCK 1
  # PRED: 0 [29.0%]  (true,exec)
<L9>:;
  return j + j;
  # SUCC: EXIT [100.0%]
                                                                               

  # BLOCK 2
  # PRED: 0 [71.0%]  (false,exec)
<L1>:;
  if (i > j) goto <L2>; else goto <L10>;
  # SUCC: 3 [50.0%]  (false,exec) 4 [50.0%]  (true,exec)
                                                                               

  # BLOCK 3
  # PRED: 2 [50.0%]  (false,exec)
<L10>:;
  return j;
  # SUCC: EXIT [100.0%]
                                                                               

  # BLOCK 4
  # PRED: 2 [50.0%]  (true,exec)
<L2>:;
  return i;
  # SUCC: EXIT [100.0%]
                                                                               

}


Note how the switch statement has been eliminated.

I won't be able to run through a regression test tonight (already got
something running), but I'll do it tomorrow.  I don't expect any problems.

jeff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: gcc/1532: Redundant compare instructions on i386
  2004-01-21  6:32     ` law
@ 2004-01-21  7:48       ` Richard Henderson
  2004-01-21  8:47         ` law
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Henderson @ 2004-01-21  7:48 UTC (permalink / raw)
  To: law; +Cc: Ian Lance Taylor, gcc

On Tue, Jan 20, 2004 at 11:28:42PM -0700, law@redhat.com wrote:
> I won't be able to run through a regression test tonight (already got
> something running), but I'll do it tomorrow.  I don't expect any problems.

Note that this doesn't directly address the PR, since 

    if (i == j) goto <L9>; else goto <L1>;
  <L1>:;
    if (i > j) goto <L2>; else goto <L10>;

can be done with just one compare insn.  So there's
still room for something like what Ian's proposing.


r~

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: gcc/1532: Redundant compare instructions on i386
  2004-01-21  7:48       ` Richard Henderson
@ 2004-01-21  8:47         ` law
  0 siblings, 0 replies; 9+ messages in thread
From: law @ 2004-01-21  8:47 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Ian Lance Taylor, gcc

In message <20040121074430.GA11819@redhat.com>, Richard Henderson writes:
 >On Tue, Jan 20, 2004 at 11:28:42PM -0700, law@redhat.com wrote:
 >> I won't be able to run through a regression test tonight (already got
 >> something running), but I'll do it tomorrow.  I don't expect any problems.
 >
 >Note that this doesn't directly address the PR, since 
 >
 >    if (i == j) goto <L9>; else goto <L1>;
 >  <L1>:;
 >    if (i > j) goto <L2>; else goto <L10>;
 >
 >can be done with just one compare insn.  So there's
 >still room for something like what Ian's proposing.
Certainly -- I was just pointing out that there was some obvious 
lameness in how this code was handled by the tree-ssa optimizers.

My changes certainly won't fix the problem Ian is trying to resolve.
They merely make the switch statement disappear before we drop from
trees down to RTL.  Ian's changes (or something similar) are still
necessary to kill the redundant compare.


jeff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: gcc/1532: Redundant compare instructions on i386
  2004-01-20 21:42 gcc/1532: Redundant compare instructions on i386 Ian Lance Taylor
  2004-01-21  0:47 ` Richard Henderson
@ 2004-01-22  3:39 ` Jamie Lokier
  2004-01-22  6:27 ` Segher Boessenkool
  2 siblings, 0 replies; 9+ messages in thread
From: Jamie Lokier @ 2004-01-22  3:39 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc, gcc-bugzilla

Ian Lance Taylor wrote:
> For a fairly simple test case, found in the PR, gcc generates code
> which looks like this:
> 
> 	cmpl	$1, %eax
> 	je	.L1
> 	cmpl	$1, %eax
> 	jle	.L15
> 
> The second cmpl instruction is useless.  The condition flags will not
> have changed since the first one.

Note that on the original Pentium, the Branch Target Buffer (which is
used to predict branches) cannot store the predicted target of two
branch instructions which are in the same 4 byte address range.

Or something like that.

This means that if "je .L1" is a short branch, the BTB will trash as
it passes through these two instructions, and fail to predict the
branches efficiently.  If so, the code is faster with the second cmpl
instruction, although a nop or two is probably better.

-- Jamie

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: gcc/1532: Redundant compare instructions on i386
  2004-01-20 21:42 gcc/1532: Redundant compare instructions on i386 Ian Lance Taylor
  2004-01-21  0:47 ` Richard Henderson
  2004-01-22  3:39 ` Jamie Lokier
@ 2004-01-22  6:27 ` Segher Boessenkool
  2 siblings, 0 replies; 9+ messages in thread
From: Segher Boessenkool @ 2004-01-22  6:27 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc, gcc-bugzilla

> 3) We want the compiler to eliminate obvious duplication in setting
>    the comparison flags.  That is, the compiler should recognize when
>    one compare instruction will produce the same result as a previous
>    compare instruction.

...or the other way around.  While recently looking at some PowerPC
binary code (2.95 generated I think, so it might not actually be like
this anymore) for something that was a switch statement, I noticed
code like this:

	cmplwi 7,5,0    # compare the unsigned word in r5 to imm zero,
	                # store the result in condition code #7
	... some branch on it
	cmpwi 7,5,0     # compare the _signed_ word in r5 to imm zero
	... more branches

Changing the first compare to be a signed compare, would eliminate
the second compare insn.

Again, this was most likely a 2.95 compiler, so I don't _know_ if this
problem still exists today for PowerPC; but it does make a case for 
Ian's
suggested optimi[sz]ation as a generic optimisation, not an x86-specific
one.


Segher


p.s.  These instructions might read easier as  cmplwi cr7,r5,0x0  etc.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-01-22  6:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-20 21:42 gcc/1532: Redundant compare instructions on i386 Ian Lance Taylor
2004-01-21  0:47 ` Richard Henderson
2004-01-21  3:24   ` Ian Lance Taylor
2004-01-21  5:34     ` law
2004-01-21  6:32     ` law
2004-01-21  7:48       ` Richard Henderson
2004-01-21  8:47         ` law
2004-01-22  3:39 ` Jamie Lokier
2004-01-22  6:27 ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).