public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* minor code-quality regression vs. 2.95
@ 2000-04-11 21:45 Zack Weinberg
  2000-04-12 14:09 ` Clinton Popetz
  0 siblings, 1 reply; 15+ messages in thread
From: Zack Weinberg @ 2000-04-11 21:45 UTC (permalink / raw)
  To: gcc

This function:

long long blocks (long long bytes) { return bytes / 512; }

compiles as follows with 2.95:

blocks:
	movl 4(%esp),%eax
	movl 8(%esp),%edx
	testl %edx,%edx
	jge .L3
	addl $511,%eax
	adcl $0,%edx
.L3:
	shrdl $9,%edx,%eax
	sarl $9,%edx
	ret

and with the 20000408 CVS tree:

blocks:
	movl	8(%esp), %edx
	movl	4(%esp), %eax
	cmpl	$0, %edx
	jg	.L3
	testl	%edx, %edx
	jns	.L3
	addl	$511, %eax
	adcl	$0, %edx
.L3:
	shrdl	$9, %edx, %eax
	sarl	$9, %edx
	ret

Notice the additional compare and test in the 2.96 code.  

I can't make any sense at all of the RTL.

zw

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: minor code-quality regression vs. 2.95
  2000-04-11 21:45 minor code-quality regression vs. 2.95 Zack Weinberg
@ 2000-04-12 14:09 ` Clinton Popetz
  2000-04-12 15:08   ` Zack Weinberg
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Clinton Popetz @ 2000-04-12 14:09 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: gcc

On Tue, Apr 11, 2000 at 09:45:41PM -0700, Zack Weinberg wrote:
> This function:
> 
> long long blocks (long long bytes) { return bytes / 512; }
> 
> compiles as follows with 2.95:
> 
> blocks:
> 	movl 4(%esp),%eax
> 	movl 8(%esp),%edx
> 	testl %edx,%edx
> 	jge .L3
> 	addl $511,%eax
> 	adcl $0,%edx
> L3:
> 	shrdl $9,%edx,%eax
> 	sarl $9,%edx
> 	ret
> 
> and with the 20000408 CVS tree:
> 
> blocks:
> 	movl	8(%esp), %edx
> 	movl	4(%esp), %eax
> 	cmpl	$0, %edx
> 	jg	.L3
> 	testl	%edx, %edx
> 	jns	.L3
> 	addl	$511, %eax
> 	adcl	$0, %edx
> L3:
> 	shrdl	$9, %edx, %eax
> 	sarl	$9, %edx
> 	ret
> 
> Notice the additional compare and test in the 2.96 code.  
> 
> I can't make any sense at all of the RTL.

ix86_expand_branch is emitting the long long compare like this:

         * a < b =>
         *    if (hi(a) < hi(b)) goto true; 
         *    if (hi(a) > hi(b)) goto false;
         *    if (lo(a) < lo(b)) goto true;

In the above case, the third jump can be turned into an uncoditional jump by
cse, which means the first test/jump could be eliminated _if_ it were emitted
directly before the last one (i.e. swap the first two statements above.)

Out of curiousity I made that change:

Index: config/i386/i386.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/config/i386/i386.c,v
retrieving revision 1.154
diff -c -2 -p -r1.154 i386.c
*** i386.c	2000/04/09 20:26:42	1.154
--- i386.c	2000/04/12 20:16:38
*************** ix86_expand_branch (code, label)
*** 4942,4949 ****
  	ix86_compare_op1 = hi[1];
  
- 	if (code1 != NIL)
- 	  ix86_expand_branch (code1, label);
  	if (code2 != NIL)
  	  ix86_expand_branch (code2, label2);
  
  	ix86_compare_op0 = lo[0];
--- 4942,4949 ----
  	ix86_compare_op1 = hi[1];
  
  	if (code2 != NIL)
  	  ix86_expand_branch (code2, label2);
+ 	if (code1 != NIL)
+ 	  ix86_expand_branch (code1, label);
  
  	ix86_compare_op0 = lo[0];


which unfortunately only yielded:

movl	8(%esp), %edx
	movl	4(%esp), %eax
	testl	%edx, %edx
	js	.L4
	cmpl	$0, %edx
	jmp	.L3
	.p2align 4,,7
.L4:
	addl	$511, %eax
	adcl	$0, %edx
.L3:
	shrdl	$9, %edx, %eax
	sarl	$9, %edx
	ret

where the cmpl is obviously useless, and the jmp could be eliminated if the
initial js were a jns .L3.  The rtl in question after cse is:

(insn 12 11 13 (set (reg:CCNO 17 flags)
        (compare:CCNO (subreg:SI (reg:DI 28) 1)
            (const_int 0 [0x0]))) 5 {cmpsi_ccno_1} (nil)
    (nil))

(jump_insn 13 12 33 (set (pc)
        (if_then_else (lt (reg:CCNO 17 flags)
                (const_int 0 [0x0]))
            (label_ref 18)
            (pc))) 459 {*jcc_1} (nil)
    (nil))

(insn 14 33 15 (set (reg:CC 17 flags)
        (compare:CC (subreg:SI (reg:DI 28) 1)
            (const_int 0 [0x0]))) 6 {cmpsi_1} (nil)
    (nil))

(jump_insn 15 14 34 (set (pc)
        (if_then_else (gt (reg:CC 17 flags)
                (const_int 0 [0x0]))
            (label_ref 20)
            (pc))) 461 {*jcc_3} (nil)
    (nil))

(insn 16 34 17 (set (reg:CCNO 17 flags)
        (compare:CCNO (subreg:SI (reg:DI 28) 0)
            (const_int 0 [0x0]))) 5 {cmpsi_ccno_1} (nil)
    (nil))

(jump_insn 17 16 38 (set (pc)
        (label_ref 20)) -1 (nil)
    (nil))

(barrier 38 17 18)

(code_label 18 38 35 4 "" "" [num uses: 1])


(insn 19 35 20 (parallel[ 
            (set (reg:DI 28)
                (plus:DI (reg:DI 28)
                    (const_int 511 [0x1ff])))
            (clobber (reg:CC 17 flags))
        ] ) 179 {adddi3} (nil)
    (nil))

(code_label 20 19 36 3 "" "" [num uses: 2])


In 2.95.2, insn 16 was deleted by cse, because i386 was a cc0 port and cse_insn
knew cc0 wasn't preserved accross insns.  This allowed the jump pass after cse
to delete jump 15 as well.  In the current compiler, neither is deleted
(because we don't have flow to know that 16 is dead.)

That would be ok, because flow deletes 16 later, and jump2 deletes jump 15.
But we can't delete insn 14 when we delete jump 15, because we can't rely on
the REG_DEAD notes after scheduling (see the comment in delete_computation.)
Since insn 14 can't go, we can't delete jump 17 and invert jump 13.

So all for naught :)  I don't think we can run jump_optimize at the end of
flow2.  Besides, I don't know if my change to i386.c would hurt jump
optimization oppurtunities for other situations.   

It's really a pain that we can't depend on lifetime information throughout the
compiler.  Is there an estimate on how much work it would take to make
the scheduler maintain REG_DEAD notes?

				-Clint

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: minor code-quality regression vs. 2.95
  2000-04-12 14:09 ` Clinton Popetz
@ 2000-04-12 15:08   ` Zack Weinberg
  2000-04-13 14:17     ` Clinton Popetz
  2000-04-13  0:29   ` Richard Henderson
  2000-05-03  9:39   ` Jeffrey A Law
  2 siblings, 1 reply; 15+ messages in thread
From: Zack Weinberg @ 2000-04-12 15:08 UTC (permalink / raw)
  To: Clinton Popetz; +Cc: gcc

On Wed, Apr 12, 2000 at 04:08:23PM -0500, Clinton Popetz wrote:
> On Tue, Apr 11, 2000 at 09:45:41PM -0700, Zack Weinberg wrote:
> > This function:
> > 
> > long long blocks (long long bytes) { return bytes / 512; }
...
> ix86_expand_branch is emitting the long long compare like this:
> 
>          * a < b =>
>          *    if (hi(a) < hi(b)) goto true; 
>          *    if (hi(a) > hi(b)) goto false;
>          *    if (lo(a) < lo(b)) goto true;
> 
> In the above case, the third jump can be turned into an uncoditional jump by
> cse, which means the first test/jump could be eliminated _if_ it were emitted
> directly before the last one (i.e. swap the first two statements above.)
...
> In 2.95.2, insn 16 was deleted by cse, because i386 was a cc0 port and
> cse_insn knew cc0 wasn't preserved accross insns.  This allowed the
> jump pass after cse to delete jump 15 as well.  In the current compiler,
> neither is deleted (because we don't have flow to know that 16 is dead.)
>
> That would be ok, because flow deletes 16 later, and jump2 deletes jump 15.
> But we can't delete insn 14 when we delete jump 15, because we can't rely on
> the REG_DEAD notes after scheduling (see the comment in delete_computation.)
> Since insn 14 can't go, we can't delete jump 17 and invert jump 13.

I do agree that it would be nice if life info remained valid all the way
through the optimizer.  However, in this case I'm wondering why we go for
the full comparison when all we want to know is if the value is negative.
a < 0 trivially optimizes to hi(a) < 0, because the low 32 bits can't
affect the sign.  We could do the same for a > 0, and similar for <=
and >= (have to test low == 0 as well).  That would be easy to do at
RTL generation time.

I also can't figure out how we get from 'val / 512' to ix86_expand_branch.
There's no divdi pattern, and the ashrdi patterns don't generate the
branch and addition.  It must be hiding somewhere in the machine
independent code, but I don't know where.

zw

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: minor code-quality regression vs. 2.95
  2000-04-12 14:09 ` Clinton Popetz
  2000-04-12 15:08   ` Zack Weinberg
@ 2000-04-13  0:29   ` Richard Henderson
  2000-04-13  6:50     ` Clinton Popetz
  2000-05-03  9:39   ` Jeffrey A Law
  2 siblings, 1 reply; 15+ messages in thread
From: Richard Henderson @ 2000-04-13  0:29 UTC (permalink / raw)
  To: Clinton Popetz; +Cc: Zack Weinberg, gcc

On Wed, Apr 12, 2000 at 04:08:23PM -0500, Clinton Popetz wrote:
> It's really a pain that we can't depend on lifetime information throughout the
> compiler.  Is there an estimate on how much work it would take to make
> the scheduler maintain REG_DEAD notes?

I'm fairly certain that commentary is out of date as of

   Wed Oct 20 06:26:58 1999  Richard Henderson  <rth@cygnus.com>
 
or so.


r~

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: minor code-quality regression vs. 2.95
  2000-04-13  0:29   ` Richard Henderson
@ 2000-04-13  6:50     ` Clinton Popetz
  2000-04-17  8:15       ` Jeffrey A Law
  0 siblings, 1 reply; 15+ messages in thread
From: Clinton Popetz @ 2000-04-13  6:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Zack Weinberg, gcc

On Thu, Apr 13, 2000 at 12:29:27AM -0700, Richard Henderson wrote:
> On Wed, Apr 12, 2000 at 04:08:23PM -0500, Clinton Popetz wrote:
> > It's really a pain that we can't depend on lifetime information throughout the
> > compiler.  Is there an estimate on how much work it would take to make
> > the scheduler maintain REG_DEAD notes?
> 
> I'm fairly certain that commentary is out of date as of
> 
>    Wed Oct 20 06:26:58 1999  Richard Henderson  <rth@cygnus.com>
>  
> or so.

Ok, I'll remove this from delete_computation:

#ifdef INSN_SCHEDULING
  /* ?!? The schedulers do not keep REG_DEAD notes accurate after
     reload has completed.  The schedulers need to be fixed.  Until
     they are, we must not rely on the death notes here.  */
  if (reload_completed && flag_schedule_insns_after_reload)
    {
      delete_insn (insn);
      return;
    }
#endif


and see if we fail regression anywhere.

				-Clint

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: minor code-quality regression vs. 2.95
  2000-04-12 15:08   ` Zack Weinberg
@ 2000-04-13 14:17     ` Clinton Popetz
  2000-04-13 23:29       ` Zack Weinberg
  0 siblings, 1 reply; 15+ messages in thread
From: Clinton Popetz @ 2000-04-13 14:17 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: gcc

On Wed, Apr 12, 2000 at 03:08:38PM -0700, Zack Weinberg wrote:
 
> I also can't figure out how we get from 'val / 512' to ix86_expand_branch.
> There's no divdi pattern, and the ashrdi patterns don't generate the
> branch and addition.  It must be hiding somewhere in the machine
> independent code, but I don't know where.

expand_divmod does this when branches are cheap.  

				-Clint 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: minor code-quality regression vs. 2.95
  2000-04-13 14:17     ` Clinton Popetz
@ 2000-04-13 23:29       ` Zack Weinberg
  2000-04-14 15:15         ` Zack Weinberg
  2000-04-23 13:48         ` Jeffrey A Law
  0 siblings, 2 replies; 15+ messages in thread
From: Zack Weinberg @ 2000-04-13 23:29 UTC (permalink / raw)
  To: Clinton Popetz; +Cc: gcc

On Thu, Apr 13, 2000 at 04:16:47PM -0500, Clinton Popetz wrote:
> On Wed, Apr 12, 2000 at 03:08:38PM -0700, Zack Weinberg wrote:
>  
> > I also can't figure out how we get from 'val / 512' to ix86_expand_branch.
> > There's no divdi pattern, and the ashrdi patterns don't generate the
> > branch and addition.  It must be hiding somewhere in the machine
> > independent code, but I don't know where.
> 
> expand_divmod does this when branches are cheap.  

Hmm... expand_divmod calls do_cmp_and_jump, which looks like it would
be the appropriate place to teach GCC that DImode < 0 or >= 0 can be
done by looking only at the high word.  But when I do that (see patch
below), it doesn't help, because i386 has a cmpdi pattern.  This
seems silly to me... why not let the generic code synthesize it?

With the patch below _and_ cmpdi commented out of i386.md, I get the
same assembly as 2.95 produced.  (The patch is overly long due to
indentation changes.)

zw

===================================================================
Index: expmed.c
--- expmed.c	2000/04/10 11:51:53	1.57
+++ expmed.c	2000/04/14 06:24:47
@@ -4563,58 +4563,84 @@ do_cmp_and_jump (arg1, arg2, op, mode, l
      enum rtx_code op;
      enum machine_mode mode;
 {
-  /* If this mode is an integer too wide to compare properly,
-     compare word by word.  Rely on cse to optimize constant cases.  */
+  rtx label2, word0;
 
-  if (GET_MODE_CLASS (mode) == MODE_INT
-      && ! can_compare_p (op, mode, ccp_jump))
+  if (GET_MODE_CLASS (mode) != MODE_INT
+      || can_compare_p (op, mode, ccp_jump))
     {
-      rtx label2 = gen_label_rtx ();
-
+      emit_cmp_and_jump_insns (arg1, arg2, op, NULL_RTX, mode, 0, 0, label);
+      return;
+    }
+    
+  /* Special case some word-by-word compares against zero.  */
+  if (arg2 == const0_rtx)
+    {
       switch (op)
 	{
-	case LTU:
-	  do_jump_by_parts_greater_rtx (mode, 1, arg2, arg1, label2, label);
-	  break;
-
-	case LEU:
-	  do_jump_by_parts_greater_rtx (mode, 1, arg1, arg2, label, label2);
-	  break;
-
-	case LT:
-	  do_jump_by_parts_greater_rtx (mode, 0, arg2, arg1, label2, label);
-	  break;
-
-	case GT:
-	  do_jump_by_parts_greater_rtx (mode, 0, arg1, arg2, label2, label);
-	  break;
+	case LTU:  return;			/* always false */
+	case GEU:  emit_jump (label); return;	/* always true */
+	case LEU:  op = EQ;  break;		/* equivalent */
+	case GTU:  op = NE;  break;
 
+	case LT:	/* a < 0 or a >= 0 depend only on the high word.  */
 	case GE:
-	  do_jump_by_parts_greater_rtx (mode, 0, arg2, arg1, label, label2);
-	  break;
+	  if (WORDS_BIG_ENDIAN)
+	    word0 = operand_subword_force (arg1, 0, mode);
+	  else
+	    word0 = operand_subword_force (arg1, (GET_MODE_SIZE (mode)
+						  / UNITS_PER_WORD) - 1, mode);
+		
+	  emit_cmp_and_jump_insns (word0, arg2, op, NULL_RTX,
+				   word_mode, 0, 0, label);
+	  return;
 
-	  /* do_jump_by_parts_equality_rtx compares with zero.  Luckily
-	     that's the only equality operations we do */
-	case EQ:
-	  if (arg2 != const0_rtx || mode != GET_MODE(arg1))
-	    abort();
-	  do_jump_by_parts_equality_rtx (arg1, label2, label);
-	  break;
-
-	case NE:
-	  if (arg2 != const0_rtx || mode != GET_MODE(arg1))
-	    abort();
-	  do_jump_by_parts_equality_rtx (arg1, label, label2);
-	  break;
-
-	default:
-	  abort();
+	default: break;
 	}
-
-      emit_label (label2);
     }
-  else
+
+  /* If this mode is an integer too wide to compare properly,
+     compare word by word.  Rely on cse to optimize constant cases.  */
+
+  label2  = gen_label_rtx ();
+  switch (op)
     {
-      emit_cmp_and_jump_insns (arg1, arg2, op, NULL_RTX, mode, 0, 0, label);
+    case LTU:
+      do_jump_by_parts_greater_rtx (mode, 1, arg2, arg1, label2, label);
+      break;
+
+    case LEU:
+      do_jump_by_parts_greater_rtx (mode, 1, arg1, arg2, label, label2);
+      break;
+
+    case LT:
+      do_jump_by_parts_greater_rtx (mode, 0, arg2, arg1, label2, label);
+      break;
+
+    case GT:
+      do_jump_by_parts_greater_rtx (mode, 0, arg1, arg2, label2, label);
+      break;
+
+    case GE:
+      do_jump_by_parts_greater_rtx (mode, 0, arg2, arg1, label, label2);
+      break;
+
+      /* do_jump_by_parts_equality_rtx compares with zero.  Luckily
+	 that's the only equality operations we do */
+    case EQ:
+      if (arg2 != const0_rtx || mode != GET_MODE(arg1))
+	abort();
+      do_jump_by_parts_equality_rtx (arg1, label2, label);
+      break;
+
+    case NE:
+      if (arg2 != const0_rtx || mode != GET_MODE(arg1))
+	abort();
+      do_jump_by_parts_equality_rtx (arg1, label, label2);
+      break;
+
+    default:
+      abort();
     }
+
+  emit_label (label2);
 }

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: minor code-quality regression vs. 2.95
  2000-04-13 23:29       ` Zack Weinberg
@ 2000-04-14 15:15         ` Zack Weinberg
  2000-04-23 13:48         ` Jeffrey A Law
  1 sibling, 0 replies; 15+ messages in thread
From: Zack Weinberg @ 2000-04-14 15:15 UTC (permalink / raw)
  To: Clinton Popetz; +Cc: gcc

On Thu, Apr 13, 2000 at 11:29:18PM -0700, Zack Weinberg wrote:
> On Thu, Apr 13, 2000 at 04:16:47PM -0500, Clinton Popetz wrote:
> > On Wed, Apr 12, 2000 at 03:08:38PM -0700, Zack Weinberg wrote:
> >  
> > > I also can't figure out how we get from 'val / 512' to ix86_expand_branch.
> > > There's no divdi pattern, and the ashrdi patterns don't generate the
> > > branch and addition.  It must be hiding somewhere in the machine
> > > independent code, but I don't know where.
> > 
> > expand_divmod does this when branches are cheap.  
> 
> Hmm... expand_divmod calls do_cmp_and_jump, which looks like it would
> be the appropriate place to teach GCC that DImode < 0 or >= 0 can be
> done by looking only at the high word.  But when I do that (see patch
> below), it doesn't help, because i386 has a cmpdi pattern.  This
> seems silly to me... why not let the generic code synthesize it?
> 
> With the patch below _and_ cmpdi commented out of i386.md, I get the
> same assembly as 2.95 produced.  (The patch is overly long due to
> indentation changes.)

... but stage1 miscompiles cpp and we die horribly while trying to
build libstdc++.  Bugger.

zw

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: minor code-quality regression vs. 2.95
  2000-04-13  6:50     ` Clinton Popetz
@ 2000-04-17  8:15       ` Jeffrey A Law
  0 siblings, 0 replies; 15+ messages in thread
From: Jeffrey A Law @ 2000-04-17  8:15 UTC (permalink / raw)
  To: Clinton Popetz; +Cc: Richard Henderson, Zack Weinberg, gcc

  In message < 20000413085015.A15135@cpopetz.com >you write:
  > Ok, I'll remove this from delete_computation:
  > 
  > #ifdef INSN_SCHEDULING
  >   /* ?!? The schedulers do not keep REG_DEAD notes accurate after
  >      reload has completed.  The schedulers need to be fixed.  Until
  >      they are, we must not rely on the death notes here.  */
  >   if (reload_completed && flag_schedule_insns_after_reload)
  >     {
  >       delete_insn (insn);
  >       return;
  >     }
  > #endif
  > 
  > 
  > and see if we fail regression anywhere.
I discussed this with Clint this morning, but wanted to get the
info straight for archival purposes.

the code needs to change since we now have accurate life info after
sched2, but cross jumping, reg-stack & reorg can confuse things.

jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: minor code-quality regression vs. 2.95
  2000-04-13 23:29       ` Zack Weinberg
  2000-04-14 15:15         ` Zack Weinberg
@ 2000-04-23 13:48         ` Jeffrey A Law
  2000-04-23 13:56           ` Zack Weinberg
  2000-04-23 21:33           ` Philippe De Muyter
  1 sibling, 2 replies; 15+ messages in thread
From: Jeffrey A Law @ 2000-04-23 13:48 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Clinton Popetz, gcc

  In message < 20000413232918.J9184@wolery.cumb.org >you write:
  > On Thu, Apr 13, 2000 at 04:16:47PM -0500, Clinton Popetz wrote:
  > > On Wed, Apr 12, 2000 at 03:08:38PM -0700, Zack Weinberg wrote:
  > >  
  > > > I also can't figure out how we get from 'val / 512' to ix86_expand_bran
  > ch.
  > > > There's no divdi pattern, and the ashrdi patterns don't generate the
  > > > branch and addition.  It must be hiding somewhere in the machine
  > > > independent code, but I don't know where.
  > > 
  > > expand_divmod does this when branches are cheap.  
  > 
  > Hmm... expand_divmod calls do_cmp_and_jump, which looks like it would
  > be the appropriate place to teach GCC that DImode < 0 or >= 0 can be
  > done by looking only at the high word.  But when I do that (see patch
  > below), it doesn't help, because i386 has a cmpdi pattern.  This
  > seems silly to me... why not let the generic code synthesize it?
I believe Richard Kenner checked in a case to optimize this stuff a couple
days ago, possibly even based on your change.

jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: minor code-quality regression vs. 2.95
  2000-04-23 13:48         ` Jeffrey A Law
@ 2000-04-23 13:56           ` Zack Weinberg
  2000-04-23 21:33           ` Philippe De Muyter
  1 sibling, 0 replies; 15+ messages in thread
From: Zack Weinberg @ 2000-04-23 13:56 UTC (permalink / raw)
  To: Jeffrey A Law; +Cc: Clinton Popetz, gcc

On Sun, Apr 23, 2000 at 01:40:20PM -0600, Jeffrey A Law wrote:
> 
>   In message < 20000413232918.J9184@wolery.cumb.org >you write:
>   > On Thu, Apr 13, 2000 at 04:16:47PM -0500, Clinton Popetz wrote:
>   > > On Wed, Apr 12, 2000 at 03:08:38PM -0700, Zack Weinberg wrote:
>   > >  
>   > > > I also can't figure out how we get from 'val / 512' to ix86_expand_bran
>   > ch.
>   > > > There's no divdi pattern, and the ashrdi patterns don't generate the
>   > > > branch and addition.  It must be hiding somewhere in the machine
>   > > > independent code, but I don't know where.
>   > > 
>   > > expand_divmod does this when branches are cheap.  
>   > 
>   > Hmm... expand_divmod calls do_cmp_and_jump, which looks like it would
>   > be the appropriate place to teach GCC that DImode < 0 or >= 0 can be
>   > done by looking only at the high word.  But when I do that (see patch
>   > below), it doesn't help, because i386 has a cmpdi pattern.  This
>   > seems silly to me... why not let the generic code synthesize it?
> I believe Richard Kenner checked in a case to optimize this stuff a couple
> days ago, possibly even based on your change.

This one?

Tue Apr 18 14:16:47 2000  Richard Kenner  <kenner@vlsi1.ultra.nyu.edu>

        * expmed.c (emit_store_flag): If comparing two-word integer
        with zero, can optimize NE, EQ, GE, and LT.

I think he was trying to fix a related problem on Alphas.  i386.md has
a cmpdi expander, and duplicates (nearly) the expmed.c code to synthesize
DImode comparisons in ix86_expand_branch.  We never go anywhere near
this part of the code.

zw

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: minor code-quality regression vs. 2.95
  2000-04-23 13:48         ` Jeffrey A Law
  2000-04-23 13:56           ` Zack Weinberg
@ 2000-04-23 21:33           ` Philippe De Muyter
  1 sibling, 0 replies; 15+ messages in thread
From: Philippe De Muyter @ 2000-04-23 21:33 UTC (permalink / raw)
  To: law; +Cc: zack, cpopetz, gcc

Zack Weinberg wrote :
> Hmm... expand_divmod calls do_cmp_and_jump, which looks like it would
> be the appropriate place to teach GCC that DImode < 0 or >= 0 can be
> done by looking only at the high word.  But when I do that (see patch
> below), it doesn't help, because i386 has a cmpdi pattern.  This
> seems silly to me... why not let the generic code synthesize it?

I had similar problems with DImode for m68k.  If one wants to provide
`adddi3' and `subdi3' because the cpu can do it fast, one ends to provide
also `cmpdi', `anddi', `xordi', `iordi' and maybe others, because
`combine' would not combine anything of the CPU-DImode world with something
of the expmed-DImode world.

`expmed' is actually called too early.  It should probably not split cmpdi,
anddi, xordi and iordi if the md-file provides other real DImode patterns,
and let a stage after combine split them using cmpsi, andsi, xorsi and iorsi.

Another way would be to let gcc know that there are `add-with-carry' and
`substract-with-borrow' insns, and keep track of the carry/borrow flag,
but that's probably not a simple task, and that would only solve the problems
with the adddi3 and subdi3 patterns, not with muldi or divdi.

Philippe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: minor code-quality regression vs. 2.95
  2000-04-12 14:09 ` Clinton Popetz
  2000-04-12 15:08   ` Zack Weinberg
  2000-04-13  0:29   ` Richard Henderson
@ 2000-05-03  9:39   ` Jeffrey A Law
  2 siblings, 0 replies; 15+ messages in thread
From: Jeffrey A Law @ 2000-05-03  9:39 UTC (permalink / raw)
  To: Clinton Popetz; +Cc: Zack Weinberg, gcc

  In message <20000412160823.A13160@cpopetz.com>you write:
  > On Tue, Apr 11, 2000 at 09:45:41PM -0700, Zack Weinberg wrote:
  > > This function:
  > > 
  > > long long blocks (long long bytes) { return bytes / 512; }
  > > 
  > > compiles as follows with 2.95:
  > > 
  > > blocks:
  > > 	movl 4(%esp),%eax
  > > 	movl 8(%esp),%edx
  > > 	testl %edx,%edx
  > > 	jge .L3
  > > 	addl $511,%eax
  > > 	adcl $0,%edx
  > > L3:
  > > 	shrdl $9,%edx,%eax
  > > 	sarl $9,%edx
  > > 	ret
  > > 
  > > and with the 20000408 CVS tree:
  > > 
  > > blocks:
  > > 	movl	8(%esp), %edx
  > > 	movl	4(%esp), %eax
  > > 	cmpl	$0, %edx
  > > 	jg	.L3
  > > 	testl	%edx, %edx
  > > 	jns	.L3
  > > 	addl	$511, %eax
  > > 	adcl	$0, %edx
  > > L3:
  > > 	shrdl	$9, %edx, %eax
  > > 	sarl	$9, %edx
  > > 	ret
  > > 
  > > Notice the additional compare and test in the 2.96 code.  
  > > 
  > > I can't make any sense at all of the RTL.
  > 
  > ix86_expand_branch is emitting the long long compare like this:
  > 
  >          * a < b =>
  >          *    if (hi(a) < hi(b)) goto true; 
  >          *    if (hi(a) > hi(b)) goto false;
  >          *    if (lo(a) < lo(b)) goto true;
  > 
  > In the above case, the third jump can be turned into an uncoditional jump b
  > y
  > cse, which means the first test/jump could be eliminated _if_ it were emitt
  > ed
  > directly before the last one (i.e. swap the first two statements above.)
Maybe we should instead look at how we could improve our existing optimizers
to handle this situation.

For example, it might be possible to have thread_jumps detect this case.

Or somehow have life_analysis be aware that its actions can simplify the cfg
and account for those changes (by modifying the cfg and updating life info
appropriately).

jeff


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: minor code-quality regression vs. 2.95
@ 2000-04-23 14:01 Richard Kenner
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Kenner @ 2000-04-23 14:01 UTC (permalink / raw)
  To: zack; +Cc: gcc

    Tue Apr 18 14:16:47 2000  Richard Kenner  <kenner@vlsi1.ultra.nyu.edu>

        * expmed.c (emit_store_flag): If comparing two-word integer
        with zero, can optimize NE, EQ, GE, and LT.

    I think he was trying to fix a related problem on Alphas.  i386.md has
    a cmpdi expander, and duplicates (nearly) the expmed.c code to
    synthesize DImode comparisons in ix86_expand_branch.  We never go
    anywhere near this part of the code.

Actually, what I was doing was very specific to emit_store_flag: it
was being called on Alpha for TImode and emitting called to __cmpti2
even when it was just a comparison against zero.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: minor code-quality regression vs. 2.95
@ 2000-04-12 14:26 John Wehle
  0 siblings, 0 replies; 15+ messages in thread
From: John Wehle @ 2000-04-12 14:26 UTC (permalink / raw)
  To: cpopetz; +Cc: gcc, zack

> It's really a pain that we can't depend on lifetime information throughout the
> compiler.  Is there an estimate on how much work it would take to make
> the scheduler maintain REG_DEAD notes?

A while back I put together a patch for the old scheduler to
address this (then Jeff went and deleted the scheduler on me :-).
I have it on my short list to do for the current scheduler, however
you're welcome to beat me to the punch.  The changes for the old
scheduler to preserve the REG_DEAD notes were pretty minor.

-- John
-------------------------------------------------------------------------
|   Feith Systems  |   Voice: 1-215-646-8000  |  Email: john@feith.com  |
|    John Wehle    |     Fax: 1-215-540-5495  |                         |
-------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2000-05-03  9:39 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-04-11 21:45 minor code-quality regression vs. 2.95 Zack Weinberg
2000-04-12 14:09 ` Clinton Popetz
2000-04-12 15:08   ` Zack Weinberg
2000-04-13 14:17     ` Clinton Popetz
2000-04-13 23:29       ` Zack Weinberg
2000-04-14 15:15         ` Zack Weinberg
2000-04-23 13:48         ` Jeffrey A Law
2000-04-23 13:56           ` Zack Weinberg
2000-04-23 21:33           ` Philippe De Muyter
2000-04-13  0:29   ` Richard Henderson
2000-04-13  6:50     ` Clinton Popetz
2000-04-17  8:15       ` Jeffrey A Law
2000-05-03  9:39   ` Jeffrey A Law
2000-04-12 14:26 John Wehle
2000-04-23 14:01 Richard Kenner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).