[Bug rtl-optimization/21827] New: unroll misses simple elimination

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/21827] New: unroll misses simple elimination - works with manual unroll
@ 2005-05-30 18:45 tlm at daimi dot au dot dk
  2005-05-30 18:55 ` [Bug rtl-optimization/21827] " pinskia at gcc dot gnu dot org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: tlm at daimi dot au dot dk @ 2005-05-30 18:45 UTC (permalink / raw)
  To: gcc-bugs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3827 bytes --]

Using gentoo gcc 3.4.3

This could look like http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11707
(and they might be the same. However I think I had the problem with 3.3.4 too)

I have also had this problem in other older versions. In 2 projects I have been
on this has been really annoying. I think that if a loop is unrolled and the
variable is eliminated it should be replaced with a constant (and then always
false ifs should be removed) 

That is not the case:
int test(int v)
{
  int x = 0;
  for (int u=0;u<2;u++)
  {
    if (u>v)  // v is input-arg the compiler can't deside at compiletime
    {
      if (u%2==1) // can only happen for u==1 (so loops for 0 and 2 does not do
        x++;      // anything. Hoped gcc would notice when unrolling.
    }
  }  
  return x;
}

g++ -O3 -unroll-loops -S simple_test.cpp 

gives me the following code:
	.text
	.align 2
	.p2align 4,,15
.globl _Z4testi
	.type	_Z4testi, @function
_Z4testi:
.LFB2:
	pushl	%ebp
.LCFI0:
	xorl	%edx, %edx
	movl	%esp, %ebp
.LCFI1:
	xorl	%eax, %eax
	incl	%eax
	cmpl	8(%ebp), %eax
	jle	.L4
	testb	$1, %al
	setne	%cl
	movzbl	%cl, %eax
	addl	%eax, %edx
.L4:
	popl	%ebp
	movl	%edx, %eax
	ret
.LFE2:
	.size	_Z4testi, .-_Z4testi
	.section	.note.GNU-stack,"",@progbits
	.ident	"GCC: (GNU) 3.4.3-20050110 (Gentoo 3.4.3.20050110-r2,
ssp-3.4.3.20050110-0, pie-8.7.7)"

If I manually unroll like :

int test(int v)
{
  int x = 0;

  if (0>v)
  {
    if (0%2==1)
      x++;
  }
  if (1>v)
  {
    if (1%2==1)
      x++;
  }
  if (2>v)
  {
    if (2%2==1)
      x++;
  }  
  
  return x;
}

And then just with O3 I get the much nicer :
	.text
	.align 2
	.p2align 4,,15
.globl _Z4testi
	.type	_Z4testi, @function
_Z4testi:
.LFB2:
	pushl	%ebp
.LCFI0:
	xorl	%eax, %eax
	movl	%esp, %ebp
.LCFI1:
	cmpl	$0, 8(%ebp)
	popl	%ebp
	setle	%al
	ret
.LFE2:
	.size	_Z4testi, .-_Z4testi
	.section	.note.GNU-stack,"",@progbits
	.ident	"GCC: (GNU) 3.4.3-20050110 (Gentoo 3.4.3.20050110-r2,
ssp-3.4.3.20050110-0, pie-8.7.7)"

I have had too cases where this optimization is very important. One is if you a
kind of program a chessboard "from within". The other case were a raytracer I
wrote with a friend. In that situation we had to seattle with a not that fast
switch (since we did not wanted to pollute out code with a manual unroll.)

The chessboard example (here a simple case - how many knightsmove does white
have. We do not consider check, pins or that pieces can be in the way)

int knight_square_count(unsigned char* board)
{
  int count = 0;
  for (int bp=0;bp<64;bp++)
  {
    if (board[bp]==WHITE_KNIGHT)
    {
      if (bp%8>1 && bp/8>0) count++;
      if (bp%8>0 && bp/8>1) count++;
      if (bp%8<6 && bp/8>0) count++;
      if (bp%8<7 && bp/8>1) count++;
      if (bp%8>1 && bp/8<7) count++;
      if (bp%8>0 && bp/8<6) count++;
      if (bp%8<6 && bp/8<7) count++;
      if (bp%8<7 && bp/8<6) count++;
    }
  }
  return count;
}

In the above situation a manual unroll (with O3) is more than 400% faster.
(I have timed it and it is close to 500%) I thought that one of the main ideas
of unrolling loops was to make a kind of every loop "its own" (Without making
ugly code)

regards and thanks for the best (free) compiler
Bsc Computer Science 
Thorbjørn Martsum

PS : There might also be a reason for things being as they are. Then I just
don't understand why - please explain then

-- 
           Summary: unroll misses simple elimination - works with manual
                    unroll
           Product: gcc
           Version: 3.4.3
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: tlm at daimi dot au dot dk
                CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/21827] unroll misses simple elimination - works with manual unroll
  2005-05-30 18:45 [Bug rtl-optimization/21827] New: unroll misses simple elimination - works with manual unroll tlm at daimi dot au dot dk
@ 2005-05-30 18:55 ` pinskia at gcc dot gnu dot org
  2005-05-30 19:06 ` pinskia at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-05-30 18:55 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-05-30 18:53 -------
The first testcase is fixed in 4.0.0.  (Though there is a regression on the mainline).  I have not looked 
into the full testcase.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
           Keywords|                            |missed-optimization
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/21827] unroll misses simple elimination - works with manual unroll
  2005-05-30 18:45 [Bug rtl-optimization/21827] New: unroll misses simple elimination - works with manual unroll tlm at daimi dot au dot dk
  2005-05-30 18:55 ` [Bug rtl-optimization/21827] " pinskia at gcc dot gnu dot org
@ 2005-05-30 19:06 ` pinskia at gcc dot gnu dot org
  2005-05-31  7:38 ` tlm at daimi dot au dot dk
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-05-30 19:06 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-05-30 18:56 -------
I was not goint to close this, it was accident.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |UNCONFIRMED
         Resolution|FIXED                       |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/21827] unroll misses simple elimination - works with manual unroll
  2005-05-30 18:45 [Bug rtl-optimization/21827] New: unroll misses simple elimination - works with manual unroll tlm at daimi dot au dot dk
  2005-05-30 18:55 ` [Bug rtl-optimization/21827] " pinskia at gcc dot gnu dot org
  2005-05-30 19:06 ` pinskia at gcc dot gnu dot org
@ 2005-05-31  7:38 ` tlm at daimi dot au dot dk
  2005-05-31 20:49 ` tlm at daimi dot au dot dk
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: tlm at daimi dot au dot dk @ 2005-05-31  7:38 UTC (permalink / raw)
  To: gcc-bugs

------- Additional Comments From tlm at daimi dot au dot dk  2005-05-31 05:38 -------

(In reply to comment #1)
The first testcase is fixed in 4.0.0.  (Though there is a regression on the
mainline).  I have not looked  into the full testcase.

(In reply to comment #2)
> I was not goint to close this, it was accident.

Well if it works in 4.0 I guess you can close it if you want.
I made the example from the code below to give a simple example.
I am not sure, but I guess a fix on the example would do also make the 
same huge improvement on the "chess-knight-code". 
(I wrote 400-500% - but this is of course depending on how many knights there
are on the board. I had 5 (a bit unrealistic) with the function returing 17) 

Thanks for the information ... (will download version 4.0 soon)

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/21827] unroll misses simple elimination - works with manual unroll
  2005-05-30 18:45 [Bug rtl-optimization/21827] New: unroll misses simple elimination - works with manual unroll tlm at daimi dot au dot dk
                   ` (2 preceding siblings ...)
  2005-05-31  7:38 ` tlm at daimi dot au dot dk
@ 2005-05-31 20:49 ` tlm at daimi dot au dot dk
  2005-07-19 17:34 ` tlm at daimi dot au dot dk
  2005-07-21 18:08 ` pinskia at gcc dot gnu dot org
  5 siblings, 0 replies; 7+ messages in thread
From: tlm at daimi dot au dot dk @ 2005-05-31 20:49 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From tlm at daimi dot au dot dk  2005-05-31 20:45 -------
(In reply to comment #1)
> The first testcase is fixed in 4.0.0.   I have not looked 
> into the full testcase.

Installed gcc 4.0.0 (a bit hard with the current version)
OK - I was wrong before (so please do not close this). 
The simple situation is fixed - however there is still the same problems 
with the knight-example.

int unrolled_knight_count(unsigned char* board)
{
  int count = 0;
  for (int bp=0;bp<2;bp++) // reduces to 2 just for the example
  {
    if (board[bp]==WHITE_KNIGHT)
    {
      if (bp%8>1 && bp/8>0) count++;
      if (bp%8>0 && bp/8>1) count++;
      if (bp%8<6 && bp/8>0) count++;
      if (bp%8<7 && bp/8>1) count++;
      if (bp%8>1 && bp/8<7) count++;
      if (bp%8>0 && bp/8<6) count++;
      if (bp%8<6 && bp/8<7) count++;
      if (bp%8<7 && bp/8<6) count++;
    }
  }
  return count;
}

is compiled to 
	.text
	.align 2
	.p2align 4,,15
.globl _Z26unrolled_knight_countPh
	.type	_Z26auto_unrolled_knight_countPh, @function
_Z26auto_unrolled_knight_countPh:
.LFB2:
	pushl	%ebp
.LCFI0:
	xorl	%eax, %eax
	movl	%esp, %ebp
.LCFI1:
	movl	8(%ebp), %edx
	cmpb	$5, (%edx)
	je	.L10
.L6:
	cmpb	$5, 1(%edx)
	je	.L11
	popl	%ebp
	ret
	.p2align 4,,7
.L11:
	popl	%ebp
	addl	$3, %eax
	.p2align 4,,6
	ret
	.p2align 4,,7
.L10:
	movl	$2, %eax
	.p2align 4,,7
	jmp	.L6
.LFE2:
	.size	_Z26auto_unrolled_knight_countPh, .-_Z26auto_unrolled_knight_countPh
	.ident	"GCC: (GNU) 4.0.0"
	.section	.note.GNU-stack,"",@progbits

Now if I (manual) expand the loop before compiling 

int unrolled_knight_count(unsigned char* board)
{
  int count = 0;
//  for (int bp=0;bp<64;bp++) // We expand 2 as before..
    if (board[0]==WHITE_KNIGHT)
    {
      if (0%8>1 && 0/8>0) count++;
      if (0%8>0 && 0/8>1) count++;
      if (0%8<6 && 0/8>0) count++;
      if (0%8<7 && 0/8>1) count++;
      if (0%8>1 && 0/8<7) count++;
      if (0%8>0 && 0/8<6) count++;
      if (0%8<6 && 0/8<7) count++;
      if (0%8<7 && 0/8<6) count++;
    }
    if (board[1]==WHITE_KNIGHT)
    {
      if (1%8>1 && 1/8>0) count++;
      if (1%8>0 && 1/8>1) count++;
      if (1%8<6 && 1/8>0) count++;
      if (1%8<7 && 1/8>1) count++;
      if (1%8>1 && 1/8<7) count++;
      if (1%8>0 && 1/8<6) count++;
      if (1%8<6 && 1/8<7) count++;
      if (1%8<7 && 1/8<6) count++;
    }
  return count;
}

The result is mush better. (Not that I know assemblercode) 

I have WHITE_KNIGT = 5 (as you might have seen from the assemblercode)
and when I timed I had knights on pos 24,44,55,56. And the code is 
400-500% faster - so it will really improve the speed ...

	.text
	.align 2
	.p2align 4,,15
.globl _Z26unrolled_knight_countPh
	.type	_Z26auto_unrolled_knight_countPh, @function
_Z26unrolled_knight_countPh:
.LFB2:
	pushl	%ebp
.LCFI0:
	xorl	%eax, %eax
	movl	%esp, %ebp
.LCFI1:
	movl	8(%ebp), %edx
	cmpb	$5, (%edx)
	sete	%al
	addl	%eax, %eax
	cmpb	$5, 1(%edx)
	je	.L9
	popl	%ebp
	ret
	.p2align 4,,7
.L9:
	popl	%ebp
	addl	$3, %eax
	ret

Again thanks. I do not want to sound like an unhappy gcc-user 
(I admire the work you are doing). 



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/21827] unroll misses simple elimination - works with manual unroll
  2005-05-30 18:45 [Bug rtl-optimization/21827] New: unroll misses simple elimination - works with manual unroll tlm at daimi dot au dot dk
                   ` (3 preceding siblings ...)
  2005-05-31 20:49 ` tlm at daimi dot au dot dk
@ 2005-07-19 17:34 ` tlm at daimi dot au dot dk
  2005-07-21 18:08 ` pinskia at gcc dot gnu dot org
  5 siblings, 0 replies; 7+ messages in thread
From: tlm at daimi dot au dot dk @ 2005-07-19 17:34 UTC (permalink / raw)
  To: gcc-bugs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 6869 bytes --]


------- Additional Comments From tlm at daimi dot au dot dk  2005-07-19 17:02 -------
(In reply to comment #1)
> The first testcase is fixed in 4.0.0.  (Though there is a regression on the
mainline).  I have not looked 
> into the full testcase.

There have not been more reactions on this bug / request, so I give a bit more
information (and hopefully motivation) to move forward to a solution of it.

I have written the following code :
auto_unrolled_knight_count8 and t_auto_unrolled_knight_count9 only have one
difference. The first loop goes to 8 the second loop goes to 9. If I manually
unroll (meaning replaceing with constant up to 64 - it is a chessproblem - 
the code is excatly like the code generated in the up to eight example.)

The code generated for the 9 example is in my opion quite bad. 
(It does work - but I consider unrolls finest task to be to eliminate what is
(easy known) impossible at compiletime). The code is normally at least 
4-5 times slower than the above code !


The source is like this :

#define WHITE_KNIGHT 5

int auto_unrolled_knight_count8(unsigned char* board)
{
  int count = 0;
  for (int bp=0;bp<8;++bp)
  {
    if (board[bp]==WHITE_KNIGHT)
    {
      if (bp%8>1 && bp/8>0) count++;
      if (bp%8>0 && bp/8>1) count++;
      if (bp%8<6 && bp/8>0) count++;
      if (bp%8<7 && bp/8>1) count++;
      if (bp%8>1 && bp/8<7) count++;
      if (bp%8>0 && bp/8<6) count++;
      if (bp%8<6 && bp/8<7) count++;
      if (bp%8<7 && bp/8<6) count++;
    }
  }
  return count;
}

int t_auto_unrolled_knight_count9(unsigned char* board)
{
  int count = 0;
  for (int bp=0;bp<9;++bp)
  {
    if (board[bp]==WHITE_KNIGHT)
    {
      if (bp%8>1 && bp/8>0) count++;
      if (bp%8>0 && bp/8>1) count++;
      if (bp%8<6 && bp/8>0) count++;
      if (bp%8<7 && bp/8>1) count++;
      if (bp%8>1 && bp/8<7) count++;
      if (bp%8>0 && bp/8<6) count++;
      if (bp%8<6 && bp/8<7) count++;
      if (bp%8<7 && bp/8<6) count++;
    }
  }
  return count;
}

Assembly : (Compiled with -O3 and -funroll-loops) 

	.file	"all_in_one.cpp"
	.text
	.align 2
	.p2align 4,,15
.globl _Z27auto_unrolled_knight_count8Ph
	.type	_Z27auto_unrolled_knight_count8Ph, @function
_Z27auto_unrolled_knight_count8Ph:
.LFB2:
	pushl	%ebp
.LCFI0:
	xorl	%eax, %eax
	movl	%esp, %ebp
.LCFI1:
	movl	8(%ebp), %edx
	cmpb	$5, (%edx)
	je	.L22
.L6:
	cmpb	$5, 1(%edx)
	je	.L23
.L8:
	cmpb	$5, 2(%edx)
	je	.L24
.L10:
	cmpb	$5, 3(%edx)
	.p2align 4,,5
	je	.L25
.L12:
	cmpb	$5, 4(%edx)
	.p2align 4,,5
	je	.L26
.L14:
	cmpb	$5, 5(%edx)
	.p2align 4,,5
	je	.L27
.L16:
	cmpb	$5, 6(%edx)
	.p2align 4,,5
	je	.L28
.L18:
	cmpb	$5, 7(%edx)
	.p2align 4,,5
	je	.L29
	popl	%ebp
	.p2align 4,,6
	ret
	.p2align 4,,7
.L29:
	popl	%ebp
	addl	$2, %eax
	.p2align 4,,6
	ret
	.p2align 4,,7
.L28:
	addl	$3, %eax
	.p2align 4,,7
	jmp	.L18
	.p2align 4,,7
.L27:
	addl	$4, %eax
	.p2align 4,,5
	jmp	.L16
	.p2align 4,,7
.L26:
	addl	$4, %eax
	.p2align 4,,5
	jmp	.L14
	.p2align 4,,7
.L25:
	addl	$4, %eax
	.p2align 4,,5
	jmp	.L12
	.p2align 4,,7
.L24:
	addl	$4, %eax
	.p2align 4,,5
	jmp	.L10
	.p2align 4,,7
.L23:
	addl	$3, %eax
	.p2align 4,,5
	jmp	.L8
	.p2align 4,,7
.L22:
	movl	$2, %eax
	.p2align 4,,5
	jmp	.L6
.LFE2:
	.size	_Z27auto_unrolled_knight_count8Ph, .-_Z27auto_unrolled_knight_count8Ph

----------------------- End of "nice" code ----------------------

	.align 2
	.p2align 4,,15
.globl _Z29t_auto_unrolled_knight_count9Ph
	.type	_Z29t_auto_unrolled_knight_count9Ph, @function
_Z29t_auto_unrolled_knight_count9Ph:
.LFB3:
	pushl	%ebp
.LCFI2:
	movl	%esp, %ebp
.LCFI3:
	pushl	%edi
.LCFI4:
	xorl	%edi, %edi
	pushl	%esi
.LCFI5:
	xorl	%esi, %esi
	pushl	%ebx
.LCFI6:
	subl	$8, %esp
.LCFI7:
	jmp	.L31
	.p2align 4,,7
.L32:
	incl	%esi
	movl	%esi, -20(%ebp)
	cmpb	$5, (%eax,%esi)
	je	.L64
.L52:
	incl	%esi
	cmpb	$5, (%eax,%esi)
	je	.L60
.L54:
	movl	-20(%ebp), %esi
	addl	$2, %esi
	cmpl	$9, %esi
	je	.L65
.L31:
	movl	8(%ebp), %eax
	cmpb	$5, (%eax,%esi)
	jne	.L32
	movl	%esi, %eax
	cltd
	shrl	$29, %edx
	leal	(%esi,%edx), %ecx
	andl	$7, %ecx
	subl	%edx, %ecx
	cmpl	$1, %ecx
	setg	-15(%ebp)
	cmpl	$7, %esi
	movzbl	-15(%ebp), %edx
	setg	%bl
	andb	%bl, %dl
	cmpb	$1, %dl
	sbbl	$-1, %edi
	testl	%ecx, %ecx
	setg	-14(%ebp)
	cmpl	$15, %esi
	movzbl	-14(%ebp), %edx
	setg	%al
	andb	%al, %dl
	cmpb	$1, %dl
	sbbl	$-1, %edi
	cmpl	$5, %ecx
	setle	-13(%ebp)
	andb	-13(%ebp), %bl
	cmpb	$1, %bl
	sbbl	$-1, %edi
	cmpl	$6, %ecx
	setle	%bl
	andb	%bl, %al
	cmpb	$1, %al
	movl	8(%ebp), %eax
	sbbl	$-1, %edi
	cmpl	$55, %esi
	setle	%cl
	andb	%cl, -15(%ebp)
	cmpb	$1, -15(%ebp)
	sbbl	$-1, %edi
	cmpl	$47, %esi
	setle	%dl
	andb	%dl, -14(%ebp)
	cmpb	$1, -14(%ebp)
	sbbl	$-1, %edi
	andb	%cl, -13(%ebp)
	cmpb	$1, -13(%ebp)
	sbbl	$-1, %edi
	andb	%dl, %bl
	cmpb	$1, %bl
	sbbl	$-1, %edi
	incl	%esi
	movl	%esi, -20(%ebp)
	cmpb	$5, (%eax,%esi)
	jne	.L52
.L64:
	movl	%esi, %eax
	cltd
	shrl	$29, %edx
	leal	(%esi,%edx), %ecx
	andl	$7, %ecx
	subl	%edx, %ecx
	cmpl	$1, %ecx
	setg	-15(%ebp)
	cmpl	$7, %esi
	movzbl	-15(%ebp), %edx
	setg	%bl
	andb	%bl, %dl
	cmpb	$1, %dl
	sbbl	$-1, %edi
	testl	%ecx, %ecx
	setg	-14(%ebp)
	cmpl	$15, %esi
	movzbl	-14(%ebp), %edx
	setg	%al
	andb	%al, %dl
	cmpb	$1, %dl
	sbbl	$-1, %edi
	cmpl	$5, %ecx
	setle	-13(%ebp)
	andb	-13(%ebp), %bl
	cmpb	$1, %bl
	sbbl	$-1, %edi
	cmpl	$6, %ecx
	setle	%bl
	andb	%bl, %al
	cmpb	$1, %al
	movl	8(%ebp), %eax
	sbbl	$-1, %edi
	cmpl	$55, %esi
	setle	%cl
	andb	%cl, -15(%ebp)
	cmpb	$1, -15(%ebp)
	sbbl	$-1, %edi
	cmpl	$47, %esi
	setle	%dl
	andb	%dl, -14(%ebp)
	cmpb	$1, -14(%ebp)
	sbbl	$-1, %edi
	andb	%cl, -13(%ebp)
	cmpb	$1, -13(%ebp)
	sbbl	$-1, %edi
	andb	%dl, %bl
	cmpb	$1, %bl
	sbbl	$-1, %edi
	incl	%esi
	cmpb	$5, (%eax,%esi)
	jne	.L54
.L60:
	movl	%esi, %eax
	cltd
	shrl	$29, %edx
	leal	(%esi,%edx), %ecx
	andl	$7, %ecx
	subl	%edx, %ecx
	cmpl	$1, %ecx
	setg	-15(%ebp)
	cmpl	$7, %esi
	movzbl	-15(%ebp), %edx
	setg	%bl
	andb	%bl, %dl
	cmpb	$1, %dl
	sbbl	$-1, %edi
	testl	%ecx, %ecx
	setg	-14(%ebp)
	cmpl	$15, %esi
	movzbl	-14(%ebp), %edx
	setg	%al
	andb	%al, %dl
	cmpb	$1, %dl
	sbbl	$-1, %edi
	cmpl	$5, %ecx
	setle	-13(%ebp)
	andb	-13(%ebp), %bl
	cmpb	$1, %bl
	sbbl	$-1, %edi
	cmpl	$6, %ecx
	setle	%bl
	andb	%bl, %al
	cmpb	$1, %al
	sbbl	$-1, %edi
	cmpl	$55, %esi
	setle	%cl
	andb	%cl, -15(%ebp)
	cmpb	$1, -15(%ebp)
	sbbl	$-1, %edi
	cmpl	$47, %esi
	movl	-20(%ebp), %esi
	setle	%dl
	andb	%dl, -14(%ebp)
	cmpb	$1, -14(%ebp)
	sbbl	$-1, %edi
	andb	%cl, -13(%ebp)
	cmpb	$1, -13(%ebp)
	sbbl	$-1, %edi
	andb	%dl, %bl
	cmpb	$1, %bl
	sbbl	$-1, %edi
	addl	$2, %esi
	cmpl	$9, %esi
	jne	.L31
.L65:
	addl	$8, %esp
	movl	%edi, %eax
	popl	%ebx
	popl	%esi
	popl	%edi
	popl	%ebp
	ret
.LFE3:
	.size	_Z29t_auto_unrolled_knight_count9Ph, .-_Z29t_auto_unrolled_knight_count9Ph
	.ident	"GCC: (GNU) 4.0.0"
	.section	.note.GNU-stack,"",@progbits

I hope you will confirm the problem (so it can be solved). It would really
improve gcc.

Regards Thorbjørn



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/21827] unroll misses simple elimination - works with manual unroll
  2005-05-30 18:45 [Bug rtl-optimization/21827] New: unroll misses simple elimination - works with manual unroll tlm at daimi dot au dot dk
                   ` (4 preceding siblings ...)
  2005-07-19 17:34 ` tlm at daimi dot au dot dk
@ 2005-07-21 18:08 ` pinskia at gcc dot gnu dot org
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-07-21 18:08 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-07-21 18:07 -------
Confirmed.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|                            |1
   Last reconfirmed|0000-00-00 00:00:00         |2005-07-21 18:07:14
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-07-21 18:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-05-30 18:45 [Bug rtl-optimization/21827] New: unroll misses simple elimination - works with manual unroll tlm at daimi dot au dot dk
2005-05-30 18:55 ` [Bug rtl-optimization/21827] " pinskia at gcc dot gnu dot org
2005-05-30 19:06 ` pinskia at gcc dot gnu dot org
2005-05-31  7:38 ` tlm at daimi dot au dot dk
2005-05-31 20:49 ` tlm at daimi dot au dot dk
2005-07-19 17:34 ` tlm at daimi dot au dot dk
2005-07-21 18:08 ` pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).