public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed.
@ 2004-11-11  2:35 schlie at comcast dot net
  2004-11-11  2:48 ` [Bug middle-end/18424] " pinskia at gcc dot gnu dot org
                   ` (31 more replies)
  0 siblings, 32 replies; 33+ messages in thread
From: schlie at comcast dot net @ 2004-11-11  2:35 UTC (permalink / raw)
  To: gcc-bugs

3.4.3 generates code which may be ~6x+ slower and larger than 3.3.1 @ -0s
as it apparently no longer evaluates constant expression trees in anything
other than simple expressions for some reason, which may result in serious
performance and code size regressions, which should really be fixed if possible.

// 000000c6 <foo>:
int foo ( int a ){
	if (a & (1L << 23))
//  c6:	aa 27       	eor	r26, r26
//  c8:	97 fd       	sbrc	r25, 7
//  ca:	a0 95       	com	r26
//  cc:	ba 2f       	mov	r27, r26
//  ce:	27 e1       	ldi	r18, 0x17	; 23
//  d0:	b6 95       	lsr	r27
//  d2:	a7 95       	ror	r26
//  d4:	97 95       	ror	r25
//  d6:	87 95       	ror	r24
//  d8:	2a 95       	dec	r18
//  da:	d1 f7       	brne	.-12     	; 0xd0
//  dc:	81 70       	andi	r24, 0x01	; 1
//  de:	90 70       	andi	r25, 0x00	; 0
//  e0:	89 2b       	or	r24, r25
//  e2:	19 f0       	breq	.+6      	; 0xea
		return 1; 
//  e4:	81 e0       	ldi	r24, 0x01	; 1
//  e6:	90 e0       	ldi	r25, 0x00	; 0
//  e8:	08 95       	ret
	  else
		return 2 ;
//  ea:	82 e0       	ldi	r24, 0x02	; 2
//  ec:	90 e0       	ldi	r25, 0x00	; 0
}
//  ee:	08 95       	ret
//  f0:	08 95       	ret
// where the second return is odd as well?

vs GCC 3.3.1 @ -0s

// 000000c6 <foo>:
int foo2 ( int a ){
	if (a & (1L << 23))
		return 1; 
	  else
		return 2 ;
}
//  c6:	82 e0       	ldi	r24, 0x02	; 2
//  c8:	90 e0       	ldi	r25, 0x00	; 0
//  ca:	08 95       	ret

(where for referance int targeted to avr is 16-bits wide,
 but the above problem is not likely target sensitive.)

-- 
           Summary: 3.4.3 ~6x+ performance regression vs 3.3.1, constant
                    trees not being computed.
           Product: gcc
           Version: 3.4.3
            Status: UNCONFIRMED
          Severity: critical
          Priority: P1
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: schlie at comcast dot net
                CC: dmixm at marine dot febras dot ru,ericw at evcohs dot
                    com,gcc-bugs at gcc dot gnu dot org
 GCC build triplet: any
  GCC host triplet: any
GCC target triplet: avr-unknown-none


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
@ 2004-11-11  2:48 ` pinskia at gcc dot gnu dot org
  2004-11-11  2:49 ` pinskia at gcc dot gnu dot org
                   ` (30 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-11  2:48 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-11 02:48 -------
Here is an example for PPC:
int foo2 ( short a ){
        if (a & (1 << 23))
                return 1;
          else
                return 2 ;
}

but it is not a regression on PPC with the above example

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|critical                    |minor
          Component|c                           |middle-end
           Keywords|                            |missed-optimization


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
  2004-11-11  2:48 ` [Bug middle-end/18424] " pinskia at gcc dot gnu dot org
@ 2004-11-11  2:49 ` pinskia at gcc dot gnu dot org
  2004-11-11  2:52 ` pinskia at gcc dot gnu dot org
                   ` (29 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-11  2:49 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-11 02:49 -------
I amost think the size of long changed for 3.4.0 for avr to 32bits.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
  2004-11-11  2:48 ` [Bug middle-end/18424] " pinskia at gcc dot gnu dot org
  2004-11-11  2:49 ` pinskia at gcc dot gnu dot org
@ 2004-11-11  2:52 ` pinskia at gcc dot gnu dot org
  2004-11-11  3:15 ` schlie at comcast dot net
                   ` (28 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-11  2:52 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-11 02:52 -------
or the default for -mint8 changed.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (2 preceding siblings ...)
  2004-11-11  2:52 ` pinskia at gcc dot gnu dot org
@ 2004-11-11  3:15 ` schlie at comcast dot net
  2004-11-11  3:18 ` pinskia at gcc dot gnu dot org
                   ` (27 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: schlie at comcast dot net @ 2004-11-11  3:15 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From schlie at comcast dot net  2004-11-11 03:15 -------
Subject: Re:  3.4.3 ~6x+ performance regression vs
 3.3.1, constant trees not being computed.

Yes but regardless of the size of the constant 1:

if (a & (1L << 24)) or (a & (1 << 24)) yields the same results,
and further, if a simpler expression containing the same constant
Sub-expression is used, it properly computes the constant expression:

int foo (int a) {

  a = a + (1 << 24);       // apparently simple enough to compute (1 << 24)

  if (a & (1 << 24)) {    // then utilized here, which yields 0
      return 1;
    else
      return 2;
}

=> return 2;

Which implies that it is not a back-end issue, but that (1 << 24)
is not being calculated as a constant expression if nested within
a more complex expression for some reason, I believe?

although the PPC may have optimized it away in other ways as it has
a more powerful shift instruction, but should not be a back-end thing,
as the embedded constant expression was properly computed and applied
in 3.3.x.

> From: pinskia at gcc dot gnu dot org <gcc-bugzilla@gcc.gnu.org>
> Reply-To: <gcc-bugzilla@gcc.gnu.org>
> Date: 11 Nov 2004 02:49:28 -0000
> To: <schlie@comcast.net>
> Subject: [Bug middle-end/18424] 3.4.3 ~6x+ performance regression vs 3.3.1,
> constant trees not being computed.
> 
> 
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-11
> 02:49 -------
> I amost think the size of long changed for 3.4.0 for avr to 32bits.
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424
> 
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (3 preceding siblings ...)
  2004-11-11  3:15 ` schlie at comcast dot net
@ 2004-11-11  3:18 ` pinskia at gcc dot gnu dot org
  2004-11-11  3:56 ` schlie at comcast dot net
                   ` (26 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-11  3:18 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-11 03:17 -------
When I did 1 << 24 I got a warning (at least on the mainline on a cross to avr) about 24 being greater 
than the size of int so it was going to be 0.  Again in real terms there is something on here but I really 
doubt it is a regression and you did not use -mint8 for the 3.3 build and not for the 3.4 build.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (4 preceding siblings ...)
  2004-11-11  3:18 ` pinskia at gcc dot gnu dot org
@ 2004-11-11  3:56 ` schlie at comcast dot net
  2004-11-11  4:33 ` schlie at comcast dot net
                   ` (25 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: schlie at comcast dot net @ 2004-11-11  3:56 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From schlie at comcast dot net  2004-11-11 03:55 -------
Subject: Re:  3.4.3 ~6x+ performance regression vs
 3.3.1, constant trees not being computed.

> pinskia at gcc dot gnu dot org <gcc-bugzilla@gcc.gnu.org> wrote:
> When I did 1 << 24 I got a warning (at least on the mainline on a cross to
> avr) about 24 being greater
> than the size of int so it was going to be 0.  Again in real terms there is
> something on here but I really
> doubt it is a regression and you did not use -mint8 for the 3.3 build and not
> for the 3.4 build.

Yes, you're correct, (1 << 24) generates warnings: (not forcing -mmint8)

main.c: In function `foo1':
main.c:3: warning: left shift count >= width of type
main.c: In function `foo2':
main.c:12: warning: left shift count >= width of type
main.c: At top level:
main.c:21: warning: return type of 'main' is not `int'

and produces the anticipated correct code, sorry for the confusion.

---

However as reported with (1L << 24), it does not, nor should it yield
different results, but it appears to product the expected code only if
the same constant expression occurs in a simpler expression in the same
basic block: ?

File: main.lss:

main.elf:     file format elf32-avr

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .data         00000000  00800100  0000011e  000001b2  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  1 .text         0000011e  00000000  00000000  00000094  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .bss          00000000  00800100  0000011e  000001b2  2**0
                  ALLOC
  3 .noinit       00000000  00800100  00800100  000001b2  2**0
                  CONTENTS
  4 .eeprom       00000000  00810000  00810000  000001b2  2**0
                  CONTENTS
  5 .stab         000004c8  00000000  00000000  000001b4  2**2
                  CONTENTS, READONLY, DEBUGGING
  6 .stabstr      0000044e  00000000  00000000  0000067c  2**0
                  CONTENTS, READONLY, DEBUGGING
Disassembly of section .text:

00000000 <__vectors>:
   0:    0c 94 46 00     jmp    0x8c
   4:    0c 94 61 00     jmp    0xc2
   8:    0c 94 61 00     jmp    0xc2
   c:    0c 94 61 00     jmp    0xc2
  10:    0c 94 61 00     jmp    0xc2
  14:    0c 94 61 00     jmp    0xc2
  18:    0c 94 61 00     jmp    0xc2
  1c:    0c 94 61 00     jmp    0xc2
  20:    0c 94 61 00     jmp    0xc2
  24:    0c 94 61 00     jmp    0xc2
  28:    0c 94 61 00     jmp    0xc2
  2c:    0c 94 61 00     jmp    0xc2
  30:    0c 94 61 00     jmp    0xc2
  34:    0c 94 61 00     jmp    0xc2
  38:    0c 94 61 00     jmp    0xc2
  3c:    0c 94 61 00     jmp    0xc2
  40:    0c 94 61 00     jmp    0xc2
  44:    0c 94 61 00     jmp    0xc2
  48:    0c 94 61 00     jmp    0xc2
  4c:    0c 94 61 00     jmp    0xc2
  50:    0c 94 61 00     jmp    0xc2
  54:    0c 94 61 00     jmp    0xc2
  58:    0c 94 61 00     jmp    0xc2
  5c:    0c 94 61 00     jmp    0xc2
  60:    0c 94 61 00     jmp    0xc2
  64:    0c 94 61 00     jmp    0xc2
  68:    0c 94 61 00     jmp    0xc2
  6c:    0c 94 61 00     jmp    0xc2
  70:    0c 94 61 00     jmp    0xc2
  74:    0c 94 61 00     jmp    0xc2
  78:    0c 94 61 00     jmp    0xc2
  7c:    0c 94 61 00     jmp    0xc2
  80:    0c 94 61 00     jmp    0xc2
  84:    0c 94 61 00     jmp    0xc2
  88:    0c 94 61 00     jmp    0xc2

0000008c <__ctors_end>:
  8c:    11 24           eor    r1, r1
  8e:    1f be           out    0x3f, r1    ; 63
  90:    cf ef           ldi    r28, 0xFF    ; 255
  92:    d0 e1           ldi    r29, 0x10    ; 16
  94:    de bf           out    0x3e, r29    ; 62
  96:    cd bf           out    0x3d, r28    ; 61

00000098 <__do_copy_data>:
  98:    11 e0           ldi    r17, 0x01    ; 1
  9a:    a0 e0           ldi    r26, 0x00    ; 0
  9c:    b1 e0           ldi    r27, 0x01    ; 1
  9e:    ee e1           ldi    r30, 0x1E    ; 30
  a0:    f1 e0           ldi    r31, 0x01    ; 1
  a2:    02 c0           rjmp    .+4          ; 0xa8

000000a4 <.do_copy_data_loop>:
  a4:    05 90           lpm    r0, Z+
  a6:    0d 92           st    X+, r0

000000a8 <.do_copy_data_start>:
  a8:    a0 30           cpi    r26, 0x00    ; 0
  aa:    b1 07           cpc    r27, r17
  ac:    d9 f7           brne    .-10         ; 0xa4

000000ae <__do_clear_bss>:
  ae:    11 e0           ldi    r17, 0x01    ; 1
  b0:    a0 e0           ldi    r26, 0x00    ; 0
  b2:    b1 e0           ldi    r27, 0x01    ; 1
  b4:    01 c0           rjmp    .+2          ; 0xb8

000000b6 <.do_clear_bss_loop>:
  b6:    1d 92           st    X+, r1

000000b8 <.do_clear_bss_start>:
  b8:    a0 30           cpi    r26, 0x00    ; 0
  ba:    b1 07           cpc    r27, r17
  bc:    e1 f7           brne    .-8          ; 0xb6
  be:    0c 94 7c 00     jmp    0xf8

000000c2 <__bad_interrupt>:
  c2:    0c 94 00 00     jmp    0x0

000000c6 <foo1>:
int foo1 ( int a ){



    if (a & (1L << 23))

  c6:    aa 27           eor    r26, r26
  c8:    97 fd           sbrc    r25, 7
  ca:    a0 95           com    r26
  cc:    ba 2f           mov    r27, r26
  ce:    27 e1           ldi    r18, 0x17    ; 23
  d0:    b6 95           lsr    r27
  d2:    a7 95           ror    r26
  d4:    97 95           ror    r25
  d6:    87 95           ror    r24
  d8:    2a 95           dec    r18
  da:    d1 f7           brne    .-12         ; 0xd0
  dc:    81 70           andi    r24, 0x01    ; 1
  de:    90 70           andi    r25, 0x00    ; 0
  e0:    89 2b           or    r24, r25
  e2:    19 f0           breq    .+6          ; 0xea
        return 1; 

  e4:    81 e0           ldi    r24, 0x01    ; 1
  e6:    90 e0           ldi    r25, 0x00    ; 0
  e8:    08 95           ret
      else

        return 2 ;

  ea:    82 e0           ldi    r24, 0x02    ; 2
  ec:    90 e0           ldi    r25, 0x00    ; 0
        

}

  ee:    08 95           ret
  f0:    08 95           ret

000000f2 <foo2>:


int foo2 ( int a ){



    a = (a & (1L << 23));



    if (a & (1L << 23))

        return 1; 

      else

        return 2 ;

        

}

  f2:    82 e0           ldi    r24, 0x02    ; 2
  f4:    90 e0           ldi    r25, 0x00    ; 0
  f6:    08 95           ret

000000f8 <main>:


void main( void ){

  f8:    cd ef           ldi    r28, 0xFD    ; 253
  fa:    d0 e1           ldi    r29, 0x10    ; 16
  fc:    de bf           out    0x3e, r29    ; 62
  fe:    cd bf           out    0x3d, r28    ; 61
    

    volatile int a;

    

    a = foo1 ( a );

 100:    89 81           ldd    r24, Y+1    ; 0x01
 102:    9a 81           ldd    r25, Y+2    ; 0x02
 104:    0e 94 63 00     call    0xc6
 108:    89 83           std    Y+1, r24    ; 0x01
 10a:    9a 83           std    Y+2, r25    ; 0x02
    a = foo2 ( a );

 10c:    89 81           ldd    r24, Y+1    ; 0x01
 10e:    9a 81           ldd    r25, Y+2    ; 0x02
 110:    0e 94 79 00     call    0xf2
 114:    89 83           std    Y+1, r24    ; 0x01
 116:    9a 83           std    Y+2, r25    ; 0x02
 118:    0c 94 8e 00     jmp    0x11c

0000011c <_exit>:
 11c:    ff cf           rjmp    .-2          ; 0x11c




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (5 preceding siblings ...)
  2004-11-11  3:56 ` schlie at comcast dot net
@ 2004-11-11  4:33 ` schlie at comcast dot net
  2004-11-11  4:41 ` pinskia at gcc dot gnu dot org
                   ` (24 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: schlie at comcast dot net @ 2004-11-11  4:33 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From schlie at comcast dot net  2004-11-11 04:33 -------
Subject: Re:  3.4.3 ~6x+ performance regression vs.
 3.3.1, constant trees not being computed.

As implied earlier, this problem may be related to (in my words) an overly
complicated and error prone processes of having to subsequently demote
operations and operands after their often needless initial promotion prior
to having determined their need.

Which longer term should really be re-considered, as it should be understood
that is not required to literally follow C's documented evaluation semantics
to yield logically equivalent optimal results. As operands need only be
promoted if their operation require it as determined by it's targets need,
and expression trees could be simplified if integer nodes were tagged with
their sign in addition to their size, rather then requiring a cast node be
added to the tree, thereby indirectly likely requiring less tree memory,
less collections, and simpler rtl target instruction descriptions.

The reason that PPC may not see the problem is that the ppc is both an int
wide machine, and has a multi-bit shift instruction, which may be optimized
away though a different mechanism than is failing in this circumstance.

But regardless, the problem exposed in this circumstance is a regression
from whatever middle-end mechanism enabled the proper evaluation of the
constant expression 3.3.1 which enabled the static analysis of the logical
result of the expression at hand, therefore shouldn't be considered a target
specific problem. (something broke in 3.3 -> 3.4 which isn't insignificant
for less than int wide targets without instruction sets similar to ppc.)

Thanks again for time and hopeful consideration, -paul-

> From: Paul Schlie <schlie@comcast.net>
> Date: Wed, 10 Nov 2004 22:55:42 -0500
> To: <gcc-bugzilla@gcc.gnu.org>
> Subject: Re: [Bug middle-end/18424] 3.4.3 ~6x+ performance regression vs
> 3.3.1, constant trees not being computed.
> 
>> pinskia at gcc dot gnu dot org <gcc-bugzilla@gcc.gnu.org> wrote:
>> When I did 1 << 24 I got a warning (at least on the mainline on a cross to
>> avr) about 24 being greater
>> than the size of int so it was going to be 0.  Again in real terms there is
>> something on here but I really
>> doubt it is a regression and you did not use -mint8 for the 3.3 build and not
>> for the 3.4 build.
> 
> Yes, you're correct, (1 << 24) generates warnings: (not forcing -mmint8)
> 
> main.c: In function `foo1':
> main.c:3: warning: left shift count >= width of type
> main.c: In function `foo2':
> main.c:12: warning: left shift count >= width of type
> main.c: At top level:
> main.c:21: warning: return type of 'main' is not `int'
> 
> and produces the anticipated correct code, sorry for the confusion.
> 
> ---
> 
> However as reported with (1L << 24), it does not, nor should it yield
> different results, but it appears to product the expected code only if
> the same constant expression occurs in a simpler expression in the same basic
> block: ?




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (6 preceding siblings ...)
  2004-11-11  4:33 ` schlie at comcast dot net
@ 2004-11-11  4:41 ` pinskia at gcc dot gnu dot org
  2004-11-11  4:59 ` schlie at comcast dot net
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-11  4:41 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-11 04:41 -------
Actually what you said is not true for this testcase as you have int & long and not int & int.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (7 preceding siblings ...)
  2004-11-11  4:41 ` pinskia at gcc dot gnu dot org
@ 2004-11-11  4:59 ` schlie at comcast dot net
  2004-11-11  5:04 ` pinskia at gcc dot gnu dot org
                   ` (22 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: schlie at comcast dot net @ 2004-11-11  4:59 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From schlie at comcast dot net  2004-11-11 04:59 -------
Subject: Re:  3.4.3 ~6x+ performance regression vs
 3.3.1, constant trees not being computed.

> From: pinskia at gcc dot gnu dot org <gcc-bugzilla@gcc.gnu.org>
> 
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-11
> 04:41 -------
> Actually what you said is not true for this testcase as you have int & long
> and not int & int.

Sorry, I don't understand, it's fairly apparent to me, and apparently 3.3,
and 3.4 (once it's actually does compute (1L << 23) in an earlier
sub-expression), that:

 <16-bits-wide-variable> = (<16-bit-wide variable> & 0x01000000) = 0

???

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424
> 
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (8 preceding siblings ...)
  2004-11-11  4:59 ` schlie at comcast dot net
@ 2004-11-11  5:04 ` pinskia at gcc dot gnu dot org
  2004-11-11 15:52 ` schlie at comcast dot net
                   ` (21 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-11  5:04 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-11 05:04 -------
int_val & long_val == (long)(int_val) & long_val by what I had quoted in the other bug which you were 
talking about this.

Also, that simplification comes from combine and knowning that ((int_val & long_val) >> (log2 
long_val)) & 1 (where long_val > MAX_INT) is false (nothing else).  The problem on PPC and this one is 
the same.  Anyways as I had said before this is not a regression. 

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|minor                       |enhancement
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|                            |1
   Last reconfirmed|0000-00-00 00:00:00         |2004-11-11 05:04:34
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (9 preceding siblings ...)
  2004-11-11  5:04 ` pinskia at gcc dot gnu dot org
@ 2004-11-11 15:52 ` schlie at comcast dot net
  2004-11-11 16:22 ` joseph at codesourcery dot com
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: schlie at comcast dot net @ 2004-11-11 15:52 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From schlie at comcast dot net  2004-11-11 15:51 -------
Subject: Re:  3.4.3 ~6x+ performance regression vs.
 3.3.1, constant trees not being computed.

> From: pinskia at gcc dot gnu dot org <gcc-bugzilla@gcc.gnu.org>
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-11
> 05:04 -------
> int_val & long_val == (long)(int_val) & long_val by what I had quoted in the
> other bug which you were
> talking about this.
> 
> Also, that simplification comes from combine and knowning that ((int_val &
> long_val) >> (log2
> long_val)) & 1 (where long_val > MAX_INT) is false (nothing else).  The
> problem on PPC and this one is
> the same.  Anyways as I had said before this is not a regression.

For what it's worth, comparable code for ppc (as << 23 is within int range):

int foo ( int a ) {

  if (a & (1L << 55))
      return 1;
    else
      return 2;

}

Which I suspect will show a difference, although due to ppc's multi-bit
shift capabilities, it's literal representation may be detectable during
instruction-generation/peep-hole-optimization, but not on machines where
multi-bit shifts are not as compactly represented, therefore rely on earlier
recognition and optimization as appears to be present in 3.3.4.

Regardless of ppc relative code quality, 3.4.3 is generating substantially
slower & larger code than 3.3.4 for avr with no corresponding target machine
description changes, therefore would think that 3.4.3 is clearly suffering
from an avr target code performance/quality regression, yes?

As another less dramatic example:

int foo ( long a ) {

  if (a & (1L << 23)) // where again (1L << 55) for ppc
      return 1;
    else
      return 2;

}

Also demonstrates 3.4.3's generation of inferior avr code, relative to
3.4.3, with no corresponding changes to it's target description; which
seems to be due to the constant expression (1L << 23) not being computed if
nested within a more complex expression, which 3.3.4 was capable of
detecting, but 3.4.3 isn't.






-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (10 preceding siblings ...)
  2004-11-11 15:52 ` schlie at comcast dot net
@ 2004-11-11 16:22 ` joseph at codesourcery dot com
  2004-11-11 16:30 ` ericw at evcohs dot com
                   ` (19 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: joseph at codesourcery dot com @ 2004-11-11 16:22 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From joseph at codesourcery dot com  2004-11-11 16:22 -------
Subject: Re:  3.4.3 ~6x+ performance regression vs
 3.3.1, constant trees not being computed.

Have you actually tried compiling code identical to that you test but with 
8388608L in place of (1L << 23) before making claims about what is done 
with constant expressions?

Your example may suggest a regression, provided no type sizes changed for 
your target between the versions compared, but you really shouldn't report 
conjectures about the cause of a bug without clear evidence to 
substantiate them, which in this case would involve substituting the value 
of the constant expression in the testcase.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (11 preceding siblings ...)
  2004-11-11 16:22 ` joseph at codesourcery dot com
@ 2004-11-11 16:30 ` ericw at evcohs dot com
  2004-11-11 17:19 ` schlie at comcast dot net
                   ` (18 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: ericw at evcohs dot com @ 2004-11-11 16:30 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From ericw at evcohs dot com  2004-11-11 16:29 -------
Subject: Re:  3.4.3 ~6x+ performance regression vs 3.3.1,
 constant trees not being computed.

pinskia at gcc dot gnu dot org wrote:

>------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-11 02:52 -------
>or the default for -mint8 changed.
>
>  
>
The size of long has always stayed at 32 bits.

The default sizes for -mint8 *might* have changed. I know that Svein 
Seldal corrected a problem with the size of long with -mint8, but I 
thought that was on HEAD, i.e. for 4.0.0, not necessarily on 3.4.3. But 
I could be wrong. For reference:
Normal:
char = 8 bits
int = 16 bits
long = 32 bits

With -mint8 (on 4.0.0 IIRC)
char = 8 bits
int = 8 bits
long = 16 bits

Eric


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (12 preceding siblings ...)
  2004-11-11 16:30 ` ericw at evcohs dot com
@ 2004-11-11 17:19 ` schlie at comcast dot net
  2004-11-11 20:29 ` schlie at comcast dot net
                   ` (17 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: schlie at comcast dot net @ 2004-11-11 17:19 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From schlie at comcast dot net  2004-11-11 17:19 -------
Subject: Re:  3.4.3 ~6x+ performance regression vs
 3.3.1, constant trees not being computed.

> From: joseph at codesourcery dot com <gcc-bugzilla@gcc.gnu.org>
> ------- Additional Comments From joseph at codesourcery dot com  2004-11-11
> 16:22 -------
> Subject: Re:  3.4.3 ~6x+ performance regression vs
>  3.3.1, constant trees not being computed.
> 
> Have you actually tried compiling code identical to that you test but with
> 8388608L in place of (1L << 23) before making claims about what is done
> with constant expressions?
> 
> Your example may suggest a regression, provided no type sizes changed for
> your target between the versions compared, but you really shouldn't report
> conjectures about the cause of a bug without clear evidence to
> substantiate them, which in this case would involve substituting the value
> of the constant expression in the testcase.

Good point, will do with both 0x800000L and (1 << 23).




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (13 preceding siblings ...)
  2004-11-11 17:19 ` schlie at comcast dot net
@ 2004-11-11 20:29 ` schlie at comcast dot net
  2004-11-16 23:58 ` dmixm at marine dot febras dot ru
                   ` (16 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: schlie at comcast dot net @ 2004-11-11 20:29 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From schlie at comcast dot net  2004-11-11 20:28 -------
Subject: Re:  3.4.3 ~6x+ performance regression vs
 3.3.1, constant trees not being computed.

> From: joseph at codesourcery dot com <gcc-bugzilla@gcc.gnu.org>
> ------- Additional Comments From joseph at codesourcery dot com  2004-11-11
> 16:22 -------
> Subject: Re:  3.4.3 ~6x+ performance regression vs
>  3.3.1, constant trees not being computed.
> 
> Have you actually tried compiling code identical to that you test but with
> 8388608L in place of (1L << 23) before making claims about what is done
> with constant expressions?
> 
> Your example may suggest a regression, provided no type sizes changed for
> your target between the versions compared, but you really shouldn't report
> conjectures about the cause of a bug without clear evidence to
> substantiate them, which in this case would involve substituting the value
> of the constant expression in the testcase.

You were correct, the problem wasn't that 3.4.3 wasn't computing the
constant expression values, it was that it was oddly transforming constant
values into runtime computed expressions, such that 3.4.3 converted:

(a & 0x800000L) => ((long)a >> 23) & 1), which doesn't quite seem sensible.

The following are the results for both 3.4.3 and 3.3.1; where 3.4.3 shows a
>100x performance regression, and a ~4x size regression relative to 3.3.1:

----

The source:

/*Compiling: main.c using (for the sake of argument)

avr-gcc -c -mmcu=atmega64 -I. -g   -Os -funsigned-char -funsigned-bitfields
-fpack-struct
-fshort-enums -Wall -Wstrict-prototypes -Wa,-adhlns=main.lst
-I/usr/local/avr/include
-std=gnu99 -funsafe-math-optimizations
-Wp,-M,-MP,-MT,main.o,-MF,.dep/main.o.d main.c
-o main.o 

Linking: main.elf (again for the sake of argumnet)
avr-gcc -mmcu=atmega64 -I. -g   -Os -funsigned-char -funsigned-bitfields
-fpack-struct
-fshort-enums -Wall -Wstrict-prototypes -Wa,-adhlns=main.o
-I/usr/local/avr/include
-std=gnu99 -funsafe-math-optimizations
-Wp,-M,-MP,-MT,main.o,-MF,.dep/main.elf.d main.o
--output main.elf -Wl,-Map=main.map,--cref    -lm

File: main.c

*/

int foo0 ( int a ){

    if (a & 0x800000L)
        return 1; 
      else
        return 2 ;
    
}

int foo1 ( int a ){

    if (a & (1L << 23))
        return 1; 
      else
        return 2 ;
    
}

int foo2 ( long a ){

    if (a & 0x800000L)
        return 1; 
      else
        return 2 ;
    
}

int foo3 ( long a ){

    if (a & (1L << 23))
        return 1; 
      else
        return 2 ;
    
}

int main( void ){
    
    volatile int a;
    
    a = foo0 ( a );
    a = foo1 ( a );
    a = foo2 ( a );
    a = foo3 ( a );
    
    return 0;
}

----

Listing for 3.4.3

main.elf:     file format elf32-avr

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .data         00000000  00800100  000001c8  0000025c  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  1 .text         000001c8  00000000  00000000  00000094  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .bss          00000000  00800100  000001c8  0000025c  2**0
                  ALLOC
  3 .noinit       00000000  00800100  00800100  0000025c  2**0
                  CONTENTS
  4 .eeprom       00000000  00810000  00810000  0000025c  2**0
                  CONTENTS
  5 .stab         000005d0  00000000  00000000  0000025c  2**2
                  CONTENTS, READONLY, DEBUGGING
  6 .stabstr      0000046e  00000000  00000000  0000082c  2**0
                  CONTENTS, READONLY, DEBUGGING
Disassembly of section .text:

00000000 <__vectors>:
   0:    0c 94 46 00     jmp    0x8c
   4:    0c 94 61 00     jmp    0xc2
   8:    0c 94 61 00     jmp    0xc2
   c:    0c 94 61 00     jmp    0xc2
  10:    0c 94 61 00     jmp    0xc2
  14:    0c 94 61 00     jmp    0xc2
  18:    0c 94 61 00     jmp    0xc2
  1c:    0c 94 61 00     jmp    0xc2
  20:    0c 94 61 00     jmp    0xc2
  24:    0c 94 61 00     jmp    0xc2
  28:    0c 94 61 00     jmp    0xc2
  2c:    0c 94 61 00     jmp    0xc2
  30:    0c 94 61 00     jmp    0xc2
  34:    0c 94 61 00     jmp    0xc2
  38:    0c 94 61 00     jmp    0xc2
  3c:    0c 94 61 00     jmp    0xc2
  40:    0c 94 61 00     jmp    0xc2
  44:    0c 94 61 00     jmp    0xc2
  48:    0c 94 61 00     jmp    0xc2
  4c:    0c 94 61 00     jmp    0xc2
  50:    0c 94 61 00     jmp    0xc2
  54:    0c 94 61 00     jmp    0xc2
  58:    0c 94 61 00     jmp    0xc2
  5c:    0c 94 61 00     jmp    0xc2
  60:    0c 94 61 00     jmp    0xc2
  64:    0c 94 61 00     jmp    0xc2
  68:    0c 94 61 00     jmp    0xc2
  6c:    0c 94 61 00     jmp    0xc2
  70:    0c 94 61 00     jmp    0xc2
  74:    0c 94 61 00     jmp    0xc2
  78:    0c 94 61 00     jmp    0xc2
  7c:    0c 94 61 00     jmp    0xc2
  80:    0c 94 61 00     jmp    0xc2
  84:    0c 94 61 00     jmp    0xc2
  88:    0c 94 61 00     jmp    0xc2

0000008c <__ctors_end>:
  8c:    11 24           eor    r1, r1
  8e:    1f be           out    0x3f, r1    ; 63
  90:    cf ef           ldi    r28, 0xFF    ; 255
  92:    d0 e1           ldi    r29, 0x10    ; 16
  94:    de bf           out    0x3e, r29    ; 62
  96:    cd bf           out    0x3d, r28    ; 61

00000098 <__do_copy_data>:
  98:    11 e0           ldi    r17, 0x01    ; 1
  9a:    a0 e0           ldi    r26, 0x00    ; 0
  9c:    b1 e0           ldi    r27, 0x01    ; 1
  9e:    e8 ec           ldi    r30, 0xC8    ; 200
  a0:    f1 e0           ldi    r31, 0x01    ; 1
  a2:    02 c0           rjmp    .+4          ; 0xa8

000000a4 <.do_copy_data_loop>:
  a4:    05 90           lpm    r0, Z+
  a6:    0d 92           st    X+, r0

000000a8 <.do_copy_data_start>:
  a8:    a0 30           cpi    r26, 0x00    ; 0
  aa:    b1 07           cpc    r27, r17
  ac:    d9 f7           brne    .-10         ; 0xa4

000000ae <__do_clear_bss>:
  ae:    11 e0           ldi    r17, 0x01    ; 1
  b0:    a0 e0           ldi    r26, 0x00    ; 0
  b2:    b1 e0           ldi    r27, 0x01    ; 1
  b4:    01 c0           rjmp    .+2          ; 0xb8

000000b6 <.do_clear_bss_loop>:
  b6:    1d 92           st    X+, r1

000000b8 <.do_clear_bss_start>:
  b8:    a0 30           cpi    r26, 0x00    ; 0
  ba:    b1 07           cpc    r27, r17
  bc:    e1 f7           brne    .-8          ; 0xb6
  be:    0c 94 b7 00     jmp    0x16e

000000c2 <__bad_interrupt>:
  c2:    0c 94 00 00     jmp    0x0

000000c6 <foo0>:
*/



int foo0 ( int a ){



    if (a & 0x800000L)

  c6:    aa 27           eor    r26, r26
  c8:    97 fd           sbrc    r25, 7
  ca:    a0 95           com    r26
  cc:    ba 2f           mov    r27, r26
  ce:    27 e1           ldi    r18, 0x17    ; 23
  d0:    b6 95           lsr    r27
  d2:    a7 95           ror    r26
  d4:    97 95           ror    r25
  d6:    87 95           ror    r24
  d8:    2a 95           dec    r18
  da:    d1 f7           brne    .-12         ; 0xd0
  dc:    81 70           andi    r24, 0x01    ; 1
  de:    90 70           andi    r25, 0x00    ; 0
  e0:    89 2b           or    r24, r25
  e2:    19 f0           breq    .+6          ; 0xea
        return 1; 

  e4:    81 e0           ldi    r24, 0x01    ; 1
  e6:    90 e0           ldi    r25, 0x00    ; 0
  e8:    08 95           ret
      else

        return 2 ;

  ea:    82 e0           ldi    r24, 0x02    ; 2
  ec:    90 e0           ldi    r25, 0x00    ; 0
    

}

  ee:    08 95           ret
  f0:    08 95           ret

000000f2 <foo1>:


int foo1 ( int a ){



    if (a & (1L << 23))

  f2:    aa 27           eor    r26, r26
  f4:    97 fd           sbrc    r25, 7
  f6:    a0 95           com    r26
  f8:    ba 2f           mov    r27, r26
  fa:    37 e1           ldi    r19, 0x17    ; 23
  fc:    b6 95           lsr    r27
  fe:    a7 95           ror    r26
 100:    97 95           ror    r25
 102:    87 95           ror    r24
 104:    3a 95           dec    r19
 106:    d1 f7           brne    .-12         ; 0xfc
 108:    81 70           andi    r24, 0x01    ; 1
 10a:    90 70           andi    r25, 0x00    ; 0
 10c:    89 2b           or    r24, r25
 10e:    19 f0           breq    .+6          ; 0x116
        return 1; 

 110:    81 e0           ldi    r24, 0x01    ; 1
 112:    90 e0           ldi    r25, 0x00    ; 0
 114:    08 95           ret
      else

        return 2 ;

 116:    82 e0           ldi    r24, 0x02    ; 2
 118:    90 e0           ldi    r25, 0x00    ; 0
    

}

 11a:    08 95           ret
 11c:    08 95           ret

0000011e <foo2>:


int foo2 ( long a ){

 11e:    dc 01           movw    r26, r24
 120:    cb 01           movw    r24, r22


    if (a & 0x800000L)

 122:    47 e1           ldi    r20, 0x17    ; 23
 124:    b6 95           lsr    r27
 126:    a7 95           ror    r26
 128:    97 95           ror    r25
 12a:    87 95           ror    r24
 12c:    4a 95           dec    r20
 12e:    d1 f7           brne    .-12         ; 0x124
 130:    81 70           andi    r24, 0x01    ; 1
 132:    90 70           andi    r25, 0x00    ; 0
 134:    89 2b           or    r24, r25
 136:    19 f0           breq    .+6          ; 0x13e
        return 1; 

 138:    81 e0           ldi    r24, 0x01    ; 1
 13a:    90 e0           ldi    r25, 0x00    ; 0
 13c:    08 95           ret
      else

        return 2 ;

 13e:    82 e0           ldi    r24, 0x02    ; 2
 140:    90 e0           ldi    r25, 0x00    ; 0
    

}

 142:    08 95           ret
 144:    08 95           ret

00000146 <foo3>:


int foo3 ( long a ){

 146:    dc 01           movw    r26, r24
 148:    cb 01           movw    r24, r22


    if (a & (1L << 23))

 14a:    57 e1           ldi    r21, 0x17    ; 23
 14c:    b6 95           lsr    r27
 14e:    a7 95           ror    r26
 150:    97 95           ror    r25
 152:    87 95           ror    r24
 154:    5a 95           dec    r21
 156:    d1 f7           brne    .-12         ; 0x14c
 158:    81 70           andi    r24, 0x01    ; 1
 15a:    90 70           andi    r25, 0x00    ; 0
 15c:    89 2b           or    r24, r25
 15e:    19 f0           breq    .+6          ; 0x166
        return 1; 

 160:    81 e0           ldi    r24, 0x01    ; 1
 162:    90 e0           ldi    r25, 0x00    ; 0
 164:    08 95           ret
      else

        return 2 ;

 166:    82 e0           ldi    r24, 0x02    ; 2
 168:    90 e0           ldi    r25, 0x00    ; 0
    

}

 16a:    08 95           ret
 16c:    08 95           ret

0000016e <main>:


int main( void ){

 16e:    cd ef           ldi    r28, 0xFD    ; 253
 170:    d0 e1           ldi    r29, 0x10    ; 16
 172:    de bf           out    0x3e, r29    ; 62
 174:    cd bf           out    0x3d, r28    ; 61
    

    volatile int a;

    

    a = foo0 ( a );

 176:    89 81           ldd    r24, Y+1    ; 0x01
 178:    9a 81           ldd    r25, Y+2    ; 0x02
 17a:    0e 94 63 00     call    0xc6
 17e:    89 83           std    Y+1, r24    ; 0x01
 180:    9a 83           std    Y+2, r25    ; 0x02
    a = foo1 ( a );

 182:    89 81           ldd    r24, Y+1    ; 0x01
 184:    9a 81           ldd    r25, Y+2    ; 0x02
 186:    0e 94 79 00     call    0xf2
 18a:    89 83           std    Y+1, r24    ; 0x01
 18c:    9a 83           std    Y+2, r25    ; 0x02
    a = foo2 ( a );

 18e:    89 81           ldd    r24, Y+1    ; 0x01
 190:    9a 81           ldd    r25, Y+2    ; 0x02
 192:    aa 27           eor    r26, r26
 194:    97 fd           sbrc    r25, 7
 196:    a0 95           com    r26
 198:    ba 2f           mov    r27, r26
 19a:    bc 01           movw    r22, r24
 19c:    cd 01           movw    r24, r26
 19e:    0e 94 8f 00     call    0x11e
 1a2:    89 83           std    Y+1, r24    ; 0x01
 1a4:    9a 83           std    Y+2, r25    ; 0x02
    a = foo3 ( a );

 1a6:    89 81           ldd    r24, Y+1    ; 0x01
 1a8:    9a 81           ldd    r25, Y+2    ; 0x02
 1aa:    aa 27           eor    r26, r26
 1ac:    97 fd           sbrc    r25, 7
 1ae:    a0 95           com    r26
 1b0:    ba 2f           mov    r27, r26
 1b2:    bc 01           movw    r22, r24
 1b4:    cd 01           movw    r24, r26
 1b6:    0e 94 a3 00     call    0x146
 1ba:    89 83           std    Y+1, r24    ; 0x01
 1bc:    9a 83           std    Y+2, r25    ; 0x02
    

    return 0;

}

 1be:    80 e0           ldi    r24, 0x00    ; 0
 1c0:    90 e0           ldi    r25, 0x00    ; 0
 1c2:    0c 94 e3 00     jmp    0x1c6

000001c6 <_exit>:
 1c6:    ff cf           rjmp    .-2          ; 0x1c6


---------

The listing with avr-gcc (GCC) 3.3.1:


   1                       .file    "main.c"
   2                       .arch atmega64
   3                   __SREG__ = 0x3f
   4                   __SP_H__ = 0x3e
   5                   __SP_L__ = 0x3d
   6                   __tmp_reg__ = 0
   7                   __zero_reg__ = 1
   8                       .global __do_copy_data
   9                       .global __do_clear_bss
  12                       .text
  13                   .Ltext0:
  38                   .global    foo0
  40                   foo0:
   1:main.c        **** /*Compiling: main.c using (for the sake of argument)
   2:main.c        ****
   3:main.c        **** avr-gcc -c -mmcu=atmega64 -I. -g   -Os
-funsigned-char -funsigned-bitfields
   4:main.c        **** -fpack-struct
   5:main.c        **** -fshort-enums -Wall -Wstrict-prototypes
-Wa,-adhlns=main.lst
   6:main.c        **** -I/usr/local/avr/include
   7:main.c        **** -std=gnu99 -funsafe-math-optimizations
   8:main.c        **** -Wp,-M,-MP,-MT,main.o,-MF,.dep/main.o.d main.c
   9:main.c        **** -o main.o
  10:main.c        ****
  11:main.c        **** Linking: main.elf (again for the sake of argumnet)
  12:main.c        **** avr-gcc -mmcu=atmega64 -I. -g   -Os -funsigned-char
-funsigned-bitfields
  13:main.c        **** -fpack-struct
  14:main.c        **** -fshort-enums -Wall -Wstrict-prototypes
-Wa,-adhlns=main.o
  15:main.c        **** -I/usr/local/avr/include
  16:main.c        **** -std=gnu99 -funsafe-math-optimizations
  17:main.c        **** -Wp,-M,-MP,-MT,main.o,-MF,.dep/main.elf.d main.o
  18:main.c        **** --output main.elf -Wl,-Map=main.map,--cref    -lm
  19:main.c        ****
  20:main.c        **** File: main.c
  21:main.c        ****
  22:main.c        **** */
  23:main.c        ****
  24:main.c        **** int foo0 ( int a ){
  42                   .LM1:
  43                   /* prologue: frame size=0 */
  44                   /* prologue end (size=0) */
  25:main.c        ****
  26:main.c        ****     if (a & 0x800000L)
  46                   .LM2:
  47 0000 AA27              clr r26
  48 0002 97FD              sbrc r25,7
  49 0004 A095              com r26
  50 0006 BA2F              mov r27,r26
  51 0008 A7FF              sbrs r26,7
  52 000a 03C0              rjmp .L2
  27:main.c        ****         return 1;
  54                   .LM3:
  55 000c 81E0              ldi r24,lo8(1)
  56 000e 90E0              ldi r25,hi8(1)
  28:main.c        ****       else
  29:main.c        ****         return 2 ;
  30:main.c        ****
  31:main.c        **** }
  58                   .LM4:
  59 0010 0895              ret
  60                   .L2:
  62                   .LM5:
  63 0012 82E0              ldi r24,lo8(2)
  64 0014 90E0              ldi r25,hi8(2)
  66                   .LM6:
  67 0016 0895              ret
  68                   /* epilogue: frame size=0 */
  69 0018 0895              ret
  70                   /* epilogue end (size=1) */
  71                   /* function foo0 size 13 (12) */
  73                   .Lscope0:
  77                   .global    foo1
  79                   foo1:
  32:main.c        ****
  33:main.c        **** int foo1 ( int a ){
  81                   .LM7:
  82                   /* prologue: frame size=0 */
  83                   /* prologue end (size=0) */
  34:main.c        ****
  35:main.c        ****     if (a & (1L << 23))
  85                   .LM8:
  86 001a AA27              clr r26
  87 001c 97FD              sbrc r25,7
  88 001e A095              com r26
  89 0020 BA2F              mov r27,r26
  90 0022 A7FF              sbrs r26,7
  91 0024 03C0              rjmp .L5
  36:main.c        ****         return 1;
  93                   .LM9:
  94 0026 81E0              ldi r24,lo8(1)
  95 0028 90E0              ldi r25,hi8(1)
  37:main.c        ****       else
  38:main.c        ****         return 2 ;
  39:main.c        ****
  40:main.c        **** }
  97                   .LM10:
  98 002a 0895              ret
  99                   .L5:
 101                   .LM11:
 102 002c 82E0              ldi r24,lo8(2)
 103 002e 90E0              ldi r25,hi8(2)
 105                   .LM12:
 106 0030 0895              ret
 107                   /* epilogue: frame size=0 */
 108 0032 0895              ret
 109                   /* epilogue end (size=1) */
 110                   /* function foo1 size 13 (12) */
 112                   .Lscope1:
 116                   .global    foo2
 118                   foo2:
  41:main.c        ****
  42:main.c        **** int foo2 ( long a ){
 120                   .LM13:
 121                   /* prologue: frame size=0 */
 122                   /* prologue end (size=0) */
 123 0034 DC01              movw r26,r24
 124 0036 CB01              movw r24,r22
  43:main.c        ****
  44:main.c        ****     if (a & 0x800000L)
 126                   .LM14:
 127 0038 A7FF              sbrs r26,7
 128 003a 03C0              rjmp .L8
  45:main.c        ****         return 1;
 130                   .LM15:
 131 003c 81E0              ldi r24,lo8(1)
 132 003e 90E0              ldi r25,hi8(1)
  46:main.c        ****       else
  47:main.c        ****         return 2 ;
  48:main.c        ****
  49:main.c        **** }
 134                   .LM16:
 135 0040 0895              ret
 136                   .L8:
 138                   .LM17:
 139 0042 82E0              ldi r24,lo8(2)
 140 0044 90E0              ldi r25,hi8(2)
 142                   .LM18:
 143 0046 0895              ret
 144                   /* epilogue: frame size=0 */
 145 0048 0895              ret
 146                   /* epilogue end (size=1) */
 147                   /* function foo2 size 11 (10) */
 149                   .Lscope2:
 153                   .global    foo3
 155                   foo3:
  50:main.c        ****
  51:main.c        **** int foo3 ( long a ){
 157                   .LM19:
 158                   /* prologue: frame size=0 */
 159                   /* prologue end (size=0) */
 160 004a DC01              movw r26,r24
 161 004c CB01              movw r24,r22
  52:main.c        ****
  53:main.c        ****     if (a & (1L << 23))
 163                   .LM20:
 164 004e A7FF              sbrs r26,7
 165 0050 03C0              rjmp .L11
  54:main.c        ****         return 1;
 167                   .LM21:
 168 0052 81E0              ldi r24,lo8(1)
 169 0054 90E0              ldi r25,hi8(1)
  55:main.c        ****       else
  56:main.c        ****         return 2 ;
  57:main.c        ****
  58:main.c        **** }
 171                   .LM22:
 172 0056 0895              ret
 173                   .L11:
 175                   .LM23:
 176 0058 82E0              ldi r24,lo8(2)
 177 005a 90E0              ldi r25,hi8(2)
 179                   .LM24:
 180 005c 0895              ret
 181                   /* epilogue: frame size=0 */
 182 005e 0895              ret
 183                   /* epilogue end (size=1) */
 184                   /* function foo3 size 11 (10) */
 186                   .Lscope3:
 189                   .global    main
 191                   main:
  59:main.c        ****
  60:main.c        **** int main( void ){
 193                   .LM25:
 194                   /* prologue: frame size=2 */
 195 0060 C0E0              ldi r28,lo8(__stack - 2)
 196 0062 D0E0              ldi r29,hi8(__stack - 2)
 197 0064 DEBF              out __SP_H__,r29
 198 0066 CDBF              out __SP_L__,r28
 199                   /* prologue end (size=4) */
  61:main.c        ****
  62:main.c        ****     volatile int a;
  63:main.c        ****
  64:main.c        ****     a = foo0 ( a );
 201                   .LM26:
 202                   .LBB2:
 203 0068 8981              ldd r24,Y+1
 204 006a 9A81              ldd r25,Y+2
 205 006c 0E94 0000         call foo0
 206 0070 8983              std Y+1,r24
 207 0072 9A83              std Y+2,r25
  65:main.c        ****     a = foo1 ( a );
 209                   .LM27:
 210 0074 8981              ldd r24,Y+1
 211 0076 9A81              ldd r25,Y+2
 212 0078 0E94 0000         call foo1
 213 007c 8983              std Y+1,r24
 214 007e 9A83              std Y+2,r25
  66:main.c        ****     a = foo2 ( a );
 216                   .LM28:
 217 0080 8981              ldd r24,Y+1
 218 0082 9A81              ldd r25,Y+2
 219 0084 AA27              clr r26
 220 0086 97FD              sbrc r25,7
 221 0088 A095              com r26
 222 008a BA2F              mov r27,r26
 223 008c BC01              movw r22,r24
 224 008e CD01              movw r24,r26
 225 0090 0E94 0000         call foo2
 226 0094 8983              std Y+1,r24
 227 0096 9A83              std Y+2,r25
  67:main.c        ****     a = foo3 ( a );
 229                   .LM29:
 230 0098 8981              ldd r24,Y+1
 231 009a 9A81              ldd r25,Y+2
 232 009c AA27              clr r26
 233 009e 97FD              sbrc r25,7
 234 00a0 A095              com r26
 235 00a2 BA2F              mov r27,r26
 236 00a4 BC01              movw r22,r24
 237 00a6 CD01              movw r24,r26
 238 00a8 0E94 0000         call foo3
 239 00ac 8983              std Y+1,r24
 240 00ae 9A83              std Y+2,r25
  68:main.c        ****
  69:main.c        ****     return 0;
  70:main.c        **** }
 242                   .LM30:
 243                   .LBE2:
 244 00b0 80E0              ldi r24,lo8(0)
 245 00b2 90E0              ldi r25,hi8(0)
 246                   /* epilogue: frame size=2 */
 247 00b4 0C94 0000         jmp exit
 248                   /* epilogue end (size=2) */
 249                   /* function main size 44 (38) */
 254                   .Lscope4:
 256                       .text
 258                   Letext:
 259                   /* File "main.c": code   92 = 0x005c (  82),
prologues   4, epilogues   6 */
DEFINED SYMBOLS
                            *ABS*:00000000 main.c
                            *ABS*:0000003f __SREG__
                            *ABS*:0000003e __SP_H__
                            *ABS*:0000003d __SP_L__
                            *ABS*:00000000 __tmp_reg__
                            *ABS*:00000001 __zero_reg__
     /tmp/ccS6dcXA.s:40     .text:00000000 foo0
     /tmp/ccS6dcXA.s:79     .text:0000001a foo1
     /tmp/ccS6dcXA.s:118    .text:00000034 foo2
     /tmp/ccS6dcXA.s:155    .text:0000004a foo3
     /tmp/ccS6dcXA.s:191    .text:00000060 main
     /tmp/ccS6dcXA.s:258    .text:000000b8 Letext

UNDEFINED SYMBOLS
__do_copy_data
__do_clear_bss
__stack
exit







-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (14 preceding siblings ...)
  2004-11-11 20:29 ` schlie at comcast dot net
@ 2004-11-16 23:58 ` dmixm at marine dot febras dot ru
  2004-11-17  7:22 ` schlie at comcast dot net
                   ` (15 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: dmixm at marine dot febras dot ru @ 2004-11-16 23:58 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From dmixm at marine dot febras dot ru  2004-11-16 23:58 -------
In March, 2004 Richard Sandiford has offered a patch for elimination of 
this problem. See: http://gcc.gnu.org/ml/gcc/2004-03/msg01456.html  This 
patch modifies function do_jump (a file dojump.c). This change now is 
present at a branch 4.0, but does not enter in 3.4.x. I have tried to 
apply this patch to 3.4.3. It has earned only after change of a line 
    if (TREE_CODE (TREE_OPERAND (exp, 0)) == RSHIFT_EXPR 
    ... 
to 
    tree arg_nops = TREE_OPERAND (exp, 0);  /* and below the same subst. */ 
    STRIP_NOPS (arg_nops); 
    if (TREE_CODE (arg_nops) == RSHIFT_EXPR 
    ... 
and only for function foo_i. 
 
foo_i() before patch: 
        ... 
        mov r24,r25 
        ldi r25,6 
1:      lsr r24 
        dec r25 
        brne 1b 
        sbrs r24,0 
        rjmp .L2 
 
foo_i() after patch: 
        ... 
        sbrs r25,6 
        rjmp .L8 
 
But foo_ll (shift loop with count 62!) and foo_l have remained on old - 
through shift of the left argument. 
 
Source file: 
~~~~~~~~~~~ 
int foo_ll (long long x) 
{ 
    return (x & 0x4000000000000000LL) ? 1 : 3; 
} 
 
int foo_l (long x) 
{ 
    return (x & 0x40000000) ? 5 : 7; 
} 
 
int foo_i (int x) 
{ 
    return (x & 0x4000) ? 9 : 11; 
} 
 
int foo_c (char x) 
{ 
    return (x & 0x40) ? 13 : 15; 
} 
 
P.S. The code for foo_c was and remains beautiful due to work 
`gcc/combine.c' . 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (15 preceding siblings ...)
  2004-11-16 23:58 ` dmixm at marine dot febras dot ru
@ 2004-11-17  7:22 ` schlie at comcast dot net
  2004-12-09 12:51 ` [Bug middle-end/18424] [3.4/4.0 Regression] ~6x+ performance regression, " giovannibajo at libero dot it
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: schlie at comcast dot net @ 2004-11-17  7:22 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From schlie at comcast dot net  2004-11-17 07:21 -------
Subject: Re:  3.4.3 ~6x+ performance regression vs
 3.3.1

> From: dmixm at marine dot febras dot ru <gcc-bugzilla@gcc.gnu.org>
> But foo_ll (shift loop with count 62!) and foo_l have remained on old -
> through shift of the left argument.

It would seem in general that if GCC in addition to transforming:

  ((size)x & (size)<pow-of-2-value>)) => (size)(x >> <log2-value>)

it also then transformed all RSHIFT expressions of the form:

  (size)(x >> <value>)
 =>
  (size-lsw)((subreg:(size-lsw) x:size lsw))  >> (<value>-(word-size*lsw)))

[where lsw is the lest significant pow-of-2 word which remains significant
 i.e lsw = 0 implies all words remain significant, lsw = 1 implies the
 precision of the result may be reduced by one word, etc.]

It would enable for a hypothetical 8-bit word-wide machine:

  (4-word-type)(X & 0x04000000)
=>
  (4-word-type)(X >> 26)
=>
  (1-word-type)((subreg:1-word-type X:4-word-type 3) >> 2)

Or for a 16-bit word wide machine: (or analogously for even wider machines)

  (2-word-type)(X & 0x04000000)
=>
  (2-word-type)(X >> 26)
=>
  (1-word-type)((subreg:1-word-type X:2-word-type 1) >> 10)

Which both enables the use of the resulting demoted precision value to be
expressed more optimally as an argument in subsequent expressions, and
to more optimally generate code on less-then-word-wide target machines.

Furthermore, the subreg representation of demoted expression values, could
also enable the potential further optimization of sign/zero extended values,
by allowing the detection of only sign or zero extended bits remaining
significant: 

  (4-word-type)((char)X & (long)0x04000000)
=>
  (4-word-type)((long<-char)X >> 26)
=>
  (1-word-type)((subreg:char ((long<-char)x):long 3) >> 2)
=>
  (1-word-type)((subreg:char (sign x):char 3) >> 2)
=>
  (1-word-type)(sign x)

Or something like that possibly.




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] [3.4/4.0 Regression] ~6x+ performance regression, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (16 preceding siblings ...)
  2004-11-17  7:22 ` schlie at comcast dot net
@ 2004-12-09 12:51 ` giovannibajo at libero dot it
  2004-12-09 15:00 ` roger at eyesopen dot com
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: giovannibajo at libero dot it @ 2004-12-09 12:51 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From giovannibajo at libero dot it  2004-12-09 12:51 -------
Proposed (partial) patch:
http://gcc.gnu.org/ml/gcc-patches/2004-12/msg00655.html

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |giovannibajo at libero dot
                   |                            |it
         AssignedTo|unassigned at gcc dot gnu   |sayle at gcc dot gnu dot org
                   |dot org                     |
           Severity|enhancement                 |critical
             Status|NEW                         |ASSIGNED
           Keywords|                            |patch
      Known to fail|                            |3.4.0 4.0.0
      Known to work|                            |3.3.1
            Summary|3.4.3 ~6x+ performance      |[3.4/4.0 Regression] ~6x+
                   |regression vs 3.3.1,        |performance regression,
                   |constant trees not being    |constant trees not being
                   |computed.                   |computed.
   Target Milestone|---                         |4.0.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] [3.4/4.0 Regression] ~6x+ performance regression, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (17 preceding siblings ...)
  2004-12-09 12:51 ` [Bug middle-end/18424] [3.4/4.0 Regression] ~6x+ performance regression, " giovannibajo at libero dot it
@ 2004-12-09 15:00 ` roger at eyesopen dot com
  2004-12-09 15:23 ` schlie at comcast dot net
                   ` (12 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: roger at eyesopen dot com @ 2004-12-09 15:00 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From roger at eyesopen dot com  2004-12-09 14:59 -------
The patch is a "partial" fix as there will still be a performance regression for
the code generated vs. gcc 3.3.1.  The reason being that 3.3.1 generated
incorrect code for test program in this PR.

int foo(int a) { return (a & (1L<<23)) ? 1 : 2; }

is supposed to sign-extend "a" from a 16-bit integer to a 32-bit long, to match
the (long) constant operand.  This sign-extension may end setting bit 23, so the
result of the function is dependent upon "a".  i.e. the best we can do is:

int foo(int a) { return (a < 0) ? 1 : 2; }

Unfortunately, we don't quite get that efficient for avr-elf, instead still
producing the sign-extension and an AND by constant.  This fixes the regression
aspects of this patch, but we still have a missed optimization.  The code we
generate is:

        clr r26
        sbrc r25,7
        com r26
        mov r27,r26
        sbrs r26,7
        rjmp .L2
        ldi r24,lo8(1)
        ldi r25,hi8(1)
        ret
.L2:
        ldi r24,lo8(2)
        ldi r25,hi8(2)
        ret

Perhaps once the patch is committed, we should close this PR, and open a
separate "enhancement" request PR to catch this missed AND(SIGN_EXTEND ..))
opportunity on AVR.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] [3.4/4.0 Regression] ~6x+ performance regression, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (18 preceding siblings ...)
  2004-12-09 15:00 ` roger at eyesopen dot com
@ 2004-12-09 15:23 ` schlie at comcast dot net
  2004-12-09 15:52 ` schlie at comcast dot net
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: schlie at comcast dot net @ 2004-12-09 15:23 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From schlie at comcast dot net  2004-12-09 15:23 -------
Subject: Re:  [3.4/4.0 Regression] ~6x+ performance
 regression, constant trees not being computed.

Few thoughts:

- I believe avr's back end does know how to convert:

 ((char)x & <pow2-const>) => bit-test x <log2-const>

 which I believe it had relied upon previously.

- might it be possible to recognize, possibly through some sub-reg
  mechanism as avr's word-size is 1(byte), that any right-shift which is
  an word-size multiple may be subtracted off both the logical shited value
  via higher-order subword selection (effectively demoting the operand), and
  correspondingly reducing the const-shift value by the N*word-bit-size?

   i.e. 

  (((long)x & (1 << 28)) == 0) => (((sub-word (long)x 3) >> 4) == 0)
   => (((byte)x >> 4) == 0)

  (then possibly => (((byte)x & 0x10) == 0) which I believe avr's
   back-end knows how to transform into a [bit-test x 5] )

Might this be possible, as it would seem generally useful as well?

Thanks again, -paul-

> From: roger at eyesopen dot com <gcc-bugzilla@gcc.gnu.org>
> ------- Additional Comments From roger at eyesopen dot com  2004-12-09 14:59
> -------
> The patch is a "partial" fix as there will still be a performance regression
> for
> the code generated vs. gcc 3.3.1.  The reason being that 3.3.1 generated
> incorrect code for test program in this PR.
> 
> int foo(int a) { return (a & (1L<<23)) ? 1 : 2; }
> 
> is supposed to sign-extend "a" from a 16-bit integer to a 32-bit long, to
> match
> the (long) constant operand.  This sign-extension may end setting bit 23, so
> the
> result of the function is dependent upon "a".  i.e. the best we can do is:
> 
> int foo(int a) { return (a < 0) ? 1 : 2; }
> 
> Unfortunately, we don't quite get that efficient for avr-elf, instead still
> producing the sign-extension and an AND by constant.  This fixes the
> regression
> aspects of this patch, but we still have a missed optimization.  The code we
> generate is:




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] [3.4/4.0 Regression] ~6x+ performance regression, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (19 preceding siblings ...)
  2004-12-09 15:23 ` schlie at comcast dot net
@ 2004-12-09 15:52 ` schlie at comcast dot net
  2004-12-11  1:49 ` cvs-commit at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: schlie at comcast dot net @ 2004-12-09 15:52 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From schlie at comcast dot net  2004-12-09 15:52 -------
Subject: Re:  [3.4/4.0 Regression] ~6x+ performance
 regression, constant trees not being computed.

Sorry, lost the fact that only a single bit needs to remain significant in
the resulting trasform:


   (((long)x & (1 << 28)) == 0)
=>
   ((((byte)(sub-word (long)x 3) >> 4) & 1) == 0)
=>
   ((((byte)x' >> 4) & 1) == 0) :: (((byte)x' & 0x10) == 0)

> From: schlie at comcast dot net <gcc-bugzilla@gcc.gnu.org>
>    i.e. 
> 
>   (((long)x & (1 << 28)) == 0) => (((sub-word (long)x 3) >> 4) == 0)
>    => (((byte)x >> 4) == 0)
> 
>   (then possibly => (((byte)x & 0x10) == 0) which I believe avr's
>    back-end knows how to transform into a [bit-test x 5] )
> 
> Might this be possible, as it would seem generally useful as well?
> 
> Thanks again, -paul-




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] [3.4/4.0 Regression] ~6x+ performance regression, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (20 preceding siblings ...)
  2004-12-09 15:52 ` schlie at comcast dot net
@ 2004-12-11  1:49 ` cvs-commit at gcc dot gnu dot org
  2004-12-14  1:47 ` cvs-commit at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: cvs-commit at gcc dot gnu dot org @ 2004-12-11  1:49 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From cvs-commit at gcc dot gnu dot org  2004-12-11 01:49 -------
Subject: Bug 18424

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	sayle@gcc.gnu.org	2004-12-11 01:49:06

Modified files:
	gcc            : ChangeLog dojump.c 

Log message:
	PR target/18002
	PR middle-end/18424
	* dojump.c (do_jump): When attempting to reverse the effects of
	fold_single_bit_test, we need to STRIP_NOPS and narrowing type
	conversions, and handle BIT_XOR_EXPR that's used to invert the
	sense of the single bit test.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.6778&r2=2.6779
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/dojump.c.diff?cvsroot=gcc&r1=1.32&r2=1.33



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] [3.4/4.0 Regression] ~6x+ performance regression, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (21 preceding siblings ...)
  2004-12-11  1:49 ` cvs-commit at gcc dot gnu dot org
@ 2004-12-14  1:47 ` cvs-commit at gcc dot gnu dot org
  2004-12-14  2:11 ` pinskia at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: cvs-commit at gcc dot gnu dot org @ 2004-12-14  1:47 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From cvs-commit at gcc dot gnu dot org  2004-12-14 01:47 -------
Subject: Bug 18424

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	gcc-3_4-branch
Changes by:	sayle@gcc.gnu.org	2004-12-14 01:47:35

Modified files:
	gcc            : ChangeLog dojump.c Makefile.in 

Log message:
	PR target/18002
	PR middle-end/18424
	Backport from mainline
	
	2004-03-20  Richard Sandiford  <rsandifo@redhat.com>
	* Makefile.in (dojump.o): Depend on $(GGC_H) and dojump.h.
	(GTFILES): Add $(srcdir)/dojump.h.
	(gt-dojump.h): New dependency.
	* dojump.c (and_reg, and_test, shift_test): New static variables.
	(prefer_and_bit_test): New function.
	(do_jump): Use it to choose between (X & (1 << C)) and (X >> C) & 1.
	
	2004-03-21  Andrew Pinski  <pinskia@gcc.gnu.org>
	* dojump.c (prefer_and_bit_test): Fix which part of
	the and_test is replaced.
	
	2004-12-10  Roger Sayle  <roger@eyesopen.com>
	* dojump.c (do_jump): When attempting to reverse the effects of
	fold_single_bit_test, we need to STRIP_NOPS and narrowing type
	conversions, and handle BIT_XOR_EXPR that's used to invert the
	sense of the single bit test.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=2.2326.2.730&r2=2.2326.2.731
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/dojump.c.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.9.4.1&r2=1.9.4.2
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/Makefile.in.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.1223.2.20&r2=1.1223.2.21



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] [3.4/4.0 Regression] ~6x+ performance regression, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (22 preceding siblings ...)
  2004-12-14  1:47 ` cvs-commit at gcc dot gnu dot org
@ 2004-12-14  2:11 ` pinskia at gcc dot gnu dot org
  2004-12-14  2:13 ` schlie at comcast dot net
                   ` (7 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-12-14  2:11 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-12-14 02:11 -------
Fixed also.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED
   Target Milestone|4.0.0                       |3.4.4


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] [3.4/4.0 Regression] ~6x+ performance regression, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (23 preceding siblings ...)
  2004-12-14  2:11 ` pinskia at gcc dot gnu dot org
@ 2004-12-14  2:13 ` schlie at comcast dot net
  2004-12-14  5:04 ` ericw at evcohs dot com
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: schlie at comcast dot net @ 2004-12-14  2:13 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From schlie at comcast dot net  2004-12-14 02:13 -------

Thank you all; and would like to try to verfiy on 4.0 as well
once we can figure out now to get the avr target to reliably build.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] [3.4/4.0 Regression] ~6x+ performance regression, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (24 preceding siblings ...)
  2004-12-14  2:13 ` schlie at comcast dot net
@ 2004-12-14  5:04 ` ericw at evcohs dot com
  2004-12-14 12:33 ` schlie at comcast dot net
                   ` (5 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: ericw at evcohs dot com @ 2004-12-14  5:04 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From ericw at evcohs dot com  2004-12-14 05:03 -------
Subject: Re:  [3.4/4.0 Regression] ~6x+ performance regression, constant trees not being computed.

On 14 Dec 2004 at 2:13, schlie at comcast dot net wrote:

> 
> ------- Additional Comments From schlie at comcast dot net  2004-12-14 02:13 -------
> 
> Thank you all; and would like to try to verfiy on 4.0 as well
> once we can figure out now to get the avr target to reliably build.
> 
> 

AFAIK, the avr target is supposed to build now for HEAD. I haven't tried a snapshot recently, 
though...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] [3.4/4.0 Regression] ~6x+ performance regression, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (25 preceding siblings ...)
  2004-12-14  5:04 ` ericw at evcohs dot com
@ 2004-12-14 12:33 ` schlie at comcast dot net
  2004-12-14 14:18 ` ericw at evcohs dot com
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: schlie at comcast dot net @ 2004-12-14 12:33 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From schlie at comcast dot net  2004-12-14 12:33 -------
Subject: Re:  [3.4/4.0 Regression] ~6x+ performance
 regression, constant trees not being computed.

Nope, unfortunately not as of yesterday, since reload.c was tweaked last
week.




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] [3.4/4.0 Regression] ~6x+ performance regression, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (26 preceding siblings ...)
  2004-12-14 12:33 ` schlie at comcast dot net
@ 2004-12-14 14:18 ` ericw at evcohs dot com
  2004-12-14 23:20 ` giovannibajo at libero dot it
                   ` (3 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: ericw at evcohs dot com @ 2004-12-14 14:18 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From ericw at evcohs dot com  2004-12-14 14:18 -------
Subject: Re:  [3.4/4.0 Regression] ~6x+ performance regression, constant trees not being computed.

On 14 Dec 2004 at 12:33, schlie at comcast dot net wrote:

> 
> ------- Additional Comments From schlie at comcast dot net  2004-12-14 12:33 -------
> Subject: Re:  [3.4/4.0 Regression] ~6x+ performance
>  regression, constant trees not being computed.
> 
> Nope, unfortunately not as of yesterday, since reload.c was tweaked last
> week.

Please file a separate bug report about this ASAP.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] [3.4/4.0 Regression] ~6x+ performance regression, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (27 preceding siblings ...)
  2004-12-14 14:18 ` ericw at evcohs dot com
@ 2004-12-14 23:20 ` giovannibajo at libero dot it
  2004-12-21  7:59 ` schlie at comcast dot net
                   ` (2 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: giovannibajo at libero dot it @ 2004-12-14 23:20 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From giovannibajo at libero dot it  2004-12-14 23:20 -------
Subject: Re:  [3.4/4.0 Regression] ~6x+ performance regression, constant trees not being computed.

ericw at evcohs dot com <gcc-bugzilla@gcc.gnu.org> wrote:

>> Nope, unfortunately not as of yesterday, since reload.c was tweaked
>> last week.
> 
> Please file a separate bug report about this ASAP.


It was already opened: PR 18887

Giovanni Bajo



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] [3.4/4.0 Regression] ~6x+ performance regression, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (28 preceding siblings ...)
  2004-12-14 23:20 ` giovannibajo at libero dot it
@ 2004-12-21  7:59 ` schlie at comcast dot net
  2004-12-21  8:02 ` schlie at comcast dot net
  2004-12-21 16:02 ` pinskia at gcc dot gnu dot org
  31 siblings, 0 replies; 33+ messages in thread
From: schlie at comcast dot net @ 2004-12-21  7:59 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From schlie at comcast dot net  2004-12-21 07:59 -------
Problems, with 4.0 avr test results (some good, some bad, some odd);

000000c6 <main>:
int main (void){
  c6:	c8 ef       	ldi	r28, 0xF8	; 248
  c8:	d0 e1       	ldi	r29, 0x10	; 16
  ca:	de bf       	out	0x3e, r29	; 62
  cc:	cd bf       	out	0x3d, r28	; 61

volatile char c;
volatile int i;
volatile long l;

/* char tests */

c = (c & (1 << 4));
  ce:	89 81       	ldd	r24, Y+1	; 0x01
  d0:	80 71       	andi	r24, 0x10	; 16                                            ; good.
  d2:	89 83       	std	Y+1, r24	; 0x01

if (c & (1 << 4))
  d4:	89 81       	ldd	r24, Y+1	; 0x01
  d6:	84 ff       	sbrs	r24, 4                                                         ; good single bit test & branch
  d8:	03 c0       	rjmp	.+6      	; 0xe0 <main+0x1a>
  c = 1;
  da:	81 e0       	ldi	r24, 0x01	; 1
  dc:	89 83       	std	Y+1, r24	; 0x01
  de:	01 c0       	rjmp	.+2      	; 0xe2 <main+0x1c>
else
  c = 0;
  e0:	19 82       	std	Y+1, r1	; 0x01

c = (c & (1 << 4)) ? 1 : 0;
  e2:	89 81       	ldd	r24, Y+1	; 0x01
  e4:	99 27       	eor	r25, r25
  e6:	44 e0       	ldi	r20, 0x04	; 4                         ; bad, using shift loop. although same as above.
  e8:	96 95       	lsr	r25
  ea:	87 95       	ror	r24
  ec:	4a 95       	dec	r20
  ee:	e1 f7       	brne	.-8      	; 0xe8 <main+0x22>
  f0:	81 70       	andi	r24, 0x01	; 1
  f2:	89 83       	std	Y+1, r24	; 0x01

c = sizeof(char);
  f4:	81 e0       	ldi	r24, 0x01	; 1
  f6:	89 83       	std	Y+1, r24	; 0x01

/* int tests */

i = i & (1 << 10);
  f8:	8a 81       	ldd	r24, Y+2	; 0x02
  fa:	9b 81       	ldd	r25, Y+3	; 0x03
  fc:	80 70       	andi	r24, 0x00	; 0                 ; ok, but nicer if recognized only highest byte significant.
  fe:	94 70       	andi	r25, 0x04	; 4
 100:	8a 83       	std	Y+2, r24	; 0x02
 102:	9b 83       	std	Y+3, r25	; 0x03

if (i & (1 << 10))
 104:	8a 81       	ldd	r24, Y+2	; 0x02
 106:	9b 81       	ldd	r25, Y+3	; 0x03
 108:	9c 01       	movw	r18, r24                           ; shouldn't be moving the operand? rest as above.
 10a:	20 70       	andi	r18, 0x00	; 0
 10c:	34 70       	andi	r19, 0x04	; 4
 10e:	92 ff       	sbrs	r25, 2
 110:	05 c0       	rjmp	.+10     	; 0x11c <main+0x56>
  i = 1;
 112:	81 e0       	ldi	r24, 0x01	; 1
 114:	90 e0       	ldi	r25, 0x00	; 0                        ; r1 = 0 already?, 
 116:	8a 83       	std	Y+2, r24	; 0x02
 118:	9b 83       	std	Y+3, r25	; 0x03
 11a:	02 c0       	rjmp	.+4      	; 0x120 <main+0x5a>
else
  i = 0;
 11c:	2a 83       	std	Y+2, r18	; 0x02
 11e:	3b 83       	std	Y+3, r19	; 0x03                ; wrong, r19 = andi r19, 0x04 ?

i = (i & (1 << 10)) ? 1 : 0;
 120:	8a 81       	ldd	r24, Y+2	; 0x02
 122:	9b 81       	ldd	r25, Y+3	; 0x03
 124:	89 2f       	mov	r24, r25                     ; nice, shifts by 8 first.
 126:	99 27       	eor	r25, r25
 128:	86 95       	lsr	r24
 12a:	86 95       	lsr	r24
 12c:	81 70       	andi	r24, 0x01	; 1
 12e:	90 70       	andi	r25, 0x00	; 0              ; but then fogets it already set r25 to 0?
 130:	8a 83       	std	Y+2, r24	; 0x02
 132:	9b 83       	std	Y+3, r25	; 0x03

i = sizeof(int);
 134:	82 e0       	ldi	r24, 0x02	; 2
 136:	90 e0       	ldi	r25, 0x00	; 0
 138:	8a 83       	std	Y+2, r24	; 0x02
 13a:	9b 83       	std	Y+3, r25	; 0x03

/* long tests */

l = (l & ((long)1 << 26));
 13c:	8c 81       	ldd	r24, Y+4	; 0x04
 13e:	9d 81       	ldd	r25, Y+5	; 0x05
 140:	ae 81       	ldd	r26, Y+6	; 0x06
 142:	bf 81       	ldd	r27, Y+7	; 0x07
 144:	80 70       	andi	r24, 0x00	; 0                    ; ok.
 146:	90 70       	andi	r25, 0x00	; 0
 148:	a0 70       	andi	r26, 0x00	; 0
 14a:	b4 70       	andi	r27, 0x04	; 4
 14c:	8c 83       	std	Y+4, r24	; 0x04
 14e:	9d 83       	std	Y+5, r25	; 0x05
 150:	ae 83       	std	Y+6, r26	; 0x06
 152:	bf 83       	std	Y+7, r27	; 0x07

if (l & ((long)1 << 26))
 154:	8c 81       	ldd	r24, Y+4	; 0x04
 156:	9d 81       	ldd	r25, Y+5	; 0x05
 158:	ae 81       	ldd	r26, Y+6	; 0x06
 15a:	bf 81       	ldd	r27, Y+7	; 0x07
 15c:	9c 01       	movw	r18, r24                    ; again unnessisarily moving things?
 15e:	ad 01       	movw	r20, r26
 160:	20 70       	andi	r18, 0x00	; 0                    ; very odd, both &'s
 162:	30 70       	andi	r19, 0x00	; 0
 164:	40 70       	andi	r20, 0x00	; 0
 166:	54 70       	andi	r21, 0x04	; 4
 168:	b2 ff       	sbrs	r27, 2                               ; and tests the significant bit, didn't need to do above.
 16a:	09 c0       	rjmp	.+18     	; 0x17e <main+0xb8>
  l = 1;
 16c:	81 e0       	ldi	r24, 0x01	; 1
 16e:	90 e0       	ldi	r25, 0x00	; 0                 ; again r1 = 0 already?
 170:	a0 e0       	ldi	r26, 0x00	; 0
 172:	b0 e0       	ldi	r27, 0x00	; 0
 174:	8c 83       	std	Y+4, r24	; 0x04
 176:	9d 83       	std	Y+5, r25	; 0x05
 178:	ae 83       	std	Y+6, r26	; 0x06
 17a:	bf 83       	std	Y+7, r27	; 0x07
 17c:	04 c0       	rjmp	.+8      	; 0x186 <main+0xc0>
else
  l = 0;
 17e:	2c 83       	std	Y+4, r18	; 0x04
 180:	3d 83       	std	Y+5, r19	; 0x05
 182:	4e 83       	std	Y+6, r20	; 0x06
 184:	5f 83       	std	Y+7, r21	; 0x07     ; wrong r21 = andi	r21, 0x04
  
l = (l & ((long)1 << 26)) ? 1 : 0;
 186:	8c 81       	ldd	r24, Y+4	; 0x04
 188:	9d 81       	ldd	r25, Y+5	; 0x05
 18a:	ae 81       	ldd	r26, Y+6	; 0x06
 18c:	bf 81       	ldd	r27, Y+7	; 0x07
 18e:	2a e1       	ldi	r18, 0x1A	; 26
 190:	b6 95       	lsr	r27                        ; not good, big shift loop, should & or just test most sig. byte.
 192:	a7 95       	ror	r26
 194:	97 95       	ror	r25
 196:	87 95       	ror	r24
 198:	2a 95       	dec	r18
 19a:	d1 f7       	brne	.-12     	; 0x190 <main+0xca>
 19c:	aa 27       	eor	r26, r26
 19e:	97 fd       	sbrc	r25, 7
 1a0:	a0 95       	com	r26
 1a2:	ba 2f       	mov	r27, r26
 1a4:	81 70       	andi	r24, 0x01	; 1
 1a6:	90 70       	andi	r25, 0x00	; 0
 1a8:	a0 70       	andi	r26, 0x00	; 0
 1aa:	b0 70       	andi	r27, 0x00	; 0
 1ac:	8c 83       	std	Y+4, r24	; 0x04
 1ae:	9d 83       	std	Y+5, r25	; 0x05
 1b0:	ae 83       	std	Y+6, r26	; 0x06
 1b2:	bf 83       	std	Y+7, r27	; 0x07

l = sizeof(long);
 1a8:	84 e0       	ldi	r24, 0x04	; 4
 1aa:	90 e0       	ldi	r25, 0x00	; 0
 1ac:	a0 e0       	ldi	r26, 0x00	; 0
 1ae:	b0 e0       	ldi	r27, 0x00	; 0
 1b0:	8c 83       	std	Y+4, r24	; 0x04
 1b2:	9d 83       	std	Y+5, r25	; 0x05
 1b4:	ae 83       	std	Y+6, r26	; 0x06
 1b6:	bf 83       	std	Y+7, r27	; 0x07

return 0;

}
 1b8:	80 e0       	ldi	r24, 0x00	; 0
 1ba:	90 e0       	ldi	r25, 0x00	; 0
 1bc:	0c 94 e0 00 	jmp	0x1c0 <_exit>


-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] [3.4/4.0 Regression] ~6x+ performance regression, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (29 preceding siblings ...)
  2004-12-21  7:59 ` schlie at comcast dot net
@ 2004-12-21  8:02 ` schlie at comcast dot net
  2004-12-21 16:02 ` pinskia at gcc dot gnu dot org
  31 siblings, 0 replies; 33+ messages in thread
From: schlie at comcast dot net @ 2004-12-21  8:02 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From schlie at comcast dot net  2004-12-21 08:02 -------
Problems, with 4.0 avr test results (some good, some bad, some odd);

000000c6 <main>:
int main (void){
  c6:	c8 ef       	ldi	r28, 0xF8	; 248
  c8:	d0 e1       	ldi	r29, 0x10	; 16
  ca:	de bf       	out	0x3e, r29	; 62
  cc:	cd bf       	out	0x3d, r28	; 61

volatile char c;
volatile int i;
volatile long l;

/* char tests */

c = (c & (1 << 4));
  ce:	89 81       	ldd	r24, Y+1	; 0x01
  d0:	80 71       	andi	r24, 0x10	; 16                                            ; good.
  d2:	89 83       	std	Y+1, r24	; 0x01

if (c & (1 << 4))
  d4:	89 81       	ldd	r24, Y+1	; 0x01
  d6:	84 ff       	sbrs	r24, 4                                                         ; good single bit test & branch
  d8:	03 c0       	rjmp	.+6      	; 0xe0 <main+0x1a>
  c = 1;
  da:	81 e0       	ldi	r24, 0x01	; 1
  dc:	89 83       	std	Y+1, r24	; 0x01
  de:	01 c0       	rjmp	.+2      	; 0xe2 <main+0x1c>
else
  c = 0;
  e0:	19 82       	std	Y+1, r1	; 0x01

c = (c & (1 << 4)) ? 1 : 0;
  e2:	89 81       	ldd	r24, Y+1	; 0x01
  e4:	99 27       	eor	r25, r25
  e6:	44 e0       	ldi	r20, 0x04	; 4                         ; bad, using shift loop. although same as above.
  e8:	96 95       	lsr	r25
  ea:	87 95       	ror	r24
  ec:	4a 95       	dec	r20
  ee:	e1 f7       	brne	.-8      	; 0xe8 <main+0x22>
  f0:	81 70       	andi	r24, 0x01	; 1
  f2:	89 83       	std	Y+1, r24	; 0x01

c = sizeof(char);
  f4:	81 e0       	ldi	r24, 0x01	; 1
  f6:	89 83       	std	Y+1, r24	; 0x01

/* int tests */

i = i & (1 << 10);
  f8:	8a 81       	ldd	r24, Y+2	; 0x02
  fa:	9b 81       	ldd	r25, Y+3	; 0x03
  fc:	80 70       	andi	r24, 0x00	; 0                 ; ok, but nicer if recognized only highest byte significant.
  fe:	94 70       	andi	r25, 0x04	; 4
 100:	8a 83       	std	Y+2, r24	; 0x02
 102:	9b 83       	std	Y+3, r25	; 0x03

if (i & (1 << 10))
 104:	8a 81       	ldd	r24, Y+2	; 0x02
 106:	9b 81       	ldd	r25, Y+3	; 0x03
 108:	9c 01       	movw	r18, r24                           ; shouldn't be moving the operand? rest as above.
 10a:	20 70       	andi	r18, 0x00	; 0
 10c:	34 70       	andi	r19, 0x04	; 4
 10e:	92 ff       	sbrs	r25, 2
 110:	05 c0       	rjmp	.+10     	; 0x11c <main+0x56>
  i = 1;
 112:	81 e0       	ldi	r24, 0x01	; 1
 114:	90 e0       	ldi	r25, 0x00	; 0                        ; r1 = 0 already?, 
 116:	8a 83       	std	Y+2, r24	; 0x02
 118:	9b 83       	std	Y+3, r25	; 0x03
 11a:	02 c0       	rjmp	.+4      	; 0x120 <main+0x5a>
else
  i = 0;
 11c:	2a 83       	std	Y+2, r18	; 0x02
 11e:	3b 83       	std	Y+3, r19	; 0x03                ; wrong, r19 = andi r19, 0x04 ?

i = (i & (1 << 10)) ? 1 : 0;
 120:	8a 81       	ldd	r24, Y+2	; 0x02
 122:	9b 81       	ldd	r25, Y+3	; 0x03
 124:	89 2f       	mov	r24, r25                     ; nice, shifts by 8 first.
 126:	99 27       	eor	r25, r25
 128:	86 95       	lsr	r24
 12a:	86 95       	lsr	r24
 12c:	81 70       	andi	r24, 0x01	; 1
 12e:	90 70       	andi	r25, 0x00	; 0              ; but then fogets it already set r25 to 0?
 130:	8a 83       	std	Y+2, r24	; 0x02
 132:	9b 83       	std	Y+3, r25	; 0x03

i = sizeof(int);
 134:	82 e0       	ldi	r24, 0x02	; 2
 136:	90 e0       	ldi	r25, 0x00	; 0
 138:	8a 83       	std	Y+2, r24	; 0x02
 13a:	9b 83       	std	Y+3, r25	; 0x03

/* long tests */

l = (l & ((long)1 << 26));
 13c:	8c 81       	ldd	r24, Y+4	; 0x04
 13e:	9d 81       	ldd	r25, Y+5	; 0x05
 140:	ae 81       	ldd	r26, Y+6	; 0x06
 142:	bf 81       	ldd	r27, Y+7	; 0x07
 144:	80 70       	andi	r24, 0x00	; 0                    ; ok.
 146:	90 70       	andi	r25, 0x00	; 0
 148:	a0 70       	andi	r26, 0x00	; 0
 14a:	b4 70       	andi	r27, 0x04	; 4
 14c:	8c 83       	std	Y+4, r24	; 0x04
 14e:	9d 83       	std	Y+5, r25	; 0x05
 150:	ae 83       	std	Y+6, r26	; 0x06
 152:	bf 83       	std	Y+7, r27	; 0x07

if (l & ((long)1 << 26))
 154:	8c 81       	ldd	r24, Y+4	; 0x04
 156:	9d 81       	ldd	r25, Y+5	; 0x05
 158:	ae 81       	ldd	r26, Y+6	; 0x06
 15a:	bf 81       	ldd	r27, Y+7	; 0x07
 15c:	9c 01       	movw	r18, r24                    ; again unnessisarily moving things?
 15e:	ad 01       	movw	r20, r26
 160:	20 70       	andi	r18, 0x00	; 0                    ; very odd, both &'s
 162:	30 70       	andi	r19, 0x00	; 0
 164:	40 70       	andi	r20, 0x00	; 0
 166:	54 70       	andi	r21, 0x04	; 4
 168:	b2 ff       	sbrs	r27, 2                               ; and tests the significant bit, didn't need to do above.
 16a:	09 c0       	rjmp	.+18     	; 0x17e <main+0xb8>
  l = 1;
 16c:	81 e0       	ldi	r24, 0x01	; 1
 16e:	90 e0       	ldi	r25, 0x00	; 0                 ; again r1 = 0 already?
 170:	a0 e0       	ldi	r26, 0x00	; 0
 172:	b0 e0       	ldi	r27, 0x00	; 0
 174:	8c 83       	std	Y+4, r24	; 0x04
 176:	9d 83       	std	Y+5, r25	; 0x05
 178:	ae 83       	std	Y+6, r26	; 0x06
 17a:	bf 83       	std	Y+7, r27	; 0x07
 17c:	04 c0       	rjmp	.+8      	; 0x186 <main+0xc0>
else
  l = 0;
 17e:	2c 83       	std	Y+4, r18	; 0x04
 180:	3d 83       	std	Y+5, r19	; 0x05
 182:	4e 83       	std	Y+6, r20	; 0x06
 184:	5f 83       	std	Y+7, r21	; 0x07     ; wrong r21 = andi	r21, 0x04
  
l = (l & ((long)1 << 26)) ? 1 : 0;
 186:	8c 81       	ldd	r24, Y+4	; 0x04
 188:	9d 81       	ldd	r25, Y+5	; 0x05
 18a:	ae 81       	ldd	r26, Y+6	; 0x06
 18c:	bf 81       	ldd	r27, Y+7	; 0x07
 18e:	2a e1       	ldi	r18, 0x1A	; 26
 190:	b6 95       	lsr	r27                        ; not good, big shift loop, should & or just test most sig. byte.
 192:	a7 95       	ror	r26
 194:	97 95       	ror	r25
 196:	87 95       	ror	r24
 198:	2a 95       	dec	r18
 19a:	d1 f7       	brne	.-12     	; 0x190 <main+0xca>
 19c:	aa 27       	eor	r26, r26
 19e:	97 fd       	sbrc	r25, 7
 1a0:	a0 95       	com	r26
 1a2:	ba 2f       	mov	r27, r26
 1a4:	81 70       	andi	r24, 0x01	; 1
 1a6:	90 70       	andi	r25, 0x00	; 0
 1a8:	a0 70       	andi	r26, 0x00	; 0
 1aa:	b0 70       	andi	r27, 0x00	; 0
 1ac:	8c 83       	std	Y+4, r24	; 0x04
 1ae:	9d 83       	std	Y+5, r25	; 0x05
 1b0:	ae 83       	std	Y+6, r26	; 0x06
 1b2:	bf 83       	std	Y+7, r27	; 0x07

l = sizeof(long);
 1a8:	84 e0       	ldi	r24, 0x04	; 4
 1aa:	90 e0       	ldi	r25, 0x00	; 0
 1ac:	a0 e0       	ldi	r26, 0x00	; 0
 1ae:	b0 e0       	ldi	r27, 0x00	; 0
 1b0:	8c 83       	std	Y+4, r24	; 0x04
 1b2:	9d 83       	std	Y+5, r25	; 0x05
 1b4:	ae 83       	std	Y+6, r26	; 0x06
 1b6:	bf 83       	std	Y+7, r27	; 0x07

return 0;

}
 1b8:	80 e0       	ldi	r24, 0x00	; 0
 1ba:	90 e0       	ldi	r25, 0x00	; 0
 1bc:	0c 94 e0 00 	jmp	0x1c0 <_exit>

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Bug middle-end/18424] [3.4/4.0 Regression] ~6x+ performance regression, constant trees not being computed.
  2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
                   ` (30 preceding siblings ...)
  2004-12-21  8:02 ` schlie at comcast dot net
@ 2004-12-21 16:02 ` pinskia at gcc dot gnu dot org
  31 siblings, 0 replies; 33+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-12-21 16:02 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-12-21 16:02 -------
No the orginal problem was fixed, please open a new bug about the new problem, I would not doubt 
that the new problem is not a regression.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18424


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2004-12-21 16:02 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-11  2:35 [Bug c/18424] New: 3.4.3 ~6x+ performance regression vs 3.3.1, constant trees not being computed schlie at comcast dot net
2004-11-11  2:48 ` [Bug middle-end/18424] " pinskia at gcc dot gnu dot org
2004-11-11  2:49 ` pinskia at gcc dot gnu dot org
2004-11-11  2:52 ` pinskia at gcc dot gnu dot org
2004-11-11  3:15 ` schlie at comcast dot net
2004-11-11  3:18 ` pinskia at gcc dot gnu dot org
2004-11-11  3:56 ` schlie at comcast dot net
2004-11-11  4:33 ` schlie at comcast dot net
2004-11-11  4:41 ` pinskia at gcc dot gnu dot org
2004-11-11  4:59 ` schlie at comcast dot net
2004-11-11  5:04 ` pinskia at gcc dot gnu dot org
2004-11-11 15:52 ` schlie at comcast dot net
2004-11-11 16:22 ` joseph at codesourcery dot com
2004-11-11 16:30 ` ericw at evcohs dot com
2004-11-11 17:19 ` schlie at comcast dot net
2004-11-11 20:29 ` schlie at comcast dot net
2004-11-16 23:58 ` dmixm at marine dot febras dot ru
2004-11-17  7:22 ` schlie at comcast dot net
2004-12-09 12:51 ` [Bug middle-end/18424] [3.4/4.0 Regression] ~6x+ performance regression, " giovannibajo at libero dot it
2004-12-09 15:00 ` roger at eyesopen dot com
2004-12-09 15:23 ` schlie at comcast dot net
2004-12-09 15:52 ` schlie at comcast dot net
2004-12-11  1:49 ` cvs-commit at gcc dot gnu dot org
2004-12-14  1:47 ` cvs-commit at gcc dot gnu dot org
2004-12-14  2:11 ` pinskia at gcc dot gnu dot org
2004-12-14  2:13 ` schlie at comcast dot net
2004-12-14  5:04 ` ericw at evcohs dot com
2004-12-14 12:33 ` schlie at comcast dot net
2004-12-14 14:18 ` ericw at evcohs dot com
2004-12-14 23:20 ` giovannibajo at libero dot it
2004-12-21  7:59 ` schlie at comcast dot net
2004-12-21  8:02 ` schlie at comcast dot net
2004-12-21 16:02 ` pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).