public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop
@ 2021-05-16  8:37 tkoenig at gcc dot gnu.org
  2021-05-17  8:46 ` [Bug rtl-optimization/100622] " tkoenig at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2021-05-16  8:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622

            Bug ID: 100622
           Summary: Conversion to smaller unsigned type in loop
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tkoenig at gcc dot gnu.org
  Target Milestone: ---

Consider

unsigned int foo(unsigned int *a, int n)
{
  int i;
  unsigned int res = 0;
  for (i=0; i<n; i++)
    res += a[i];

  return res;
}

unsigned int foo2 (unsigned int *a, int n)
{
  int i;
  unsigned long res = 0;
  for (i=0; i<n; i++)
    res += a[i];

  return res;
}

Given modular 2^n arithmetic, these two functions are identical in effect.

On POWER with a reasonably recent trunk with -O1, this gets compiled to
(using -O1 in order to avoid loop unrolling for better visibility)

$ gcc -O1 -c add.c && objdump --disassemble add.o

add.o:     file format elf64-powerpcle


Disassembly of section .text:

0000000000000000 <foo>:
   0:   00 00 04 2c     cmpwi   r4,0
   4:   30 00 81 40     ble     34 <foo+0x34>
   8:   20 00 89 78     clrldi  r9,r4,32
   c:   fc ff 43 39     addi    r10,r3,-4
  10:   00 00 60 38     li      r3,0
  14:   a6 03 29 7d     mtctr   r9
  18:   04 00 0a 85     lwzu    r8,4(r10)
  1c:   14 1a 68 7c     add     r3,r8,r3
  20:   20 00 63 78     clrldi  r3,r3,32
  24:   ff ff 29 39     addi    r9,r9,-1
  28:   20 00 29 79     clrldi  r9,r9,32
  2c:   ec ff 00 42     bdnz    18 <foo+0x18>
  30:   20 00 80 4e     blr
  34:   00 00 60 38     li      r3,0
  38:   20 00 80 4e     blr
        ...

0000000000000048 <foo2>:
  48:   00 00 04 2c     cmpwi   r4,0
  4c:   30 00 81 40     ble     7c <foo2+0x34>
  50:   20 00 89 78     clrldi  r9,r4,32
  54:   fc ff 43 39     addi    r10,r3,-4
  58:   00 00 60 38     li      r3,0
  5c:   a6 03 29 7d     mtctr   r9
  60:   04 00 0a 85     lwzu    r8,4(r10)
  64:   14 42 63 7c     add     r3,r3,r8
  68:   ff ff 29 39     addi    r9,r9,-1
  6c:   20 00 29 79     clrldi  r9,r9,32
  70:   f0 ff 00 42     bdnz    60 <foo2+0x18>
  74:   20 00 63 78     clrldi  r3,r3,32
  78:   20 00 80 4e     blr
  7c:   00 00 60 38     li      r3,0
  80:   f4 ff ff 4b     b       74 <foo2+0x2c>

so there is an extra instruction to mask the result of the
addition in foo.  This should not be needed.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/100622] Conversion to smaller unsigned type in loop
  2021-05-16  8:37 [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop tkoenig at gcc dot gnu.org
@ 2021-05-17  8:46 ` tkoenig at gcc dot gnu.org
  2021-05-17  9:05 ` tkoenig at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2021-05-17  8:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622

Thomas Koenig <tkoenig at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |SUSPENDED
   Last reconfirmed|                            |2021-05-17
     Ever confirmed|0                           |1

--- Comment #1 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Hm, I just reran this with a much more recent version of trunk,
and the result is totally different.

So, please hold back on investigating this - this may have been
resolved in the meantime.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/100622] Conversion to smaller unsigned type in loop
  2021-05-16  8:37 [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop tkoenig at gcc dot gnu.org
  2021-05-17  8:46 ` [Bug rtl-optimization/100622] " tkoenig at gcc dot gnu.org
@ 2021-05-17  9:05 ` tkoenig at gcc dot gnu.org
  2021-05-17 12:41 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2021-05-17  9:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622

Thomas Koenig <tkoenig at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|SUSPENDED                   |UNCONFIRMED
     Ever confirmed|1                           |0

--- Comment #2 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
So, with recent trunk now.

[tkoenig@gcc135 tmp]$ gcc -v
Es werden eingebaute Spezifikationen verwendet.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/home/tkoenig/libexec/gcc/powerpc64le-unknown-linux-gnu/12.0.0/lto-wrapper
Ziel: powerpc64le-unknown-linux-gnu
Konfiguriert mit: ../trunk/configure --prefix=/home/tkoenig
--enable-languages=c,c++,fortran
Thread-Modell: posix
Unterstützte LTO-Kompressionsalgorithmen: zlib
gcc-Version 12.0.0 20210516 (experimental) (GCC) 
[tkoenig@gcc135 tmp]$ cat add.c

unsigned int foo(unsigned int *a, int n)
{
  int i;
  unsigned int res = 0;
  for (i=0; i<n; i++)
    res += a[i];

  return res;
}

unsigned int foo2 (unsigned int *a, int n)
{
  int i;
  unsigned long res = 0;
  for (i=0; i<n; i++)
    res += a[i];

  return res;
}
[tkoenig@gcc135 tmp]$ gcc -O -c add.c  && objdump --disassemble add.o

add.o:     file format elf64-powerpcle


Disassembly of section .text:

0000000000000000 <foo>:
   0:   00 00 04 2c     cmpwi   r4,0
   4:   30 00 81 40     ble     34 <foo+0x34>
   8:   20 00 89 78     clrldi  r9,r4,32
   c:   fc ff 43 39     addi    r10,r3,-4
  10:   00 00 60 38     li      r3,0
  14:   a6 03 29 7d     mtctr   r9
  18:   04 00 0a 85     lwzu    r8,4(r10)
  1c:   14 1a 68 7c     add     r3,r8,r3
  20:   20 00 63 78     clrldi  r3,r3,32
  24:   ff ff 29 39     addi    r9,r9,-1
  28:   20 00 29 79     clrldi  r9,r9,32
  2c:   ec ff 00 42     bdnz    18 <foo+0x18>
  30:   20 00 80 4e     blr
  34:   00 00 60 38     li      r3,0
  38:   20 00 80 4e     blr
        ...

0000000000000048 <foo2>:
  48:   00 00 04 2c     cmpwi   r4,0
  4c:   30 00 81 40     ble     7c <foo2+0x34>
  50:   20 00 89 78     clrldi  r9,r4,32
  54:   fc ff 43 39     addi    r10,r3,-4
  58:   00 00 60 38     li      r3,0
  5c:   a6 03 29 7d     mtctr   r9
  60:   04 00 0a 85     lwzu    r8,4(r10)
  64:   14 42 63 7c     add     r3,r3,r8
  68:   ff ff 29 39     addi    r9,r9,-1
  6c:   20 00 29 79     clrldi  r9,r9,32
  70:   f0 ff 00 42     bdnz    60 <foo2+0x18>
  74:   20 00 63 78     clrldi  r3,r3,32
  78:   20 00 80 4e     blr
  7c:   00 00 60 38     li      r3,0
  80:   f4 ff ff 4b     b       74 <foo2+0x2c>


With -O2 -fno-unroll-loops (foo only):

0000000000000000 <foo>:
   0:   00 00 04 2c     cmpwi   r4,0
   4:   3c 00 81 40     ble     40 <foo+0x40>
   8:   20 00 8a 78     clrldi  r10,r4,32
   c:   fc ff 23 39     addi    r9,r3,-4
  10:   a6 03 49 7d     mtctr   r10
  14:   00 00 60 38     li      r3,0
  18:   00 00 00 60     nop
  1c:   00 00 42 60     ori     r2,r2,0
  20:   04 00 49 85     lwzu    r10,4(r9)
  24:   14 1a 6a 7c     add     r3,r10,r3
  28:   20 00 63 78     clrldi  r3,r3,32
  2c:   f4 ff 00 42     bdnz    20 <foo+0x20>
  30:   20 00 80 4e     blr
  34:   00 00 00 60     nop
  38:   00 00 00 60     nop
  3c:   00 00 42 60     ori     r2,r2,0
  40:   00 00 60 38     li      r3,0
  44:   20 00 80 4e     blr
        ...
  54:   00 00 00 60     nop
  58:   00 00 00 60     nop
  5c:   00 00 42 60     ori     r2,r2,0

The clrldi is still in there.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/100622] Conversion to smaller unsigned type in loop
  2021-05-16  8:37 [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop tkoenig at gcc dot gnu.org
  2021-05-17  8:46 ` [Bug rtl-optimization/100622] " tkoenig at gcc dot gnu.org
  2021-05-17  9:05 ` tkoenig at gcc dot gnu.org
@ 2021-05-17 12:41 ` rguenth at gcc dot gnu.org
  2021-05-17 13:39 ` tkoenig at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-05-17 12:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
why is masking not needed?  it looks like it is present in both cases, once
before the return and once after the add (that could be sunk).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/100622] Conversion to smaller unsigned type in loop
  2021-05-16  8:37 [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop tkoenig at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-05-17 12:41 ` rguenth at gcc dot gnu.org
@ 2021-05-17 13:39 ` tkoenig at gcc dot gnu.org
  2021-05-17 22:02 ` segher at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2021-05-17 13:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622

--- Comment #4 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Yes, the masking should be only performed at the end.

However, the inner loop could be further simplified to

label:
    lwzu r8,4(r10)
    add r3,r8,r3
    bdnz label

without the need to do anything with r9, so this is probably
more than one topic in one test case.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/100622] Conversion to smaller unsigned type in loop
  2021-05-16  8:37 [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop tkoenig at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-05-17 13:39 ` tkoenig at gcc dot gnu.org
@ 2021-05-17 22:02 ` segher at gcc dot gnu.org
  2021-06-08  6:25 ` guojiufu at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: segher at gcc dot gnu.org @ 2021-05-17 22:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622

Segher Boessenkool <segher at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

--- Comment #5 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Thomas Koenig from comment #4)
> Yes, the masking should be only performed at the end.
> 
> However, the inner loop could be further simplified to
> 
> label:
>     lwzu r8,4(r10)
>     add r3,r8,r3
>     bdnz label
> 
> without the need to do anything with r9, so this is probably
> more than one topic in one test case.

Please use -O2 instead, no one will care much about -O1.  You can use
-fno-unroll-loops to make it easier to read.

The core for foo is

.L3:
        lwzu 10,4(9)
        add 3,10,3
        rldicl 3,3,0,32
        bdnz .L3

and for foo2 is

.L10:
        lwzu 10,4(9)
        add 3,3,10
        bdnz .L10

This is this way in Gimple already: the IV is a DImode, while it would
be better as a SImode.  That is the root of the problem here.  Sinking
extensions could well help, but the IV should not be DImode in the first
place!

Confirmed.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/100622] Conversion to smaller unsigned type in loop
  2021-05-16  8:37 [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop tkoenig at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-05-17 22:02 ` segher at gcc dot gnu.org
@ 2021-06-08  6:25 ` guojiufu at gcc dot gnu.org
  2021-06-08 12:08 ` segher at gcc dot gnu.org
  2021-09-17  6:49 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2021-06-08  6:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622

Jiu Fu Guo <guojiufu at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED
                 CC|                            |guojiufu at gcc dot gnu.org

--- Comment #6 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
Had a test, this issue has been fixed on the trunk by r12-1202.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/100622] Conversion to smaller unsigned type in loop
  2021-05-16  8:37 [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop tkoenig at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-06-08  6:25 ` guojiufu at gcc dot gnu.org
@ 2021-06-08 12:08 ` segher at gcc dot gnu.org
  2021-09-17  6:49 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: segher at gcc dot gnu.org @ 2021-06-08 12:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622

--- Comment #7 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Nice :-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/100622] Conversion to smaller unsigned type in loop
  2021-05-16  8:37 [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop tkoenig at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2021-06-08 12:08 ` segher at gcc dot gnu.org
@ 2021-09-17  6:49 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-17  6:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-09-17  6:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-16  8:37 [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop tkoenig at gcc dot gnu.org
2021-05-17  8:46 ` [Bug rtl-optimization/100622] " tkoenig at gcc dot gnu.org
2021-05-17  9:05 ` tkoenig at gcc dot gnu.org
2021-05-17 12:41 ` rguenth at gcc dot gnu.org
2021-05-17 13:39 ` tkoenig at gcc dot gnu.org
2021-05-17 22:02 ` segher at gcc dot gnu.org
2021-06-08  6:25 ` guojiufu at gcc dot gnu.org
2021-06-08 12:08 ` segher at gcc dot gnu.org
2021-09-17  6:49 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).