public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop
@ 2021-05-16 8:37 tkoenig at gcc dot gnu.org
2021-05-17 8:46 ` [Bug rtl-optimization/100622] " tkoenig at gcc dot gnu.org
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2021-05-16 8:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622
Bug ID: 100622
Summary: Conversion to smaller unsigned type in loop
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: tkoenig at gcc dot gnu.org
Target Milestone: ---
Consider
unsigned int foo(unsigned int *a, int n)
{
int i;
unsigned int res = 0;
for (i=0; i<n; i++)
res += a[i];
return res;
}
unsigned int foo2 (unsigned int *a, int n)
{
int i;
unsigned long res = 0;
for (i=0; i<n; i++)
res += a[i];
return res;
}
Given modular 2^n arithmetic, these two functions are identical in effect.
On POWER with a reasonably recent trunk with -O1, this gets compiled to
(using -O1 in order to avoid loop unrolling for better visibility)
$ gcc -O1 -c add.c && objdump --disassemble add.o
add.o: file format elf64-powerpcle
Disassembly of section .text:
0000000000000000 <foo>:
0: 00 00 04 2c cmpwi r4,0
4: 30 00 81 40 ble 34 <foo+0x34>
8: 20 00 89 78 clrldi r9,r4,32
c: fc ff 43 39 addi r10,r3,-4
10: 00 00 60 38 li r3,0
14: a6 03 29 7d mtctr r9
18: 04 00 0a 85 lwzu r8,4(r10)
1c: 14 1a 68 7c add r3,r8,r3
20: 20 00 63 78 clrldi r3,r3,32
24: ff ff 29 39 addi r9,r9,-1
28: 20 00 29 79 clrldi r9,r9,32
2c: ec ff 00 42 bdnz 18 <foo+0x18>
30: 20 00 80 4e blr
34: 00 00 60 38 li r3,0
38: 20 00 80 4e blr
...
0000000000000048 <foo2>:
48: 00 00 04 2c cmpwi r4,0
4c: 30 00 81 40 ble 7c <foo2+0x34>
50: 20 00 89 78 clrldi r9,r4,32
54: fc ff 43 39 addi r10,r3,-4
58: 00 00 60 38 li r3,0
5c: a6 03 29 7d mtctr r9
60: 04 00 0a 85 lwzu r8,4(r10)
64: 14 42 63 7c add r3,r3,r8
68: ff ff 29 39 addi r9,r9,-1
6c: 20 00 29 79 clrldi r9,r9,32
70: f0 ff 00 42 bdnz 60 <foo2+0x18>
74: 20 00 63 78 clrldi r3,r3,32
78: 20 00 80 4e blr
7c: 00 00 60 38 li r3,0
80: f4 ff ff 4b b 74 <foo2+0x2c>
so there is an extra instruction to mask the result of the
addition in foo. This should not be needed.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/100622] Conversion to smaller unsigned type in loop
2021-05-16 8:37 [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop tkoenig at gcc dot gnu.org
@ 2021-05-17 8:46 ` tkoenig at gcc dot gnu.org
2021-05-17 9:05 ` tkoenig at gcc dot gnu.org
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2021-05-17 8:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622
Thomas Koenig <tkoenig at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |SUSPENDED
Last reconfirmed| |2021-05-17
Ever confirmed|0 |1
--- Comment #1 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Hm, I just reran this with a much more recent version of trunk,
and the result is totally different.
So, please hold back on investigating this - this may have been
resolved in the meantime.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/100622] Conversion to smaller unsigned type in loop
2021-05-16 8:37 [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop tkoenig at gcc dot gnu.org
2021-05-17 8:46 ` [Bug rtl-optimization/100622] " tkoenig at gcc dot gnu.org
@ 2021-05-17 9:05 ` tkoenig at gcc dot gnu.org
2021-05-17 12:41 ` rguenth at gcc dot gnu.org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2021-05-17 9:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622
Thomas Koenig <tkoenig at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|SUSPENDED |UNCONFIRMED
Ever confirmed|1 |0
--- Comment #2 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
So, with recent trunk now.
[tkoenig@gcc135 tmp]$ gcc -v
Es werden eingebaute Spezifikationen verwendet.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/home/tkoenig/libexec/gcc/powerpc64le-unknown-linux-gnu/12.0.0/lto-wrapper
Ziel: powerpc64le-unknown-linux-gnu
Konfiguriert mit: ../trunk/configure --prefix=/home/tkoenig
--enable-languages=c,c++,fortran
Thread-Modell: posix
Unterstützte LTO-Kompressionsalgorithmen: zlib
gcc-Version 12.0.0 20210516 (experimental) (GCC)
[tkoenig@gcc135 tmp]$ cat add.c
unsigned int foo(unsigned int *a, int n)
{
int i;
unsigned int res = 0;
for (i=0; i<n; i++)
res += a[i];
return res;
}
unsigned int foo2 (unsigned int *a, int n)
{
int i;
unsigned long res = 0;
for (i=0; i<n; i++)
res += a[i];
return res;
}
[tkoenig@gcc135 tmp]$ gcc -O -c add.c && objdump --disassemble add.o
add.o: file format elf64-powerpcle
Disassembly of section .text:
0000000000000000 <foo>:
0: 00 00 04 2c cmpwi r4,0
4: 30 00 81 40 ble 34 <foo+0x34>
8: 20 00 89 78 clrldi r9,r4,32
c: fc ff 43 39 addi r10,r3,-4
10: 00 00 60 38 li r3,0
14: a6 03 29 7d mtctr r9
18: 04 00 0a 85 lwzu r8,4(r10)
1c: 14 1a 68 7c add r3,r8,r3
20: 20 00 63 78 clrldi r3,r3,32
24: ff ff 29 39 addi r9,r9,-1
28: 20 00 29 79 clrldi r9,r9,32
2c: ec ff 00 42 bdnz 18 <foo+0x18>
30: 20 00 80 4e blr
34: 00 00 60 38 li r3,0
38: 20 00 80 4e blr
...
0000000000000048 <foo2>:
48: 00 00 04 2c cmpwi r4,0
4c: 30 00 81 40 ble 7c <foo2+0x34>
50: 20 00 89 78 clrldi r9,r4,32
54: fc ff 43 39 addi r10,r3,-4
58: 00 00 60 38 li r3,0
5c: a6 03 29 7d mtctr r9
60: 04 00 0a 85 lwzu r8,4(r10)
64: 14 42 63 7c add r3,r3,r8
68: ff ff 29 39 addi r9,r9,-1
6c: 20 00 29 79 clrldi r9,r9,32
70: f0 ff 00 42 bdnz 60 <foo2+0x18>
74: 20 00 63 78 clrldi r3,r3,32
78: 20 00 80 4e blr
7c: 00 00 60 38 li r3,0
80: f4 ff ff 4b b 74 <foo2+0x2c>
With -O2 -fno-unroll-loops (foo only):
0000000000000000 <foo>:
0: 00 00 04 2c cmpwi r4,0
4: 3c 00 81 40 ble 40 <foo+0x40>
8: 20 00 8a 78 clrldi r10,r4,32
c: fc ff 23 39 addi r9,r3,-4
10: a6 03 49 7d mtctr r10
14: 00 00 60 38 li r3,0
18: 00 00 00 60 nop
1c: 00 00 42 60 ori r2,r2,0
20: 04 00 49 85 lwzu r10,4(r9)
24: 14 1a 6a 7c add r3,r10,r3
28: 20 00 63 78 clrldi r3,r3,32
2c: f4 ff 00 42 bdnz 20 <foo+0x20>
30: 20 00 80 4e blr
34: 00 00 00 60 nop
38: 00 00 00 60 nop
3c: 00 00 42 60 ori r2,r2,0
40: 00 00 60 38 li r3,0
44: 20 00 80 4e blr
...
54: 00 00 00 60 nop
58: 00 00 00 60 nop
5c: 00 00 42 60 ori r2,r2,0
The clrldi is still in there.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/100622] Conversion to smaller unsigned type in loop
2021-05-16 8:37 [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop tkoenig at gcc dot gnu.org
2021-05-17 8:46 ` [Bug rtl-optimization/100622] " tkoenig at gcc dot gnu.org
2021-05-17 9:05 ` tkoenig at gcc dot gnu.org
@ 2021-05-17 12:41 ` rguenth at gcc dot gnu.org
2021-05-17 13:39 ` tkoenig at gcc dot gnu.org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-05-17 12:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
why is masking not needed? it looks like it is present in both cases, once
before the return and once after the add (that could be sunk).
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/100622] Conversion to smaller unsigned type in loop
2021-05-16 8:37 [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop tkoenig at gcc dot gnu.org
` (2 preceding siblings ...)
2021-05-17 12:41 ` rguenth at gcc dot gnu.org
@ 2021-05-17 13:39 ` tkoenig at gcc dot gnu.org
2021-05-17 22:02 ` segher at gcc dot gnu.org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2021-05-17 13:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622
--- Comment #4 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Yes, the masking should be only performed at the end.
However, the inner loop could be further simplified to
label:
lwzu r8,4(r10)
add r3,r8,r3
bdnz label
without the need to do anything with r9, so this is probably
more than one topic in one test case.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/100622] Conversion to smaller unsigned type in loop
2021-05-16 8:37 [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop tkoenig at gcc dot gnu.org
` (3 preceding siblings ...)
2021-05-17 13:39 ` tkoenig at gcc dot gnu.org
@ 2021-05-17 22:02 ` segher at gcc dot gnu.org
2021-06-08 6:25 ` guojiufu at gcc dot gnu.org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: segher at gcc dot gnu.org @ 2021-05-17 22:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622
Segher Boessenkool <segher at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
--- Comment #5 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Thomas Koenig from comment #4)
> Yes, the masking should be only performed at the end.
>
> However, the inner loop could be further simplified to
>
> label:
> lwzu r8,4(r10)
> add r3,r8,r3
> bdnz label
>
> without the need to do anything with r9, so this is probably
> more than one topic in one test case.
Please use -O2 instead, no one will care much about -O1. You can use
-fno-unroll-loops to make it easier to read.
The core for foo is
.L3:
lwzu 10,4(9)
add 3,10,3
rldicl 3,3,0,32
bdnz .L3
and for foo2 is
.L10:
lwzu 10,4(9)
add 3,3,10
bdnz .L10
This is this way in Gimple already: the IV is a DImode, while it would
be better as a SImode. That is the root of the problem here. Sinking
extensions could well help, but the IV should not be DImode in the first
place!
Confirmed.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/100622] Conversion to smaller unsigned type in loop
2021-05-16 8:37 [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop tkoenig at gcc dot gnu.org
` (4 preceding siblings ...)
2021-05-17 22:02 ` segher at gcc dot gnu.org
@ 2021-06-08 6:25 ` guojiufu at gcc dot gnu.org
2021-06-08 12:08 ` segher at gcc dot gnu.org
2021-09-17 6:49 ` pinskia at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2021-06-08 6:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622
Jiu Fu Guo <guojiufu at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
CC| |guojiufu at gcc dot gnu.org
--- Comment #6 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
Had a test, this issue has been fixed on the trunk by r12-1202.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/100622] Conversion to smaller unsigned type in loop
2021-05-16 8:37 [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop tkoenig at gcc dot gnu.org
` (5 preceding siblings ...)
2021-06-08 6:25 ` guojiufu at gcc dot gnu.org
@ 2021-06-08 12:08 ` segher at gcc dot gnu.org
2021-09-17 6:49 ` pinskia at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: segher at gcc dot gnu.org @ 2021-06-08 12:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622
--- Comment #7 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Nice :-)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/100622] Conversion to smaller unsigned type in loop
2021-05-16 8:37 [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop tkoenig at gcc dot gnu.org
` (6 preceding siblings ...)
2021-06-08 12:08 ` segher at gcc dot gnu.org
@ 2021-09-17 6:49 ` pinskia at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-17 6:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |12.0
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-09-17 6:49 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-16 8:37 [Bug rtl-optimization/100622] New: Conversion to smaller unsigned type in loop tkoenig at gcc dot gnu.org
2021-05-17 8:46 ` [Bug rtl-optimization/100622] " tkoenig at gcc dot gnu.org
2021-05-17 9:05 ` tkoenig at gcc dot gnu.org
2021-05-17 12:41 ` rguenth at gcc dot gnu.org
2021-05-17 13:39 ` tkoenig at gcc dot gnu.org
2021-05-17 22:02 ` segher at gcc dot gnu.org
2021-06-08 6:25 ` guojiufu at gcc dot gnu.org
2021-06-08 12:08 ` segher at gcc dot gnu.org
2021-09-17 6:49 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).