public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/65335] New: Potential optimization issue with 'tree-loop-vectorize'
@ 2015-03-06 16:16 anwilli5 at ncsu dot edu
2015-03-09 12:06 ` [Bug tree-optimization/65335] " rguenth at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: anwilli5 at ncsu dot edu @ 2015-03-06 16:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65335
Bug ID: 65335
Summary: Potential optimization issue with
'tree-loop-vectorize'
Product: gcc
Version: 4.9.2
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: anwilli5 at ncsu dot edu
When I enable the tree-loop-vectorize optimization I'm seeing some behavior
that I don't understand...
Here is a minimized test case that highlights the scenario:
typedef long unsigned int size_t;
extern void *malloc (size_t __size);
int main(){
unsigned int a = 2;
unsigned int *buffer = malloc(10000 * sizeof(*buffer));
for (int i = 0; i < 10000; i++){
if ((i % 1000) == 0){
a = a * a * a * a * a;
}
buffer[i] = a;
}
return buffer[999];
}
When compiled with the option disabled (xgcc -save-temps -m32 -Wall -Wextra
-std=c99 -O3 -fno-tree-loop-vectorize -S -masm=intel test.c) the following code
is produced:
mov ebx, 2 ; a = 2
mov edi, 274877907
sub esp, 20
push 40000
call malloc
add esp, 16
mov esi, eax ; buffer = malloc(...)
xor ecx, ecx ; i = 0
.p2align 4,,10
.p2align 3
.L3:
mov eax, ecx
imul edi
mov eax, ecx
sar eax, 31
sar edx, 6
sub edx, eax
imul edx, edx, 1000
cmp ecx, edx
jne .L2 ; if ((i % 1000) == 0) {
mov eax, ebx
imul eax, ebx
imul eax, eax
imul ebx, eax ; a = a * a * a * a * a; }
.L2:
mov DWORD PTR [esi+ecx*4], ebx ; buffer[i] = a
add ecx, 1 ; i++
cmp ecx, 10000
jne .L3 ; continue if i < 10000
...
When the tree-loop-vectorize option is enabled (xgcc -save-temps -m32 -Wall
-Wextra -std=c99 -O3 -ftree-loop-vectorize test.c -S -masm=intel), though, the
following code is generated:
mov esi, 2 ; a = 2
sub esp, 20
push 40000
call malloc
add esp, 16
mov edi, eax ; buffer = malloc(...)
xor ecx, ecx ; i = 0
.p2align 4,,10
.p2align 3
.L2:
mov ebx, esi
mov eax, 274877907
imul ecx
mov eax, ecx
imul ebx, esi
sar eax, 31
sar edx, 6
imul ebx, ebx
sub edx, eax
imul edx, edx, 1000
imul ebx, esi ; a = a * a * a * a * a;
cmp ecx, edx
cmove esi, ebx ; move new value if ((i % 1000) == 0)
mov DWORD PTR [edi+ecx*4], esi ; buffer[i] = a
add ecx, 1 ; i++
cmp ecx, 10000
jne .L2 ; continue if i < 10000
The main difference here is that the 'a * a * a * a * a' calculation is done
every loop iteration instead of every 1000th, but a is only assigned the new
value every 1000th time via the conditional move instruction. It seems
inefficient to do this, and from basic testing the code compiled without the
tree-loop-vectorize optimization seems to run faster on my machine.
The "real" code that I derived this from has it worse - it uses 64-bit data
types in a 32-bit binary, so there are several multiply instructions for each
logical multiplication in the code, the stack gets used for storing some of the
intermediate values, and after computing everything into some registers it just
replaces those values with ones stored on the stack from previous iterations in
the case where the modulus condition is not met. :(
GCC version:
Using built-in specs.
COLLECT_GCC=xgcc
Target: x86_64-unknown-linux-gnu
Configured with: ./configure
Thread model: posix
gcc version 4.9.2 (GCC)
I've also reproduced the issue on gcc 4.9.2 20141224 (prerelease) from an Arch
Linux distro and gcc 4.5.2-8ubuntu4 from a Ubuntu distro.
I'm happy to provide any other information needed. Thanks!
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-08-23 6:44 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-06 16:16 [Bug tree-optimization/65335] New: Potential optimization issue with 'tree-loop-vectorize' anwilli5 at ncsu dot edu
2015-03-09 12:06 ` [Bug tree-optimization/65335] " rguenth at gcc dot gnu.org
2021-08-21 18:59 ` pinskia at gcc dot gnu.org
2021-08-23 6:44 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).