public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/66278] New: Missed auto-vectorization of an array subtraction
@ 2015-05-25 13:55 marxin at gcc dot gnu.org
2015-05-25 17:17 ` [Bug tree-optimization/66278] " jakub at gcc dot gnu.org
0 siblings, 1 reply; 2+ messages in thread
From: marxin at gcc dot gnu.org @ 2015-05-25 13:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66278
Bug ID: 66278
Summary: Missed auto-vectorization of an array subtraction
Product: gcc
Version: 5.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: marxin at gcc dot gnu.org
Target Milestone: ---
Target: x86_64-linux-gnu
Hello.
In the following test case, we do not optimize assembly to utilize a vector
instruction.
$ cat vector.c
#include <stdint.h>
#define N 101
int main(int argc, char **argv)
{
uint32_t array[N][N][N];
const unsigned int next = argc == 3 ? 0 : 1;
for (unsigned i = next; i < N; i++)
array[3][3][i] = array[3][3][i] - 10;
return array[3][3][argc];
}
gcc 5.1.0 (same for GCC 4.8) with -O3 (http://goo.gl/zA7LMy):
main:
xorl %eax, %eax
subq $4121104, %rsp
cmpl $3, %edi
setne %al
.L2:
movl %eax, %edx
addl $1, %eax
subl $10, 123504(%rsp,%rdx,4)
cmpl $101, %eax
jne .L2
movslq %edi, %rdi
movl 123504(%rsp,%rdi,4), %eax
addq $4121104, %rsp
ret
icc 13.0.1 with -O3 (http://goo.gl/xzlz2C):
L__routine_start_main_0:
main:
pushq %rbp #6.1
movq %rsp, %rbp #6.1
andq $-128, %rsp #6.1
pushq %r12 #6.1
subq $4121208, %rsp #6.1
movl %edi, %r12d #6.1
movl $3, %edi #6.1
call __intel_new_proc_init #6.1
stmxcsr (%rsp) #6.1
movslq %r12d, %r12 #6.1
xorl %edi, %edi #9.37
movl $1, %esi #9.37
cmpq $3, %r12 #9.37
cmove %edi, %esi #9.37
orl $32832, (%rsp) #6.1
ldmxcsr (%rsp) #6.1
movl %esi, %ecx #11.3
negl %ecx #11.3
addl $101, %ecx #11.3
lea 123624(%rsp,%rsi,4), %rax #11.3
andq $15, %rax #11.3
movl %eax, %edx #11.3
negl %edx #11.3
addl $16, %edx #11.3
shrl $2, %edx #11.3
testl %eax, %eax #11.3
cmovne %edx, %eax #11.3
lea 4(%rax), %r8d #11.3
cmpl %r8d, %ecx #11.3
jl ..B1.16 # Prob 10% #11.3
movl %ecx, %edx #11.3
subl %eax, %edx #11.3
andl $3, %edx #11.3
negl %edx #11.3
addl %ecx, %edx #11.3
testl %eax, %eax #11.3
jbe ..B1.8 # Prob 10% #11.3
..B1.6: # Preds ..B1.4 ..B1.6
lea (%rsi,%rdi), %r8d #12.22
incl %edi #11.3
addl $-10, 123624(%rsp,%r8,4) #12.39
cmpl %eax, %edi #11.3
jb ..B1.6 # Prob 99% #11.3
..B1.8: # Preds ..B1.6 ..B1.4
movdqa .L_2il0floatpacket.2(%rip), %xmm0 #12.39
..B1.9: # Preds ..B1.9 ..B1.8
lea (%rsi,%rax), %edi #12.22
addl $4, %eax #11.3
cmpl %edx, %eax #11.3
movdqa 123624(%rsp,%rdi,4), %xmm1 #12.39
paddd %xmm0, %xmm1 #12.39
movdqa %xmm1, 123624(%rsp,%rdi,4) #12.5
jb ..B1.9 # Prob 99% #11.3
..B1.11: # Preds ..B1.9 ..B1.16
cmpl %ecx, %edx #11.3
jae ..B1.15 # Prob 10% #11.3
..B1.13: # Preds ..B1.11 ..B1.13
lea (%rsi,%rdx), %eax #12.22
incl %edx #11.3
addl $-10, 123624(%rsp,%rax,4) #12.39
cmpl %ecx, %edx #11.3
jb ..B1.13 # Prob 99% #11.3
..B1.15: # Preds ..B1.13 ..B1.11
movl 123624(%rsp,%r12,4), %eax #14.10
addq $4121208, %rsp #14.10
popq %r12 #14.10
movq %rbp, %rsp #14.10
popq %rbp #14.10
ret #14.10
..B1.16: # Preds ..B1.3 # Infreq
movl %edi, %edx #11.3
jmp ..B1.11 # Prob 100% #11.3
.L_2il0floatpacket.2:
.long 0xfffffff6,0xfffffff6,0xfffffff6,0xfffffff6
Thanks,
Martin
^ permalink raw reply [flat|nested] 2+ messages in thread
* [Bug tree-optimization/66278] Missed auto-vectorization of an array subtraction
2015-05-25 13:55 [Bug tree-optimization/66278] New: Missed auto-vectorization of an array subtraction marxin at gcc dot gnu.org
@ 2015-05-25 17:17 ` jakub at gcc dot gnu.org
0 siblings, 0 replies; 2+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-05-25 17:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66278
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Seems to be overly conservative tree-chrec.c code.
chrec_fold_plus is called on
(sizetype) {i_5, +, 1}_1 and 30906
where loop->nb_iterations_upper_bound is 100 and
loop->nb_iterations is (unsigned int) _4 + 1 <= 101 ? 100 - (unsigned int) _4 :
0
It is a pitty we don't use range info on _4 when simplifying
loop->nb_iterations, here it is [0, 1], so we could at least easily find out
that (unsigned int) _4 + 1 <= 101 is always true.
Anyway, chrec_fold_plus just gives up on:
CASE_CONVERT:
if (tree_contains_chrecs (op0, NULL))
return chrec_dont_know;
eventhough from the loop bounds in this case it could prove that for all loop
iterations the chrec always fits into the narrower type, including the addition
and thus it can safely move the addition into the chrec's op0.
If I change [3][3] to [0][0], then the problem is in chrec_fold_multiply
instead (again, we have (sizetype) i_16 * 4 and give up because of the cast).
If the testcase is modified, so that it uses unsigned long as i and next type,
then it is vectorized fine.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2015-05-25 17:17 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-25 13:55 [Bug tree-optimization/66278] New: Missed auto-vectorization of an array subtraction marxin at gcc dot gnu.org
2015-05-25 17:17 ` [Bug tree-optimization/66278] " jakub at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).