public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt
@ 2013-07-09 7:55 vincenzo.innocente at cern dot ch
2013-07-09 9:44 ` [Bug tree-optimization/57858] " jakub at gcc dot gnu.org
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2013-07-09 7:55 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858
Bug ID: 57858
Summary: AVX2: ymm used for div, not for sqrt
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
in the following example div uses ymm registries while sqr only xmm ones
gcc version 4.9.0 20130630 (experimental) [trunk revision 200570] (GCC)
cat avx2sqrt.cc
#include<math.h>
double div() {
double s=0;
for (int i=0; i!=1024; ++i) s+=1./(i+1);
return s;
}
double sqr() {
double s=0;
for (int i=0; i!=1024; ++i) s+=sqrt(i+1);
return s;
}
c++ -std=c++11 -Ofast -S avx2sqrt.cc -march=corei7-avx -mavx2
-ftree-vectorizer-verbose=1 -Wall ; cat avx2sqrt.s
_Z3divv:
.LFB3:
.cfi_startproc
vmovdqa .LC1(%rip), %ymm6
xorl %eax, %eax
vxorpd %xmm1, %xmm1, %xmm1
vmovdqa .LC0(%rip), %ymm0
vmovdqa .LC2(%rip), %ymm5
vmovapd .LC3(%rip), %ymm2
jmp .L2
.p2align 4,,10
.p2align 3
.L3:
vmovdqa %ymm4, %ymm0
.L2:
vpaddd %ymm6, %ymm0, %ymm4
vpaddd %ymm5, %ymm0, %ymm0
addl $1, %eax
vextracti128 $0x1, %ymm0, %xmm3
vcvtdq2pd %xmm0, %ymm0
vcvtdq2pd %xmm3, %ymm3
vdivpd %ymm0, %ymm2, %ymm0
vdivpd %ymm3, %ymm2, %ymm3
vaddpd %ymm0, %ymm3, %ymm0
cmpl $128, %eax
vaddpd %ymm0, %ymm1, %ymm1
jne .L3
vhaddpd %ymm1, %ymm1, %ymm1
vperm2f128 $1, %ymm1, %ymm1, %ymm0
vaddpd %ymm0, %ymm1, %ymm0
vzeroupper
ret
.cfi_endproc
.LFE3:
.size _Z3divv, .-_Z3divv
.p2align 4,,15
.globl _Z3sqrv
.type _Z3sqrv, @function
_Z3sqrv:
.LFB4:
.cfi_startproc
movl $1, %eax
vmovsd .LC4(%rip), %xmm1
vxorpd %xmm0, %xmm0, %xmm0
jmp .L6
.p2align 4,,10
.p2align 3
.L7:
vcvtsi2sd %eax, %xmm1, %xmm1
vsqrtsd %xmm1, %xmm1, %xmm1
.L6:
addl $1, %eax
vaddsd %xmm1, %xmm0, %xmm0
cmpl $1025, %eax
jne .L7
rep; ret
.cfi_endproc
.LFE4:
.size _Z3sqrv, .-_Z3sqrv
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt
2013-07-09 7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
@ 2013-07-09 9:44 ` jakub at gcc dot gnu.org
2013-07-09 13:49 ` vincenzo.innocente at cern dot ch
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-07-09 9:44 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I'll look at this.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt
2013-07-09 7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
2013-07-09 9:44 ` [Bug tree-optimization/57858] " jakub at gcc dot gnu.org
@ 2013-07-09 13:49 ` vincenzo.innocente at cern dot ch
2013-07-09 15:33 ` glisse at gcc dot gnu.org
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2013-07-09 13:49 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858
--- Comment #2 from vincenzo Innocente <vincenzo.innocente at cern dot ch> ---
actually the code for div and sqr is different already for standard SSE
c++ -std=c++11 -Ofast -S avx2sqrt.cc -ftree-vectorizer-verbose=1 -Wall ; cat
avx2sqrt.s
.L2:
movdqa %xmm0, %xmm1
addl $1, %eax
movdqa %xmm0, %xmm4
cmpl $256, %eax
paddd %xmm5, %xmm1
pshufd $238, %xmm1, %xmm0
cvtdq2pd %xmm1, %xmm1
movapd %xmm3, %xmm7
paddd %xmm6, %xmm4
cvtdq2pd %xmm0, %xmm0
divpd %xmm0, %xmm7
movapd %xmm7, %xmm0
movapd %xmm3, %xmm7
divpd %xmm1, %xmm7
addpd %xmm7, %xmm0
addpd %xmm0, %xmm2
jne .L3
movapd %xmm2, -24(%rsp)
movsd -16(%rsp), %xmm0
addsd %xmm2, %xmm0
ret
.cfi_endproc
.LFE3:
.size _Z3divv, .-_Z3divv
.p2align 4,,15
.globl _Z3sqrv
.type _Z3sqrv, @function
_Z3sqrv:
.LFB4:
.cfi_startproc
movl $1, %eax
movsd .LC4(%rip), %xmm1
xorpd %xmm0, %xmm0
jmp .L6
.p2align 4,,10
.p2align 3
.L7:
cvtsi2sd %eax, %xmm1
sqrtsd %xmm1, %xmm1
.L6:
addl $1, %eax
addsd %xmm1, %xmm0
cmpl $1025, %eax
jne .L7
rep; ret
.cfi_endproc
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt
2013-07-09 7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
2013-07-09 9:44 ` [Bug tree-optimization/57858] " jakub at gcc dot gnu.org
2013-07-09 13:49 ` vincenzo.innocente at cern dot ch
@ 2013-07-09 15:33 ` glisse at gcc dot gnu.org
2013-07-10 6:37 ` jakub at gcc dot gnu.org
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-07-09 15:33 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858
--- Comment #3 from Marc Glisse <glisse at gcc dot gnu.org> ---
-fno-tree-pre lets it vectorize sqr as well. PRE creates a jump to the middle
of the loop body, which is nice but prevents vectorization.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt
2013-07-09 7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
` (2 preceding siblings ...)
2013-07-09 15:33 ` glisse at gcc dot gnu.org
@ 2013-07-10 6:37 ` jakub at gcc dot gnu.org
2013-07-10 9:51 ` vincenzo.innocente at cern dot ch
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-07-10 6:37 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Actually, it isn't vectorized at all, because PRE attempts to be smart, figures
out that for the first iteration of the loop it can avoid computing the sqrt
because the result will be one, and moves thus the sqrt call into the latch,
but we can't vectorize any loops that have non-empty latches.
So, either the vectorizer would need to undo this transformation, or PRE not do
it at all, or arrange for it to be done only after vectorizations. Richard,
any thoughts on this?
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt
2013-07-09 7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
` (3 preceding siblings ...)
2013-07-10 6:37 ` jakub at gcc dot gnu.org
@ 2013-07-10 9:51 ` vincenzo.innocente at cern dot ch
2021-07-30 6:06 ` pinskia at gcc dot gnu.org
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2013-07-10 9:51 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858
--- Comment #5 from vincenzo Innocente <vincenzo.innocente at cern dot ch> ---
I remember something similar in the past
--param max-completely-peel-times=1
sort of fix it… (why pre does not recognize that 1/(1+0) == 1 btw??
of course it is just a benchmark (and I can modify it to avoid the loop
peeling),
still
>From gcc-bugs-return-426080-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Wed Jul 10 10:11:27 2013
Return-Path: <gcc-bugs-return-426080-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 7750 invoked by alias); 10 Jul 2013 10:11:26 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 7690 invoked by uid 48); 10 Jul 2013 10:11:22 -0000
From: "sebastian.huber@embedded-brains.de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/45208] powerpc-gcc -msdata breakdown on incomplete initializers
Date: Wed, 10 Jul 2013 10:11:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: unknown
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: sebastian.huber@embedded-brains.de
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-45208-4-QiAUOqtDJD@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-45208-4@http.gcc.gnu.org/bugzilla/>
References: <bug-45208-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-07/txt/msg00587.txt.bz2
Content-length: 520
http://gcc.gnu.org/bugzilla/show_bug.cgi?idE208
Sebastian Huber <sebastian.huber@embedded-brains.de> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |sebastian.huber@embedded-br
| |ains.de
--- Comment #1 from Sebastian Huber <sebastian.huber@embedded-brains.de> ---
I can no longer reproduce this problem with GCC 4.7 and 4.8.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt
2013-07-09 7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
` (4 preceding siblings ...)
2013-07-10 9:51 ` vincenzo.innocente at cern dot ch
@ 2021-07-30 6:06 ` pinskia at gcc dot gnu.org
2021-07-30 9:16 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-07-30 6:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
So this was fixed in GCC 8 but I cannot tell by what. ch_vect has been there
since 2014 which should have done the copying of the header but did not until
GCC 8. There is not enough debug output to tell what changed either.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt
2013-07-09 7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
` (5 preceding siblings ...)
2021-07-30 6:06 ` pinskia at gcc dot gnu.org
@ 2021-07-30 9:16 ` rguenth at gcc dot gnu.org
2021-07-30 22:56 ` pinskia at gcc dot gnu.org
2021-09-11 6:37 ` pinskia at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-07-30 9:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org
Status|UNCONFIRMED |RESOLVED
Blocks| |53947
Resolution|--- |FIXED
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
It was fixed by adding another loop header copying pass before vectorization,
aka ch_vect. Of course it means we peel one iteration which might be not 100%
optimal. Optimally we'd teach PRE that those loop carried dependences are
bad(TM) just like we do for loads and extend that to cover calls. The peeling
means we need an epilogue, so we didn't really save a sqrt call.
That said, the situation is somewhat mitigated now and I'd declare it fixed
anyway, the testcase is somewhat artificial (resolvable at compile time).
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt
2013-07-09 7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
` (6 preceding siblings ...)
2021-07-30 9:16 ` rguenth at gcc dot gnu.org
@ 2021-07-30 22:56 ` pinskia at gcc dot gnu.org
2021-09-11 6:37 ` pinskia at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-07-30 22:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858
--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #7)
> It was fixed by adding another loop header copying pass before
> vectorization, aka ch_vect.
But that went in way in GCC 6 (r6-1951) but the loop header copying was not
happening until GCC 8.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt
2013-07-09 7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
` (7 preceding siblings ...)
2021-07-30 22:56 ` pinskia at gcc dot gnu.org
@ 2021-09-11 6:37 ` pinskia at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-11 6:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |8.0
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-09-11 6:37 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-09 7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
2013-07-09 9:44 ` [Bug tree-optimization/57858] " jakub at gcc dot gnu.org
2013-07-09 13:49 ` vincenzo.innocente at cern dot ch
2013-07-09 15:33 ` glisse at gcc dot gnu.org
2013-07-10 6:37 ` jakub at gcc dot gnu.org
2013-07-10 9:51 ` vincenzo.innocente at cern dot ch
2021-07-30 6:06 ` pinskia at gcc dot gnu.org
2021-07-30 9:16 ` rguenth at gcc dot gnu.org
2021-07-30 22:56 ` pinskia at gcc dot gnu.org
2021-09-11 6:37 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).