[Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt
@ 2013-07-09  7:55 vincenzo.innocente at cern dot ch
  2013-07-09  9:44 ` [Bug tree-optimization/57858] " jakub at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2013-07-09  7:55 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858

            Bug ID: 57858
           Summary: AVX2: ymm used for div, not for sqrt
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vincenzo.innocente at cern dot ch

in the following example div uses ymm registries while sqr only xmm ones
gcc version 4.9.0 20130630 (experimental) [trunk revision 200570] (GCC) 

cat avx2sqrt.cc
#include<math.h>
double div() {
   double s=0;
   for (int i=0; i!=1024; ++i) s+=1./(i+1);
   return s;
}


double sqr() {
   double s=0;
   for (int i=0; i!=1024; ++i) s+=sqrt(i+1);
   return s;
}

c++ -std=c++11 -Ofast -S avx2sqrt.cc -march=corei7-avx -mavx2
-ftree-vectorizer-verbose=1 -Wall ; cat avx2sqrt.s

_Z3divv:
.LFB3:
    .cfi_startproc
    vmovdqa    .LC1(%rip), %ymm6
    xorl    %eax, %eax
    vxorpd    %xmm1, %xmm1, %xmm1
    vmovdqa    .LC0(%rip), %ymm0
    vmovdqa    .LC2(%rip), %ymm5
    vmovapd    .LC3(%rip), %ymm2
    jmp    .L2
    .p2align 4,,10
    .p2align 3
.L3:
    vmovdqa    %ymm4, %ymm0
.L2:
    vpaddd    %ymm6, %ymm0, %ymm4
    vpaddd    %ymm5, %ymm0, %ymm0
    addl    $1, %eax
    vextracti128    $0x1, %ymm0, %xmm3
    vcvtdq2pd    %xmm0, %ymm0
    vcvtdq2pd    %xmm3, %ymm3
    vdivpd    %ymm0, %ymm2, %ymm0
    vdivpd    %ymm3, %ymm2, %ymm3
    vaddpd    %ymm0, %ymm3, %ymm0
    cmpl    $128, %eax
    vaddpd    %ymm0, %ymm1, %ymm1
    jne    .L3
    vhaddpd    %ymm1, %ymm1, %ymm1
    vperm2f128    $1, %ymm1, %ymm1, %ymm0
    vaddpd    %ymm0, %ymm1, %ymm0
    vzeroupper
    ret
    .cfi_endproc
.LFE3:
    .size    _Z3divv, .-_Z3divv
    .p2align 4,,15
    .globl    _Z3sqrv
    .type    _Z3sqrv, @function
_Z3sqrv:
.LFB4:
    .cfi_startproc
    movl    $1, %eax
    vmovsd    .LC4(%rip), %xmm1
    vxorpd    %xmm0, %xmm0, %xmm0
    jmp    .L6
    .p2align 4,,10
    .p2align 3
.L7:
    vcvtsi2sd    %eax, %xmm1, %xmm1
    vsqrtsd    %xmm1, %xmm1, %xmm1
.L6:
    addl    $1, %eax
    vaddsd    %xmm1, %xmm0, %xmm0
    cmpl    $1025, %eax
    jne    .L7
    rep; ret
    .cfi_endproc
.LFE4:
    .size    _Z3sqrv, .-_Z3sqrv


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt
  2013-07-09  7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
@ 2013-07-09  9:44 ` jakub at gcc dot gnu.org
  2013-07-09 13:49 ` vincenzo.innocente at cern dot ch
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-07-09  9:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I'll look at this.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt
  2013-07-09  7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
  2013-07-09  9:44 ` [Bug tree-optimization/57858] " jakub at gcc dot gnu.org
@ 2013-07-09 13:49 ` vincenzo.innocente at cern dot ch
  2013-07-09 15:33 ` glisse at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2013-07-09 13:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858

--- Comment #2 from vincenzo Innocente <vincenzo.innocente at cern dot ch> ---
actually the code for div and sqr is different already for standard SSE
c++ -std=c++11 -Ofast -S avx2sqrt.cc -ftree-vectorizer-verbose=1 -Wall ; cat
avx2sqrt.s

.L2:
    movdqa    %xmm0, %xmm1
    addl    $1, %eax
    movdqa    %xmm0, %xmm4
    cmpl    $256, %eax
    paddd    %xmm5, %xmm1
    pshufd    $238, %xmm1, %xmm0
    cvtdq2pd    %xmm1, %xmm1
    movapd    %xmm3, %xmm7
    paddd    %xmm6, %xmm4
    cvtdq2pd    %xmm0, %xmm0
    divpd    %xmm0, %xmm7
    movapd    %xmm7, %xmm0
    movapd    %xmm3, %xmm7
    divpd    %xmm1, %xmm7
    addpd    %xmm7, %xmm0
    addpd    %xmm0, %xmm2
    jne    .L3
    movapd    %xmm2, -24(%rsp)
    movsd    -16(%rsp), %xmm0
    addsd    %xmm2, %xmm0
    ret
    .cfi_endproc
.LFE3:
    .size    _Z3divv, .-_Z3divv
    .p2align 4,,15
    .globl    _Z3sqrv
    .type    _Z3sqrv, @function
_Z3sqrv:
.LFB4:
    .cfi_startproc
    movl    $1, %eax
    movsd    .LC4(%rip), %xmm1
    xorpd    %xmm0, %xmm0
    jmp    .L6
    .p2align 4,,10
    .p2align 3
.L7:
    cvtsi2sd    %eax, %xmm1
    sqrtsd    %xmm1, %xmm1
.L6:
    addl    $1, %eax
    addsd    %xmm1, %xmm0
    cmpl    $1025, %eax
    jne    .L7
    rep; ret
    .cfi_endproc


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt
  2013-07-09  7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
  2013-07-09  9:44 ` [Bug tree-optimization/57858] " jakub at gcc dot gnu.org
  2013-07-09 13:49 ` vincenzo.innocente at cern dot ch
@ 2013-07-09 15:33 ` glisse at gcc dot gnu.org
  2013-07-10  6:37 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-07-09 15:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858

--- Comment #3 from Marc Glisse <glisse at gcc dot gnu.org> ---
-fno-tree-pre lets it vectorize sqr as well. PRE creates a jump to the middle
of the loop body, which is nice but prevents vectorization.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt
  2013-07-09  7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
                   ` (2 preceding siblings ...)
  2013-07-09 15:33 ` glisse at gcc dot gnu.org
@ 2013-07-10  6:37 ` jakub at gcc dot gnu.org
  2013-07-10  9:51 ` vincenzo.innocente at cern dot ch
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-07-10  6:37 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Actually, it isn't vectorized at all, because PRE attempts to be smart, figures
out that for the first iteration of the loop it can avoid computing the sqrt
because the result will be one, and moves thus the sqrt call into the latch,
but we can't vectorize any loops that have non-empty latches.
So, either the vectorizer would need to undo this transformation, or PRE not do
it at all, or arrange for it to be done only after vectorizations.  Richard,
any thoughts on this?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt
  2013-07-09  7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
                   ` (3 preceding siblings ...)
  2013-07-10  6:37 ` jakub at gcc dot gnu.org
@ 2013-07-10  9:51 ` vincenzo.innocente at cern dot ch
  2021-07-30  6:06 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2013-07-10  9:51 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858

--- Comment #5 from vincenzo Innocente <vincenzo.innocente at cern dot ch> ---
I remember something similar in the past
--param max-completely-peel-times=1 
sort of fix it…  (why pre does not recognize that 1/(1+0) == 1  btw??

of course it is just a benchmark (and I can modify it to avoid the loop
peeling),
still
>From gcc-bugs-return-426080-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Wed Jul 10 10:11:27 2013
Return-Path: <gcc-bugs-return-426080-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 7750 invoked by alias); 10 Jul 2013 10:11:26 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 7690 invoked by uid 48); 10 Jul 2013 10:11:22 -0000
From: "sebastian.huber@embedded-brains.de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/45208] powerpc-gcc -msdata breakdown on incomplete initializers
Date: Wed, 10 Jul 2013 10:11:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: unknown
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: sebastian.huber@embedded-brains.de
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-45208-4-QiAUOqtDJD@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-45208-4@http.gcc.gnu.org/bugzilla/>
References: <bug-45208-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-07/txt/msg00587.txt.bz2
Content-length: 520

http://gcc.gnu.org/bugzilla/show_bug.cgi?idE208

Sebastian Huber <sebastian.huber@embedded-brains.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sebastian.huber@embedded-br
                   |                            |ains.de

--- Comment #1 from Sebastian Huber <sebastian.huber@embedded-brains.de> ---
I can no longer reproduce this problem with GCC 4.7 and 4.8.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt
  2013-07-09  7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
                   ` (4 preceding siblings ...)
  2013-07-10  9:51 ` vincenzo.innocente at cern dot ch
@ 2021-07-30  6:06 ` pinskia at gcc dot gnu.org
  2021-07-30  9:16 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-07-30  6:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
So this was fixed in GCC 8 but I cannot tell by what.  ch_vect has been there
since 2014 which should have done the copying of the header but did not until
GCC 8.  There is not enough debug output to tell what changed either.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt
  2013-07-09  7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
                   ` (5 preceding siblings ...)
  2021-07-30  6:06 ` pinskia at gcc dot gnu.org
@ 2021-07-30  9:16 ` rguenth at gcc dot gnu.org
  2021-07-30 22:56 ` pinskia at gcc dot gnu.org
  2021-09-11  6:37 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-07-30  9:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org
             Status|UNCONFIRMED                 |RESOLVED
             Blocks|                            |53947
         Resolution|---                         |FIXED

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
It was fixed by adding another loop header copying pass before vectorization,
aka ch_vect.  Of course it means we peel one iteration which might be not 100%
optimal.  Optimally we'd teach PRE that those loop carried dependences are
bad(TM) just like we do for loads and extend that to cover calls.  The peeling
means we need an epilogue, so we didn't really save a sqrt call.

That said, the situation is somewhat mitigated now and I'd declare it fixed
anyway, the testcase is somewhat artificial (resolvable at compile time).


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt
  2013-07-09  7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
                   ` (6 preceding siblings ...)
  2021-07-30  9:16 ` rguenth at gcc dot gnu.org
@ 2021-07-30 22:56 ` pinskia at gcc dot gnu.org
  2021-09-11  6:37 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-07-30 22:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #7)
> It was fixed by adding another loop header copying pass before
> vectorization, aka ch_vect. 

But that went in way in GCC 6 (r6-1951) but the loop header copying was not
happening until GCC 8.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/57858] AVX2: ymm used for div, not for sqrt
  2013-07-09  7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
                   ` (7 preceding siblings ...)
  2021-07-30 22:56 ` pinskia at gcc dot gnu.org
@ 2021-09-11  6:37 ` pinskia at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-11  6:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57858

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |8.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-09-11  6:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-09  7:55 [Bug tree-optimization/57858] New: AVX2: ymm used for div, not for sqrt vincenzo.innocente at cern dot ch
2013-07-09  9:44 ` [Bug tree-optimization/57858] " jakub at gcc dot gnu.org
2013-07-09 13:49 ` vincenzo.innocente at cern dot ch
2013-07-09 15:33 ` glisse at gcc dot gnu.org
2013-07-10  6:37 ` jakub at gcc dot gnu.org
2013-07-10  9:51 ` vincenzo.innocente at cern dot ch
2021-07-30  6:06 ` pinskia at gcc dot gnu.org
2021-07-30  9:16 ` rguenth at gcc dot gnu.org
2021-07-30 22:56 ` pinskia at gcc dot gnu.org
2021-09-11  6:37 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).