public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/57954] New: AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
@ 2013-07-22 14:50 vincenzo.innocente at cern dot ch
  2013-07-26  0:08 ` [Bug target/57954] " hjl.tools at gmail dot com
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2013-07-22 14:50 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954

            Bug ID: 57954
           Summary: AVX missing vxorps (zeroing) before vcvtsi2s %edx,
                    slow down AVX code
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vincenzo.innocente at cern dot ch

in the following benchmark performances w/o vectorization are poor wrt to
expectations
I find out this is due to non zeroing a register before using it 

c++ -O2 -S polyAVX.cpp -mavx
 as -v --64 -o polyAVX.o polyAVX.s
GNU assembler version 2.23.1 (x86_64-redhat-linux-gnu) using BFD version (GNU
Binutils) 2.23.1
c++ -O2 polyAVX.o -march=corei7-avx ; time ./a.out
53896530759
15.418u 0.000s 0:15.43 99.8%    0+0k 0+0io 1pf+0w
patch polyAVX.s
49a50
>         vxorps          %xmm0,%xmm0,%xmm0
patching file polyAVX.s
as -v --64 -o polyAVX.o polyAVX.s
GNU assembler version 2.23.1 (x86_64-redhat-linux-gnu) using BFD version (GNU
Binutils) 2.23.1
c++ -O2 polyAVX.o -march=corei7-avx ; time ./a.out
10340756863
2.958u 0.000s 0:02.96 99.6%    0+0k 0+0io 1pf+0w

I am sure there are many other cases like this.
gcc version 4.9.0 20130718 (experimental) [trunk revision 201034] (GCC) 

cat polyAVX.cpp 
//template<typename T>
typedef float T;
inline T polyHorner(T y) {
  return  T(0x2.p0) + y * (T(0x2.p0) + y * (T(0x1.p0) + y * (T(0x5.55523p-4) +
y * (T(0x1.5554dcp-4) + y * (T(0x4.48f41p-8) + y * T(0xb.6ad4p-12)))))) ;
}

#include <x86intrin.h>
#include<iostream>

volatile unsigned long long rdtsc() {
    unsigned int taux=0;
    return __rdtscp(&taux);
  }

int main() {


  long long t=0;

    bool ret=true;
    float s =0;
    for (int k=0; k!=100; ++k) {
      float c =   1.f/10000000.f;
      t -=rdtsc();
      for (int i=1; i<10000001; ++i) s+= polyHorner((float(i)+float(k))*c);
      t    +=rdtsc();
    }
    ret &= s!=0;

  std::cout << t <<std::endl;

  return ret ? 0 : -1;


}


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
  2013-07-22 14:50 [Bug target/57954] New: AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code vincenzo.innocente at cern dot ch
@ 2013-07-26  0:08 ` hjl.tools at gmail dot com
  2013-07-26 19:06 ` hjl.tools at gmail dot com
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: hjl.tools at gmail dot com @ 2013-07-26  0:08 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dushistov at mail dot ru

--- Comment #1 from H.J. Lu <hjl.tools at gmail dot com> ---
*** Bug 57988 has been marked as a duplicate of this bug. ***


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
  2013-07-22 14:50 [Bug target/57954] New: AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code vincenzo.innocente at cern dot ch
  2013-07-26  0:08 ` [Bug target/57954] " hjl.tools at gmail dot com
@ 2013-07-26 19:06 ` hjl.tools at gmail dot com
  2013-07-26 19:59 ` dushistov at mail dot ru
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: hjl.tools at gmail dot com @ 2013-07-26 19:06 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hjl.tools at gmail dot com

--- Comment #2 from H.J. Lu <hjl.tools at gmail dot com> ---
Created attachment 30560
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30560&action=edit
A patch

This patch adds X86_TUNE_SSE_PARTIAL_REG_STALL.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
  2013-07-22 14:50 [Bug target/57954] New: AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code vincenzo.innocente at cern dot ch
  2013-07-26  0:08 ` [Bug target/57954] " hjl.tools at gmail dot com
  2013-07-26 19:06 ` hjl.tools at gmail dot com
@ 2013-07-26 19:59 ` dushistov at mail dot ru
  2013-07-26 20:02 ` hjl.tools at gmail dot com
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: dushistov at mail dot ru @ 2013-07-26 19:59 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954

--- Comment #3 from Evgeniy Dushistov <dushistov at mail dot ru> ---
Great, I tested the patch, at now pi calculation as fast as in "icc", and two
times faster then in clang 3.3.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
  2013-07-22 14:50 [Bug target/57954] New: AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code vincenzo.innocente at cern dot ch
                   ` (2 preceding siblings ...)
  2013-07-26 19:59 ` dushistov at mail dot ru
@ 2013-07-26 20:02 ` hjl.tools at gmail dot com
  2013-07-27 17:21 ` vincenzo.innocente at cern dot ch
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: hjl.tools at gmail dot com @ 2013-07-26 20:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #30560|0                           |1
        is obsolete|                            |

--- Comment #4 from H.J. Lu <hjl.tools at gmail dot com> ---
Created attachment 30561
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30561&action=edit
An updated patch

This patch uses existing TARGET_SSE_PARTIAL_REG_DEPENDENCY instead.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
  2013-07-22 14:50 [Bug target/57954] New: AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code vincenzo.innocente at cern dot ch
                   ` (3 preceding siblings ...)
  2013-07-26 20:02 ` hjl.tools at gmail dot com
@ 2013-07-27 17:21 ` vincenzo.innocente at cern dot ch
  2013-07-29 11:24 ` ubizjak at gmail dot com
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2013-07-27 17:21 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954

--- Comment #5 from vincenzo Innocente <vincenzo.innocente at cern dot ch> ---
confirmed that the patch fixes the issue
c++ -O2 -march=corei7-avx polyAVX.cpp
time ./a.out
10358474048
2.965u 0.001s 0:02.97 99.6%    0+0k 0+0io 146pf+0w


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
  2013-07-22 14:50 [Bug target/57954] New: AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code vincenzo.innocente at cern dot ch
                   ` (4 preceding siblings ...)
  2013-07-27 17:21 ` vincenzo.innocente at cern dot ch
@ 2013-07-29 11:24 ` ubizjak at gmail dot com
  2013-07-29 12:14 ` ysrumyan at gmail dot com
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: ubizjak at gmail dot com @ 2013-07-29 11:24 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86
             Status|UNCONFIRMED                 |RESOLVED
                URL|                            |http://gcc.gnu.org/ml/gcc-p
                   |                            |atches/2013-07/msg01368.htm
                   |                            |l
         Resolution|---                         |FIXED
   Target Milestone|---                         |4.9.0

--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
Author: uros
Date: Mon Jul 29 11:17:51 2013
New Revision: 201308

URL: http://gcc.gnu.org/viewcvs?rev=201308&root=gcc&view=rev
Log:
2013-07-29  Uros Bizjak  <ubizjak@gmail.com>

    * config/i386/i386.md (float post-reload splitters): Do not check
    for subregs of SSE registers.

2013-07-29  Uros Bizjak  <ubizjak@gmail.com>
        H.J. Lu  <hongjiu.lu@intel.com>

    PR target/57954
    PR target/57988
    * config/i386/i386.md (post-reload splitter
    to avoid partial SSE reg dependency stalls): New pattern.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.md
>From gcc-bugs-return-426900-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Mon Jul 29 11:24:37 2013
Return-Path: <gcc-bugs-return-426900-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 28075 invoked by alias); 29 Jul 2013 11:24:37 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 27993 invoked by uid 48); 29 Jul 2013 11:24:35 -0000
From: "ubizjak at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
Date: Mon, 29 Jul 2013 11:24:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 4.9.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: ubizjak at gmail dot com
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.9.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-57954-4-IHrity8HGY@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-57954-4@http.gcc.gnu.org/bugzilla/>
References: <bug-57954-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-07/txt/msg01407.txt.bz2
Content-length: 128

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954

--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
Fixed.
>From gcc-bugs-return-426901-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Mon Jul 29 11:30:26 2013
Return-Path: <gcc-bugs-return-426901-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 7064 invoked by alias); 29 Jul 2013 11:30:26 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 6966 invoked by uid 48); 29 Jul 2013 11:30:23 -0000
From: "vincenzo.innocente at cern dot ch" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
Date: Mon, 29 Jul 2013 11:30:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 4.9.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: vincenzo.innocente at cern dot ch
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 4.9.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-57954-4-SeNRZ5mOQ8@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-57954-4@http.gcc.gnu.org/bugzilla/>
References: <bug-57954-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-07/txt/msg01408.txt.bz2
Content-length: 243

http://gcc.gnu.org/bugzilla/show_bug.cgi?idW954

--- Comment #8 from vincenzo Innocente <vincenzo.innocente at cern dot ch> ---
thanks for getting in the trunk.
will be possible to back port to at least 4.8?
(this issue is there till 4.4!)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
  2013-07-22 14:50 [Bug target/57954] New: AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code vincenzo.innocente at cern dot ch
                   ` (5 preceding siblings ...)
  2013-07-29 11:24 ` ubizjak at gmail dot com
@ 2013-07-29 12:14 ` ysrumyan at gmail dot com
  2013-07-29 15:52 ` ubizjak at gmail dot com
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: ysrumyan at gmail dot com @ 2013-07-29 12:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954

Yuri Rumyantsev <ysrumyan at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ysrumyan at gmail dot com

--- Comment #9 from Yuri Rumyantsev <ysrumyan at gmail dot com> ---
Uros,

I assume that this fix is not good and must be reverted - I will prepare
another fix for your reviewing. There are at least 2 problems:

1. New split for int --> fp converisons is done under TARGET_SSE2 and
TARGET_SSE_PARTIAL_REG_DEPENDENCY which include both Atom chips - SLT and SLM.
I checked that zeroing of xmm register before conversion leads to performance
slowdown on SLM (-5%) for proveded test-case. I assume that TARGET_AVX must be
used instead of TARGET_SSE2.
2. This zeroing must redundant and should not be inserted, e.g. for the
following simple test-case:

void foo (float* p, int n)
{
  int i;
  for (i=0; i<n; i++)
    p[i] = (float) i;
}

with H.J patch we got the following assembly (I compiled it for slm but it does
not matter):

.L3:
    xorps    %xmm0, %xmm0
    cvtsi2ss    %eax, %xmm0
    movss    %xmm0, (%ecx,%eax,4)
    addl    $1, %eax
    cmpl    %edx, %eax
    jne    .L3

It is clear that zeroing is redundant for it and must be deleted.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
  2013-07-22 14:50 [Bug target/57954] New: AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code vincenzo.innocente at cern dot ch
                   ` (6 preceding siblings ...)
  2013-07-29 12:14 ` ysrumyan at gmail dot com
@ 2013-07-29 15:52 ` ubizjak at gmail dot com
  2013-07-29 18:48 ` dushistov at mail dot ru
  2013-12-31 14:00 ` glisse at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: ubizjak at gmail dot com @ 2013-07-29 15:52 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954

--- Comment #10 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Yuri Rumyantsev from comment #9)

> I assume that this fix is not good and must be reverted - I will prepare
> another fix for your reviewing. There are at least 2 problems:
> 
> 1. New split for int --> fp converisons is done under TARGET_SSE2 and
> TARGET_SSE_PARTIAL_REG_DEPENDENCY which include both Atom chips - SLT and
> SLM.
> I checked that zeroing of xmm register before conversion leads to
> performance slowdown on SLM (-5%) for proveded test-case. I assume that
> TARGET_AVX must be used instead of TARGET_SSE2.

The patch is effective for my target (IvyBridge), but I see no problem to
fine-tune the split condition for other targets. Perhaps Atom should be taken
out od TARGET_SSE_PARTIAL_REG_DEPENDENCY ?

> 2. This zeroing must redundant and should not be inserted, e.g. for the
> following simple test-case:
> 
> void foo (float* p, int n)
> {
>   int i;
>   for (i=0; i<n; i++)
>     p[i] = (float) i;
> }
> 
> with H.J patch we got the following assembly (I compiled it for slm but it
> does not matter):
> 
> .L3:
> 	xorps	%xmm0, %xmm0
> 	cvtsi2ss	%eax, %xmm0
> 	movss	%xmm0, (%ecx,%eax,4)
> 	addl	$1, %eax
> 	cmpl	%edx, %eax
> 	jne	.L3
> 
> It is clear that zeroing is redundant for it and must be deleted.

Hm, it is not that clear. If the stall is happening in cvtsi2ss, then following
movss shouldn't matter, or at least it shouldn't make things any worse. Of
course, you have much more information at hand, so instead of the patch revert
(the patch *is* effective for certain targets), I suggest to submit a follow-up
patch that fine-tunes the split condition.
>From gcc-bugs-return-426909-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Mon Jul 29 15:53:41 2013
Return-Path: <gcc-bugs-return-426909-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 31232 invoked by alias); 29 Jul 2013 15:53:41 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 31144 invoked by uid 48); 29 Jul 2013 15:53:38 -0000
From: "vmakarov at redhat dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/51041] g++ strange optimisation behaviour
Date: Mon, 29 Jul 2013 15:53:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 4.6.2
X-Bugzilla-Keywords: missed-optimization, ra
X-Bugzilla-Severity: normal
X-Bugzilla-Who: vmakarov at redhat dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-51041-4-hZOwnAeiGO@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-51041-4@http.gcc.gnu.org/bugzilla/>
References: <bug-51041-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-07/txt/msg01416.txt.bz2
Content-length: 2407

http://gcc.gnu.org/bugzilla/show_bug.cgi?idQ041

Vladimir Makarov <vmakarov at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at redhat dot com

--- Comment #3 from Vladimir Makarov <vmakarov at redhat dot com> ---
I guess RA is doing right thing.  Pseudo 84 corresponding to variable sum when
the second printf is uncommented lives through insn throwing an
exception.  The code affecting p84 allocation (putting it into memory
as SSE_REGS have no caller-saved regs) is

ira-lives.c::process_bb_node_lives:

                  if (can_throw_internal (insn))
                    {
                      IOR_HARD_REG_SET (OBJECT_CONFLICT_HARD_REGS (obj),
                                        call_used_reg_set);
                      IOR_HARD_REG_SET (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj),
                                        call_used_reg_set);
                    }
Where insn is:

(call_insn 141 140 142 22 (call (mem:QI (symbol_ref:DI ("_ZdlPv") [flags 0x41]
<function_decl 0x7ffff1ae2400 operator delete>) [0 operator delete S1 A8])
        (const_int 0 [0]))
/usr/lib/gcc/x86_64-redhat-linux/4.7.2/../../../../include/c++/4.7.2/ext/new_allocator.h:100
648 {*call}
     (expr_list:REG_DEAD (reg:DI 5 di)
        (expr_list:REG_EH_REGION (const_int 0 [0])
            (nil)))
    (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 5 di))
        (nil)))

it is a destructor in new_allocator.h:

      void
      deallocate(pointer __p, size_type)
      { ::operator delete(__p); }

The problem could be solved by p84 live range splitting.  By default IRA does
live range splitting only when the register pressure is high.  This is not the
case for the test where max pressure for GENERAL_REGS and SSE_REGS is only 4.

We can modify semantics -fira-region=all to form a region for any loop on which
border live range splitting is done.  I tried that and with -fira-region=all
the same speed is achieved for the test.  Unfortunately, with the new semantics
permitting too aggressive spilling, the generated code is about 0.5% worse on
SPEC2000 for x86-64.

I guess we should pay more attention in optimizations to deal with code with EH
regions, as C++ code have a lot of such code.

I'll think what can I do more with the problem.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
  2013-07-22 14:50 [Bug target/57954] New: AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code vincenzo.innocente at cern dot ch
                   ` (7 preceding siblings ...)
  2013-07-29 15:52 ` ubizjak at gmail dot com
@ 2013-07-29 18:48 ` dushistov at mail dot ru
  2013-12-31 14:00 ` glisse at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: dushistov at mail dot ru @ 2013-07-29 18:48 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954

--- Comment #11 from Evgeniy Dushistov <dushistov at mail dot ru> ---
(In reply to Yuri Rumyantsev from comment #9)
> I checked that zeroing of xmm register before conversion leads to
> performance slowdown on SLM (-5%) for proveded test-case. I assume that
> 
> with H.J patch we got the following assembly (I compiled it for slm but it
> does not matter):
> 
> .L3:
> 	xorps	%xmm0, %xmm0
> 	cvtsi2ss	%eax, %xmm0
> 	movss	%xmm0, (%ecx,%eax,4)
> 	addl	$1, %eax
> 	cmpl	%edx, %eax
> 	jne	.L3
> 

By the way, I tried compile my sample
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57988) for atom, icc(13.1.3
20130607) produce:


xorps  %xmm2,%xmm2
cvtsi2sd %rax,%xmm2

may be 5% measuring error?
>From gcc-bugs-return-426920-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Mon Jul 29 19:08:55 2013
Return-Path: <gcc-bugs-return-426920-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 31231 invoked by alias); 29 Jul 2013 19:08:54 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 30853 invoked by uid 48); 29 Jul 2013 19:08:50 -0000
From: "law at redhat dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug fortran/57800] Wasted work in gfc_match_call()
Date: Mon, 29 Jul 2013 19:08:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: fortran
X-Bugzilla-Version: 4.9.0
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: law at redhat dot com
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_status cc resolution
Message-ID: <bug-57800-4-a5t7ZWBT6m@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-57800-4@http.gcc.gnu.org/bugzilla/>
References: <bug-57800-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-07/txt/msg01427.txt.bz2
Content-length: 507

http://gcc.gnu.org/bugzilla/show_bug.cgi?idW800

Jeffrey A. Law <law at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
                 CC|                            |law at redhat dot com
         Resolution|---                         |FIXED

--- Comment #3 from Jeffrey A. Law <law at redhat dot com> ---
Patch installed on trunk.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/57954] AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code
  2013-07-22 14:50 [Bug target/57954] New: AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code vincenzo.innocente at cern dot ch
                   ` (8 preceding siblings ...)
  2013-07-29 18:48 ` dushistov at mail dot ru
@ 2013-12-31 14:00 ` glisse at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-12-31 14:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57954

Marc Glisse <glisse at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ylow at graphlab dot com

--- Comment #12 from Marc Glisse <glisse at gcc dot gnu.org> ---
*** Bug 58450 has been marked as a duplicate of this bug. ***


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-12-31 14:00 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-22 14:50 [Bug target/57954] New: AVX missing vxorps (zeroing) before vcvtsi2s %edx, slow down AVX code vincenzo.innocente at cern dot ch
2013-07-26  0:08 ` [Bug target/57954] " hjl.tools at gmail dot com
2013-07-26 19:06 ` hjl.tools at gmail dot com
2013-07-26 19:59 ` dushistov at mail dot ru
2013-07-26 20:02 ` hjl.tools at gmail dot com
2013-07-27 17:21 ` vincenzo.innocente at cern dot ch
2013-07-29 11:24 ` ubizjak at gmail dot com
2013-07-29 12:14 ` ysrumyan at gmail dot com
2013-07-29 15:52 ` ubizjak at gmail dot com
2013-07-29 18:48 ` dushistov at mail dot ru
2013-12-31 14:00 ` glisse at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).