From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-472472-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 28686 invoked by alias); 8 Jan 2015 14:03:04 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 28585 invoked by uid 48); 8 Jan 2015 14:03:00 -0000
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers
Date: Thu, 08 Jan 2015 14:03:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: c++
X-Bugzilla-Version: 4.9.2
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: keywords bug_status blocked assigned_to
Message-ID: <bug-64410-4-0JfYP6PeH4@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-64410-4@http.gcc.gnu.org/bugzilla/>
References: <bug-64410-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-01/txt/msg00466.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|NEW                         |ASSIGNED
             Blocks|                            |53947
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #5)
> (In reply to Marc Glisse from comment #1)
> > There are a number of things that make it complicated.
> > 1) gcc doesn't like to vectorize when the number of iterations is not known
> > at compile time.
> 
> Not an issue, we know it here (it's symbolic)
> 
> > 2) gcc doesn't vectorize anything already involving complex or vector
> > operations.
> 
> Indeed - here the issue is that we have C++ 'complex' aggregate
> load / store operations:
> 
>   _67 = MEM[(const struct complex &)_75];
>   __r$_M_value = _67;
> ...
>   _51 = REALPART_EXPR <__r$_M_value>;
>   REALPART_EXPR <__r$_M_value> = _104;
> ...
>   IMAGPART_EXPR <__r$_M_value> = _107;
>   _108 = __r$_M_value;
>   MEM[(struct cx_double *)_72] = _108;
> 
> which SRA for some reason didn't decompose as they are not aggregate
> (well, they are COMPLEX_TYPE).  They are not in SSA form either because
> they are partly written to.

And this forces it to be TREE_ADDRESSABLE.  Which means update-address-taken
might be a better candidate to fix this.

Note that it will still run into the issue that the vectorizer does not
like complex types (in loads), nor does it like building complex
registers via COMPLEX_EXPR.  After fixing update-address-taken we have

  __r$_M_value_70 = MEM[(const struct complex &)_78];
  _66 = MEM[(const double &)_77];
  _54 = REALPART_EXPR <__r$_M_value_70>;
  _105 = _54 + _66;
  _135 = IMAGPART_EXPR <__r$_M_value_70>;
  _106 = MEM[(const double &)_77 + 8];
  _107 = _106 + _135;
  __r$_M_value_180 = COMPLEX_EXPR <_105, _107>;
  MEM[(struct cx_double *)_76] = __r$_M_value_180;

which we ideally would have converted to piecewise loading / storing,
but the vectorizer may also be able to recover here with some twists.

> In this case it would have been profitable
> to SRA __r$_M_value.  Eventually this should have been complex lowerings
> job (but it doesn't try to decompose complex assignments).
> 
> > 3) the ABI for complex uses 2 separate double instead of a vector of 2
> > double.
> 
> I think that's unrelated.
> 
> > I believe there are dups at least for 2).