From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28686 invoked by alias); 8 Jan 2015 14:03:04 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 28585 invoked by uid 48); 8 Jan 2015 14:03:00 -0000 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug c++/64410] gcc 25% slower than clang 3.5 for adding complex numbers Date: Thu, 08 Jan 2015 14:03:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c++ X-Bugzilla-Version: 4.9.2 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: keywords bug_status blocked assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-01/txt/msg00466.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Status|NEW |ASSIGNED Blocks| |53947 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #6 from Richard Biener --- (In reply to Richard Biener from comment #5) > (In reply to Marc Glisse from comment #1) > > There are a number of things that make it complicated. > > 1) gcc doesn't like to vectorize when the number of iterations is not known > > at compile time. > > Not an issue, we know it here (it's symbolic) > > > 2) gcc doesn't vectorize anything already involving complex or vector > > operations. > > Indeed - here the issue is that we have C++ 'complex' aggregate > load / store operations: > > _67 = MEM[(const struct complex &)_75]; > __r$_M_value = _67; > ... > _51 = REALPART_EXPR <__r$_M_value>; > REALPART_EXPR <__r$_M_value> = _104; > ... > IMAGPART_EXPR <__r$_M_value> = _107; > _108 = __r$_M_value; > MEM[(struct cx_double *)_72] = _108; > > which SRA for some reason didn't decompose as they are not aggregate > (well, they are COMPLEX_TYPE). They are not in SSA form either because > they are partly written to. And this forces it to be TREE_ADDRESSABLE. Which means update-address-taken might be a better candidate to fix this. Note that it will still run into the issue that the vectorizer does not like complex types (in loads), nor does it like building complex registers via COMPLEX_EXPR. After fixing update-address-taken we have __r$_M_value_70 = MEM[(const struct complex &)_78]; _66 = MEM[(const double &)_77]; _54 = REALPART_EXPR <__r$_M_value_70>; _105 = _54 + _66; _135 = IMAGPART_EXPR <__r$_M_value_70>; _106 = MEM[(const double &)_77 + 8]; _107 = _106 + _135; __r$_M_value_180 = COMPLEX_EXPR <_105, _107>; MEM[(struct cx_double *)_76] = __r$_M_value_180; which we ideally would have converted to piecewise loading / storing, but the vectorizer may also be able to recover here with some twists. > In this case it would have been profitable > to SRA __r$_M_value. Eventually this should have been complex lowerings > job (but it doesn't try to decompose complex assignments). > > > 3) the ABI for complex uses 2 separate double instead of a vector of 2 > > double. > > I think that's unrelated. > > > I believe there are dups at least for 2).