From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-415167-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 21560 invoked by alias); 13 Feb 2013 15:58:56 -0000
Received: (qmail 21518 invoked by uid 48); 13 Feb 2013 15:58:35 -0000
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/37021] Fortran Complex reduction / multiplication not vectorized
Date: Wed, 13 Feb 2013 15:58:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Keywords: alias, missed-optimization
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Changed-Fields:
Message-ID: <bug-37021-4-VGjh8rc1zM@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-37021-4@http.gcc.gnu.org/bugzilla/>
References: <bug-37021-4@http.gcc.gnu.org/bugzilla/>
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
Content-Type: text/plain; charset="UTF-8"
MIME-Version: 1.0
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
X-SW-Source: 2013-02/txt/msg01332.txt.bz2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37021
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-13 15:58:31 UTC ---
The following testcase shows the issue well:

_Complex double self[1024];
_Complex double a[1024][1024];
_Complex double b[1024];

void foo (void)
{
  int i, j;
  for (i = 0; i < 1024; i+=3)
    for (j = 0; j < 1024; j+=3)
      self[i] = self[i] + a[i][j]*b[j];
}

we have to get the complex multiplication pattern recognized by SLP
which looks like (without PRE):

  <bb 3>:

  <bb 4>:
  # j_21 = PHI <j_13(3), 0(7)>
  # self_I_RE_lsm.2_12 = PHI <_26(3), self_I_RE_lsm.2_7(7)>
  # self_I_IM_lsm.3_28 = PHI <_27(3), self_I_IM_lsm.3_8(7)>
  # ivtmp_16 = PHI <ivtmp_1(3), 342(7)>
  _2 = REALPART_EXPR <a[i_20][j_21]>;
  _18 = IMAGPART_EXPR <a[i_20][j_21]>;
  _19 = REALPART_EXPR <b[j_21]>;
  _17 = IMAGPART_EXPR <b[j_21]>;
  _4 = _19 * _2;
  _3 = _18 * _17;
  _6 = _17 * _2;
  _23 = _19 * _18;
  _24 = _4 - _3;
  _25 = _23 + _6;
  _26 = _24 + self_I_RE_lsm.2_12;
  _27 = _25 + self_I_IM_lsm.3_28;
  j_13 = j_21 + 3;
  ivtmp_1 = ivtmp_16 - 1;
  if (ivtmp_1 != 0)
    goto <bb 3>;

we fail to build the SLP tree for _25 = _23 + _6 because the matching
stmt is _24 = _4 - _3 which has a different operation (SSE4 addsub
would support vectorizing this).  I don't see how we can easily
make this supported with the current pattern support ... the
support doesn't allow tieing together two SLP group members.
Simply allowing analysis to proceeed here reveals the fact that
the interleaving has a gap of 6 which makes the analysis fail.
Allowing it to proceed for ncopies == 1 (thus no actual interleaving
required) reveals the next check is slightly bogus in that case.
Fixing that ends us with

t.c:9: note: Load permutation 0 0 1 0 1 1 0 1
t.c:9: note: Build SLP failed: unsupported load permutation _27 = _25 +
self_I_IM_lsm.3_28;

... (to be continued)