From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21560 invoked by alias); 13 Feb 2013 15:58:56 -0000 Received: (qmail 21518 invoked by uid 48); 13 Feb 2013 15:58:35 -0000 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/37021] Fortran Complex reduction / multiplication not vectorized Date: Wed, 13 Feb 2013 15:58:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Keywords: alias, missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2013-02/txt/msg01332.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37021 --- Comment #14 from Richard Biener 2013-02-13 15:58:31 UTC --- The following testcase shows the issue well: _Complex double self[1024]; _Complex double a[1024][1024]; _Complex double b[1024]; void foo (void) { int i, j; for (i = 0; i < 1024; i+=3) for (j = 0; j < 1024; j+=3) self[i] = self[i] + a[i][j]*b[j]; } we have to get the complex multiplication pattern recognized by SLP which looks like (without PRE): : : # j_21 = PHI # self_I_RE_lsm.2_12 = PHI <_26(3), self_I_RE_lsm.2_7(7)> # self_I_IM_lsm.3_28 = PHI <_27(3), self_I_IM_lsm.3_8(7)> # ivtmp_16 = PHI _2 = REALPART_EXPR ; _18 = IMAGPART_EXPR ; _19 = REALPART_EXPR ; _17 = IMAGPART_EXPR ; _4 = _19 * _2; _3 = _18 * _17; _6 = _17 * _2; _23 = _19 * _18; _24 = _4 - _3; _25 = _23 + _6; _26 = _24 + self_I_RE_lsm.2_12; _27 = _25 + self_I_IM_lsm.3_28; j_13 = j_21 + 3; ivtmp_1 = ivtmp_16 - 1; if (ivtmp_1 != 0) goto ; we fail to build the SLP tree for _25 = _23 + _6 because the matching stmt is _24 = _4 - _3 which has a different operation (SSE4 addsub would support vectorizing this). I don't see how we can easily make this supported with the current pattern support ... the support doesn't allow tieing together two SLP group members. Simply allowing analysis to proceeed here reveals the fact that the interleaving has a gap of 6 which makes the analysis fail. Allowing it to proceed for ncopies == 1 (thus no actual interleaving required) reveals the next check is slightly bogus in that case. Fixing that ends us with t.c:9: note: Load permutation 0 0 1 0 1 1 0 1 t.c:9: note: Build SLP failed: unsupported load permutation _27 = _25 + self_I_IM_lsm.3_28; ... (to be continued)