From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 8FD4A3858005; Mon, 17 Oct 2022 10:37:02 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8FD4A3858005 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1666003022; bh=wgvy4SHHGAAMSveSYkTUbtvo0eN60NVIbcKTH3F/IUs=; h=From:To:Subject:Date:In-Reply-To:References:From; b=WFSTGZ+FtyOV0Y/Lwk+XbWlrn7n44x/Fl0Ybb/3bQtSaMWj7Hk/vkaUj2LhZjG348 3ClGfo/Nq5jdabAE+QR/GN7lWVVQyjjWBal3F2S4KFP/kMWRX9QRH+mdRaWj2Pnrns yN2ukFtcwcXJCjx61kfmZZNDIlicGhCM+zJrUzwQ= From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/99409] s252 benchmark of TSVC is vectorized by clang and not by gcc Date: Mon, 17 Oct 2022 10:36:59 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D99409 --- Comment #2 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:46a8e017d048ec3271bbb898942e3b166c4e8ff3 commit r13-3327-g46a8e017d048ec3271bbb898942e3b166c4e8ff3 Author: Richard Biener Date: Thu Oct 6 13:56:09 2022 +0200 Vectorization of first-order recurrences The following picks up the prototype by Ju-Zhe Zhong for vectorizing first order recurrences. That solves two TSVC missed optimization PRs. There's a new scalar cycle def kind, vect_first_order_recurrence and it's handling of the backedge value vectorization is complicated by the fact that the vectorized value isn't the PHI but instead a (series of) permute(s) shifting in the recurring value from the previous iteration. I've implemented this by creating both the single vectorized PHI and the series of permutes when vectorizing the scalar PHI but leave the backedge values in both unassigned. The backedge values are (for the testcases) computed by a load which is also the place after which the permutes are inserted. That placement also restricts the cases we can handle (without resorting to code motion). I added both costing and SLP handling though SLP handling is restricted to the case where a single vectorized PHI is enough. Missing is epilogue handling - while prologue peeling would be handled transparently by adjusting iv_phi_p the epilogue case doesn't work with just inserting a scalar LC PHI since that a) keeps the scalar load live and b) that loads is the wrong one, it has to be the last, much like when we'd vectorize the LC PHI as live operation. Unfortunately LIVE compute/analysis happens too early before we decide on peeling. When using fully masked loop vectorization the vect-recurr-6.c works as expected though. I have tested this on x86_64 for now, but since epilogue handling is missing there's probably no practical cases. My prototype WHILE_ULT AVX512 patch can handle vect-recurr-6.c just fine but I didn't feel like running SPEC within SDE nor is the WHILE_ULT patch complete enough. PR tree-optimization/99409 PR tree-optimization/99394 * tree-vectorizer.h (vect_def_type::vect_first_order_recurrence= ): Add. (stmt_vec_info_type::recurr_info_type): Likewise. (vectorizable_recurr): New function. * tree-vect-loop.cc (vect_phi_first_order_recurrence_p): New function. (vect_analyze_scalar_cycles_1): Look for first order recurrences. (vect_analyze_loop_operations): Handle them. (vect_transform_loop): Likewise. (vectorizable_recurr): New function. (maybe_set_vectorized_backedge_value): Handle the backedge value setting in the first order recurrence PHI and the permutes. * tree-vect-stmts.cc (vect_analyze_stmt): Handle first order recurrences. (vect_transform_stmt): Likewise. (vect_is_simple_use): Likewise. (vect_is_simple_use): Likewise. * tree-vect-slp.cc (vect_get_and_check_slp_defs): Likewise. (vect_build_slp_tree_2): Likewise. (vect_schedule_scc): Handle the backedge value setting in the first order recurrence PHI and the permutes. * gcc.dg/vect/vect-recurr-1.c: New testcase. * gcc.dg/vect/vect-recurr-2.c: Likewise. * gcc.dg/vect/vect-recurr-3.c: Likewise. * gcc.dg/vect/vect-recurr-4.c: Likewise. * gcc.dg/vect/vect-recurr-5.c: Likewise. * gcc.dg/vect/vect-recurr-6.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s252.c: Un-XFAIL. * gcc.dg/vect/tsvc/vect-tsvc-s254.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s291.c: Likewise. Co-authored-by: Ju-Zhe Zhong =