From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 3A15D3858D1E; Wed, 31 Jan 2024 07:59:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3A15D3858D1E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1706687979; bh=/SssB7V04PH1IaUUUphjDG616dCAwZMKfGHnF/s0FMA=; h=From:To:Subject:Date:In-Reply-To:References:From; b=GUu8u9T+2SL3lTbkwr22fJyWvdBE5YrzXtVTU/lV2omobmczs/tQbl8P5kawypNM9 kV4jtG3oCP3IMgoHwFXAp9CxzisCkVtvXg/0DxVTknl0Dw9PsW9zpYrfX1sEScQAca XFsNfunWiKmzPGuWXqLr7Lw5NtaV9xDRkuxy3/XA= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/99395] s116 benchmark of TSVC is vectorized by clang and not by gcc Date: Wed, 31 Jan 2024 07:59:37 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D99395 --- Comment #13 from Richard Biener --- (In reply to JuzheZhong from comment #12) > OK. It seems it has data dependency issue: >=20 > missed: not vectorized, possible dependence between data-refs a[i_15] a= nd > a[_4] >=20 > a[i_15] =3D _3; STMT 1 > _4 =3D i_15 + 2; > _5 =3D a[_4]; STMT 2 >=20 > STMT2 should not depend on STMT1. >=20 > It's recognized as dependency in vect_analyze_data_ref_dependence. >=20 > Is is reasonable to fix it in vect_analyze_data_ref_dependence ? t2.c:4:21: note: dependence distance =3D 1. t2.c:7:12: missed: not vectorized, possible dependence between data-refs a[i_15] and a[_4] t2.c:4:21: missed: bad data dependence. so there's a cross iteration dependence with distance 1 - that's (compute_affine_dependence ref_a: a[i_15], stmt_a: a[i_15] =3D _3; ref_b: a[_4], stmt_b: _5 =3D a[_4]; (analyze_overlapping_iterations (chrec_a =3D {0, +, 2}_1) (chrec_b =3D {2, +, 2}_1) (analyze_siv_subscript=20 (analyze_subscript_affine_affine (overlaps_a =3D [1 + 1 * x_1]) (overlaps_b =3D [0 + 1 * x_1])) )=20 (overlap_iterations_a =3D [1 + 1 * x_1]) (overlap_iterations_b =3D [0 + 1 * x_1]))=20 (build_classic_dist_vector dist_vector =3D (1=20 ) ) ) a read-after-write of a[i+2] after storing to a[i+1] in program order. This would be fine with a VF of 1 only, but we are not really considering that (a pure SLP vectorization w/o unrolling). Instead we start with the assumption of classical vectorization using interleaving which has a minimal VF of the number of lanes of the vector type with the largest number of lanes as determined by vect_analyze_data_refs. We can delay this all a bit but then the SLP build will fail anyway: t2.c:4:21: missed: Build SLP failed: different interleaving chains in one node _5 =3D a[_4]; which is because we do t2.c:4:21: note: =3D=3D=3D vect_analyze_data_ref_accesses =3D=3D=3D t2.c:4:21: note: Detected interleaving load a[i_15] and a[_1] t2.c:4:21: note: Detected interleaving store a[i_15] and a[_1] t2.c:4:21: note: Detected interleaving load of size 2 t2.c:4:21: note: _2 =3D a[i_15]; t2.c:4:21: note: tem_10 =3D a[_1]; t2.c:4:21: note: Detected single element interleaving a[_4] step 16 that is, we are splitting the chain because of the intermediate store (that's kind-of OK-ish, heuristically it works for more cases). We'd usually handle the VF =3D=3D 1 cases also duriing BB vectorization on the loop body, but we're only doing that when there was if-conversion and the later stand-alone BB vectorization is after predictive commoning which wrecks the loop. We should move predcom after BB vect for that. That said, this PR is quite elaborate and it will touch some key design issues in the vectorizer. I'd rather finally finish getting us to work on the SLP representation only before touching all these delicate things. The following allows the analysis to proceed a bit longer with VF =3D=3D 1. Not adjusting min_vf early might have issues, but the change might work as-is and possibly allow some cases to be loop vectorized with SLP and a low VF that we now fail to. diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index f592aeb8028..b16b4664e7b 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -589,7 +589,7 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr, } unsigned int abs_dist =3D abs (dist); - if (abs_dist >=3D 2 && abs_dist < *max_vf) + if (abs_dist >=3D 1 && abs_dist < *max_vf) { /* The dependence distance requires reduction of the maximal vectorization factor. */ @@ -4946,7 +4955,7 @@ vect_analyze_data_refs (vec_info *vinfo, poly_uint64 *min_vf, bool *fatal) /* Adjust the minimal vectorization factor according to the vector type. */ vf =3D TYPE_VECTOR_SUBPARTS (vectype); - *min_vf =3D upper_bound (*min_vf, vf); + //*min_vf =3D upper_bound (*min_vf, vf); /* Leave the BB vectorizer to pick the vector type later, based on the final dataref group size and SLP node size. */ diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 30b90d99925..7eab3d4bebc 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -2719,7 +2719,7 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool &fatal, opt_result ok =3D opt_result::success (); int res; unsigned int max_vf =3D MAX_VECTORIZATION_FACTOR; - poly_uint64 min_vf =3D 2; + poly_uint64 min_vf =3D 1; loop_vec_info orig_loop_vinfo =3D NULL; /* If we are dealing with an epilogue then orig_loop_vinfo points to the=