From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id B843F3858C62; Thu, 22 Feb 2024 09:37:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B843F3858C62 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1708594636; bh=JPzyf4QtDxKKVEFfIa9oiHfSu2mhFCEzQAPBj1iYdO4=; h=From:To:Subject:Date:In-Reply-To:References:From; b=tWx7MQSxUt18lYuXfppaYZfKzpI+eIQMvzV9yNLf8n4Z+AowGsc0xkztz6ferFMaz xkx85uKpEIvwI2HTiV5YpRz5P16CkkcQkSdPdoYGtbt2fu4UedahTb3nfd92H/+AjL C2aj1lI8Q7CIBcfaAiqzh/hb+rVdgLxT46leElBQ= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/114027] [11/12/13/14 Regression] miscompile at `-O3 -fno-vect-cost-model -msse4.2` Date: Thu, 22 Feb 2024 09:37:15 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: needs-bisection, wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 14.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114027 --- Comment #13 from Richard Biener --- We're detecting a COND_REDUCTION with a chain. It seems to work (and vectorize) with -march=3Dznver4 using AVX sized vectors (but AVX512 style masking). I think what goes wrong is treating the COND_REDUCTION as MAX reduction by only checking the last COND which looks like c_lsm.9_18 =3D _76 ? prephitmp_26 : 0; but the previous one is prephitmp_26 =3D _69 ? c_lsm.9_30 : -3; I'm not too familiar with the condition reduction code, the reduction is classified as cond_reduc_dt =3D=3D vect_constant_def and so we run into else if (cond_reduc_dt =3D=3D vect_constant_def) { enum vect_def_type cond_initial_dt; tree cond_initial_val =3D vect_phi_initial_value (reduc_def_phi); vect_is_simple_use (cond_initial_val, loop_vinfo, &cond_initial_d= t); if (cond_initial_dt =3D=3D vect_constant_def && types_compatible_p (TREE_TYPE (cond_initial_val), TREE_TYPE (cond_reduc_val))) { tree e =3D fold_binary (LE_EXPR, boolean_type_node, cond_initial_val, cond_reduc_val); if (e && (integer_onep (e) || integer_zerop (e))) { if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "condition expression based on " "compile time constant.\n"); /* Record reduction code at analysis stage. */ STMT_VINFO_REDUC_CODE (reduc_info) =3D integer_onep (e) ? MAX_EXPR : MIN_EXPR; STMT_VINFO_REDUC_TYPE (reduc_info) =3D CONST_COND_REDUCTI= ON; } and the loop classifying and computing cond_reduc_val just looks at the first chain element ... This should possibly be merged with the loop going over all chain stmts but a more conservative fix for the latent(?) issue might be the following (but that also cuts out conversions in the chain): diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 5a5865c42fc..e19896eef79 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -7762,14 +7762,16 @@ vectorizable_reduction (loop_vec_info loop_vinfo, if (op.code =3D=3D COND_EXPR) { /* Record how the non-reduction-def value of COND_EXPR is defined= .=20 */ - if (dt =3D=3D vect_constant_def) + if (reduc_chain_length !=3D 1) + ; + else if (dt =3D=3D vect_constant_def) { cond_reduc_dt =3D dt; cond_reduc_val =3D op.ops[i]; } - if (dt =3D=3D vect_induction_def - && def_stmt_info - && is_nonwrapping_integer_induction (def_stmt_info, loop)) + else if (dt =3D=3D vect_induction_def + && def_stmt_info + && is_nonwrapping_integer_induction (def_stmt_info, loop= )) { cond_reduc_dt =3D dt; cond_stmt_vinfo =3D def_stmt_info; I think it's latent even before the bisected rev.=