From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id B843F3858C62; Thu, 22 Feb 2024 09:37:16 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B843F3858C62
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1708594636;
	bh=JPzyf4QtDxKKVEFfIa9oiHfSu2mhFCEzQAPBj1iYdO4=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=tWx7MQSxUt18lYuXfppaYZfKzpI+eIQMvzV9yNLf8n4Z+AowGsc0xkztz6ferFMaz
	 xkx85uKpEIvwI2HTiV5YpRz5P16CkkcQkSdPdoYGtbt2fu4UedahTb3nfd92H/+AjL
	 C2aj1lI8Q7CIBcfaAiqzh/hb+rVdgLxT46leElBQ=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/114027] [11/12/13/14 Regression] miscompile
 at `-O3 -fno-vect-cost-model -msse4.2`
Date: Thu, 22 Feb 2024 09:37:15 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: needs-bisection, wrong-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 14.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-114027-4-owEhPZxreT@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-114027-4@http.gcc.gnu.org/bugzilla/>
References: <bug-114027-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114027

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
We're detecting a COND_REDUCTION with a chain.  It seems to work (and
vectorize) with -march=3Dznver4 using AVX sized vectors (but AVX512 style
masking).

I think what goes wrong is treating the COND_REDUCTION as MAX reduction
by only checking the last COND which looks like

  c_lsm.9_18 =3D _76 ? prephitmp_26 : 0;

but the previous one is

  prephitmp_26 =3D _69 ? c_lsm.9_30 : -3;

I'm not too familiar with the condition reduction code, the reduction
is classified as cond_reduc_dt =3D=3D vect_constant_def and so we run into

      else if (cond_reduc_dt =3D=3D vect_constant_def)
        {
          enum vect_def_type cond_initial_dt;
          tree cond_initial_val =3D vect_phi_initial_value (reduc_def_phi);
          vect_is_simple_use (cond_initial_val, loop_vinfo, &cond_initial_d=
t);
          if (cond_initial_dt =3D=3D vect_constant_def
              && types_compatible_p (TREE_TYPE (cond_initial_val),
                                     TREE_TYPE (cond_reduc_val)))
            {
              tree e =3D fold_binary (LE_EXPR, boolean_type_node,
                                    cond_initial_val, cond_reduc_val);
              if (e && (integer_onep (e) || integer_zerop (e)))
                {
                  if (dump_enabled_p ())
                    dump_printf_loc (MSG_NOTE, vect_location,
                                     "condition expression based on "
                                     "compile time constant.\n");
                  /* Record reduction code at analysis stage.  */
                  STMT_VINFO_REDUC_CODE (reduc_info)
                    =3D integer_onep (e) ? MAX_EXPR : MIN_EXPR;
                  STMT_VINFO_REDUC_TYPE (reduc_info) =3D CONST_COND_REDUCTI=
ON;
                }

and the loop classifying and computing cond_reduc_val just looks at the
first chain element ...  This should possibly be merged with the loop
going over all chain stmts but a more conservative fix for the latent(?)
issue might be the following (but that also cuts out conversions in the
chain):
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 5a5865c42fc..e19896eef79 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7762,14 +7762,16 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       if (op.code =3D=3D COND_EXPR)
        {
          /* Record how the non-reduction-def value of COND_EXPR is defined=
.=20
*/
-         if (dt =3D=3D vect_constant_def)
+         if (reduc_chain_length !=3D 1)
+           ;
+         else if (dt =3D=3D vect_constant_def)
            {
              cond_reduc_dt =3D dt;
              cond_reduc_val =3D op.ops[i];
            }
-         if (dt =3D=3D vect_induction_def
-             && def_stmt_info
-             && is_nonwrapping_integer_induction (def_stmt_info, loop))
+         else if (dt =3D=3D vect_induction_def
+                  && def_stmt_info
+                  && is_nonwrapping_integer_induction (def_stmt_info, loop=
))
            {
              cond_reduc_dt =3D dt;
              cond_stmt_vinfo =3D def_stmt_info;


I think it's latent even before the bisected rev.=