From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x132.google.com (mail-lf1-x132.google.com [IPv6:2a00:1450:4864:20::132]) by sourceware.org (Postfix) with ESMTPS id B9293382FADF for ; Fri, 28 Jun 2024 13:00:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B9293382FADF Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B9293382FADF Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::132 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719579608; cv=none; b=OYAtEUPG+OHa6wX+NlSGQvKodGRlU0zRd8gs3DprRjfICEMbtIVJCgXXPlzaDgmyds8zy3KVO8hcd4HNFK2C0kaB+C/469O/AOoh67GJBhkbWv/TwAZ7KOBP2MguGHXqNYGly1E5fxcawuojZPYIdqlppdsOLBbB2j/m4jRCMiA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719579608; c=relaxed/simple; bh=4RqudNfTBgmpXXzzldF3mt+RvboaD4Y6UpX655NpvFw=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=haIwFDyq1Y07Ar9sozIsSL5MfxR/W4cz3YIWiY87eM23zZ3aMAZ1AvkwryWqEngaVxecL7PvwHKFS9WmA12LiwRmuLTMbruIrytO+66AjnJRLHADiXwlaUUIK8/MAKnluhf3jIumMjm/IRi/B+wu5h/hoErwK8uNV21ud8MkQ5I= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x132.google.com with SMTP id 2adb3069b0e04-52e7d2278d8so750707e87.3 for ; Fri, 28 Jun 2024 06:00:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719579603; x=1720184403; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=vU/g7HCpd+DUtwZgfilD2mWlxozULP3hBYENwuaOAFE=; b=XiXni0onP++XhzVLmO9se7kjyzI8TGrDdOCt76raxA0QUOv6KLw7lCq9uzBzOsB/dr Jtn6nh2gZgXxlFRcMxYTbcPji22m0sVXPTBZJ4zc0ukTwJcu0jfTjHwhkS9t4cPIRJH3 EG8aR7OXeercXIe4UORre8B0pBoiyk2clesOn1YkQ/caDzPl775ziMKAMQh0p+Ct9TnP RYBinR1e2n0WkN6ny/7MjBTJ/xGr5RX/z4xYqwfhlXEspDxi2JMh0YmOMTPSaHMxPSfO nxITCy2t1NIp1mJ1nM0JQ6UNrNgmhNlhaEEVZZIfFm+/WX/K4QdUxkm8sfVbnrw756bg ToQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719579603; x=1720184403; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vU/g7HCpd+DUtwZgfilD2mWlxozULP3hBYENwuaOAFE=; b=ABGy1uz1/rc/v6+R4W7njK/49eo38g/lIxrYj9Yf5EvNxOwT9yuTFgJDvQxTzO+yEq eO1CAxD22oHeiBVhulFWl1wKRCV+gCAJzvOkljNq8IAT4Q3/j1N7MqIpFqesbj8yshGq 11zJjBFPFUUPWLfK5MRXjFsdYn959lCxgggYUuEoFy/lPnh5MkNJXGGKp/PHQ0dYJ0yV 46BOIyubAUXP6tMxPjALseeB9XnjimGKe2rD8frxgghEb09Gvw4MtDkTt8+Eti+z3Qtr MwByrEJnCUVpItsPm4l5bYq267pmXmMsP2zEHZOTmRSoqRLnYjdDa8K2smKJdV2/g7Q7 ZJNA== X-Gm-Message-State: AOJu0Yz0KEFHemrvJYuPizemxhnaHGs2Wjz581NI2d5xF0JgUcuNQH8N W3TjYIPkqhEhcIxJGSzX92TwDl4qHm/5XcmBsD/9egHi/YsvBEd9e/HGKjDOP9bH5hMXGYi9oT/ sqQBhz6Yf5m9Z0kYJ5hjrIV/9k1pBzw== X-Google-Smtp-Source: AGHT+IF+JlGKWXWsKml/jTDj8R4lL1LVDm9nvd3UKReQ8stS9aNzCyO4wxVou2F+xoHpgLYwtxCCoyRYeJil8IOU4DA= X-Received: by 2002:a19:f00d:0:b0:52c:df36:1041 with SMTP id 2adb3069b0e04-52ce1835a75mr11424288e87.40.1719579602699; Fri, 28 Jun 2024 06:00:02 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Fri, 28 Jun 2024 14:59:50 +0200 Message-ID: Subject: Re: [PATCH 4/8] vect: Determine input vectype for multiple lane-reducing To: Feng Xue OS Cc: "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Jun 26, 2024 at 4:48=E2=80=AFPM Feng Xue OS wrote: > > Updated the patches based on comments. > > The input vectype of reduction PHI statement must be determined before > vect cost computation for the reduction. Since lance-reducing operation h= as > different input vectype from normal one, so we need to traverse all reduc= tion > statements to find out the input vectype with the least lanes, and set th= at to > the PHI statement. OK > --- > gcc/tree-vect-loop.cc | 79 ++++++++++++++++++++++++++++++------------- > 1 file changed, 56 insertions(+), 23 deletions(-) > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index 347dac97e49..419f4b08d2b 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -7643,7 +7643,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > { > stmt_vec_info def =3D loop_vinfo->lookup_def (reduc_def); > stmt_vec_info vdef =3D vect_stmt_to_vectorize (def); > - if (STMT_VINFO_REDUC_IDX (vdef) =3D=3D -1) > + int reduc_idx =3D STMT_VINFO_REDUC_IDX (vdef); > + > + if (reduc_idx =3D=3D -1) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -7686,10 +7688,57 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > return false; > } > } > - else if (!stmt_info) > - /* First non-conversion stmt. */ > - stmt_info =3D vdef; > - reduc_def =3D op.ops[STMT_VINFO_REDUC_IDX (vdef)]; > + else > + { > + /* First non-conversion stmt. */ > + if (!stmt_info) > + stmt_info =3D vdef; > + > + if (lane_reducing_op_p (op.code)) > + { > + enum vect_def_type dt; > + tree vectype_op; > + > + /* The last operand of lane-reducing operation is for > + reduction. */ > + gcc_assert (reduc_idx > 0 && reduc_idx =3D=3D (int) op.num_= ops - 1); > + > + if (!vect_is_simple_use (op.ops[0], loop_vinfo, &dt, &vecty= pe_op)) > + return false; > + > + tree type_op =3D TREE_TYPE (op.ops[0]); > + > + if (!vectype_op) > + { > + vectype_op =3D get_vectype_for_scalar_type (loop_vinfo, > + type_op); > + if (!vectype_op) > + return false; > + } > + > + /* For lane-reducing operation vectorizable analysis needs = the > + reduction PHI information */ > + STMT_VINFO_REDUC_DEF (def) =3D phi_info; > + > + /* Each lane-reducing operation has its own input vectype, = while > + reduction PHI will record the input vectype with the lea= st > + lanes. */ > + STMT_VINFO_REDUC_VECTYPE_IN (vdef) =3D vectype_op; > + > + /* To accommodate lane-reducing operations of mixed input > + vectypes, choose input vectype with the least lanes for = the > + reduction PHI statement, which would result in the most > + ncopies for vectorized reduction results. */ > + if (!vectype_in > + || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype= _in))) > + < GET_MODE_SIZE (SCALAR_TYPE_MODE (type_op)))) > + vectype_in =3D vectype_op; > + } > + else > + vectype_in =3D STMT_VINFO_VECTYPE (phi_info); > + } > + > + reduc_def =3D op.ops[reduc_idx]; > reduc_chain_length++; > if (!stmt_info && slp_node) > slp_for_stmt_info =3D SLP_TREE_CHILDREN (slp_for_stmt_info)[0]; > @@ -7747,6 +7796,8 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > > tree vectype_out =3D STMT_VINFO_VECTYPE (stmt_info); > STMT_VINFO_REDUC_VECTYPE (reduc_info) =3D vectype_out; > + STMT_VINFO_REDUC_VECTYPE_IN (reduc_info) =3D vectype_in; > + > gimple_match_op op; > if (!gimple_extract_op (stmt_info->stmt, &op)) > gcc_unreachable (); > @@ -7831,16 +7882,6 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > =3D get_vectype_for_scalar_type (loop_vinfo, > TREE_TYPE (op.ops[i]), slp_op[i]= ); > > - /* To properly compute ncopies we are interested in the widest > - non-reduction input type in case we're looking at a widening > - accumulation that we later handle in vect_transform_reduction. = */ > - if (lane_reducing > - && vectype_op[i] > - && (!vectype_in > - || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_in)= )) > - < GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_o= p[i])))))) > - vectype_in =3D vectype_op[i]; > - > /* Record how the non-reduction-def value of COND_EXPR is defined. > ??? For a chain of multiple CONDs we'd have to match them up al= l. */ > if (op.code =3D=3D COND_EXPR && reduc_chain_length =3D=3D 1) > @@ -7859,14 +7900,6 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > } > } > } > - if (!vectype_in) > - vectype_in =3D STMT_VINFO_VECTYPE (phi_info); > - STMT_VINFO_REDUC_VECTYPE_IN (reduc_info) =3D vectype_in; > - > - /* Each lane-reducing operation has its own input vectype, while reduc= tion > - PHI records the input vectype with least lanes. */ > - if (lane_reducing) > - STMT_VINFO_REDUC_VECTYPE_IN (stmt_info) =3D vectype_in; > > enum vect_reduction_type reduction_type =3D STMT_VINFO_REDUC_TYPE (phi= _info); > STMT_VINFO_REDUC_TYPE (reduc_info) =3D reduction_type; > -- > 2.17.1 > > > ________________________________________ > From: Feng Xue OS > Sent: Thursday, June 20, 2024 1:47 PM > To: Richard Biener > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH 4/8] vect: Determine input vectype for multiple lane-= reducing > > >> + if (lane_reducing_op_p (op.code)) > >> + { > >> + unsigned group_size =3D slp_node ? SLP_TREE_LANES (slp_n= ode) : 0; > >> + tree op_type =3D TREE_TYPE (op.ops[0]); > >> + tree new_vectype_in =3D get_vectype_for_scalar_type (loo= p_vinfo, > >> + op_ty= pe, > >> + group= _size); > > > > I think doing it this way does not adhere to the vector type size const= raint > > with loop vectorization. You should use vect_is_simple_use like the > > original code did as the actual vector definition determines the vector= type > > used. > > OK, though this might be wordy. > > Actually, STMT_VINFO_REDUC_VECTYPE_IN is logically equivalent to nunits_v= ectype > that is determined in vect_determine_vf_for_stmt_1(). So how about settin= g the type > in this function? > > > > > You are always using op.ops[0] here - I think that works because > > reduc_idx is the last operand of all lane-reducing ops. But then > > we should assert reduc_idx !=3D 0 here and add a comment. > > Already added in the following assertion. > > >> + > >> + /* The last operand of lane-reducing operation is for > >> + reduction. */ > >> + gcc_assert (reduc_idx > 0 && reduc_idx =3D=3D (int) op.n= um_ops - 1); > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > >> + > >> + /* For lane-reducing operation vectorizable analysis nee= ds the > >> + reduction PHI information */ > >> + STMT_VINFO_REDUC_DEF (def) =3D phi_info; > >> + > >> + if (!new_vectype_in) > >> + return false; > >> + > >> + /* Each lane-reducing operation has its own input vectyp= e, while > >> + reduction PHI will record the input vectype with the = least > >> + lanes. */ > >> + STMT_VINFO_REDUC_VECTYPE_IN (vdef) =3D new_vectype_in; > >> + > >> + /* To accommodate lane-reducing operations of mixed inpu= t > >> + vectypes, choose input vectype with the least lanes f= or the > >> + reduction PHI statement, which would result in the mo= st > >> + ncopies for vectorized reduction results. */ > >> + if (!vectype_in > >> + || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vect= ype_in))) > >> + < GET_MODE_SIZE (SCALAR_TYPE_MODE (op_type)))) > >> + vectype_in =3D new_vectype_in; > > > > I know this is a fragile area but I always wonder since the accumulatin= g operand > > is the largest (all lane-reducing ops are widening), and that will be > > equal to the > > type of the PHI node, how this condition can be ever true. > > In the original code, accumulating operand is skipped! While it is correc= tly, we > should not count the operand, this is why we call operation lane-reducing= . > > > > > ncopies is determined by the VF, so the comment is at least misleading. > > > >> + } > >> + else > >> + vectype_in =3D STMT_VINFO_VECTYPE (phi_info); > > > > Please initialize vectype_in from phi_info before the loop (that > > should never be NULL). > > > > May not, as the below explanation. > > > I'll note that with your patch it seems we'd initialize vectype_in to > > the biggest > > non-accumulation vector type involved in lane-reducing ops but the accu= mulating > > type might still be larger. Why, when we have multiple lane-reducing > > ops, would > > we chose the largest input here? I see we eventually do > > > > if (slp_node) > > ncopies =3D 1; > > else > > ncopies =3D vect_get_num_copies (loop_vinfo, vectype_in); > > > > but then IIRC we always force a single cycle def for lane-reducing ops(= ?). > > > > In particular for vect_transform_reduction and SLP we rely on > > SLP_TREE_NUMBER_OF_VEC_STMTS while non-SLP uses > > STMT_VINFO_REDUC_VECTYPE_IN. > > > > So I wonder what breaks when we set vectype_in =3D vector type of PHI? > > > > Yes. It is right, nothing is broken. Suppose that a loop contains three d= ot_prods, > two are <16 * char>, one is <8 * short>, and choose <4 * int> as vectype_= in: > > With the patch #7, we get: > > vector<4> int sum_v0 =3D { 0, 0, 0, 0 }; > vector<4> int sum_v1 =3D { 0, 0, 0, 0 }; > vector<4> int sum_v2 =3D { 0, 0, 0, 0 }; > vector<4> int sum_v3 =3D { 0, 0, 0, 0 }; > > loop () { > sum_v0 =3D dot_prod<16 * char>(char_a0, char_a1, sum_v0); > > sum_v0 =3D dot_prod<16 * char>(char_b0, char_b1, sum_v0); > > sum_v0 =3D dot_prod<8 * short>(short_c0_lo, short_c1_lo, sum_v0)= ; > sum_v1 =3D dot_prod<8 * short>(short_c0_hi, short_c1_hi, sum_v1)= ; > > sum_v2 =3D sum_v2; > sum_v3 =3D sum_v3; > } > > The def/use cycles (sum_v2 and sum_v3> would be optimized away finally. > Then this gets same result as setting vectype_in to <8 * short>. > > With the patch #8, we get: > > vector<4> int sum_v0 =3D { 0, 0, 0, 0 }; > vector<4> int sum_v1 =3D { 0, 0, 0, 0 }; > vector<4> int sum_v2 =3D { 0, 0, 0, 0 }; > vector<4> int sum_v3 =3D { 0, 0, 0, 0 }; > > loop () { > sum_v0 =3D dot_prod<16 * char>(char_a0, char_a1, sum_v0); > > sum_v1 =3D dot_prod<16 * char>(char_b0, char_b1, sum_v1); > > sum_v2 =3D dot_prod<8 * short>(short_c0_lo, short_c1_lo, sum_v2)= ; > sum_v3 =3D dot_prod<8 * short>(short_c0_hi, short_c1_hi, sum_v3)= ; > } > > All dot_prods are assigned to separate def/use cycles, and no > dependency. More def/use cycles, higher instruction parallelism, > but there need extra cost in epilogue to combine the result. > > So we consider a somewhat compact def/use layout similar to > single-defuse-cycle, in which two <16 * char> dot_prods are independent, > and cycle 2 and 3 are not used, and this is better than the 1st scheme. > > vector<4> int sum_v0 =3D { 0, 0, 0, 0 }; > vector<4> int sum_v1 =3D { 0, 0, 0, 0 }; > > loop () { > sum_v0 =3D dot_prod<16 * char>(char_a0, char_a1, sum_v0); > > sum_v1 =3D dot_prod<16 * char>(char_b0, char_b1, sum_v1); > > sum_v0 =3D dot_prod<8 * short>(short_c0_lo, short_c1_lo, sum_v0)= ; > sum_v1 =3D dot_prod<8 * short>(short_c0_hi, short_c1_hi, sum_v1)= ; > } > > For this purpose, we need to track the vectype_in that results in > the most ncopies, for this case, the type is <8 * short>. > > BTW: would you please also take a look at patch #7 and #8? > > Thanks, > Feng > > ________________________________________ > From: Richard Biener > Sent: Wednesday, June 19, 2024 9:01 PM > To: Feng Xue OS > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH 4/8] vect: Determine input vectype for multiple lane-= reducing > > On Sun, Jun 16, 2024 at 9:25=E2=80=AFAM Feng Xue OS wrote: > > > > The input vectype of reduction PHI statement must be determined before > > vect cost computation for the reduction. Since lance-reducing operation= has > > different input vectype from normal one, so we need to traverse all red= uction > > statements to find out the input vectype with the least lanes, and set = that to > > the PHI statement. > > > > Thanks, > > Feng > > > > --- > > gcc/ > > * tree-vect-loop.cc (vectorizable_reduction): Determine input v= ectype > > during traversal of reduction statements. > > --- > > gcc/tree-vect-loop.cc | 72 +++++++++++++++++++++++++++++-------------- > > 1 file changed, 49 insertions(+), 23 deletions(-) > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > > index 0f7b125e72d..39aa5cb1197 100644 > > --- a/gcc/tree-vect-loop.cc > > +++ b/gcc/tree-vect-loop.cc > > @@ -7643,7 +7643,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > > { > > stmt_vec_info def =3D loop_vinfo->lookup_def (reduc_def); > > stmt_vec_info vdef =3D vect_stmt_to_vectorize (def); > > - if (STMT_VINFO_REDUC_IDX (vdef) =3D=3D -1) > > + int reduc_idx =3D STMT_VINFO_REDUC_IDX (vdef); > > + > > + if (reduc_idx =3D=3D -1) > > { > > if (dump_enabled_p ()) > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > @@ -7686,10 +7688,50 @@ vectorizable_reduction (loop_vec_info loop_vinf= o, > > return false; > > } > > } > > - else if (!stmt_info) > > - /* First non-conversion stmt. */ > > - stmt_info =3D vdef; > > - reduc_def =3D op.ops[STMT_VINFO_REDUC_IDX (vdef)]; > > + else > > + { > > + /* First non-conversion stmt. */ > > + if (!stmt_info) > > + stmt_info =3D vdef; > > + > > + if (lane_reducing_op_p (op.code)) > > + { > > + unsigned group_size =3D slp_node ? SLP_TREE_LANES (slp_no= de) : 0; > > + tree op_type =3D TREE_TYPE (op.ops[0]); > > + tree new_vectype_in =3D get_vectype_for_scalar_type (loop= _vinfo, > > + op_typ= e, > > + group_= size); > > I think doing it this way does not adhere to the vector type size constra= int > with loop vectorization. You should use vect_is_simple_use like the > original code did as the actual vector definition determines the vector t= ype > used. > > You are always using op.ops[0] here - I think that works because > reduc_idx is the last operand of all lane-reducing ops. But then > we should assert reduc_idx !=3D 0 here and add a comment. > > > + > > + /* The last operand of lane-reducing operation is for > > + reduction. */ > > + gcc_assert (reduc_idx > 0 && reduc_idx =3D=3D (int) op.nu= m_ops - 1); > > + > > + /* For lane-reducing operation vectorizable analysis need= s the > > + reduction PHI information */ > > + STMT_VINFO_REDUC_DEF (def) =3D phi_info; > > + > > + if (!new_vectype_in) > > + return false; > > + > > + /* Each lane-reducing operation has its own input vectype= , while > > + reduction PHI will record the input vectype with the l= east > > + lanes. */ > > + STMT_VINFO_REDUC_VECTYPE_IN (vdef) =3D new_vectype_in; > > + > > + /* To accommodate lane-reducing operations of mixed input > > + vectypes, choose input vectype with the least lanes fo= r the > > + reduction PHI statement, which would result in the mos= t > > + ncopies for vectorized reduction results. */ > > + if (!vectype_in > > + || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vecty= pe_in))) > > + < GET_MODE_SIZE (SCALAR_TYPE_MODE (op_type)))) > > + vectype_in =3D new_vectype_in; > > I know this is a fragile area but I always wonder since the accumulating = operand > is the largest (all lane-reducing ops are widening), and that will be > equal to the > type of the PHI node, how this condition can be ever true. > > ncopies is determined by the VF, so the comment is at least misleading. > > > + } > > + else > > + vectype_in =3D STMT_VINFO_VECTYPE (phi_info); > > Please initialize vectype_in from phi_info before the loop (that > should never be NULL). > > I'll note that with your patch it seems we'd initialize vectype_in to > the biggest > non-accumulation vector type involved in lane-reducing ops but the accumu= lating > type might still be larger. Why, when we have multiple lane-reducing > ops, would > we chose the largest input here? I see we eventually do > > if (slp_node) > ncopies =3D 1; > else > ncopies =3D vect_get_num_copies (loop_vinfo, vectype_in); > > but then IIRC we always force a single cycle def for lane-reducing ops(?)= . > In particular for vect_transform_reduction and SLP we rely on > SLP_TREE_NUMBER_OF_VEC_STMTS while non-SLP uses > STMT_VINFO_REDUC_VECTYPE_IN. > > So I wonder what breaks when we set vectype_in =3D vector type of PHI? > > Richard. > > > + } > > + > > + reduc_def =3D op.ops[reduc_idx]; > > reduc_chain_length++; > > if (!stmt_info && slp_node) > > slp_for_stmt_info =3D SLP_TREE_CHILDREN (slp_for_stmt_info)[0]; > > @@ -7747,6 +7789,8 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > > > > tree vectype_out =3D STMT_VINFO_VECTYPE (stmt_info); > > STMT_VINFO_REDUC_VECTYPE (reduc_info) =3D vectype_out; > > + STMT_VINFO_REDUC_VECTYPE_IN (reduc_info) =3D vectype_in; > > + > > gimple_match_op op; > > if (!gimple_extract_op (stmt_info->stmt, &op)) > > gcc_unreachable (); > > @@ -7831,16 +7875,6 @@ vectorizable_reduction (loop_vec_info loop_vinfo= , > > =3D get_vectype_for_scalar_type (loop_vinfo, > > TREE_TYPE (op.ops[i]), slp_op[= i]); > > > > - /* To properly compute ncopies we are interested in the widest > > - non-reduction input type in case we're looking at a widening > > - accumulation that we later handle in vect_transform_reduction.= */ > > - if (lane_reducing > > - && vectype_op[i] > > - && (!vectype_in > > - || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_i= n))) > > - < GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype= _op[i])))))) > > - vectype_in =3D vectype_op[i]; > > - > > /* Record how the non-reduction-def value of COND_EXPR is define= d. > > ??? For a chain of multiple CONDs we'd have to match them up = all. */ > > if (op.code =3D=3D COND_EXPR && reduc_chain_length =3D=3D 1) > > @@ -7859,14 +7893,6 @@ vectorizable_reduction (loop_vec_info loop_vinfo= , > > } > > } > > } > > - if (!vectype_in) > > - vectype_in =3D STMT_VINFO_VECTYPE (phi_info); > > - STMT_VINFO_REDUC_VECTYPE_IN (reduc_info) =3D vectype_in; > > - > > - /* Each lane-reducing operation has its own input vectype, while red= uction > > - PHI records the input vectype with least lanes. */ > > - if (lane_reducing) > > - STMT_VINFO_REDUC_VECTYPE_IN (stmt_info) =3D vectype_in; > > > > enum vect_reduction_type reduction_type =3D STMT_VINFO_REDUC_TYPE (p= hi_info); > > STMT_VINFO_REDUC_TYPE (reduc_info) =3D reduction_type; > > -- > > 2.17.1