From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-507762-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 98703 invoked by alias); 27 Aug 2019 09:59:01 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 98692 invoked by uid 89); 27 Aug 2019 09:59:01 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-3.7 required=5.0 tests=AWL,BAYES_00,KAM_NUMSUBJECT,SPF_PASS autolearn=no version=3.3.1 spammy=
X-HELO: foss.arm.com
Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 27 Aug 2019 09:58:59 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C727D337;	Tue, 27 Aug 2019 02:58:57 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.62])	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 34FF23F718;	Tue, 27 Aug 2019 02:58:57 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Mail-Followup-To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>,Richard Biener <richard.guenther@gmail.com>,  gcc Patches <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Cc: Richard Biener <richard.guenther@gmail.com>,  gcc Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [SVE] PR86753
References: <CAAgBjMnTn_3hiTxFPW-=QBp3=Pq0oCx1OhUbdRKwN9bWkGQ_UQ@mail.gmail.com>	<CAFiYyc0=MQ4p7974RsQawcgQNEU8p8w10MB6Q24A_1ZS5KKHsg@mail.gmail.com>	<CAFiYyc2_NuddC+cP+BmhmjjEJ2e5Sd5Ne7LTaLMwF5cU_OJW=g@mail.gmail.com>	<mpty2zvjy7o.fsf@arm.com>	<CAAgBjM=usye_qrYR58sKxjFdSJwD4hEiZ_2FbF48NPMHQKyupg@mail.gmail.com>	<CAFiYyc3ksU2GMcZ9vytPx-4DUhxxhva7mCci4ZKog+KdcAh8Bg@mail.gmail.com>	<CAAgBjMm=_L9VoE3mDhFAtemz7_2MDbRiY-10=9yb=7GX9=ZOuA@mail.gmail.com>	<mptsgpsxd6j.fsf@arm.com>	<CAAgBjMkX786wQ2CgtBOTTY_ejD1Zp=KfmAnT08NFkDtNe3ZLJA@mail.gmail.com>	<mptwof4vujx.fsf@arm.com>	<CAAgBjMntcR5nBs96aD6_FJPGSjiCJVGsbGEk_WBxhrhgPAOQBA@mail.gmail.com>	<CAFiYyc10BkstnK2f6X91S3JGZUX5QDz5zF_WDEiJdLxRRFEvwQ@mail.gmail.com>	<CAAgBjM=0MN8j_uC2pp7+S6C2BfQArvtW6u2WDug2pEA4hUAzmw@mail.gmail.com>
Date: Tue, 27 Aug 2019 10:41:00 -0000
In-Reply-To: <CAAgBjM=0MN8j_uC2pp7+S6C2BfQArvtW6u2WDug2pEA4hUAzmw@mail.gmail.com>	(Prathamesh Kulkarni's message of "Mon, 26 Aug 2019 18:29:59 +0530")
Message-ID: <mptblwbszdr.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-IsSubscribed: yes
X-SW-Source: 2019-08/txt/msg01807.txt.bz2

Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> On Mon, 26 Aug 2019 at 14:48, Richard Biener <richard.guenther@gmail.com> wrote:
>>
>> On Sun, Aug 25, 2019 at 11:13 PM Prathamesh Kulkarni
>> <prathamesh.kulkarni@linaro.org> wrote:
>> >
>> > On Fri, 23 Aug 2019 at 19:43, Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> > >
>> > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> > > > On Fri, 23 Aug 2019 at 18:15, Richard Sandiford
>> > > > <richard.sandiford@arm.com> wrote:
>> > > >>
>> > > >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> > > >> > On Thu, 22 Aug 2019 at 16:44, Richard Biener <richard.guenther@gmail.com> wrote:
>> > > >> >> It looks a bit odd to me.  I'd have expected it to work by generating
>> > > >> >> the stmts as before in the vectorizer and then on the stmts we care
>> > > >> >> invoke vn_visit_stmt that does both value-numbering and elimination.
>> > > >> >> Alternatively you could ask the VN state to generate the stmt for
>> > > >> >> you via vn_nary_build_or_lookup () (certainly that needs a bit more
>> > > >> >> work).  One complication might be availability if you don't value-number
>> > > >> >> all stmts in the block, but well.  I'm not sure constraining to a single
>> > > >> >> block is necessary - I've thought of having a "CSE"ing gimple_build
>> > > >> >> for some time (add & CSE new stmts onto a sequence), so one
>> > > >> >> should keep this mode in mind when designing the one working on
>> > > >> >> an existing BB.  Note as you write it it depends on visiting the
>> > > >> >> stmts in proper order - is that guaranteed when for example
>> > > >> >> vectorizing SLP?
>> > > >> > Hi,
>> > > >> > Indeed, I wrote the function with assumption that, stmts would be
>> > > >> > visited in proper order.
>> > > >> > This doesn't affect SLP currently, because call to vn_visit_stmt in
>> > > >> > vect_transform_stmt is
>> > > >> > conditional on cond_to_vec_mask, which is only allocated inside
>> > > >> > vect_transform_loop.
>> > > >> > But I agree we could make it more general.
>> > > >> > AFAIU, the idea of constraining VN to single block was to avoid using defs from
>> > > >> > non-dominating scalar stmts during outer-loop vectorization.
>> > > >>
>> > > >> Maybe we could do the numbering in a separate walk immediately before
>> > > >> the transform phase instead.
>> > > > Um sorry, I didn't understand. Do you mean we should do dom based VN
>> > > > just before transform phase
>> > > > or run full VN ?
>> > >
>> > > No, I just meant that we could do a separate walk of the contents
>> > > of the basic block:
>> > >
>> > > > @@ -8608,6 +8609,8 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>> > > >      {
>> > > >        basic_block bb = bbs[i];
>> > > >        stmt_vec_info stmt_info;
>> > > > +      vn_bb_init (bb);
>> > > > +      loop_vinfo->cond_to_vec_mask = new cond_vmask_map_type (8);
>> > > >
>> > >
>> > > ...here, rather than doing it on the fly during vect_transform_stmt
>> > > itself.  The walk should be gated on LOOP_VINFO_FULLY_MASKED_P so that
>> > > others don't have to pay the compile-time penalty.  (Same for
>> > > cond_to_vec_mask itself really.)
>> > Hi,
>> > Does the attached patch look OK ?
>> > In patch, I put call to vn_visit stmt in bb loop in
>> > vect_transform_loop to avoid replicating logic for processing phi and
>> > stmts.
>> > AFAIU, vect_transform_loop_stmt is only called from bb loop, so
>> > compile time penalty for checking cond_to_vec_mask
>> > should be pretty small ?
>> > If this is not OK, I will walk bb immediately before the bb loop.
>>
>> So if I understand correctly you never have vectorizable COND_EXPRs
>> in SLP mode?  Because we vectorize all SLP chains before entering
>> the loop in vect_transform_loop where you VN existing scalar(!) stmts.

On the "!": the idea behind the patch is to find cases in which a
scalar condition is used in both a statement that needs to be masked
for correctness reasons and a statement that we can choose to mask
if we want to.  It also tries (opportunisticly) to match the ?: order
with other conditions.

That's why it's operating on scalar values rather than vector values.
In principle it could be done as a subpass before vectorisation rather
than on the fly, when there aren't any vector stmts around.

>> Then all this hew hash-table stuff should not be needed since this
>> is what VN should provide you with.  You of course need to visit
>> generated condition stmts.  And condition support is weak
>> in VN due to it possibly having two operations in a single stmt.
>> Bad GIMPLE IL.  So I'm not sure VN is up to the task here or
>> why you even need it given you are doing your own hashing?
> Well, we thought of using VN for comparing operands for cases
> operand_equal_p would not
> work. Actually, VN seems not to be required for test-cases in PR
> because both conditions
> are _4 != 0 (_35 = _4 != 0 and in cond_expr), which works to match
> with operand_equal_p.

Right, that's why I was suggesting in the earlier thread that we
treat value numbering as a follow-on.  But...

> Input to vectorizer is:
>
>  <bb 3> [local count: 1063004407]:
>   # i_20 = PHI <i_16(7), 0(15)>
>   # ivtmp_19 = PHI <ivtmp_9(7), 100(15)>
>   _1 = (long unsigned int) i_20;
>   _2 = _1 * 4;
>   _3 = y_11(D) + _2;
>   _4 = *_3;
>   _5 = z_12(D) + _2;
>   _35 = _4 != 0;
>   iftmp.0_13 = .MASK_LOAD (_5, 32B, _35);
>   iftmp.0_8 = _4 != 0 ? iftmp.0_13 : 10;
>
> In prepare_load_store_mask, we record (ne_expr, _4, 0) -> vec_mask in
> cond_to_vec_mask,
> and in vectorizable_condition, we look up (ne_expr, _4, 0) which does
> not require VN
> since operands are same.
>
> Initially, I was trying to change the generated vectorized code:
>
>   mask__35.8_43 = vect__4.7_41 != vect_cst__42;
>   vect_iftmp.12_50 = VEC_COND_EXPR <vect__4.7_41 != vect_cst__48,
> vect_iftmp.11_47, vect_cst__49>;
>
> where both conditions are equivalent because vect_cst__42 and
> vect_cst__48 are zero vectors but operand_equal_p failed to catch
> those.
>
> Sorry, I mixed up later between scalar and vector stmts.
> I wonder if we then need VN ? Perhaps there might be other cases where
> operands of scalar
> conditions may be equivalent but not match with operand_equal_p ?
> In the attached patch, changing operator==, to compare using
> operand_equal_p works for the tests.

...those are only very simple cases :-) The problem is that ifcvt
doesn't leave the code free of redundancies, and pattern stmt generation
can create redundancies too.  Of course, we could fix those instead.

E.g. for:

void
f (short *restrict x, short *restrict a, short *restrict b, short *restrict c)
{
  for (int i = 0; i < 100; ++i)
    if (a[i] >= 1)
      x[i] = b[i] >= 1 ? a[i] : c[i];
}

ifcvt produces:

  <bb 3> [local count: 1063004407]:
  # i_34 = PHI <i_30(10), 0(21)>
  # ivtmp_5 = PHI <ivtmp_6(10), 100(21)>
  _1 = (long unsigned int) i_34;
  _2 = _1 * 2;
  _3 = a_23(D) + _2;
  _4 = *_3;
  _7 = b_24(D) + _2;
  _49 = _4 > 0;
  _8 = .MASK_LOAD (_7, 16B, _49);
  _12 = _4 > 0;
  _13 = _8 > 0;
  _9 = _12 & _13;
  _10 = _4 > 0;
  _11 = _8 > 0;
  _27 = ~_11;
  _15 = _10 & _27;
  _14 = c_25(D) + _2;
  iftmp.0_26 = .MASK_LOAD (_14, 16B, _15);
  iftmp.0_19 = _9 ? _4 : iftmp.0_26;
  _17 = x_28(D) + _2;
  _50 = _4 > 0;
  .MASK_STORE (_17, 16B, _50, iftmp.0_19);
  i_30 = i_34 + 1;
  ivtmp_6 = ivtmp_5 - 1;
  if (ivtmp_6 != 0)
    goto <bb 10>; [98.99%]
  else
    goto <bb 9>; [1.01%]

  <bb 10> [local count: 1052266994]:
  goto <bb 3>; [100.00%]

which has 4 copies of _4 > 0 (a[i] > 0) and 2 copies of _8 > 0 (b[i] > 0).
Looking through the definition of an SSA name means that we can cope
with these redundancies for single comparisons, but not for comparisons
combined through & and |.  This hurts most when trying to decide whether
to invert comparisons.

But maybe value numbering won't cope with that either :-)

Thanks,
Richard