From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Oezq=EA=gmail.com=richard.guenther@sourceware.org>
Received: from mail-lj1-x233.google.com (mail-lj1-x233.google.com [IPv6:2a00:1450:4864:20::233])
	by sourceware.org (Postfix) with ESMTPS id 8A882385702F
	for <gcc-patches@gcc.gnu.org>; Tue, 15 Aug 2023 09:05:44 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8A882385702F
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-lj1-x233.google.com with SMTP id 38308e7fff4ca-2b9bf52cd08so76440721fa.2
        for <gcc-patches@gcc.gnu.org>; Tue, 15 Aug 2023 02:05:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1692090342; x=1692695142;
        h=content-transfer-encoding:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=9RtuATkZ/DDjXKqUvhAdR0JcfP8yA9zgUJ8Xw9GkVLA=;
        b=je4T6TjAFABBVeuts5KKuY5BU0ATV79Nwe7ER0n8cZq+hm7bQmEOzysvDQ65Ktq5dN
         3GKTHeVsYMqH675l5s+ltMyh/hkhiXzT8wb5VdsEbAa9usWPphX4x5uyQfu5XQncXoVM
         /Mz7CyQOLW5QVekgSM1sWhw6QwhKaiQW40/vluaBOWEmMAfUmCC8XXEGxffr4Xf0m7HR
         aa81sEZQLKckn/8PuxCiZXQrbZBAbHTLKDs9FYDxveHjXtieP1JBhym1zvuFlt1R1IhE
         pZv+K8t99ku5XinHl7UtMLt5+QwsR3yClGYN++da39FMJ7tP2rndmKqqsOOusHqTbO+Y
         HG9w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1692090342; x=1692695142;
        h=content-transfer-encoding:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=9RtuATkZ/DDjXKqUvhAdR0JcfP8yA9zgUJ8Xw9GkVLA=;
        b=SHLXLhQOA0R1RwoKUXNRW1raf3fo974CfCPFJO3cszHe4rmU/8gANJUhiayiwkjtmn
         cFcABNDPwzH8kq4uj2XrQKMs98+iEV6FpFwpTdxas/KIJbqLsIOQXJ6ssE6TG1kUgbGZ
         fbJJjRWd041IdVLFk7Xx8/kWDFPgOMvZbAAW+CvAPxAWZT82A0yms7Ya7zdX9byfYGIY
         Iy6diSoPXrvvfYYznsYccN7ILqJDdR7Tzosm+Jcp+l0ntRjxnMIXNiFRuv+h3XyqFLxc
         agADLkOpvkC9Ggej0mJ2eragRe56fJi8ToGsD7SikkLJvOC19GtX4K1lWZ1b/42PoO2x
         O+aQ==
X-Gm-Message-State: AOJu0Yx+JYJLEIUBYvLyewwxHSof9Q2fSqEJKkbKu/7jjMIS2VgtyW0L
	OzJjMWEO2a22ZDrHHkDwwAXZ+MVdhZdq+KBopRc=
X-Google-Smtp-Source: AGHT+IGyo04SqMH5/rHL4nCzCT+1L3LAB8LP+RshjcD6jTWoPL+1xXyQQzKyIKaTmKycllJvN4Flbs2ptwGhH8nvmlI=
X-Received: by 2002:a2e:96d9:0:b0:2bb:8cd9:ea29 with SMTP id
 d25-20020a2e96d9000000b002bb8cd9ea29mr772879ljj.39.1692090342139; Tue, 15 Aug
 2023 02:05:42 -0700 (PDT)
MIME-Version: 1.0
References: <b39e934e-869e-840d-eb7a-5b2de24146a8@linux.ibm.com>
 <7314a4eb-26d0-e33e-94c2-31daca9f490e@linux.ibm.com> <mptzg2trf1l.fsf@arm.com>
 <dce2077b-81b6-0b78-eb54-a9339309ae78@linux.ibm.com> <mpth6p1r9ny.fsf@arm.com>
 <c6b56548-f451-03bf-e680-b92f90b248dd@linux.ibm.com> <CAFiYyc0d931GyuqwUXfRP-8UzfYbbEL7up0k91sJ5u3hh-Xu3Q@mail.gmail.com>
 <mptleecpudu.fsf@arm.com>
In-Reply-To: <mptleecpudu.fsf@arm.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Tue, 15 Aug 2023 11:04:23 +0200
Message-ID: <CAFiYyc02_QNBwpXJKXOD29q1EK53teshC6KvrfWnOjmmYSgLQg@mail.gmail.com>
Subject: Re: [PATCH] vect: Move VMAT_GATHER_SCATTER handlings from final loop nest
To: Richard Biener <richard.guenther@gmail.com>, "Kewen.Lin" <linkw@linux.ibm.com>, 
	GCC Patches <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Tue, Aug 15, 2023 at 10:44=E2=80=AFAM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Tue, Aug 15, 2023 at 4:44=E2=80=AFAM Kewen.Lin <linkw@linux.ibm.com>=
 wrote:
> >>
> >> on 2023/8/14 22:16, Richard Sandiford wrote:
> >> > No, it was more that 219-142=3D77, so it seems like a lot of lines
> >> > are being duplicated rather than simply being moved.  (Unlike for
> >> > VMAT_LOAD_STORE_LANES, which was even a slight LOC saving, and so
> >> > was a clear improvement.)
> >> >
> >> > So I was just wondering if there was any obvious factoring-out that
> >> > could be done to reduce the duplication.
> >>
> >> ah, thanks for the clarification!
> >>
> >> I think the main duplication are on the loop body beginning and end,
> >> let's take a look at them in details:
> >>
> >> +  if (memory_access_type =3D=3D VMAT_GATHER_SCATTER)
> >> +    {
> >> +      gcc_assert (alignment_support_scheme =3D=3D dr_aligned
> >> +                 || alignment_support_scheme =3D=3D dr_unaligned_supp=
orted);
> >> +      gcc_assert (!grouped_load && !slp_perm);
> >> +
> >> +      unsigned int inside_cost =3D 0, prologue_cost =3D 0;
> >>
> >> // These above are newly added.
> >>
> >> +      for (j =3D 0; j < ncopies; j++)
> >> +       {
> >> +         /* 1. Create the vector or array pointer update chain.  */
> >> +         if (j =3D=3D 0 && !costing_p)
> >> +           {
> >> +             if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
> >> +               vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_in=
fo,
> >> +                                            slp_node, &gs_info, &data=
ref_ptr,
> >> +                                            &vec_offsets);
> >> +             else
> >> +               dataref_ptr
> >> +                 =3D vect_create_data_ref_ptr (vinfo, first_stmt_info=
, aggr_type,
> >> +                                             at_loop, offset, &dummy,=
 gsi,
> >> +                                             &ptr_incr, false, bump);
> >> +           }
> >> +         else if (!costing_p)
> >> +           {
> >> +             gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo));
> >> +             if (!STMT_VINFO_GATHER_SCATTER_P (stmt_info))
> >> +               dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, p=
tr_incr,
> >> +                                              gsi, stmt_info, bump);
> >> +           }
> >>
> >> // These are for dataref_ptr, in the final looop nest we deal with mor=
e cases
> >> on simd_lane_access_p and diff_first_stmt_info, but don't handle
> >> STMT_VINFO_GATHER_SCATTER_P any more, very few (one case) can be share=
d between,
> >> IMHO factoring out it seems like a overkill.
> >>
> >> +
> >> +         if (mask && !costing_p)
> >> +           vec_mask =3D vec_masks[j];
> >>
> >> // It's merged out from j =3D=3D 0 and j !=3D 0
> >>
> >> +
> >> +         gimple *new_stmt =3D NULL;
> >> +         for (i =3D 0; i < vec_num; i++)
> >> +           {
> >> +             tree final_mask =3D NULL_TREE;
> >> +             tree final_len =3D NULL_TREE;
> >> +             tree bias =3D NULL_TREE;
> >> +             if (!costing_p)
> >> +               {
> >> +                 if (loop_masks)
> >> +                   final_mask
> >> +                     =3D vect_get_loop_mask (loop_vinfo, gsi, loop_ma=
sks,
> >> +                                           vec_num * ncopies, vectype=
,
> >> +                                           vec_num * j + i);
> >> +                 if (vec_mask)
> >> +                   final_mask =3D prepare_vec_mask (loop_vinfo, mask_=
vectype,
> >> +                                                  final_mask, vec_mas=
k, gsi);
> >> +
> >> +                 if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info=
))
> >> +                   dataref_ptr =3D bump_vector_ptr (vinfo, dataref_pt=
r, ptr_incr,
> >> +                                                  gsi, stmt_info, bum=
p);
> >> +               }
> >>
> >> // This part is directly copied from the original, the original gets u=
pdated by
> >> removing && !STMT_VINFO_GATHER_SCATTER_P.  Due to its size, I didn't c=
onsider
> >> this before, do you prefer me to factor this part out?
> >>
> >> +             if (gs_info.ifn !=3D IFN_LAST)
> >> +               {
> >> ...
> >> +               }
> >> +             else
> >> +               {
> >> +                 /* Emulated gather-scatter.  */
> >> ...
> >>
> >> // This part is just moved from the original.
> >>
> >> +             vec_dest =3D vect_create_destination_var (scalar_dest, v=
ectype);
> >> +             /* DATA_REF is null if we've already built the statement=
.  */
> >> +             if (data_ref)
> >> +               {
> >> +                 vect_copy_ref_info (data_ref, DR_REF (first_dr_info-=
>dr));
> >> +                 new_stmt =3D gimple_build_assign (vec_dest, data_ref=
);
> >> +               }
> >> +             new_temp =3D make_ssa_name (vec_dest, new_stmt);
> >> +             gimple_set_lhs (new_stmt, new_temp);
> >> +             vect_finish_stmt_generation (vinfo, stmt_info, new_stmt,=
 gsi);
> >> +
> >> +             /* Store vector loads in the corresponding SLP_NODE.  */
> >> +             if (slp)
> >> +               slp_node->push_vec_def (new_stmt);
> >> +
> >> +         if (!slp && !costing_p)
> >> +           STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
> >> +       }
> >> +
> >> +      if (!slp && !costing_p)
> >> +       *vec_stmt =3D STMT_VINFO_VEC_STMTS (stmt_info)[0];
> >>
> >> // This part is some subsequent handlings, it's duplicated from the or=
iginal
> >> but removing some more useless code.  I guess this part is not worthy
> >> being factored out?
> >>
> >> +      if (costing_p)
> >> +       {
> >> +         if (dump_enabled_p ())
> >> +           dump_printf_loc (MSG_NOTE, vect_location,
> >> +                            "vect_model_load_cost: inside_cost =3D %u=
, "
> >> +                            "prologue_cost =3D %u .\n",
> >> +                            inside_cost, prologue_cost);
> >> +       }
> >> +      return true;
> >> +    }
> >>
> >> // Duplicating the dumping, I guess it's unnecessary to be factored ou=
t.
> >>
> >> oh, I just noticed that this should be shorten as
> >> "if (costing_p && dump_enabled_p ())" instead, just the same as what's
> >> adopted for VMAT_LOAD_STORE_LANES dumping.
> >
> > Just to mention, the original motivational idea was even though we
> > duplicate some
> > code we make it overall more readable and thus maintainable.
>
> Not sure I necessarily agree with the "thus".  Maybe it tends to be
> true with a good, well-factored API.  But the internal vector APIs
> make it extremely easy to get things wrong.  If we have multiple copies
> of the same operation, the tendency is to fix bugs in the copy that the
> bugs were seen in.  It's easy to forget that other copies exist
> elsewhere that probably need updating in the same way.
>
> > In the end we
> > might have vectorizable_load () for analysis but have not only
> > load_vec_info_type but one for each VMAT_* which means multiple separat=
e
> > vect_transform_load () functions.  Currently vectorizable_load is struc=
tured
> > very inconsistently, having the transforms all hang off a single
> > switch (vmat-kind) {} would be an improvement IMHO.
>
> Yeah, agree vectorizable_load ought to be refactored.
>
> > But sure some of our internal APIs are verbose and maybe badly factored=
,
> > any improvement there is welcome.  Inventing new random APIs just to
> > save a few lines of code without actually making the code more readable
> > is IMHO bad.
>
> OK, fair enough.  So the idea is: see where we end up and then try to
> improve/factor the APIs in a less peephole way?

Yeah, I think that's the only good way forward.

> Thanks,
> Richard
>
> > But, if we can for example enhance prepare_vec_mask to handle both loop
> > and conditional mask and handle querying the mask that would be fine
> > (of course you need to check all uses to see if that makes sense).
> >
> > Richard.
> >
> >> BR,
> >> Kewen