From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x233.google.com (mail-lj1-x233.google.com [IPv6:2a00:1450:4864:20::233]) by sourceware.org (Postfix) with ESMTPS id 8A882385702F for ; Tue, 15 Aug 2023 09:05:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8A882385702F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lj1-x233.google.com with SMTP id 38308e7fff4ca-2b9bf52cd08so76440721fa.2 for ; Tue, 15 Aug 2023 02:05:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692090342; x=1692695142; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=9RtuATkZ/DDjXKqUvhAdR0JcfP8yA9zgUJ8Xw9GkVLA=; b=je4T6TjAFABBVeuts5KKuY5BU0ATV79Nwe7ER0n8cZq+hm7bQmEOzysvDQ65Ktq5dN 3GKTHeVsYMqH675l5s+ltMyh/hkhiXzT8wb5VdsEbAa9usWPphX4x5uyQfu5XQncXoVM /Mz7CyQOLW5QVekgSM1sWhw6QwhKaiQW40/vluaBOWEmMAfUmCC8XXEGxffr4Xf0m7HR aa81sEZQLKckn/8PuxCiZXQrbZBAbHTLKDs9FYDxveHjXtieP1JBhym1zvuFlt1R1IhE pZv+K8t99ku5XinHl7UtMLt5+QwsR3yClGYN++da39FMJ7tP2rndmKqqsOOusHqTbO+Y HG9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692090342; x=1692695142; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9RtuATkZ/DDjXKqUvhAdR0JcfP8yA9zgUJ8Xw9GkVLA=; b=SHLXLhQOA0R1RwoKUXNRW1raf3fo974CfCPFJO3cszHe4rmU/8gANJUhiayiwkjtmn cFcABNDPwzH8kq4uj2XrQKMs98+iEV6FpFwpTdxas/KIJbqLsIOQXJ6ssE6TG1kUgbGZ fbJJjRWd041IdVLFk7Xx8/kWDFPgOMvZbAAW+CvAPxAWZT82A0yms7Ya7zdX9byfYGIY Iy6diSoPXrvvfYYznsYccN7ILqJDdR7Tzosm+Jcp+l0ntRjxnMIXNiFRuv+h3XyqFLxc agADLkOpvkC9Ggej0mJ2eragRe56fJi8ToGsD7SikkLJvOC19GtX4K1lWZ1b/42PoO2x O+aQ== X-Gm-Message-State: AOJu0Yx+JYJLEIUBYvLyewwxHSof9Q2fSqEJKkbKu/7jjMIS2VgtyW0L OzJjMWEO2a22ZDrHHkDwwAXZ+MVdhZdq+KBopRc= X-Google-Smtp-Source: AGHT+IGyo04SqMH5/rHL4nCzCT+1L3LAB8LP+RshjcD6jTWoPL+1xXyQQzKyIKaTmKycllJvN4Flbs2ptwGhH8nvmlI= X-Received: by 2002:a2e:96d9:0:b0:2bb:8cd9:ea29 with SMTP id d25-20020a2e96d9000000b002bb8cd9ea29mr772879ljj.39.1692090342139; Tue, 15 Aug 2023 02:05:42 -0700 (PDT) MIME-Version: 1.0 References: <7314a4eb-26d0-e33e-94c2-31daca9f490e@linux.ibm.com> In-Reply-To: From: Richard Biener Date: Tue, 15 Aug 2023 11:04:23 +0200 Message-ID: Subject: Re: [PATCH] vect: Move VMAT_GATHER_SCATTER handlings from final loop nest To: Richard Biener , "Kewen.Lin" , GCC Patches , richard.sandiford@arm.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Aug 15, 2023 at 10:44=E2=80=AFAM Richard Sandiford wrote: > > Richard Biener writes: > > On Tue, Aug 15, 2023 at 4:44=E2=80=AFAM Kewen.Lin = wrote: > >> > >> on 2023/8/14 22:16, Richard Sandiford wrote: > >> > No, it was more that 219-142=3D77, so it seems like a lot of lines > >> > are being duplicated rather than simply being moved. (Unlike for > >> > VMAT_LOAD_STORE_LANES, which was even a slight LOC saving, and so > >> > was a clear improvement.) > >> > > >> > So I was just wondering if there was any obvious factoring-out that > >> > could be done to reduce the duplication. > >> > >> ah, thanks for the clarification! > >> > >> I think the main duplication are on the loop body beginning and end, > >> let's take a look at them in details: > >> > >> + if (memory_access_type =3D=3D VMAT_GATHER_SCATTER) > >> + { > >> + gcc_assert (alignment_support_scheme =3D=3D dr_aligned > >> + || alignment_support_scheme =3D=3D dr_unaligned_supp= orted); > >> + gcc_assert (!grouped_load && !slp_perm); > >> + > >> + unsigned int inside_cost =3D 0, prologue_cost =3D 0; > >> > >> // These above are newly added. > >> > >> + for (j =3D 0; j < ncopies; j++) > >> + { > >> + /* 1. Create the vector or array pointer update chain. */ > >> + if (j =3D=3D 0 && !costing_p) > >> + { > >> + if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > >> + vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_in= fo, > >> + slp_node, &gs_info, &data= ref_ptr, > >> + &vec_offsets); > >> + else > >> + dataref_ptr > >> + =3D vect_create_data_ref_ptr (vinfo, first_stmt_info= , aggr_type, > >> + at_loop, offset, &dummy,= gsi, > >> + &ptr_incr, false, bump); > >> + } > >> + else if (!costing_p) > >> + { > >> + gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); > >> + if (!STMT_VINFO_GATHER_SCATTER_P (stmt_info)) > >> + dataref_ptr =3D bump_vector_ptr (vinfo, dataref_ptr, p= tr_incr, > >> + gsi, stmt_info, bump); > >> + } > >> > >> // These are for dataref_ptr, in the final looop nest we deal with mor= e cases > >> on simd_lane_access_p and diff_first_stmt_info, but don't handle > >> STMT_VINFO_GATHER_SCATTER_P any more, very few (one case) can be share= d between, > >> IMHO factoring out it seems like a overkill. > >> > >> + > >> + if (mask && !costing_p) > >> + vec_mask =3D vec_masks[j]; > >> > >> // It's merged out from j =3D=3D 0 and j !=3D 0 > >> > >> + > >> + gimple *new_stmt =3D NULL; > >> + for (i =3D 0; i < vec_num; i++) > >> + { > >> + tree final_mask =3D NULL_TREE; > >> + tree final_len =3D NULL_TREE; > >> + tree bias =3D NULL_TREE; > >> + if (!costing_p) > >> + { > >> + if (loop_masks) > >> + final_mask > >> + =3D vect_get_loop_mask (loop_vinfo, gsi, loop_ma= sks, > >> + vec_num * ncopies, vectype= , > >> + vec_num * j + i); > >> + if (vec_mask) > >> + final_mask =3D prepare_vec_mask (loop_vinfo, mask_= vectype, > >> + final_mask, vec_mas= k, gsi); > >> + > >> + if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info= )) > >> + dataref_ptr =3D bump_vector_ptr (vinfo, dataref_pt= r, ptr_incr, > >> + gsi, stmt_info, bum= p); > >> + } > >> > >> // This part is directly copied from the original, the original gets u= pdated by > >> removing && !STMT_VINFO_GATHER_SCATTER_P. Due to its size, I didn't c= onsider > >> this before, do you prefer me to factor this part out? > >> > >> + if (gs_info.ifn !=3D IFN_LAST) > >> + { > >> ... > >> + } > >> + else > >> + { > >> + /* Emulated gather-scatter. */ > >> ... > >> > >> // This part is just moved from the original. > >> > >> + vec_dest =3D vect_create_destination_var (scalar_dest, v= ectype); > >> + /* DATA_REF is null if we've already built the statement= . */ > >> + if (data_ref) > >> + { > >> + vect_copy_ref_info (data_ref, DR_REF (first_dr_info-= >dr)); > >> + new_stmt =3D gimple_build_assign (vec_dest, data_ref= ); > >> + } > >> + new_temp =3D make_ssa_name (vec_dest, new_stmt); > >> + gimple_set_lhs (new_stmt, new_temp); > >> + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt,= gsi); > >> + > >> + /* Store vector loads in the corresponding SLP_NODE. */ > >> + if (slp) > >> + slp_node->push_vec_def (new_stmt); > >> + > >> + if (!slp && !costing_p) > >> + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); > >> + } > >> + > >> + if (!slp && !costing_p) > >> + *vec_stmt =3D STMT_VINFO_VEC_STMTS (stmt_info)[0]; > >> > >> // This part is some subsequent handlings, it's duplicated from the or= iginal > >> but removing some more useless code. I guess this part is not worthy > >> being factored out? > >> > >> + if (costing_p) > >> + { > >> + if (dump_enabled_p ()) > >> + dump_printf_loc (MSG_NOTE, vect_location, > >> + "vect_model_load_cost: inside_cost =3D %u= , " > >> + "prologue_cost =3D %u .\n", > >> + inside_cost, prologue_cost); > >> + } > >> + return true; > >> + } > >> > >> // Duplicating the dumping, I guess it's unnecessary to be factored ou= t. > >> > >> oh, I just noticed that this should be shorten as > >> "if (costing_p && dump_enabled_p ())" instead, just the same as what's > >> adopted for VMAT_LOAD_STORE_LANES dumping. > > > > Just to mention, the original motivational idea was even though we > > duplicate some > > code we make it overall more readable and thus maintainable. > > Not sure I necessarily agree with the "thus". Maybe it tends to be > true with a good, well-factored API. But the internal vector APIs > make it extremely easy to get things wrong. If we have multiple copies > of the same operation, the tendency is to fix bugs in the copy that the > bugs were seen in. It's easy to forget that other copies exist > elsewhere that probably need updating in the same way. > > > In the end we > > might have vectorizable_load () for analysis but have not only > > load_vec_info_type but one for each VMAT_* which means multiple separat= e > > vect_transform_load () functions. Currently vectorizable_load is struc= tured > > very inconsistently, having the transforms all hang off a single > > switch (vmat-kind) {} would be an improvement IMHO. > > Yeah, agree vectorizable_load ought to be refactored. > > > But sure some of our internal APIs are verbose and maybe badly factored= , > > any improvement there is welcome. Inventing new random APIs just to > > save a few lines of code without actually making the code more readable > > is IMHO bad. > > OK, fair enough. So the idea is: see where we end up and then try to > improve/factor the APIs in a less peephole way? Yeah, I think that's the only good way forward. > Thanks, > Richard > > > But, if we can for example enhance prepare_vec_mask to handle both loop > > and conditional mask and handle querying the mask that would be fine > > (of course you need to check all uses to see if that makes sense). > > > > Richard. > > > >> BR, > >> Kewen