From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 0CECD3858C33 for ; Wed, 26 Jul 2023 10:12:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0CECD3858C33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4F2201692; Wed, 26 Jul 2023 03:13:04 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C8F283F67D; Wed, 26 Jul 2023 03:12:20 -0700 (PDT) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener ,Hao Liu OS , "GCC-patches\@gcc.gnu.org" , richard.sandiford@arm.com Cc: Hao Liu OS , "GCC-patches\@gcc.gnu.org" Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625] References: Date: Wed, 26 Jul 2023 11:12:19 +0100 In-Reply-To: (Richard Biener's message of "Wed, 26 Jul 2023 12:02:01 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-20.4 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Richard Biener writes: > On Wed, Jul 26, 2023 at 11:14=E2=80=AFAM Richard Sandiford > wrote: >> >> Richard Biener writes: >> > On Wed, Jul 26, 2023 at 4:02=E2=80=AFAM Hao Liu OS via Gcc-patches >> > wrote: >> >> >> >> > When was STMT_VINFO_REDUC_DEF empty? I just want to make sure that= we're not papering over an issue elsewhere. >> >> >> >> Yes, I also wonder if this is an issue in vectorizable_reduction. Be= low is the the gimple of "gcc.target/aarch64/sve/cost_model_13.c": >> >> >> >> : >> >> # res_18 =3D PHI >> >> # i_20 =3D PHI >> >> _1 =3D (long unsigned int) i_20; >> >> _2 =3D _1 * 2; >> >> _3 =3D x_14(D) + _2; >> >> _4 =3D *_3; >> >> _5 =3D (unsigned short) _4; >> >> res.0_6 =3D (unsigned short) res_18; >> >> _7 =3D _5 + res.0_6; <-- The current st= mt_info >> >> res_15 =3D (short int) _7; >> >> i_16 =3D i_20 + 1; >> >> if (n_11(D) > i_16) >> >> goto ; >> >> else >> >> goto ; >> >> >> >> : >> >> goto ; >> >> >> >> It looks like that STMT_VINFO_REDUC_DEF should be "res_18 =3D PHI "? >> >> The status here is: >> >> STMT_VINFO_REDUC_IDX (stmt_info): 1 >> >> STMT_VINFO_REDUC_TYPE (stmt_info): TREE_CODE_REDUCTION >> >> STMT_VINFO_REDUC_VECTYPE (stmt_info): 0x0 >> > >> > Not all stmts in the SSA cycle forming the reduction have >> > STMT_VINFO_REDUC_DEF set, >> > only the last (latch def) and live stmts have at the moment. >> >> Ah, thanks. In that case, Hao, I think we can avoid the ICE by changing: >> >> if ((kind =3D=3D scalar_stmt || kind =3D=3D vector_stmt || kind =3D=3D= vec_to_scalar) >> && vect_is_reduction (stmt_info)) >> >> to: >> >> if ((kind =3D=3D scalar_stmt || kind =3D=3D vector_stmt || kind =3D=3D= vec_to_scalar) >> && STMT_VINFO_LIVE_P (stmt_info) >> && vect_is_reduction (stmt_info)) >> >> instead of using a null check. > > But as seen you will miss stmts that are part of the reduction then? Yeah, but the code is doing a maximum of all the reductions in the loop: /* ??? Ideally we'd do COUNT reductions in parallel, but unfortunately that's not yet the case. */ ops->reduction_latency =3D MAX (ops->reduction_latency, base * count); So as it stands, we only need to see each reduction (as opposed to each reduction statement) once. But we do want to know the length of each reduction... > In principle we could put STMT_VINFO_REDUC_DEF to other stmts > as well. See vectorizable_reduction in the > > while (reduc_def !=3D PHI_RESULT (reduc_def_phi)) > > loop. Nod. That's where I'd got the STMT_VINFO_LIVE_P thing from. >> I see that vectorizable_reduction calculates a reduc_chain_length. >> Would it be OK to store that in the stmt_vec_info? I suppose the >> AArch64 code should be multiplying by that as well. (It would be a >> separate patch from this one though.) > > I don't think that's too relevant here (it also counts noop conversions). Bah. I'm loath to copy that loop and just pick out the relevant statements though. I suppose if every statement had a STMT_VINFO_REDUC_DEF, aarch64 could maintain a hash map from STMT_VINFO_REDUC_DEF to total latencies, and then take the maximum of those total latencies. It sounds pretty complex though... Thanks, Richard