From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x232.google.com (mail-lj1-x232.google.com [IPv6:2a00:1450:4864:20::232]) by sourceware.org (Postfix) with ESMTPS id 48E023857027 for ; Wed, 26 Jul 2023 12:00:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 48E023857027 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lj1-x232.google.com with SMTP id 38308e7fff4ca-2b93fba1f62so97567081fa.1 for ; Wed, 26 Jul 2023 05:00:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690372851; x=1690977651; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=UVYZwpe0N5TyMBzU1+52SeaQCGzkrLbyLHms1CwSRwE=; b=Q1Kn1NWuZ3D4sqVg0CbLiaEgidt2vFPn+BEmLUOVwfb15fd6BWo25JRvqo9R1e4tSC RuormGC5zgSETy8gHdamKZk0bNipJsOd+o4XZYgCWToZPaCQBa/XpD2aESykyuCzwFcg 3/8TxarWfS7eN8pcdXrB4IEPQ8On7/uPtioGYFImirbOujrtCUYbIEwtVv+aF9HGT8PK RXYplUZ87IMN1z7IYewF3PHmux76jR8esDcAI7jhI8d/ENKyg2sDgXZQGNSdS16e6CbM CgXYFNV809RuRN0euxyuDOizdenByl0JvYz2pKVxMafAKm+18J2WpaCX4oMO7mA81h4y hw9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690372851; x=1690977651; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UVYZwpe0N5TyMBzU1+52SeaQCGzkrLbyLHms1CwSRwE=; b=eNK/jljpTnflkCwg0sx/aJ+KvcQcCpSCBdKa8SdfotEPGXXAd/x7emoppQ8HXuB9ii YHcMeh7dS2/Vv9UuKFmwYGfWLp1RT4w2e3OXVo3RON6WC4NYE3ELZKUu5ETH6rpk1TQB 5F36uZVOrjwWFVg9ol5tB1SntKS/SIPl1dpoB+IVYP1Ls9qLo+xh94mk/BmAWCTcwhHa zhSDoEMxYwdT2j+Qx50StcnBii6tXH01/0td9m3hwauAyc81JrxqswFgaYnC+egYhkQq 2wz3MypzasNSXREHYJ/Plh4eCG206A8qAuziUSZeP/EOzCKsMeNeYmY5KI2w4KVsscqA +lHQ== X-Gm-Message-State: ABy/qLZyWidPU+SP/Gl+PxcGm528V8G4RNT6MQzzCLru6eVxbL2Fteya JqF/OZTv78zpbpKt89GtGBHSrv6u7AUD0i3reWM= X-Google-Smtp-Source: APBJJlHyX/4KYpzhJ/ec+4w1BN10qiupfU9hubJEHjWNmdmP8KS5ReO8uI546Xlzf4/bas/iElIHbCR6RHTK60Q7AQ0= X-Received: by 2002:a05:651c:10d2:b0:2b6:fa54:cec1 with SMTP id l18-20020a05651c10d200b002b6fa54cec1mr1239327ljn.48.1690372850914; Wed, 26 Jul 2023 05:00:50 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Wed, 26 Jul 2023 14:00:04 +0200 Message-ID: Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625] To: Richard Biener , Hao Liu OS , "GCC-patches@gcc.gnu.org" , richard.sandiford@arm.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Jul 26, 2023 at 12:12=E2=80=AFPM Richard Sandiford wrote: > > Richard Biener writes: > > On Wed, Jul 26, 2023 at 11:14=E2=80=AFAM Richard Sandiford > > wrote: > >> > >> Richard Biener writes: > >> > On Wed, Jul 26, 2023 at 4:02=E2=80=AFAM Hao Liu OS via Gcc-patches > >> > wrote: > >> >> > >> >> > When was STMT_VINFO_REDUC_DEF empty? I just want to make sure th= at we're not papering over an issue elsewhere. > >> >> > >> >> Yes, I also wonder if this is an issue in vectorizable_reduction. = Below is the the gimple of "gcc.target/aarch64/sve/cost_model_13.c": > >> >> > >> >> : > >> >> # res_18 =3D PHI > >> >> # i_20 =3D PHI > >> >> _1 =3D (long unsigned int) i_20; > >> >> _2 =3D _1 * 2; > >> >> _3 =3D x_14(D) + _2; > >> >> _4 =3D *_3; > >> >> _5 =3D (unsigned short) _4; > >> >> res.0_6 =3D (unsigned short) res_18; > >> >> _7 =3D _5 + res.0_6; <-- The current = stmt_info > >> >> res_15 =3D (short int) _7; > >> >> i_16 =3D i_20 + 1; > >> >> if (n_11(D) > i_16) > >> >> goto ; > >> >> else > >> >> goto ; > >> >> > >> >> : > >> >> goto ; > >> >> > >> >> It looks like that STMT_VINFO_REDUC_DEF should be "res_18 =3D PHI <= res_15(7), 0(6)>"? > >> >> The status here is: > >> >> STMT_VINFO_REDUC_IDX (stmt_info): 1 > >> >> STMT_VINFO_REDUC_TYPE (stmt_info): TREE_CODE_REDUCTION > >> >> STMT_VINFO_REDUC_VECTYPE (stmt_info): 0x0 > >> > > >> > Not all stmts in the SSA cycle forming the reduction have > >> > STMT_VINFO_REDUC_DEF set, > >> > only the last (latch def) and live stmts have at the moment. > >> > >> Ah, thanks. In that case, Hao, I think we can avoid the ICE by changi= ng: > >> > >> if ((kind =3D=3D scalar_stmt || kind =3D=3D vector_stmt || kind =3D= =3D vec_to_scalar) > >> && vect_is_reduction (stmt_info)) > >> > >> to: > >> > >> if ((kind =3D=3D scalar_stmt || kind =3D=3D vector_stmt || kind =3D= =3D vec_to_scalar) > >> && STMT_VINFO_LIVE_P (stmt_info) > >> && vect_is_reduction (stmt_info)) > >> > >> instead of using a null check. > > > > But as seen you will miss stmts that are part of the reduction then? > > Yeah, but the code is doing a maximum of all the reductions in the loop: > > /* ??? Ideally we'd do COUNT reductions in parallel, but unfortunat= ely > that's not yet the case. */ > ops->reduction_latency =3D MAX (ops->reduction_latency, base * coun= t); > > So as it stands, we only need to see each reduction (as opposed to each > reduction statement) once. > > But we do want to know the length of each reduction... Yeah, it really tells the current costing infrastructure is far from perfect here ... We could pass you the stmt_info for the reduction PHI (aka the info_for_reduction) once with a special kind, vect_reduction, so you could walk relevant stmts from that. You get a number of add_stmts for each reduction, that could be a hint of the length as well ... (but I think we always pass the 'main' stmt_info here) > > In principle we could put STMT_VINFO_REDUC_DEF to other stmts > > as well. See vectorizable_reduction in the > > > > while (reduc_def !=3D PHI_RESULT (reduc_def_phi)) > > > > loop. > > Nod. That's where I'd got the STMT_VINFO_LIVE_P thing from. > > >> I see that vectorizable_reduction calculates a reduc_chain_length. > >> Would it be OK to store that in the stmt_vec_info? I suppose the > >> AArch64 code should be multiplying by that as well. (It would be a > >> separate patch from this one though.) > > > > I don't think that's too relevant here (it also counts noop conversions= ). > > Bah. I'm loath to copy that loop and just pick out the relevant > statements though. > > I suppose if every statement had a STMT_VINFO_REDUC_DEF, aarch64 could > maintain a hash map from STMT_VINFO_REDUC_DEF to total latencies, and > then take the maximum of those total latencies. It sounds pretty > complex though... Ick. Yeah. I think most of the costing is still nearly GIGO, I hardly see= any loop vectorization disqualified because of cost (happens for BB SLP though)= . Richard. > Thanks, > Richard >