From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18857 invoked by alias); 4 Nov 2019 16:27:14 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 18846 invoked by uid 89); 4 Nov 2019 16:27:14 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-11.0 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,GIT_PATCH_3,SPF_PASS autolearn=ham version=3.3.1 spammy=H*f:sk:5a71368, H*i:sk:5a71368, is X-HELO: mx1.suse.de Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 04 Nov 2019 16:27:11 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 0C20BB2B3; Mon, 4 Nov 2019 16:27:09 +0000 (UTC) Date: Mon, 04 Nov 2019 16:27:00 -0000 User-Agent: K-9 Mail for Android In-Reply-To: <5a713689-3070-f550-d441-7520e5f266e1@arm.com> References: <5edb0b00-4ae2-41c0-80ec-76de15d0b110@arm.com> <6b66b279-eba2-80be-58f0-f72dc2316fcb@arm.com> <93a0b5b9-312d-b5fe-5d5b-eacce7e19344@arm.com> <5a713689-3070-f550-d441-7520e5f266e1@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [SLP] SLP vectorization: vectorize vector constructors To: Joel Hutton CC: GCC Patches ,nd From: Richard Biener Message-ID: X-SW-Source: 2019-11/txt/msg00133.txt.bz2 On November 4, 2019 4:30:57 PM GMT+01:00, Joel Hutton = wrote: >First copy bounced from mailing list. > > > On 30/10/2019 13:49, Richard Biener wrote: > > >=C2=A0 >> >> >=C2=A0 >>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * expr.= c (store_constructor): Add case for constructor >of > > > vectors. > > >=C2=A0 > Why do you need this?=C2=A0 The vectorizer already creates su= ch=20 >CTORs.=C2=A0 Any > > >=C2=A0 > testcase that you can show? > > >=C2=A0 > > > >=C2=A0 > typedef long v2di __attribute__((vector_size(16))); > > >=C2=A0 > v2di v; > > >=C2=A0 > void > > >=C2=A0 > foo() > > >=C2=A0 > { > > >=C2=A0 >=C2=A0=C2=A0=C2=A0 v =3D (v2di){v[1], v[0]}; > > >=C2=A0 > } >> > There is a special case for single element vectors, where the >vector > > > mode and > > > the element mode are the same. >> > I've changed this slightly, as your solution of using VECTOR_MODE_P > >didn't > > > work for my case where: > > >=C2=A0=C2=A0=C2=A0 emode =3D E_DImode > > >=C2=A0=C2=A0=C2=A0 eltmode =3D E_DImode > > > On aarch64. As E_DImode is not a vector mode. >> > Based on some checks I found in verify_gimple, I set the type of >the > > > replacement >> > constructor to the same as the original constructor, as >verify_gimple > > > seems to >> > expect flattened types. i.e. a vector of vectors of ints should >have > > > type vector > > > of ints. > > > > Huh.=C2=A0 On aarch64 for the above testcase I see mode V2DImode and > > emode =3D eltmode =3D DImode.=C2=A0 That's just a regular CTOR with > > non-vector elements so not sure why you need any adjustment at all > > here? > > > > It looks like your vectorization of the CTOR introduces a > > V2DImode CTOR of vector(1) long elements which incidentially > > have DImode.=C2=A0 That's because somehow V2DI vectorization isn't > > profitable but V1DI one is.=C2=A0 In the end it's a noop transform > > but the testcase shows that for V2DI vectorization we fail > > to cost the CTOR in the scalar cost computation (you have to > > include the pattern root there I guess). > > >Yes, sorry, I was unclear about this, it's the new constructor that the > >change is needed for. > > > Technically we feed valid GIMPLE to the expander so we indeed > > shouldn't ICE.=C2=A0 So I'd like to have the earlier keying on > > vec_vec_init_p match the later assert we run into, thus > > sth like > > > > Index: gcc/expr.c > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > --- gcc/expr.c=C2=A0 (revision 277765) > > +++ gcc/expr.c=C2=A0 (working copy) > > @@ -6809,6 +6809,7 @@ store_constructor (tree exp, rtx target, > >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= && n_elts.is_constant (&const_n_elts)) > >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= machine_mode emode =3D eltmode; > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bool vect= or_typed_elts_p =3D false; > > > >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= if (CONSTRUCTOR_NELTS (exp) > >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 && (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (exp, > > 0)->value)) > > @@ -6819,13 +6820,14 @@ store_constructor (tree exp, rtx target, >>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 * TYPE_VECTOR_SUBPARTS (etype), > >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 n_elts)); > >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 emode =3D TYPE_MODE (etype); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 vector_typed_elts_p =3D true; > >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 } >>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = icode =3D convert_optab_handler (vec_init_optab, mode, >emode); > >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= if (icode !=3D CODE_FOR_nothing) > >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 { > >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 unsigned int n =3D const_n_elts; > > > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 if (emode !=3D eltmode) > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 if (vector_typed_elts_p) > >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 n =3D CONSTRUCTOR_NELTS (e= xp); > >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 vec_vec_init_p =3D true; > > > > >I've used this, thank you. > > > >=C2=A0 > @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo) > > >=C2=A0 >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (is_a (vinfo)) > > >=C2=A0 > >> >=C2=A0 > the ChangeLog doesn't mention this.=C2=A0 I guess this isn't >necessary > > >=C2=A0 > unless you fail vectorizing late which you shouldn't. > > >=C2=A0 > > > > It's necessary to avoid: > > > > > >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 removing stmts = twice for constructors of the form=20 >{_1,_1,_1,_1} >> >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 removing stmts t= hat define ssa names used elsewhere (which >> > previously wasn't a consideration because the scalar_stmts were >stores > > > to memory, not assignments to ssa names) > > > > OK, but the code you are patching is just supposed to remove stores > > from the final scalar stmt seeds - it doesn't expect any loads there > > which is probably what happens.=C2=A0 I'd instead add a > > > >=C2=A0=C2=A0=C2=A0 /* Do not CSE the stmts feeding a CTOR, they might ha= ve uses > >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 outside of the vectorized stmts.=C2= =A0 */ > >=C2=A0=C2=A0=C2=A0 if (SLP_INSTANCE_ROOT_STMT (instance)) > >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 continue; > > > > before the loop over SLP_TREE_SCALAR_STMTS. >Done. > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!subparts.is_cons= tant () || !(subparts.to_constant () >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D=3D CONSTRUCTOR_NELTS >(rhs))) > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 continue; > > > > can be better written as if (maybe_ne (subparts, CONSTRUCTOR_NELTS=20 >(rhs))) >Done. > > > +/* Vectorize the instance root.=C2=A0 Return success or failure. */ > > + > > +void > > > > since it's now void the comment has to be adjusted. >Done. > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 vect_schedule_slp_instance (node, > >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 instance, bst_map= ); > > > > this now fits in one line. >Done. > > > OK with those changes. >Tested and bootstrapped on aarch64. >OK? Ok.=20 Thanks,=20 Richard.=20 >2019-11-04=C2=A0 Joel Hutton=C2=A0 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * expr.c (store_construc= tor): Modify to handle single element >vectors. > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * tree-vect-slp.c (vect_= analyze_slp_instance): Add case for >vector constructors. > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (vect_slp_check_for_cons= tructors): New function. >=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (vect_slp_analyze_bb_1): = Call new function to check for vector >constructors. > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (vectorize_slp_instance_= root_stmt): New function. > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (vect_schedule_slp): Cal= l new function to vectorize root stmt >of vector constructors. > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * tree-vectorizer.h (SLP= _INSTANCE_ROOT_STMT): New field. > >gcc/testsuite/ChangeLog: > >2019-11-04=C2=A0 Joel Hutton=C2=A0 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * gcc.dg/vect/bb-slp-40.= c: New test. > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * gcc.dg/vect/bb-slp-41.= c: New test.