On Mon, 7 Nov 2022, Andre Vieira (lists) wrote: > > On 07/11/2022 11:05, Richard Biener wrote: > > On Fri, 4 Nov 2022, Andre Vieira (lists) wrote: > > > >> Sorry for the delay, just been reminded I still had this patch outstanding > >> from last stage 1. Hopefully since it has been mostly reviewed it could go > >> in > >> for this stage 1? > >> > >> I addressed the comments and gave the slp-part of vectorizable_call some > >> TLC > >> to make it work. > >> > >> I also changed vect_get_slp_defs as I noticed that the call from > >> vectorizable_call was creating an auto_vec with 'nargs' that might be less > >> than the number of children in the slp_node > > how so? Please fix that in the caller. It looks like it probably > > shoud use vect_nargs instead? > Well that was my first intuition, but when I looked at it further the variant > it's calling: > void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec > > *vec_oprnds, unsigned n) > > Is actually creating a vector of vectors of slp defs. So for each child of > slp_node it calls: > void vect_get_slp_defs (slp_tree slp_node, vec *vec_defs) > > Which returns a vector of vectorized defs. So vect_nargs would be the right > size for the inner vec of vec_defs, but the outer should have the same > number of elements as the original slp_node has children. No, the inner vector is the vector of vectors for each arg, the outer vector should be the one for each argument. Hm, that was a confusing sentence. That said, the number of SLP children of a call node should eventually be the number of arguments of the call (plus masks, etc.). So it looks about correct besides the vec_nargs issue? > > However, at the call site (vectorizable_call), the operand we pass to > vect_get_slp_defs 'vec_defs', is initialized before the code-path is > specialized for slp_node. I'll go see if I can change the call site to not > have to do that, given the continue at the end of the if (slp_node) BB I don't > think it needs to use vec_defs after it, but it may require some massaging to > be able to define it separately for each code-path. > > > > >> , so that quick_push might not be > >> safe as is, so I added the reserve (n) to ensure it's safe to push. I > >> didn't > >> actually come across any failure because of it though. Happy to split this > >> into a separate patch if needed. > >> > >> Bootstrapped and regression tested on aarch64-none-linux-gnu and > >> x86_64-pc-linux-gnu. > >> > >> OK for trunk? > > I'll leave final approval to Richard but > > > > - This only needs 1 bit, but occupies the full 16 to ensure a nice > > + This only needs 1 bit, but occupies the full 15 to ensure a nice > > layout. */ > > unsigned int vectorizable : 16; > > > > you don't actually change the width of the bitfield. I would find > > it more natural to have > > > > signed int type0 : 7; > > signed int type0_vtrans : 1; > > signed int type1 : 7; > > signed int type1_vtrans : 1; > > > > with typeN_vtrans specifying how the types transform when vectorized. > > I would imagine another variant we could need is narrow/widen > > according to either result or other argument type? That said, > > just your flag would then be > > > > signed int type0 : 7; > > signed int pad : 1; > > signed int type1 : 7; > > signed int type1_vect_as_scalar : 1; > > > > ? > That's a cool idea! I'll leave it as a single bit for now like that, if we > want to re-use it for multiple transformations we will obviously need to > rename & give it more bits. > > > >> gcc/ChangeLog: > >> > >>         * config/aarch64/aarch64.md (ftrunc2): New > >> pattern. > >>         * config/aarch64/iterators.md (FRINTNZ): New iterator. > >>         (frintnz_mode): New int attribute. > >>         (VSFDF): Make iterator conditional. > >>         * internal-fn.def (FTRUNC_INT): New IFN. > >>         * internal-fn.cc (ftrunc_int_direct): New define. > >>         (expand_ftrunc_int_optab_fn): New custom expander. > >>         (direct_ftrunc_int_optab_supported_p): New supported_p. > >>         * internal-fn.h (direct_internal_fn_info): Add new member > >>         type1_is_scalar_p. > >>         * match.pd: Add to the existing TRUNC pattern match. > >>         * optabs.def (ftrunc_int): New entry. > >>         * stor-layout.h (element_precision): Moved from here... > >>         * tree.h (element_precision): ... to here. > >>         (element_type): New declaration. > >>         * tree.cc (element_type): New function. > >>         (element_precision): Changed to use element_type. > >>         * tree-vect-stmts.cc (vectorizable_internal_function): Add > >> support for > >>         IFNs with different input types. > >>         (vect_get_scalar_oprnds): New function. > >>         (vectorizable_call): Teach to handle IFN_FTRUNC_INT. > >>         * tree-vect-slp.cc (check_scalar_arg_ok): New function. > >>         (vect_slp_analyze_node_operations): Use check_scalar_arg_ok. > >>         (vect_get_slp_defs): Ensure vec_oprnds has enough slots to push. > >>         * doc/md.texi: New entry for ftrunc pattern name. > >>         * doc/sourcebuild.texi (aarch64_frintzx_ok): New target. > >> > >> gcc/testsuite/ChangeLog: > >> > >>         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintnz > >> instructions available. > >>         * lib/target-supports.exp: Added aarch64_frintnzx_ok target and > >> aarch64_frintz options. > >>         * gcc.target/aarch64/frintnz.c: New test. > >>         * gcc.target/aarch64/frintnz_vec.c: New test. > >>         * gcc.target/aarch64/frintnz_slp.c: New test. > >> > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)