From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1144 invoked by alias); 9 Aug 2010 11:01:12 -0000 Received: (qmail 934 invoked by uid 22791); 9 Aug 2010 11:01:10 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,TW_TM X-Spam-Check-By: sourceware.org Received: from mail-iw0-f175.google.com (HELO mail-iw0-f175.google.com) (209.85.214.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 09 Aug 2010 11:01:04 +0000 Received: by iwn38 with SMTP id 38so3788872iwn.20 for ; Mon, 09 Aug 2010 04:01:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.172.83 with SMTP id k19mr18894260ibz.114.1281351659221; Mon, 09 Aug 2010 04:00:59 -0700 (PDT) Received: by 10.231.199.134 with HTTP; Mon, 9 Aug 2010 04:00:59 -0700 (PDT) In-Reply-To: References: <4C34E6CF.4030608@redhat.com> <4C3607AD.50406@redhat.com> <4C3630BD.3040807@redhat.com> Date: Mon, 09 Aug 2010 11:01:00 -0000 Message-ID: Subject: Re: [patch] Support vectorization of min/max location pattern - take 2 From: Richard Guenther To: Ira Rosen Cc: gcc-patches@gcc.gnu.org, Richard Henderson Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2010-08/txt/msg00645.txt.bz2 On Mon, Aug 9, 2010 at 12:53 PM, Ira Rosen wrote: > > > Richard Guenther wrote on 09/08/2010 12:50:14 > PM: >> > I implemented VEC_COND_EXPR extension in the attached patch. >> > >> > For reduction epilogue I defined new tree codes >> > REDUC_MIN/MAX_FIRST/LAST_LOC_EXPR. >> >> Why do you need new tree codes here? > > After vector loop we have two vectors one with four minimums and the seco= nd > with four corresponding array indexes. The extraction of the correct index > out of four can be done differently on each platform (including problemat= ic > vector comparisons). So the tree code is just to tie those two operations together? >> They btw need >> documentation - just stating the new operand is a vector isn't >> very informative. =A0They need documentation in generic.texi. > > Sorry about that, I'll add documentation for both. Thanks. >> >> Likewise the new RTX codes (what are they for??) > > Probably there is a better way to do that, but I needed to map new vector > comparison instructions that compare floats and return ints. So you just need this at expansion time then and the RTXen will never appear in RTL code? Why not use a target hook for expanding those comparisons then? Btw, my GSoC student implemented lowering of generic vector comparisons resulting in a mask in tree-vect-generic.c using a target hook that eventually uses target specific builtins. I attached the latest patch for that. >> need documentation >> in rtl.texi. >> >> Btw, you still don't adjust if-conversion to fold the COND_EXPR >> it generates - that would generate the MIN/MAX expressions >> directly and you wouldn't have to pattern match the COND_EXPR. > > I don't see how it can help to avoid pattern matching. We will still need > to match MIN/MAX's arguments with the COND_EXPR arguments. True, but you need to match MIN/MAX instead. Well, my point is that if-convert shouldn't create a COND_EXPR in that case. Richard. > Thanks, > Ira > >> >> Richard. >> >> > Bootstrapped and tested on powerpc64-suse-linux. >> > OK for mainline? >> > >> > Thanks, >> > Ira >> > >> > ChangeLog: >> > >> > =A0 =A0 =A0 =A0* tree-pretty-print.c (dump_generic_node): Handle new c= odes. >> > =A0 =A0 =A0 =A0* optabs.c (optab_for_tree_code): Likewise. >> > =A0 =A0 =A0 =A0(init_optabs): Initialize new optabs. >> > =A0 =A0 =A0 =A0(get_vcond_icode): Handle vector condition with differe= nt types >> > =A0 =A0 =A0 =A0of comparison and then/else operands. >> > =A0 =A0 =A0 =A0(expand_vec_cond_expr_p, expand_vec_cond_expr): Likewis= e. >> > =A0 =A0 =A0 =A0(get_vec_reduc_minloc_expr_icode): New function. >> > =A0 =A0 =A0 =A0(expand_vec_reduc_minloc_expr): New function. >> > =A0 =A0 =A0 =A0* optabs.h (enum convert_optab_index): Add new optabs. >> > =A0 =A0 =A0 =A0(vcondc_optab): Define. >> > =A0 =A0 =A0 =A0(vcondcu_optab, reduc_min_first_loc_optab, > reduc_min_last_loc_optab, >> > =A0 =A0 =A0 =A0reduc_max_last_loc_optab): Likewise. >> > =A0 =A0 =A0 =A0(expand_vec_cond_expr_p): Add arguments. >> > =A0 =A0 =A0 =A0(get_vec_reduc_minloc_expr_code): Declare. >> > =A0 =A0 =A0 =A0(expand_vec_reduc_minloc_expr): Declare. >> > =A0 =A0 =A0 =A0* genopinit.c (optabs): Add vcondc_optab, vcondcu_optab, >> > =A0 =A0 =A0 =A0reduc_min_first_loc_optab, reduc_min_last_loc_optab, >> > =A0 =A0 =A0 =A0reduc_max_last_loc_optab. >> > =A0 =A0 =A0 =A0* rtl.def (GEF): New rtx. >> > =A0 =A0 =A0 =A0(GTF, LEF, LTF, EQF, NEQF): Likewise. >> > =A0 =A0 =A0 =A0* jump.c (reverse_condition): Handle new rtx. >> > =A0 =A0 =A0 =A0(swap_condition): Likewise. >> > =A0 =A0 =A0 =A0* expr.c (expand_expr_real_2): Expand new reduction tre= e codes. >> > =A0 =A0 =A0 =A0* gimple-pretty-print.c (dump_binary_rhs): Print new co= des. >> > =A0 =A0 =A0 =A0* tree-vectorizer.h (enum vect_compound_pattern): New. >> > =A0 =A0 =A0 =A0(struct _stmt_vec_info): Add new field compound_pattern= . Add > macro >> > =A0 =A0 =A0 =A0to access it. >> > =A0 =A0 =A0 =A0(is_pattern_stmt_p): Return true for compound pattern. >> > =A0 =A0 =A0 =A0(get_minloc_reduc_epilogue_code): New. >> > =A0 =A0 =A0 =A0(vectorizable_condition): Add arguments. >> > =A0 =A0 =A0 =A0(vect_recog_compound_func_ptr): New function-pointer ty= pe. >> > =A0 =A0 =A0 =A0(NUM_COMPOUND_PATTERNS): New. >> > =A0 =A0 =A0 =A0(vect_compound_pattern_recog): Declare. >> > =A0 =A0 =A0 =A0* tree-vect-loop.c (vect_determine_vectorization_factor= ): Fix > assert >> > =A0 =A0 =A0 =A0for compound patterns. >> > =A0 =A0 =A0 =A0(vect_analyze_scalar_cycles_1): Fix typo. Detect compou= nd > reduction >> > =A0 =A0 =A0 =A0patterns. Update comment. >> > =A0 =A0 =A0 =A0(vect_analyze_scalar_cycles): Update comment. >> > =A0 =A0 =A0 =A0(destroy_loop_vec_info): Update def stmt for the origin= al > pattern >> > =A0 =A0 =A0 =A0statement. >> > =A0 =A0 =A0 =A0(vect_is_simple_reduction_1): Skip compound pattern sta= tements > in >> > =A0 =A0 =A0 =A0uses check. Add spaces. Skip commutativity and type che= cks for >> > =A0 =A0 =A0 =A0minimum location statement. Fix printings. >> > =A0 =A0 =A0 =A0(vect_model_reduction_cost): Add min/max location patte= rn cost >> > =A0 =A0 =A0 =A0computation. >> > =A0 =A0 =A0 =A0(vect_create_epilog_for_reduction): Don't retrieve the = original >> > =A0 =A0 =A0 =A0statement for compound pattern. Fix comment accordingly= . Get > tree >> > =A0 =A0 =A0 =A0code for reduction epilogue of min/max location computa= tion >> > =A0 =A0 =A0 =A0according to the comparison operation. Don't expect to = find an >> > =A0 =A0 =A0 =A0exit phi node for min/max statement. >> > =A0 =A0 =A0 =A0(vectorizable_reduction): Skip check for uses in loop f= or > compound >> > =A0 =A0 =A0 =A0patterns. Don't retrieve the original statement for com= pound > pattern. >> > =A0 =A0 =A0 =A0Call vectorizable_condition () with additional paramete= rs. Skip >> > =A0 =A0 =A0 =A0reduction code check for compound patterns. Prepare ope= rands for >> > =A0 =A0 =A0 =A0min/max location statement vectorization and pass them = to >> > =A0 =A0 =A0 =A0vectorizable_condition (). >> > =A0 =A0 =A0 =A0(vectorizable_live_operation): Return TRUE for compound > patterns. >> > =A0 =A0 =A0 =A0* tree.def (REDUC_MIN_FIRST_LOC_EXPR): Define. >> > =A0 =A0 =A0 =A0(REDUC_MIN_LAST_LOC_EXPR, REDUC_MAX_FIRST_LOC_EXPR, >> > =A0 =A0 =A0 =A0REDUC_MAX_LAST_LOC_EXPR): Likewise. >> > =A0 =A0 =A0 =A0* cfgexpand.c (expand_debug_expr): Handle new tree code= s. >> > =A0 =A0 =A0 =A0* tree-vect-patterns.c (vect_recog_min_max_loc_pattern): > Declare. >> > =A0 =A0 =A0 =A0(vect_recog_compound_func_ptrs): Likewise. >> > =A0 =A0 =A0 =A0(vect_recog_min_max_loc_pattern): New function. >> > =A0 =A0 =A0 =A0(vect_compound_pattern_recog): Likewise. >> > =A0 =A0 =A0 =A0* tree-vect-stmts.c (process_use): Mark compound pattern > statements >> > as >> > =A0 =A0 =A0 =A0used by reduction. >> > =A0 =A0 =A0 =A0(vect_mark_stmts_to_be_vectorized): Allow compound patt= ern > statements >> > =A0 =A0 =A0 =A0to be used by reduction. >> > =A0 =A0 =A0 =A0(vectorizable_condition): Update comment, add arguments= . Skip > checks >> > =A0 =A0 =A0 =A0irrelevant for compound pattern. Check that if comparis= on and >> > then/else >> > =A0 =A0 =A0 =A0operands are of different types, the size of the types = is > equal.Check >> > =A0 =A0 =A0 =A0that reduction epilogue, if needed, is supported. Prepa= re > operands >> > =A0 =A0 =A0 =A0using new arguments. >> > =A0 =A0 =A0 =A0(vect_analyze_stmt): Allow nested cycle statements to b= e used by >> > =A0 =A0 =A0 =A0reduction. Call vectorizable_condition () with addition= al > arguments. >> > =A0 =A0 =A0 =A0(vect_transform_stmt): Call vectorizable_condition () w= ith > additional >> > =A0 =A0 =A0 =A0arguments. >> > =A0 =A0 =A0 =A0(new_stmt_vec_info): Initialize new fields. >> > =A0 =A0 =A0 =A0* tree-inline.c (estimate_operator_cost): Handle new tr= ee codes. >> > =A0 =A0 =A0 =A0* tree-vect-generic.c (expand_vector_operations_1): Lik= ewise. >> > =A0 =A0 =A0 =A0* tree-cfg.c (verify_gimple_assign_binary): Likewise. >> > =A0 =A0 =A0 =A0* config/rs6000/rs6000.c (rs6000_emit_vector_compare_in= ner): Add >> > =A0 =A0 =A0 =A0argument. Handle new rtx. >> > =A0 =A0 =A0 =A0(rs6000_emit_vector_compare): Handle the case of result= type >> > different >> > =A0 =A0 =A0 =A0from the operands, update calls to > rs6000_emit_vector_compare_inner >> > (). >> > =A0 =A0 =A0 =A0(rs6000_emit_vector_cond_expr): Use new codes in case of > different >> > =A0 =A0 =A0 =A0types. >> > =A0 =A0 =A0 =A0* config/rs6000/altivec.md (UNSPEC_REDUC_MINLOC): New. >> > =A0 =A0 =A0 =A0(altivec_gefv4sf): New pattern. >> > =A0 =A0 =A0 =A0(altivec_gtfv4sf, altivec_eqfv4sf, reduc_min_first_loc_= v4sfv4si, >> > =A0 =A0 =A0 =A0reduc_min_last_loc_v4sfv4si, reduc_max_first_loc_v4sfv4= si, >> > =A0 =A0 =A0 =A0reduc_max_last_loc_v4sfv4si): Likewise. >> > =A0 =A0 =A0 =A0* tree-vect-slp.c (vect_get_and_check_slp_defs): Fail f= or > compound >> > =A0 =A0 =A0 =A0patterns. >> > >> > testsuite/ChangeLog: >> > >> > =A0 =A0 =A0 =A0* gcc.dg/vect/vect.exp: Define how to run tests named > fast-math*.c >> > =A0 =A0 =A0 =A0* lib/target-supports.exp (check_effective_target_vect_= cmp): > New. >> > =A0 =A0 =A0 =A0* gcc.dg/vect/fast-math-no-pre-minmax-loc-1.c: New test. >> > =A0 =A0 =A0 =A0* gcc.dg/vect/fast-math-no-pre-minmax-loc-2.c, >> > =A0 =A0 =A0 =A0gcc.dg/vect/fast-math-no-pre-minmax-loc-3.c, >> > =A0 =A0 =A0 =A0gcc.dg/vect/fast-math-no-pre-minmax-loc-4.c, >> > =A0 =A0 =A0 =A0gcc.dg/vect/fast-math-no-pre-minmax-loc-5.c, >> > =A0 =A0 =A0 =A0gcc.dg/vect/fast-math-no-pre-minmax-loc-6.c, >> > =A0 =A0 =A0 =A0gcc.dg/vect/fast-math-no-pre-minmax-loc-7.c, >> > =A0 =A0 =A0 =A0gcc.dg/vect/fast-math-no-pre-minmax-loc-8.c, >> > =A0 =A0 =A0 =A0gcc.dg/vect/fast-math-no-pre-minmax-loc-9.c, >> > =A0 =A0 =A0 =A0gcc.dg/vect/fast-math-no-pre-minmax-loc-10.c: Likewise. >> > >> > >> > (See attached file: minloc.txt) >> > >> >> >> >> I can think of 2 portability problems with your current solution: >> >> >> >> (1) SSE4.1 would prefer to use BLEND instructions, which perform >> >> =A0 =A0 that entire (X & M) | (Y & ~M) operation in one insn. >> >> >> >> (2) The mips C.cond.PS instruction does *not* produce a bitmask >> >> =A0 =A0 like altivec or sse do. =A0Instead it sets multiple condition >> >> =A0 =A0 codes. =A0One then uses MOV[TF].PS to merge the elements based >> >> =A0 =A0 on the individual condition codes. =A0While there's no direct >> >> =A0 =A0 corresponding instruction that will operate on integers, I >> >> =A0 =A0 don't think it would be too difficult to use MOV[TF].G or >> >> =A0 =A0 BC1AND2[FT] instructions to emulate it. =A0In any case, this >> >> =A0 =A0 is again a case where you don't want to expose any part of >> >> =A0 =A0 the VEC_COND at the gimple level. >> >> >> >> >> >> r~ > >