Hi, Reworked according to your comments, see inline for clarification. Is this OK for trunk? gcc/ChangeLog: 2019-10-28 Andre Vieira PR 88915 * tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration. * tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter and make the valueize function pointer also take a void pointer. * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap around vn_valueize, to call it without a context. (process_bb): Use vn_valueize_wrapper instead of vn_valueize. * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos. (~_loop_vec_info): Release epilogue_vinfos. (vect_analyze_loop_costing): Use knowledge of main VF to estimate number of iterations of epilogue. (vect_analyze_loop_2): Adapt to analyse main loop for all supported vector sizes when vect-epilogues-nomask=1. Also keep track of lowest versioning threshold needed for main loop. (vect_analyze_loop): Likewise. (find_in_mapping): New helper function. (update_epilogue_loop_vinfo): New function. (vect_transform_loop): When vectorizing epilogues re-use analysis done on main loop and call update_epilogue_loop_vinfo to update it. * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert stmts on loop preheader edge. (vect_do_peeling): Enable skip-vectors when doing loop versioning if we decided to vectorize epilogues. Update epilogues NITERS and construct ADVANCE to update epilogues data references where needed. * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos. (vect_do_peeling, vect_update_inits_of_drs, determine_peel_for_niter, vect_analyze_loop): Add or update declarations. * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already created loop_vec_info's for epilogues when available. Otherwise analyse epilogue separately. Cheers, Andre On 28/10/2019 14:16, Richard Biener wrote: > On Fri, 25 Oct 2019, Andre Vieira (lists) wrote: > >> Hi, >> >> This is the reworked patch after your comments. >> >> I have moved the epilogue check into the analysis form disguised under >> '!epilogue_vinfos.is_empty ()'. This because I realized that I am doing the >> "lowest threshold" check there. >> >> The only place where we may reject an epilogue_vinfo is when we know the >> number of scalar iterations and we realize the number of iterations left after >> the main loop are not enough to enter the vectorized epilogue so we optimize >> away that code-gen. The only way we know this to be true is if the number of >> scalar iterations are known and the peeling for alignment is known. So we know >> we will enter the main loop regardless, so whether the threshold we use is for >> a lower VF or not it shouldn't matter as much, I would even like to think that >> check isn't done, but I am not sure... Might be worth checking as an >> optimization. >> >> >> Is this OK for trunk? > > + for (epilogue_phi_gsi = gsi_start_phis (epilogue_bbs[i]); > + !gsi_end_p (epilogue_phi_gsi); gsi_next (&epilogue_phi_gsi)) > + { > .. > + if (STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo)) > + pattern_worklist.safe_push (stmt_vinfo); > + > + related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo); > + while (related_vinfo && related_vinfo != stmt_vinfo) > + { > > I think PHIs cannot have patterns. You can assert > that STMT_VINFO_RELATED_STMT is NULL I think. Done. > > + related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo); > + while (related_vinfo && related_vinfo != stmt_vinfo) > + { > + related_worklist.safe_push (related_vinfo); > + /* Set BB such that the assert in > + 'get_initial_def_for_reduction' is able to determine that > + the BB of the related stmt is inside this loop. */ > + gimple_set_bb (STMT_VINFO_STMT (related_vinfo), > + gimple_bb (new_stmt)); > + related_vinfo = STMT_VINFO_RELATED_STMT (related_vinfo); > + } > > do we really keep references to "nested" patterns? Thus, do you > need this loop? Changed and added asserts. They didn't trigger so I suppose you are right, I didn't know at the time whether it was possible, so I just operated on the side of caution. Can remove the asserts and so on if you want. > > + /* The PATTERN_DEF_SEQs in the epilogue were constructed using the > + original main loop and thus need to be updated to refer to the > cloned > + variables used in the epilogue. */ > + for (unsigned i = 0; i < pattern_worklist.length (); ++i) > + { > ... > + op = simplify_replace_tree (op, NULL_TREE, NULL_TREE, > + &find_in_mapping, &mapping); > + gimple_set_op (seq, j, op); > > you do this for the pattern-def seq but not for the related one. > I guess you ran into this for COND_EXPR conditions. I wondered > to use a shared worklist for both the def-seq and the main pattern > stmt or at least to split out the replacement so you can share it. I think that was it yeah, reworked it now to use the same list. Less code, thanks! > > + /* Data references for gather loads and scatter stores do not use > the > + updated offset we set using ADVANCE. Instead we have to make > sure the > + reference in the data references point to the corresponding copy > of > + the original in the epilogue. */ > + if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)) > + { > + int j; > + if (TREE_CODE (DR_REF (dr)) == MEM_REF) > + j = 0; > + else if (TREE_CODE (DR_REF (dr)) == ARRAY_REF) > + j = 1; > + else > + gcc_unreachable (); > + > + if (tree *new_op = mapping.get (TREE_OPERAND (DR_REF (dr), j))) > + { > + DR_REF (dr) = unshare_expr (DR_REF (dr)); > + TREE_OPERAND (DR_REF (dr), j) = *new_op; > + } > > huh, do you really only ever see MEM_REF or ARRAY_REF here? > I would guess using simplify_replace_tree is safer. > There's also DR_BASE_ADDRESS - we seem to leave the DRs partially > updated, is that correct? Yeah can use simplify_replace_tree indeed. And I have changed it so it updates DR_BASE_ADDRESS. I think DR_BASE_ADDRESS never actually changed in the way we use data_references... Either way, replacing them if they do change is cleaner and more future proof. > > Otherwise looks OK to me. > > Thanks, > Richard. > > >> gcc/ChangeLog: >> 2019-10-25 Andre Vieira >> >> PR 88915 >> * tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration. >> * tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter >> and make the valueize function pointer also take a void pointer. >> * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap >> around vn_valueize, to call it without a context. >> (process_bb): Use vn_valueize_wrapper instead of vn_valueize. >> * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos. >> (~_loop_vec_info): Release epilogue_vinfos. >> (vect_analyze_loop_costing): Use knowledge of main VF to estimate >> number of iterations of epilogue. >> (vect_analyze_loop_2): Adapt to analyse main loop for all supported >> vector sizes when vect-epilogues-nomask=1. Also keep track of lowest >> versioning threshold needed for main loop. >> (vect_analyze_loop): Likewise. >> (find_in_mapping): New helper function. >> (update_epilogue_loop_vinfo): New function. >> (vect_transform_loop): When vectorizing epilogues re-use analysis done >> on main loop and call update_epilogue_loop_vinfo to update it. >> * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert >> stmts on loop preheader edge. >> (vect_do_peeling): Enable skip-vectors when doing loop versioning if >> we decided to vectorize epilogues. Update epilogues NITERS and >> construct ADVANCE to update epilogues data references where needed. >> * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos. >> (vect_do_peeling, vect_update_inits_of_drs, >> determine_peel_for_niter, vect_analyze_loop): Add or update declarations. >> * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already >> created loop_vec_info's for epilogues when available. Otherwise >> analyse >> epilogue separately. >> >> >> >> Cheers, >> Andre >> >