Hi richi, Committed the patch with the suggested change in revision r277569. I will look at the follow-up, from what I understood the idea was to add a field to the dr_vec_info where we keep the "extra-offset" that represents the position in the DR wrt the current loop/epilogue. So when we advance the DR, we change that field and not the DR_OFFSET? That way we always keep the "original state". Cheers, Andre gcc/ChangeLog: 2019-10-29 Andre Vieira PR 88915 * tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration. * tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter and make the valueize function pointer also take a void pointer. * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap around vn_valueize, to call it without a context. (process_bb): Use vn_valueize_wrapper instead of vn_valueize. * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos. (~_loop_vec_info): Release epilogue_vinfos. (vect_analyze_loop_costing): Use knowledge of main VF to estimate number of iterations of epilogue. (vect_analyze_loop_2): Adapt to analyse main loop for all supported vector sizes when vect-epilogues-nomask=1. Also keep track of lowest versioning threshold needed for main loop. (vect_analyze_loop): Likewise. (find_in_mapping): New helper function. (update_epilogue_loop_vinfo): New function. (vect_transform_loop): When vectorizing epilogues re-use analysis done on main loop and call update_epilogue_loop_vinfo to update it. * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert stmts on loop preheader edge. (vect_do_peeling): Enable skip-vectors when doing loop versioning if we decided to vectorize epilogues. Update epilogues NITERS and construct ADVANCE to update epilogues data references where needed. * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos. (vect_do_peeling, vect_update_inits_of_drs, determine_peel_for_niter, vect_analyze_loop): Add or update declarations. * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already created loop_vec_info's for epilogues when available. Otherwise analyse epilogue separately. Cheers, Andre On 29/10/2019 11:48, Richard Biener wrote: > On Mon, 28 Oct 2019, Andre Vieira (lists) wrote: > >> Hi, >> >> Reworked according to your comments, see inline for clarification. >> >> Is this OK for trunk? > > + gimple_seq seq = STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo); > + while (seq) > + { > + stmt_worklist.safe_push (seq); > + seq = seq->next; > + } > > you're supposed to do to the following, not access the ->next > implementation detail: > > for (gimple_stmt_iterator gsi = gsi_start (seq); !gsi_end_p (gsi); > gsi_next (&gsi)) > stmt_worklist.safe_push (gsi_stmt (gsi)); > > > + /* Data references for gather loads and scatter stores do not use > the > + updated offset we set using ADVANCE. Instead we have to make > sure > the > + reference in the data references point to the corresponding copy > of > + the original in the epilogue. */ > + if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)) > + { > + DR_REF (dr) > + = simplify_replace_tree (DR_REF (dr), NULL_TREE, NULL_TREE, > + &find_in_mapping, &mapping); > + DR_BASE_ADDRESS (dr) > + = simplify_replace_tree (DR_BASE_ADDRESS (dr), NULL_TREE, > NULL_TREE, > + &find_in_mapping, &mapping); > + } > > Hmm. So for other DRs we account for the previous vector loop > by adjusting DR_OFFSET? But STMT_VINFO_GATHER_SCATTER_P ends up > using (unconditionally) DR_REF here? In that case it seems > best to adjust DR_REF only but NULL out DR_BASE_ADDRESS and > DR_OFFSET? I wonder how prologue peeling deals with > STMT_VINFO_GATHER_SCATTER_P ... I see the caller of > vect_update_init_of_dr there does nothing for STMT_VINFO_GATHER_SCATTER_P. > > I wonder if (as followup to not delay this further) we can > "offload" all the DR adjustment by storing ADVANCE in dr_vec_info > and accounting for that when we create the dataref pointers in > vectorizable_load/store? That way we could avoid saving/restoring > DR_OFFSET as well. > > So, the patch is OK with the sequence iteration fixed. I think > sorting out the above can be done as followup. > > Thanks, > Richard. > >> gcc/ChangeLog: >> 2019-10-28 Andre Vieira >> >> PR 88915 >> * tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration. >> * tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter >> and make the valueize function pointer also take a void pointer. >> * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap >> around vn_valueize, to call it without a context. >> (process_bb): Use vn_valueize_wrapper instead of vn_valueize. >> * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos. >> (~_loop_vec_info): Release epilogue_vinfos. >> (vect_analyze_loop_costing): Use knowledge of main VF to estimate >> number of iterations of epilogue. >> (vect_analyze_loop_2): Adapt to analyse main loop for all supported >> vector sizes when vect-epilogues-nomask=1. Also keep track of lowest >> versioning threshold needed for main loop. >> (vect_analyze_loop): Likewise. >> (find_in_mapping): New helper function. >> (update_epilogue_loop_vinfo): New function. >> (vect_transform_loop): When vectorizing epilogues re-use analysis done >> on main loop and call update_epilogue_loop_vinfo to update it. >> * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert >> stmts on loop preheader edge. >> (vect_do_peeling): Enable skip-vectors when doing loop versioning if >> we decided to vectorize epilogues. Update epilogues NITERS and >> construct ADVANCE to update epilogues data references where needed. >> * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos. >> (vect_do_peeling, vect_update_inits_of_drs, >> determine_peel_for_niter, vect_analyze_loop): Add or update declarations. >> * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already >> created loop_vec_info's for epilogues when available. Otherwise >> analyse >> epilogue separately. >> >> >> >> Cheers, >> Andre >> >> On 28/10/2019 14:16, Richard Biener wrote: >>> On Fri, 25 Oct 2019, Andre Vieira (lists) wrote: >>> >>>> Hi, >>>> >>>> This is the reworked patch after your comments. >>>> >>>> I have moved the epilogue check into the analysis form disguised under >>>> '!epilogue_vinfos.is_empty ()'. This because I realized that I am doing >>>> the >>>> "lowest threshold" check there. >>>> >>>> The only place where we may reject an epilogue_vinfo is when we know the >>>> number of scalar iterations and we realize the number of iterations left >>>> after >>>> the main loop are not enough to enter the vectorized epilogue so we >>>> optimize >>>> away that code-gen. The only way we know this to be true is if the number >>>> of >>>> scalar iterations are known and the peeling for alignment is known. So we >>>> know >>>> we will enter the main loop regardless, so whether the threshold we use is >>>> for >>>> a lower VF or not it shouldn't matter as much, I would even like to think >>>> that >>>> check isn't done, but I am not sure... Might be worth checking as an >>>> optimization. >>>> >>>> >>>> Is this OK for trunk? >>> >>> + for (epilogue_phi_gsi = gsi_start_phis (epilogue_bbs[i]); >>> + !gsi_end_p (epilogue_phi_gsi); gsi_next (&epilogue_phi_gsi)) >>> + { >>> .. >>> + if (STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo)) >>> + pattern_worklist.safe_push (stmt_vinfo); >>> + >>> + related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo); >>> + while (related_vinfo && related_vinfo != stmt_vinfo) >>> + { >>> >>> I think PHIs cannot have patterns. You can assert >>> that STMT_VINFO_RELATED_STMT is NULL I think. >> >> Done. >>> >>> + related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo); >>> + while (related_vinfo && related_vinfo != stmt_vinfo) >>> + { >>> + related_worklist.safe_push (related_vinfo); >>> + /* Set BB such that the assert in >>> + 'get_initial_def_for_reduction' is able to determine that >>> + the BB of the related stmt is inside this loop. */ >>> + gimple_set_bb (STMT_VINFO_STMT (related_vinfo), >>> + gimple_bb (new_stmt)); >>> + related_vinfo = STMT_VINFO_RELATED_STMT (related_vinfo); >>> + } >>> >>> do we really keep references to "nested" patterns? Thus, do you >>> need this loop? >> >> Changed and added asserts. They didn't trigger so I suppose you are right, I >> didn't know at the time whether it was possible, so I just operated on the >> side of caution. Can remove the asserts and so on if you want. >>> >>> + /* The PATTERN_DEF_SEQs in the epilogue were constructed using the >>> + original main loop and thus need to be updated to refer to the >>> cloned >>> + variables used in the epilogue. */ >>> + for (unsigned i = 0; i < pattern_worklist.length (); ++i) >>> + { >>> ... >>> + op = simplify_replace_tree (op, NULL_TREE, NULL_TREE, >>> + &find_in_mapping, &mapping); >>> + gimple_set_op (seq, j, op); >>> >>> you do this for the pattern-def seq but not for the related one. >>> I guess you ran into this for COND_EXPR conditions. I wondered >>> to use a shared worklist for both the def-seq and the main pattern >>> stmt or at least to split out the replacement so you can share it. >> >> I think that was it yeah, reworked it now to use the same list. Less code, >> thanks! >>> >>> + /* Data references for gather loads and scatter stores do not use >>> the >>> + updated offset we set using ADVANCE. Instead we have to make >>> sure the >>> + reference in the data references point to the corresponding copy >>> of >>> + the original in the epilogue. */ >>> + if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)) >>> + { >>> + int j; >>> + if (TREE_CODE (DR_REF (dr)) == MEM_REF) >>> + j = 0; >>> + else if (TREE_CODE (DR_REF (dr)) == ARRAY_REF) >>> + j = 1; >>> + else >>> + gcc_unreachable (); >>> + >>> + if (tree *new_op = mapping.get (TREE_OPERAND (DR_REF (dr), j))) >>> + { >>> + DR_REF (dr) = unshare_expr (DR_REF (dr)); >>> + TREE_OPERAND (DR_REF (dr), j) = *new_op; >>> + } >>> >>> huh, do you really only ever see MEM_REF or ARRAY_REF here? >>> I would guess using simplify_replace_tree is safer. >>> There's also DR_BASE_ADDRESS - we seem to leave the DRs partially >>> updated, is that correct? >> >> Yeah can use simplify_replace_tree indeed. And I have changed it so it >> updates DR_BASE_ADDRESS. I think DR_BASE_ADDRESS never actually changed in >> the way we use data_references... Either way, replacing them if they do change >> is cleaner and more future proof. >>> >>> Otherwise looks OK to me. >>> >>> Thanks, >>> Richard. >>> >>> >>>> gcc/ChangeLog: >>>> 2019-10-25 Andre Vieira >>>> >>>> PR 88915 >>>> * tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration. >>>> * tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter >>>> and make the valueize function pointer also take a void pointer. >>>> * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap >>>> around vn_valueize, to call it without a context. >>>> (process_bb): Use vn_valueize_wrapper instead of vn_valueize. >>>> * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos. >>>> (~_loop_vec_info): Release epilogue_vinfos. >>>> (vect_analyze_loop_costing): Use knowledge of main VF to estimate >>>> number of iterations of epilogue. >>>> (vect_analyze_loop_2): Adapt to analyse main loop for all supported >>>> vector sizes when vect-epilogues-nomask=1. Also keep track of lowest >>>> versioning threshold needed for main loop. >>>> (vect_analyze_loop): Likewise. >>>> (find_in_mapping): New helper function. >>>> (update_epilogue_loop_vinfo): New function. >>>> (vect_transform_loop): When vectorizing epilogues re-use analysis done >>>> on main loop and call update_epilogue_loop_vinfo to update it. >>>> * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert >>>> stmts on loop preheader edge. >>>> (vect_do_peeling): Enable skip-vectors when doing loop versioning if >>>> we decided to vectorize epilogues. Update epilogues NITERS and >>>> construct ADVANCE to update epilogues data references where needed. >>>> * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos. >>>> (vect_do_peeling, vect_update_inits_of_drs, >>>> determine_peel_for_niter, vect_analyze_loop): Add or update >>>> declarations. >>>> * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already >>>> created loop_vec_info's for epilogues when available. Otherwise >>>> analyse >>>> epilogue separately. >>>> >>>> >>>> >>>> Cheers, >>>> Andre >>>> >>> >> >> >