On Tue, 29 Oct 2019, Andre Vieira (lists) wrote: > Hi richi, > > > Committed the patch with the suggested change in revision r277569. > > I will look at the follow-up, from what I understood the idea was to add a > field to the dr_vec_info where we keep the "extra-offset" that represents the > position in the DR wrt the current loop/epilogue. So when we advance the DR, > we change that field and not the DR_OFFSET? That way we always keep the > "original state". Yes. That would apply to the adjustment done by prologue peeling as well. Still wonder how gather/scatter get by not having any adjustment ;) Ah, I guess they simply are not affected by peeling since they get their DR_OFFSET & friends computed in the loop (loaded from memory). But DR_REF needs adjustment because it tries to walk def-stmts (repeatedly in both analysis and transform phase). Richard. > Cheers, > Andre > > gcc/ChangeLog: > 2019-10-29 Andre Vieira > > PR 88915 > * tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration. > * tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter > and make the valueize function pointer also take a void pointer. > * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap > around vn_valueize, to call it without a context. > (process_bb): Use vn_valueize_wrapper instead of vn_valueize. > * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos. > (~_loop_vec_info): Release epilogue_vinfos. > (vect_analyze_loop_costing): Use knowledge of main VF to estimate > number of iterations of epilogue. > (vect_analyze_loop_2): Adapt to analyse main loop for all supported > vector sizes when vect-epilogues-nomask=1. Also keep track of lowest > versioning threshold needed for main loop. > (vect_analyze_loop): Likewise. > (find_in_mapping): New helper function. > (update_epilogue_loop_vinfo): New function. > (vect_transform_loop): When vectorizing epilogues re-use analysis done > on main loop and call update_epilogue_loop_vinfo to update it. > * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert > stmts on loop preheader edge. > (vect_do_peeling): Enable skip-vectors when doing loop versioning if > we decided to vectorize epilogues. Update epilogues NITERS and > construct ADVANCE to update epilogues data references where needed. > * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos. > (vect_do_peeling, vect_update_inits_of_drs, > determine_peel_for_niter, vect_analyze_loop): Add or update declarations. > * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already > created loop_vec_info's for epilogues when available. Otherwise > analyse > epilogue separately. > > > > Cheers, > Andre > > On 29/10/2019 11:48, Richard Biener wrote: > > On Mon, 28 Oct 2019, Andre Vieira (lists) wrote: > > > >> Hi, > >> > >> Reworked according to your comments, see inline for clarification. > >> > >> Is this OK for trunk? > > > > + gimple_seq seq = STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo); > > + while (seq) > > + { > > + stmt_worklist.safe_push (seq); > > + seq = seq->next; > > + } > > > > you're supposed to do to the following, not access the ->next > > implementation detail: > > > > for (gimple_stmt_iterator gsi = gsi_start (seq); !gsi_end_p (gsi); > > gsi_next (&gsi)) > > stmt_worklist.safe_push (gsi_stmt (gsi)); > > > > > > + /* Data references for gather loads and scatter stores do not use > > the > > + updated offset we set using ADVANCE. Instead we have to make > > sure > > the > > + reference in the data references point to the corresponding copy > > of > > + the original in the epilogue. */ > > + if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)) > > + { > > + DR_REF (dr) > > + = simplify_replace_tree (DR_REF (dr), NULL_TREE, NULL_TREE, > > + &find_in_mapping, &mapping); > > + DR_BASE_ADDRESS (dr) > > + = simplify_replace_tree (DR_BASE_ADDRESS (dr), NULL_TREE, > > NULL_TREE, > > + &find_in_mapping, &mapping); > > + } > > > > Hmm. So for other DRs we account for the previous vector loop > > by adjusting DR_OFFSET? But STMT_VINFO_GATHER_SCATTER_P ends up > > using (unconditionally) DR_REF here? In that case it seems > > best to adjust DR_REF only but NULL out DR_BASE_ADDRESS and > > DR_OFFSET? I wonder how prologue peeling deals with > > STMT_VINFO_GATHER_SCATTER_P ... I see the caller of > > vect_update_init_of_dr there does nothing for STMT_VINFO_GATHER_SCATTER_P. > > > > I wonder if (as followup to not delay this further) we can > > "offload" all the DR adjustment by storing ADVANCE in dr_vec_info > > and accounting for that when we create the dataref pointers in > > vectorizable_load/store? That way we could avoid saving/restoring > > DR_OFFSET as well. > > > > So, the patch is OK with the sequence iteration fixed. I think > > sorting out the above can be done as followup. > > > > Thanks, > > Richard. > > > >> gcc/ChangeLog: > >> 2019-10-28 Andre Vieira > >> > >> PR 88915 > >> * tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration. > >> * tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter > >> and make the valueize function pointer also take a void pointer. > >> * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap > >> around vn_valueize, to call it without a context. > >> (process_bb): Use vn_valueize_wrapper instead of vn_valueize. > >> * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos. > >> (~_loop_vec_info): Release epilogue_vinfos. > >> (vect_analyze_loop_costing): Use knowledge of main VF to estimate > >> number of iterations of epilogue. > >> (vect_analyze_loop_2): Adapt to analyse main loop for all supported > >> vector sizes when vect-epilogues-nomask=1. Also keep track of lowest > >> versioning threshold needed for main loop. > >> (vect_analyze_loop): Likewise. > >> (find_in_mapping): New helper function. > >> (update_epilogue_loop_vinfo): New function. > >> (vect_transform_loop): When vectorizing epilogues re-use analysis done > >> on main loop and call update_epilogue_loop_vinfo to update it. > >> * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert > >> stmts on loop preheader edge. > >> (vect_do_peeling): Enable skip-vectors when doing loop versioning if > >> we decided to vectorize epilogues. Update epilogues NITERS and > >> construct ADVANCE to update epilogues data references where needed. > >> * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos. > >> (vect_do_peeling, vect_update_inits_of_drs, > >> determine_peel_for_niter, vect_analyze_loop): Add or update > >> declarations. > >> * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already > >> created loop_vec_info's for epilogues when available. Otherwise > >> analyse > >> epilogue separately. > >> > >> > >> > >> Cheers, > >> Andre > >> > >> On 28/10/2019 14:16, Richard Biener wrote: > >>> On Fri, 25 Oct 2019, Andre Vieira (lists) wrote: > >>> > >>>> Hi, > >>>> > >>>> This is the reworked patch after your comments. > >>>> > >>>> I have moved the epilogue check into the analysis form disguised under > >>>> '!epilogue_vinfos.is_empty ()'. This because I realized that I am doing > >>>> the > >>>> "lowest threshold" check there. > >>>> > >>>> The only place where we may reject an epilogue_vinfo is when we know the > >>>> number of scalar iterations and we realize the number of iterations left > >>>> after > >>>> the main loop are not enough to enter the vectorized epilogue so we > >>>> optimize > >>>> away that code-gen. The only way we know this to be true is if the > >>>> number > >>>> of > >>>> scalar iterations are known and the peeling for alignment is known. So we > >>>> know > >>>> we will enter the main loop regardless, so whether the threshold we use > >>>> is > >>>> for > >>>> a lower VF or not it shouldn't matter as much, I would even like to think > >>>> that > >>>> check isn't done, but I am not sure... Might be worth checking as an > >>>> optimization. > >>>> > >>>> > >>>> Is this OK for trunk? > >>> > >>> + for (epilogue_phi_gsi = gsi_start_phis (epilogue_bbs[i]); > >>> + !gsi_end_p (epilogue_phi_gsi); gsi_next (&epilogue_phi_gsi)) > >>> + { > >>> .. > >>> + if (STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo)) > >>> + pattern_worklist.safe_push (stmt_vinfo); > >>> + > >>> + related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo); > >>> + while (related_vinfo && related_vinfo != stmt_vinfo) > >>> + { > >>> > >>> I think PHIs cannot have patterns. You can assert > >>> that STMT_VINFO_RELATED_STMT is NULL I think. > >> > >> Done. > >>> > >>> + related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo); > >>> + while (related_vinfo && related_vinfo != stmt_vinfo) > >>> + { > >>> + related_worklist.safe_push (related_vinfo); > >>> + /* Set BB such that the assert in > >>> + 'get_initial_def_for_reduction' is able to determine that > >>> + the BB of the related stmt is inside this loop. */ > >>> + gimple_set_bb (STMT_VINFO_STMT (related_vinfo), > >>> + gimple_bb (new_stmt)); > >>> + related_vinfo = STMT_VINFO_RELATED_STMT (related_vinfo); > >>> + } > >>> > >>> do we really keep references to "nested" patterns? Thus, do you > >>> need this loop? > >> > >> Changed and added asserts. They didn't trigger so I suppose you are right, > >> I > >> didn't know at the time whether it was possible, so I just operated on the > >> side of caution. Can remove the asserts and so on if you want. > >>> > >>> + /* The PATTERN_DEF_SEQs in the epilogue were constructed using the > >>> + original main loop and thus need to be updated to refer to the > >>> cloned > >>> + variables used in the epilogue. */ > >>> + for (unsigned i = 0; i < pattern_worklist.length (); ++i) > >>> + { > >>> ... > >>> + op = simplify_replace_tree (op, NULL_TREE, NULL_TREE, > >>> + &find_in_mapping, &mapping); > >>> + gimple_set_op (seq, j, op); > >>> > >>> you do this for the pattern-def seq but not for the related one. > >>> I guess you ran into this for COND_EXPR conditions. I wondered > >>> to use a shared worklist for both the def-seq and the main pattern > >>> stmt or at least to split out the replacement so you can share it. > >> > >> I think that was it yeah, reworked it now to use the same list. Less code, > >> thanks! > >>> > >>> + /* Data references for gather loads and scatter stores do not use > >>> the > >>> + updated offset we set using ADVANCE. Instead we have to make > >>> sure the > >>> + reference in the data references point to the corresponding copy > >>> of > >>> + the original in the epilogue. */ > >>> + if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)) > >>> + { > >>> + int j; > >>> + if (TREE_CODE (DR_REF (dr)) == MEM_REF) > >>> + j = 0; > >>> + else if (TREE_CODE (DR_REF (dr)) == ARRAY_REF) > >>> + j = 1; > >>> + else > >>> + gcc_unreachable (); > >>> + > >>> + if (tree *new_op = mapping.get (TREE_OPERAND (DR_REF (dr), j))) > >>> + { > >>> + DR_REF (dr) = unshare_expr (DR_REF (dr)); > >>> + TREE_OPERAND (DR_REF (dr), j) = *new_op; > >>> + } > >>> > >>> huh, do you really only ever see MEM_REF or ARRAY_REF here? > >>> I would guess using simplify_replace_tree is safer. > >>> There's also DR_BASE_ADDRESS - we seem to leave the DRs partially > >>> updated, is that correct? > >> > >> Yeah can use simplify_replace_tree indeed. And I have changed it so it > >> updates DR_BASE_ADDRESS. I think DR_BASE_ADDRESS never actually changed in > >> the way we use data_references... Either way, replacing them if they do > >> change > >> is cleaner and more future proof. > >>> > >>> Otherwise looks OK to me. > >>> > >>> Thanks, > >>> Richard. > >>> > >>> > >>>> gcc/ChangeLog: > >>>> 2019-10-25 Andre Vieira > >>>> > >>>> PR 88915 > >>>> * tree-ssa-loop-niter.h (simplify_replace_tree): Change > >>>> declaration. > >>>> * tree-ssa-loop-niter.c (simplify_replace_tree): Add context > >>>> parameter > >>>> and make the valueize function pointer also take a void pointer. > >>>> * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap > >>>> around vn_valueize, to call it without a context. > >>>> (process_bb): Use vn_valueize_wrapper instead of vn_valueize. > >>>> * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos. > >>>> (~_loop_vec_info): Release epilogue_vinfos. > >>>> (vect_analyze_loop_costing): Use knowledge of main VF to estimate > >>>> number of iterations of epilogue. > >>>> (vect_analyze_loop_2): Adapt to analyse main loop for all supported > >>>> vector sizes when vect-epilogues-nomask=1. Also keep track of > >>>> lowest > >>>> versioning threshold needed for main loop. > >>>> (vect_analyze_loop): Likewise. > >>>> (find_in_mapping): New helper function. > >>>> (update_epilogue_loop_vinfo): New function. > >>>> (vect_transform_loop): When vectorizing epilogues re-use analysis > >>>> done > >>>> on main loop and call update_epilogue_loop_vinfo to update it. > >>>> * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer > >>>> insert > >>>> stmts on loop preheader edge. > >>>> (vect_do_peeling): Enable skip-vectors when doing loop versioning > >>>> if > >>>> we decided to vectorize epilogues. Update epilogues NITERS and > >>>> construct ADVANCE to update epilogues data references where needed. > >>>> * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos. > >>>> (vect_do_peeling, vect_update_inits_of_drs, > >>>> determine_peel_for_niter, vect_analyze_loop): Add or update > >>>> declarations. > >>>> * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use > >>>> already > >>>> created loop_vec_info's for epilogues when available. Otherwise > >>>> analyse > >>>> epilogue separately. > >>>> > >>>> > >>>> > >>>> Cheers, > >>>> Andre > >>>> > >>> > >> > >> > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)