* Duplicating loops and virtual phis @ 2017-05-12 20:42 Steve Ellcey 2017-05-13 6:18 ` Richard Biener 0 siblings, 1 reply; 8+ messages in thread From: Steve Ellcey @ 2017-05-12 20:42 UTC (permalink / raw) To: gcc [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 2148 bytes --] (Short version of this email, is there a way to recalculate/rebuild virtual phi nodes after modifying the CFG.) I have a question about duplicating loops and virtual phi nodes. I am trying to implement the following optimization as a pass: Transform: for (i = 0; i < n; i++) { A[i] = A[i] + B[i]; C[i] = C[i-1] + D[i]; } Into: if (noalias between A&B, A&C, A&D) for (i = 0; i < 100; i++) A[i] = A[i] + B[i]; for (i = 0; i < 100; i++) C[i] = C[i-1] + D[i]; else for (i = 0; i < 100; i++) { A[i] = A[i] + B[i]; C[i] = C[i-1] + D[i]; } Right now the vectorizer sees that 'C[i] = C[i-1] + D[i];' cannot be vectorized so it gives up and does not vectorize the loop. If we split up the loop into two loops then the vector add with A[i] could be vectorized even if the one with C[i] could not. Currently I can introduce the first 'if' that checks for aliasing by using loop_version() and that seems to work OK. (My actual compare for aliasing is actually just an approximation for now.) Where I am running into problems is with splitting up the single loop under the noalias if condition into two sequential loops (which I then intend to 'thin out' by removing one or the other set of instructions. I am using slpeel_tree_duplicate_loop_to_edge_cfg() for that loop duplication and while I get the CFG I want, the pass ends with verify_ssa failing due to bad virtual/MEM PHI nodes. Perhaps there is a different function that I should use duplicate the loop. a.c: In function âfooâ: a.c:2:5: error: PHI node with wrong VUSE on edge from BB 13 int foo(int *a, int *b, int *c, int *d, int n) ^~~ .MEM_40 = PHI <.MEM_15(D)(13), .MEM_34(9)> expected .MEM_58 a.c:2:5: internal compiler error: verify_ssa failed I have tried to fix up the PHI node by hand using SET_PHI_ARG_DEF but have not had any luck. I was wondering if there was any kind of 'update all the phi nodes' function or just a 'update the virtual phi nodes' function. The non-virtual PHI nodes seem to be OK, it is just the virtual ones that seem wrong after I duplicate the loop into two consecutive loops. Steve Ellcey sellcey@cavium.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Duplicating loops and virtual phis 2017-05-12 20:42 Duplicating loops and virtual phis Steve Ellcey @ 2017-05-13 6:18 ` Richard Biener 2017-05-15 16:56 ` Steve Ellcey 0 siblings, 1 reply; 8+ messages in thread From: Richard Biener @ 2017-05-13 6:18 UTC (permalink / raw) To: sellcey, Steve Ellcey, gcc On May 12, 2017 10:42:34 PM GMT+02:00, Steve Ellcey <sellcey@cavium.com> wrote: >(Short version of this email, is there a way to recalculate/rebuild >virtual >phi nodes after modifying the CFG.) > >I have a question about duplicating loops and virtual phi nodes. >I am trying to implement the following optimization as a pass: > >Transform: > > for (i = 0; i < n; i++) { > A[i] = A[i] + B[i]; > C[i] = C[i-1] + D[i]; > } > >Into: > > if (noalias between A&B, A&C, A&D) > for (i = 0; i < 100; i++) > A[i] = A[i] + B[i]; > for (i = 0; i < 100; i++) > C[i] = C[i-1] + D[i]; > else > for (i = 0; i < 100; i++) { > A[i] = A[i] + B[i]; > C[i] = C[i-1] + D[i]; > } > >Right now the vectorizer sees that 'C[i] = C[i-1] + D[i];' cannot be >vectorized so it gives up and does not vectorize the loop. If we split >up the loop into two loops then the vector add with A[i] could be >vectorized >even if the one with C[i] could not. Loop distribution does this transform but it doesn't know about versioning for unknown dependences. >Currently I can introduce the first 'if' that checks for aliasing by >using loop_version() and that seems to work OK. (My actual compare >for aliasing is actually just an approximation for now.) > >Where I am running into problems is with splitting up the single loop >under the noalias if condition into two sequential loops (which I then >intend to 'thin out' by removing one or the other set of instructions. >I am using slpeel_tree_duplicate_loop_to_edge_cfg() for that loop >duplication >and while I get the CFG I want, the pass ends with verify_ssa failing >due >to bad virtual/MEM PHI nodes. Perhaps there is a different function >that >I should use duplicate the loop. >a.c: In function ���foo���: >a.c:2:5: error: PHI node with wrong VUSE on edge from BB 13 > int foo(int *a, int *b, int *c, int *d, int n) > ^~~ >.MEM_40 = PHI <.MEM_15(D)(13), .MEM_34(9)> >expected .MEM_58 >a.c:2:5: internal compiler error: verify_ssa failed > >I have tried to fix up the PHI node by hand using SET_PHI_ARG_DEF but >have not had any luck. I was wondering if there was any kind of >'update all the phi nodes' function or just a 'update the virtual phi >nodes' function. The non-virtual PHI nodes seem to be OK, it is just >the virtual ones that seem wrong after I duplicate the loop into two >consecutive loops. > >Steve Ellcey >sellcey@cavium.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Duplicating loops and virtual phis 2017-05-13 6:18 ` Richard Biener @ 2017-05-15 16:56 ` Steve Ellcey 2017-05-15 18:32 ` Richard Biener 0 siblings, 1 reply; 8+ messages in thread From: Steve Ellcey @ 2017-05-15 16:56 UTC (permalink / raw) To: Richard Biener, gcc On Sat, 2017-05-13 at 08:18 +0200, Richard Biener wrote: > On May 12, 2017 10:42:34 PM GMT+02:00, Steve Ellcey <sellcey@cavium.c > om> wrote: > > > > (Short version of this email, is there a way to recalculate/rebuild > > virtual > > phi nodes after modifying the CFG.) > > > > I have a question about duplicating loops and virtual phi nodes. > > I am trying to implement the following optimization as a pass: > > > > Transform: > > > >  for (i = 0; i < n; i++) { > > A[i] = A[i] + B[i]; > > C[i] = C[i-1] + D[i]; > >  } > > > > Into: > > > >  if (noalias between A&B, A&C, A&D) > > for (i = 0; i < 100; i++) > > A[i] = A[i] + B[i]; > > for (i = 0; i < 100; i++) > > C[i] = C[i-1] + D[i]; > >  else > > for (i = 0; i < 100; i++) { > > A[i] = A[i] + B[i]; > > C[i] = C[i-1] + D[i]; > > } > > > > Right now the vectorizer sees that 'C[i] = C[i-1] + D[i];' cannot be > > vectorized so it gives up and does not vectorize the loop.  If we split > > up the loop into two loops then the vector add with A[i] could be > > vectorized > > even if the one with C[i] could not. > Loop distribution does this transform but it doesn't know about > versioning for unknown dependences. > Yes, I looked at loop distribution.  But it only works with global arrays and not with pointer arguments where it doesn't know the size of the array being pointed at.  I would like to be able to have it work with pointer arguments.  If I call a function with 2 or more integer pointers, and I have a loop that accesses them with offsets between 0 and N where N is loop invariant then I should have enough information (at runtime) to determine if there are overlapping memory accesses through the pointers and determine whether or not I can distribute the loop. The loop splitting code seemed like a better template since it already knows how to split a loop based on a runtime determined condition. That part seems to be working for me, it is when I try to distribute/duplicate one of those loops (under the unaliased condition) that I am running into the problem with virtual PHIs. Steve Ellcey sellcey@cavium.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Duplicating loops and virtual phis 2017-05-15 16:56 ` Steve Ellcey @ 2017-05-15 18:32 ` Richard Biener 2017-05-17 9:41 ` Bin.Cheng 0 siblings, 1 reply; 8+ messages in thread From: Richard Biener @ 2017-05-15 18:32 UTC (permalink / raw) To: sellcey, Steve Ellcey, gcc On May 15, 2017 6:56:53 PM GMT+02:00, Steve Ellcey <sellcey@cavium.com> wrote: >On Sat, 2017-05-13 at 08:18 +0200, Richard Biener wrote: >> On May 12, 2017 10:42:34 PM GMT+02:00, Steve Ellcey <sellcey@cavium.c >> om> wrote: >> > >> > (Short version of this email, is there a way to recalculate/rebuild >> > virtual >> > phi nodes after modifying the CFG.) >> > >> > I have a question about duplicating loops and virtual phi nodes. >> > I am trying to implement the following optimization as a pass: >> > >> > Transform: >> > >> > for (i = 0; i < n; i++) { >> > A[i] = A[i] + B[i]; >> > C[i] = C[i-1] + D[i]; >> > } >> > >> > Into: >> > >> > if (noalias between A&B, A&C, A&D) >> > for (i = 0; i < 100; i++) >> > A[i] = A[i] + B[i]; >> > for (i = 0; i < 100; i++) >> > C[i] = C[i-1] + D[i]; >> > else >> > for (i = 0; i < 100; i++) { >> > A[i] = A[i] + B[i]; >> > C[i] = C[i-1] + D[i]; >> > } >> > >> > Right now the vectorizer sees that 'C[i] = C[i-1] + D[i];' cannot >be >> > vectorized so it gives up and does not vectorize the loop. If we >split >> > up the loop into two loops then the vector add with A[i] could be >> > vectorized >> > even if the one with C[i] could not. >> Loop distribution does this transform but it doesn't know about >> versioning for unknown dependences. >> > >Yes, I looked at loop distribution. But it only works with global >arrays and not with pointer arguments where it doesn't know the size of >the array being pointed at. I would like to be able to have it work >with pointer arguments. If I call a function with 2 or >more integer pointers, and I have a loop that accesses them with >offsets between 0 and N where N is loop invariant then I should have >enough information (at runtime) to determine if there are overlapping >memory accesses through the pointers and determine whether or not I can >distribute the loop. Not sure where you got that from. Loop distribution works with our data reference / dependence analysis. The cost model might be more restricted but that can be fixed. >The loop splitting code seemed like a better template since it already >knows how to split a loop based on a runtime determined condition. That >part seems to be working for me, it is when I try to >distribute/duplicate one of those loops (under the unaliased condition) >that I am running into the problem with virtual PHIs. There's mark_virtual*for_renaming (sp?). But as said you are performing loop distribution so please enhance the existing pass rather than writing a new one. Richard. >Steve Ellcey >sellcey@cavium.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Duplicating loops and virtual phis 2017-05-15 18:32 ` Richard Biener @ 2017-05-17 9:41 ` Bin.Cheng 2017-05-17 15:58 ` Steve Ellcey 2017-05-22 0:42 ` Kugan Vivekanandarajah 0 siblings, 2 replies; 8+ messages in thread From: Bin.Cheng @ 2017-05-17 9:41 UTC (permalink / raw) To: Richard Biener; +Cc: Steve Ellcey, gcc On Mon, May 15, 2017 at 7:32 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On May 15, 2017 6:56:53 PM GMT+02:00, Steve Ellcey <sellcey@cavium.com> wrote: >>On Sat, 2017-05-13 at 08:18 +0200, Richard Biener wrote: >>> On May 12, 2017 10:42:34 PM GMT+02:00, Steve Ellcey <sellcey@cavium.c >>> om> wrote: >>> > >>> > (Short version of this email, is there a way to recalculate/rebuild >>> > virtual >>> > phi nodes after modifying the CFG.) >>> > >>> > I have a question about duplicating loops and virtual phi nodes. >>> > I am trying to implement the following optimization as a pass: >>> > >>> > Transform: >>> > >>> > for (i = 0; i < n; i++) { >>> > A[i] = A[i] + B[i]; >>> > C[i] = C[i-1] + D[i]; >>> > } >>> > >>> > Into: >>> > >>> > if (noalias between A&B, A&C, A&D) >>> > for (i = 0; i < 100; i++) >>> > A[i] = A[i] + B[i]; >>> > for (i = 0; i < 100; i++) >>> > C[i] = C[i-1] + D[i]; >>> > else >>> > for (i = 0; i < 100; i++) { >>> > A[i] = A[i] + B[i]; >>> > C[i] = C[i-1] + D[i]; >>> > } >>> > >>> > Right now the vectorizer sees that 'C[i] = C[i-1] + D[i];' cannot >>be >>> > vectorized so it gives up and does not vectorize the loop. If we >>split >>> > up the loop into two loops then the vector add with A[i] could be >>> > vectorized >>> > even if the one with C[i] could not. >>> Loop distribution does this transform but it doesn't know about >>> versioning for unknown dependences. >>> >> >>Yes, I looked at loop distribution. But it only works with global >>arrays and not with pointer arguments where it doesn't know the size of >>the array being pointed at. I would like to be able to have it work >>with pointer arguments. If I call a function with 2 or >>more integer pointers, and I have a loop that accesses them with >>offsets between 0 and N where N is loop invariant then I should have >>enough information (at runtime) to determine if there are overlapping >>memory accesses through the pointers and determine whether or not I can >>distribute the loop. > > Not sure where you got that from. Loop distribution works with our data reference / dependence analysis. The cost model might be more restricted but that can be fixed. > >>The loop splitting code seemed like a better template since it already >>knows how to split a loop based on a runtime determined condition. That >>part seems to be working for me, it is when I try to >>distribute/duplicate one of those loops (under the unaliased condition) >>that I am running into the problem with virtual PHIs. > > There's mark_virtual*for_renaming (sp?). > > But as said you are performing loop distribution so please enhance the existing pass rather than writing a new one. I happen to be working on loop distribution now (If guess correctly, to get hmmer fixed). So far my idea is to fuse the finest distributed loop in two passes, in the first pass, we merge all SCCs due to "true" data dependence; in the second one we identify all SCCs and breaks them on dependent edges due to possible alias. Breaking SCCs with minimal edge set can be modeled as Feedback arc set problem which is NP-hard. Fortunately the problem is small in our case and there are approximation algorithms. OTOH, we should also improve loop distribution/fusion to maximize parallelism / minimize synchronization, as well as maximize data locality, but I think this is not needed to get hmmer vectorized. Thanks, bin > > Richard. > >>Steve Ellcey >>sellcey@cavium.com > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Duplicating loops and virtual phis 2017-05-17 9:41 ` Bin.Cheng @ 2017-05-17 15:58 ` Steve Ellcey 2017-05-22 0:42 ` Kugan Vivekanandarajah 1 sibling, 0 replies; 8+ messages in thread From: Steve Ellcey @ 2017-05-17 15:58 UTC (permalink / raw) To: Bin.Cheng, Richard Biener; +Cc: gcc On Wed, 2017-05-17 at 10:41 +0100, Bin.Cheng wrote: > I happen to be working on loop distribution now (If guess correctly, > to get hmmer fixed).  So far my idea is to fuse the finest > distributed > loop in two passes, in the first pass, we merge all SCCs due to > "true" > data dependence; in the second one we identify all SCCs and breaks > them on dependent edges due to possible alias.  Breaking SCCs with > minimal edge set can be modeled as Feedback arc set problem which is > NP-hard. Fortunately the problem is small in our case and there are > approximation algorithms.  OTOH, we should also improve loop > distribution/fusion to maximize parallelism / minimize > synchronization, as well as maximize data locality, but I think this > is not needed to get hmmer vectorized. Vectorizing hmmer is what I am interested in so I am glad to hear you are looking into that.  You are obviously more knowledgable about the GCC loop infrastructure then I am so I look forward to what you come up with. Steve Ellcey sellcey@cavium.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Duplicating loops and virtual phis 2017-05-17 9:41 ` Bin.Cheng 2017-05-17 15:58 ` Steve Ellcey @ 2017-05-22 0:42 ` Kugan Vivekanandarajah 2017-05-22 8:31 ` Bin.Cheng 1 sibling, 1 reply; 8+ messages in thread From: Kugan Vivekanandarajah @ 2017-05-22 0:42 UTC (permalink / raw) To: Bin.Cheng; +Cc: Richard Biener, Steve Ellcey, gcc Hi Bin and Steve, On 17 May 2017 at 19:41, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Mon, May 15, 2017 at 7:32 PM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On May 15, 2017 6:56:53 PM GMT+02:00, Steve Ellcey <sellcey@cavium.com> wrote: >>>On Sat, 2017-05-13 at 08:18 +0200, Richard Biener wrote: >>>> On May 12, 2017 10:42:34 PM GMT+02:00, Steve Ellcey <sellcey@cavium.c >>>> om> wrote: >>>> > >>>> > (Short version of this email, is there a way to recalculate/rebuild >>>> > virtual >>>> > phi nodes after modifying the CFG.) >>>> > >>>> > I have a question about duplicating loops and virtual phi nodes. >>>> > I am trying to implement the following optimization as a pass: >>>> > >>>> > Transform: >>>> > >>>> > for (i = 0; i < n; i++) { >>>> > A[i] = A[i] + B[i]; >>>> > C[i] = C[i-1] + D[i]; >>>> > } >>>> > >>>> > Into: >>>> > >>>> > if (noalias between A&B, A&C, A&D) >>>> > for (i = 0; i < 100; i++) >>>> > A[i] = A[i] + B[i]; >>>> > for (i = 0; i < 100; i++) >>>> > C[i] = C[i-1] + D[i]; >>>> > else >>>> > for (i = 0; i < 100; i++) { >>>> > A[i] = A[i] + B[i]; >>>> > C[i] = C[i-1] + D[i]; >>>> > } >>>> > >>>> > Right now the vectorizer sees that 'C[i] = C[i-1] + D[i];' cannot >>>be >>>> > vectorized so it gives up and does not vectorize the loop. If we >>>split >>>> > up the loop into two loops then the vector add with A[i] could be >>>> > vectorized >>>> > even if the one with C[i] could not. >>>> Loop distribution does this transform but it doesn't know about >>>> versioning for unknown dependences. >>>> >>> >>>Yes, I looked at loop distribution. But it only works with global >>>arrays and not with pointer arguments where it doesn't know the size of >>>the array being pointed at. I would like to be able to have it work >>>with pointer arguments. If I call a function with 2 or >>>more integer pointers, and I have a loop that accesses them with >>>offsets between 0 and N where N is loop invariant then I should have >>>enough information (at runtime) to determine if there are overlapping >>>memory accesses through the pointers and determine whether or not I can >>>distribute the loop. >> >> Not sure where you got that from. Loop distribution works with our data reference / dependence analysis. The cost model might be more restricted but that can be fixed. >> >>>The loop splitting code seemed like a better template since it already >>>knows how to split a loop based on a runtime determined condition. That >>>part seems to be working for me, it is when I try to >>>distribute/duplicate one of those loops (under the unaliased condition) >>>that I am running into the problem with virtual PHIs. >> >> There's mark_virtual*for_renaming (sp?). >> >> But as said you are performing loop distribution so please enhance the existing pass rather than writing a new one. > I happen to be working on loop distribution now (If guess correctly, > to get hmmer fixed). So far my idea is to fuse the finest distributed > loop in two passes, in the first pass, we merge all SCCs due to "true" > data dependence; in the second one we identify all SCCs and breaks > them on dependent edges due to possible alias. Breaking SCCs with > minimal edge set can be modeled as Feedback arc set problem which is > NP-hard. Fortunately the problem is small in our case and there are > approximation algorithms. OTOH, we should also improve loop > distribution/fusion to maximize parallelism / minimize > synchronization, as well as maximize data locality, but I think this > is not needed to get hmmer vectorized. I am also looking into vectoring homer loop. Glad to know you are also looking at this problem and looking forward to seeing the patches. I have some experimental patches where I added the data reference that needs runtime checking to a list static int pg_add_dependence_edges (struct graph *rdg, vec<loop_p> loops, int dir, vec<data_reference_p> drs1, - vec<data_reference_p> drs2) + vec<data_reference_p> drs2, + vec<ddr_p> &ddrs, + bool runtime_alias_check) Then I am vesioning the main loop based on the condition generated from the runtime check. I have borrowed the logic from vectorizer (like pruning the list and generating the condition). I have neither verified nor benchmarked it enough yet. As I understand, we also should have some form of cost model where we should be able too see the data access patterns and decide if the distributed loops can be vectorized? Cost model in similar_memory_accesses also need to be relaxd based on the ability to vectorize distributed loops. Thanks, Kugan > Thanks, > bin >> >> Richard. >> >>>Steve Ellcey >>>sellcey@cavium.com >> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Duplicating loops and virtual phis 2017-05-22 0:42 ` Kugan Vivekanandarajah @ 2017-05-22 8:31 ` Bin.Cheng 0 siblings, 0 replies; 8+ messages in thread From: Bin.Cheng @ 2017-05-22 8:31 UTC (permalink / raw) To: Kugan Vivekanandarajah; +Cc: Richard Biener, Steve Ellcey, gcc On Mon, May 22, 2017 at 1:42 AM, Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org> wrote: > Hi Bin and Steve, > > On 17 May 2017 at 19:41, Bin.Cheng <amker.cheng@gmail.com> wrote: >> On Mon, May 15, 2017 at 7:32 PM, Richard Biener >> <richard.guenther@gmail.com> wrote: >>> On May 15, 2017 6:56:53 PM GMT+02:00, Steve Ellcey <sellcey@cavium.com> wrote: >>>>On Sat, 2017-05-13 at 08:18 +0200, Richard Biener wrote: >>>>> On May 12, 2017 10:42:34 PM GMT+02:00, Steve Ellcey <sellcey@cavium.c >>>>> om> wrote: >>>>> > >>>>> > (Short version of this email, is there a way to recalculate/rebuild >>>>> > virtual >>>>> > phi nodes after modifying the CFG.) >>>>> > >>>>> > I have a question about duplicating loops and virtual phi nodes. >>>>> > I am trying to implement the following optimization as a pass: >>>>> > >>>>> > Transform: >>>>> > >>>>> > for (i = 0; i < n; i++) { >>>>> > A[i] = A[i] + B[i]; >>>>> > C[i] = C[i-1] + D[i]; >>>>> > } >>>>> > >>>>> > Into: >>>>> > >>>>> > if (noalias between A&B, A&C, A&D) >>>>> > for (i = 0; i < 100; i++) >>>>> > A[i] = A[i] + B[i]; >>>>> > for (i = 0; i < 100; i++) >>>>> > C[i] = C[i-1] + D[i]; >>>>> > else >>>>> > for (i = 0; i < 100; i++) { >>>>> > A[i] = A[i] + B[i]; >>>>> > C[i] = C[i-1] + D[i]; >>>>> > } >>>>> > >>>>> > Right now the vectorizer sees that 'C[i] = C[i-1] + D[i];' cannot >>>>be >>>>> > vectorized so it gives up and does not vectorize the loop. If we >>>>split >>>>> > up the loop into two loops then the vector add with A[i] could be >>>>> > vectorized >>>>> > even if the one with C[i] could not. >>>>> Loop distribution does this transform but it doesn't know about >>>>> versioning for unknown dependences. >>>>> >>>> >>>>Yes, I looked at loop distribution. But it only works with global >>>>arrays and not with pointer arguments where it doesn't know the size of >>>>the array being pointed at. I would like to be able to have it work >>>>with pointer arguments. If I call a function with 2 or >>>>more integer pointers, and I have a loop that accesses them with >>>>offsets between 0 and N where N is loop invariant then I should have >>>>enough information (at runtime) to determine if there are overlapping >>>>memory accesses through the pointers and determine whether or not I can >>>>distribute the loop. >>> >>> Not sure where you got that from. Loop distribution works with our data reference / dependence analysis. The cost model might be more restricted but that can be fixed. >>> >>>>The loop splitting code seemed like a better template since it already >>>>knows how to split a loop based on a runtime determined condition. That >>>>part seems to be working for me, it is when I try to >>>>distribute/duplicate one of those loops (under the unaliased condition) >>>>that I am running into the problem with virtual PHIs. >>> >>> There's mark_virtual*for_renaming (sp?). >>> >>> But as said you are performing loop distribution so please enhance the existing pass rather than writing a new one. >> I happen to be working on loop distribution now (If guess correctly, >> to get hmmer fixed). So far my idea is to fuse the finest distributed >> loop in two passes, in the first pass, we merge all SCCs due to "true" >> data dependence; in the second one we identify all SCCs and breaks >> them on dependent edges due to possible alias. Breaking SCCs with >> minimal edge set can be modeled as Feedback arc set problem which is >> NP-hard. Fortunately the problem is small in our case and there are >> approximation algorithms. OTOH, we should also improve loop >> distribution/fusion to maximize parallelism / minimize >> synchronization, as well as maximize data locality, but I think this >> is not needed to get hmmer vectorized. > > I am also looking into vectoring homer loop. Glad to know you are also > looking at this problem and looking forward to seeing the patches. > > I have some experimental patches where I added the data reference that > needs runtime checking to a list > > static int > pg_add_dependence_edges (struct graph *rdg, vec<loop_p> loops, int dir, > vec<data_reference_p> drs1, > - vec<data_reference_p> drs2) > + vec<data_reference_p> drs2, > + vec<ddr_p> &ddrs, > + bool runtime_alias_check) > > Then I am vesioning the main loop based on the condition generated > from the runtime check. I have borrowed the logic from vectorizer > (like pruning the list and generating the condition). I have neither Yes, I am doing experiments factor runtime alias check out from vectorizer as general interfaces. Also I need to fix negative step bug in current runtime alias check. > verified nor benchmarked it enough yet. > > As I understand, we also should have some form of cost model where we > should be able too see the data access patterns and decide if the > distributed loops can be vectorized? As the start, the idea is to version loop as gcc does in ifcvt, we can revert the versioning later if distributed loop can't be vectorized (or optimized in other significant way). In case of hmmer, it's also important to eliminate number of dead stores under condition of no-alias. > > Cost model in similar_memory_accesses also need to be relaxd based on > the ability to vectorize distributed loops. Yes, better cost model for fusion is needed in general. Thanks, bin > > Thanks, > Kugan > > >> Thanks, >> bin >>> >>> Richard. >>> >>>>Steve Ellcey >>>>sellcey@cavium.com >>> ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-05-22 8:31 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-05-12 20:42 Duplicating loops and virtual phis Steve Ellcey 2017-05-13 6:18 ` Richard Biener 2017-05-15 16:56 ` Steve Ellcey 2017-05-15 18:32 ` Richard Biener 2017-05-17 9:41 ` Bin.Cheng 2017-05-17 15:58 ` Steve Ellcey 2017-05-22 0:42 ` Kugan Vivekanandarajah 2017-05-22 8:31 ` Bin.Cheng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).