Ayal, OOPs, your mail skipped my inbox and I missed it for several days. Using testsuite/gcc.dg/sms-6.c as an example and compiling it for PowerPC, node 18 (see attachment) is in a SCC and cannot be scheduled until spliting twice. The MII = 20 and the schedule can only be found at II = 24. On our 2-issue VLIW, the MII=10 and the valid schedule can only be found at II = 14. It is not great since we want to maximize performance. I had experience (in development of another compiler) on this issue by constructing the MinDist matrix and using it to calculate schedule window for each instruction. However, speed was not a real concern then. Do you know better algorithm (better than O(N^3))? Or do you think it is not so crtical here? After all, not many loops are candidates for software pipelining. Thanks. Cheers, Bingfeng > -----Original Message----- > From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org] On > Behalf Of Ayal Zaks > Sent: 01 February 2009 23:18 > To: Bingfeng Mei > Cc: Adrian Ashley; gcc@gcc.gnu.org > Subject: Re: Solve transitive closure issue in modulo scheduling > > "Bingfeng Mei" wrote on 30/01/2009 14:44:01: > > > Hello, > > I try to make modulo scheduling work more efficiently for our VLIW > target. I > > found one serious issue that prevents current SMS algorithm from > achieving > > high IPC is so-called "transitive closure" problem, where scheduling > window is > > only calculated using direct predecessors and successors. > Because SMS is > not > > an iterative algorithm, this may cause failures in finding a valid > schedule. > > Agreed. > > > Without splitting rows, some simple loops just cannot be > scheduled not > matter > > how big the II is. With splitting rows, schedule can be > found, but only > at > > bigger II. > > It may happen that even splitting rows will not help, e.g. when we > repeatedly end up with a node having negative sched window. > > > GCC wiki (http://gcc.gnu.org/wiki/SwingModuloScheduling) lists this > > as a TODO. Is there any work going on about this issue > > No, not to my knowledge. We had some testcase where this > showed up, hence > its appearance in the TODO, but afaicr some change caused it > to disappear. > > > (the last wiki update > > was one year ago)? If no one is working on it, I plan to do > it. My idea > is to > > use the MinDist algorithm described in B. Rau's classic > paper "iterative > > modulo > scheduling" > (http://www.hpl.hp.com/techreports/94/HPL-94-115.html). The > > same algorithm can also be used to compute better RecMII. > The biggest > concern > > is complexity of computing MinDist matrix, which is O(N^3). > N is number > of > > nodes in the loop. I remember somewhere GCC coding guide says "never > write > > quadratic algorithm" :-) Is this an absolute requirement? > If yes, I will > keep > > it as our target-specific code (we are less concerned about > compilation > time). > > Otherwise, I will try to make it more generic to see if it > can make into > > mainline in 4.5. Any comments? > > > > The problem appears only when the DDG is cyclic, and for > every cycle in the > DDG, the problem may arise when trying to schedule the last > node of the > cycle, which has both predecessors and successors already > scheduled. So you > might try to connect only each such predecessor to every such > successor > with a transitive arc, to ensure that this last node will > have a non-empty > scheduling window. But this might not suffice; you may > eventually need to > wire (nearly) every pair of nodes in the strongly connected > component. If > this happens, you'd be better off with a dense graph representation > (incidence matrix) than the current sparse one (adjaceny lists). > > An example would help see things more clearly. If you have a > (small :) DDG > demonstrating the need for transitive arcs, I'd be happy to > have a look and > advise what should be done. > > Ayal. > > > Cheers, > > Bingfeng Mei > > > > Broadcom UK > > > > > > >