Do we need to do a loop invariant motion after loop interchange ?

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Do we need to do a loop invariant motion after loop interchange ?
@ 2020-09-07  9:42 HAO CHEN GUI
  2020-09-07 11:21 ` Richard Biener
  2020-09-08 10:18 ` Bin.Cheng
  0 siblings, 2 replies; 6+ messages in thread
From: HAO CHEN GUI @ 2020-09-07  9:42 UTC (permalink / raw)
  To: gcc-patches

Hi,

I want to follow Lijia's work as I gained the performance benefit on 
some SPEC workloads by adding a im pass after loop interchange.  Could 
you send me the latest patches? I could do further testing. Thanks a lot.

https://gcc.gnu.org/pipermail/gcc/2020-February/232091.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Do we need to do a loop invariant motion after loop interchange ?
  2020-09-07  9:42 Do we need to do a loop invariant motion after loop interchange ? HAO CHEN GUI
@ 2020-09-07 11:21 ` Richard Biener
  2020-09-08 10:18 ` Bin.Cheng
  1 sibling, 0 replies; 6+ messages in thread
From: Richard Biener @ 2020-09-07 11:21 UTC (permalink / raw)
  To: HAO CHEN GUI, Martin Jambor; +Cc: gcc-patches

On Mon, Sep 7, 2020 at 11:43 AM HAO CHEN GUI via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
>
> I want to follow Lijia's work as I gained the performance benefit on
> some SPEC workloads by adding a im pass after loop interchange.  Could
> you send me the latest patches? I could do further testing. Thanks a lot.
>
> https://gcc.gnu.org/pipermail/gcc/2020-February/232091.html

Given the patches are quite complex I wonder if executing region-based LIM
from interchange makes more sense here.  Martin might want to work on this.

Richard.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Do we need to do a loop invariant motion after loop interchange ?
  2020-09-07  9:42 Do we need to do a loop invariant motion after loop interchange ? HAO CHEN GUI
  2020-09-07 11:21 ` Richard Biener
@ 2020-09-08 10:18 ` Bin.Cheng
  2020-09-22  2:30   ` HAO CHEN GUI
  1 sibling, 1 reply; 6+ messages in thread
From: Bin.Cheng @ 2020-09-08 10:18 UTC (permalink / raw)
  To: HAO CHEN GUI; +Cc: gcc-patches

On Mon, Sep 7, 2020 at 5:42 PM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote:
>
> Hi,
>
> I want to follow Lijia's work as I gained the performance benefit on
> some SPEC workloads by adding a im pass after loop interchange.  Could
> you send me the latest patches? I could do further testing. Thanks a lot.
Hi,
Hmm, not sure if this refers to me?  I only provided an example patch
(which isn't complete) before Lijia's.  Unfortunately I don't have any
latest patch about this either.
As Richard suggested, maybe you (if you work on this) can simplify the
implementation.  Anyway, we only need to hoist memory references here.

Thanks,
bin
>
> https://gcc.gnu.org/pipermail/gcc/2020-February/232091.html
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Do we need to do a loop invariant motion after loop interchange ?
  2020-09-08 10:18 ` Bin.Cheng
@ 2020-09-22  2:30   ` HAO CHEN GUI
  2020-09-22  7:11     ` Richard Biener
  2020-09-22  8:24     ` Bin.Cheng
  0 siblings, 2 replies; 6+ messages in thread
From: HAO CHEN GUI @ 2020-09-22  2:30 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: gcc-patches

Bin,

I just tested your patch on current trunk.  Here is my summary.

1. About some iv aren't moved out of inner loop (Lijia mentioned in his 
last email)

  <bb 13> [local count: 955630226]:
   # l_32 = PHI <1(12), l_54(21)>
   # ivtmp_165 = PHI <_446(12), ivtmp_155(21)>
   _26 = (integer(kind=8)) l_32;
   _27 = _25 + _26;
   y__I_lsm.119_136 = (*y_135(D))[_27];
   y__I_lsm.119_90 = m_55 != 1 ? y__I_lsm.119_136 : 0.0;
   _37 = _36 * stride.88_111;
   _38 = _35 + _37;
   _39 = _26 + _38;
   _40 = (*a_137(D))[_39];

The offset _39 is not loop independent as it relies on _26. But _38 and 
_37 should be loop independent. So Lijia thought they should be moved 
out of loop.

I checked the following pass and found that these  statements are 
eliminated after vertorizing and dce.

In vect dump,

simple.F:27:23: note:  ------>vectorizing statement: _37 = _36 * 
stride.88_111;
simple.F:27:23: note:  ------>vectorizing statement: _38 = _35 + _37;
simple.F:27:23: note:  ------>vectorizing statement: _39 = _26 + _38;
simple.F:27:23: note:  ------>vectorizing statement: _40 = (*a_137(D))[_39];
simple.F:27:23: note:  transform statement.
simple.F:27:23: note:  transform load. ncopies = 1
simple.F:27:23: note:  create vector_type-pointer variable to type: 
vector(2) real(kind=8)  vectorizing an array ref: (*a_137(D))
simple.F:27:23: note:  created vectp_a.131_383
simple.F:27:23: note:  add new stmt: vect__40.132_374 = MEM <vector(2) 
real(kind=8)> [(real(kind=8) *)vectp_a.130_376];

In dce dump,

Deleting : _39 = _26 + _38;

Deleting : _38 = _35 + _37;

Deleting : _37 = _36 * stride.88_111;

So it's reasonable to only consider data reference after loop 
interchange. Other statements may be eliminated or be moved out of loop 
in last lim pass if they're real expensive.

2. I tested the SPEC on powerpc64le-linux-gnu. 503.bwaves_r got 6.77% 
performance improvement with this patch. It has no impact on other 
benchmarks.

3. The patch passed bootstrapped and regression test on 
powerpc64le-linux-gnu.

I think the patch works fine. Could you please add it into trunk? Thanks 
a lot.


On 8/9/2020 下午 6:18, Bin.Cheng wrote:
> On Mon, Sep 7, 2020 at 5:42 PM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote:
>> Hi,
>>
>> I want to follow Lijia's work as I gained the performance benefit on
>> some SPEC workloads by adding a im pass after loop interchange.  Could
>> you send me the latest patches? I could do further testing. Thanks a lot.
> Hi,
> Hmm, not sure if this refers to me?  I only provided an example patch
> (which isn't complete) before Lijia's.  Unfortunately I don't have any
> latest patch about this either.
> As Richard suggested, maybe you (if you work on this) can simplify the
> implementation.  Anyway, we only need to hoist memory references here.
>
> Thanks,
> bin
>> https://gcc.gnu.org/pipermail/gcc/2020-February/232091.html
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Do we need to do a loop invariant motion after loop interchange ?
  2020-09-22  2:30   ` HAO CHEN GUI
@ 2020-09-22  7:11     ` Richard Biener
  2020-09-22  8:24     ` Bin.Cheng
  1 sibling, 0 replies; 6+ messages in thread
From: Richard Biener @ 2020-09-22  7:11 UTC (permalink / raw)
  To: HAO CHEN GUI, Martin Jambor; +Cc: Bin.Cheng, gcc-patches

On Tue, Sep 22, 2020 at 4:31 AM HAO CHEN GUI via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Bin,
>
> I just tested your patch on current trunk.  Here is my summary.
>
> 1. About some iv aren't moved out of inner loop (Lijia mentioned in his
> last email)
>
>   <bb 13> [local count: 955630226]:
>    # l_32 = PHI <1(12), l_54(21)>
>    # ivtmp_165 = PHI <_446(12), ivtmp_155(21)>
>    _26 = (integer(kind=8)) l_32;
>    _27 = _25 + _26;
>    y__I_lsm.119_136 = (*y_135(D))[_27];
>    y__I_lsm.119_90 = m_55 != 1 ? y__I_lsm.119_136 : 0.0;
>    _37 = _36 * stride.88_111;
>    _38 = _35 + _37;
>    _39 = _26 + _38;
>    _40 = (*a_137(D))[_39];
>
> The offset _39 is not loop independent as it relies on _26. But _38 and
> _37 should be loop independent. So Lijia thought they should be moved
> out of loop.
>
> I checked the following pass and found that these  statements are
> eliminated after vertorizing and dce.
>
> In vect dump,
>
> simple.F:27:23: note:  ------>vectorizing statement: _37 = _36 *
> stride.88_111;
> simple.F:27:23: note:  ------>vectorizing statement: _38 = _35 + _37;
> simple.F:27:23: note:  ------>vectorizing statement: _39 = _26 + _38;
> simple.F:27:23: note:  ------>vectorizing statement: _40 = (*a_137(D))[_39];
> simple.F:27:23: note:  transform statement.
> simple.F:27:23: note:  transform load. ncopies = 1
> simple.F:27:23: note:  create vector_type-pointer variable to type:
> vector(2) real(kind=8)  vectorizing an array ref: (*a_137(D))
> simple.F:27:23: note:  created vectp_a.131_383
> simple.F:27:23: note:  add new stmt: vect__40.132_374 = MEM <vector(2)
> real(kind=8)> [(real(kind=8) *)vectp_a.130_376];
>
> In dce dump,
>
> Deleting : _39 = _26 + _38;
>
> Deleting : _38 = _35 + _37;
>
> Deleting : _37 = _36 * stride.88_111;
>
> So it's reasonable to only consider data reference after loop
> interchange. Other statements may be eliminated or be moved out of loop
> in last lim pass if they're real expensive.
>
> 2. I tested the SPEC on powerpc64le-linux-gnu. 503.bwaves_r got 6.77%
> performance improvement with this patch. It has no impact on other
> benchmarks.
>
> 3. The patch passed bootstrapped and regression test on
> powerpc64le-linux-gnu.
>
> I think the patch works fine. Could you please add it into trunk? Thanks
> a lot.

I'd like to see us instead apply the existing LIM code (w/o store-motion)
to the loop nests we applied interchange to instead of adding yet another
invariant motion code.  It should be reasonably "easy" to make the code
have the complexity of the region it operates on (again, without store-motion
which would introduce some complexities).

Btw, with an extra LIM pass scheduled after interchange I observed ~20%
improvement on bwaves for x86_64, mainly due to better induction variable
selection IIRC.

Richard.

>
> On 8/9/2020 下午 6:18, Bin.Cheng wrote:
> > On Mon, Sep 7, 2020 at 5:42 PM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote:
> >> Hi,
> >>
> >> I want to follow Lijia's work as I gained the performance benefit on
> >> some SPEC workloads by adding a im pass after loop interchange.  Could
> >> you send me the latest patches? I could do further testing. Thanks a lot.
> > Hi,
> > Hmm, not sure if this refers to me?  I only provided an example patch
> > (which isn't complete) before Lijia's.  Unfortunately I don't have any
> > latest patch about this either.
> > As Richard suggested, maybe you (if you work on this) can simplify the
> > implementation.  Anyway, we only need to hoist memory references here.
> >
> > Thanks,
> > bin
> >> https://gcc.gnu.org/pipermail/gcc/2020-February/232091.html
> >>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Do we need to do a loop invariant motion after loop interchange ?
  2020-09-22  2:30   ` HAO CHEN GUI
  2020-09-22  7:11     ` Richard Biener
@ 2020-09-22  8:24     ` Bin.Cheng
  1 sibling, 0 replies; 6+ messages in thread
From: Bin.Cheng @ 2020-09-22  8:24 UTC (permalink / raw)
  To: HAO CHEN GUI; +Cc: gcc-patches

On Tue, Sep 22, 2020 at 10:30 AM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote:
>
> Bin,
>
> I just tested your patch on current trunk.  Here is my summary.
>
> 1. About some iv aren't moved out of inner loop (Lijia mentioned in his
> last email)
>
>   <bb 13> [local count: 955630226]:
>    # l_32 = PHI <1(12), l_54(21)>
>    # ivtmp_165 = PHI <_446(12), ivtmp_155(21)>
>    _26 = (integer(kind=8)) l_32;
>    _27 = _25 + _26;
>    y__I_lsm.119_136 = (*y_135(D))[_27];
>    y__I_lsm.119_90 = m_55 != 1 ? y__I_lsm.119_136 : 0.0;
>    _37 = _36 * stride.88_111;
>    _38 = _35 + _37;
>    _39 = _26 + _38;
>    _40 = (*a_137(D))[_39];
>
> The offset _39 is not loop independent as it relies on _26. But _38 and
> _37 should be loop independent. So Lijia thought they should be moved
> out of loop.
>
> I checked the following pass and found that these  statements are
> eliminated after vertorizing and dce.
>
> In vect dump,
>
> simple.F:27:23: note:  ------>vectorizing statement: _37 = _36 *
> stride.88_111;
> simple.F:27:23: note:  ------>vectorizing statement: _38 = _35 + _37;
> simple.F:27:23: note:  ------>vectorizing statement: _39 = _26 + _38;
> simple.F:27:23: note:  ------>vectorizing statement: _40 = (*a_137(D))[_39];
> simple.F:27:23: note:  transform statement.
> simple.F:27:23: note:  transform load. ncopies = 1
> simple.F:27:23: note:  create vector_type-pointer variable to type:
> vector(2) real(kind=8)  vectorizing an array ref: (*a_137(D))
> simple.F:27:23: note:  created vectp_a.131_383
> simple.F:27:23: note:  add new stmt: vect__40.132_374 = MEM <vector(2)
> real(kind=8)> [(real(kind=8) *)vectp_a.130_376];
>
> In dce dump,
>
> Deleting : _39 = _26 + _38;
>
> Deleting : _38 = _35 + _37;
>
> Deleting : _37 = _36 * stride.88_111;
>
> So it's reasonable to only consider data reference after loop
> interchange. Other statements may be eliminated or be moved out of loop
> in last lim pass if they're real expensive.
>
> 2. I tested the SPEC on powerpc64le-linux-gnu. 503.bwaves_r got 6.77%
> performance improvement with this patch. It has no impact on other
> benchmarks.
>
> 3. The patch passed bootstrapped and regression test on
> powerpc64le-linux-gnu.
>
> I think the patch works fine. Could you please add it into trunk? Thanks
> a lot.
Hmm, IIRC the patch was intended to show what the missing transform
is, and I think it has latent bugs which I haven't got time to refine.
As Richard mentioned, could you please explore this with the existing
LIM facility, rather than introducing new code implementing existing
transforms?

Thanks,
bin
>
>
> On 8/9/2020 下午 6:18, Bin.Cheng wrote:
> > On Mon, Sep 7, 2020 at 5:42 PM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote:
> >> Hi,
> >>
> >> I want to follow Lijia's work as I gained the performance benefit on
> >> some SPEC workloads by adding a im pass after loop interchange.  Could
> >> you send me the latest patches? I could do further testing. Thanks a lot.
> > Hi,
> > Hmm, not sure if this refers to me?  I only provided an example patch
> > (which isn't complete) before Lijia's.  Unfortunately I don't have any
> > latest patch about this either.
> > As Richard suggested, maybe you (if you work on this) can simplify the
> > implementation.  Anyway, we only need to hoist memory references here.
> >
> > Thanks,
> > bin
> >> https://gcc.gnu.org/pipermail/gcc/2020-February/232091.html
> >>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-09-22  8:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-07  9:42 Do we need to do a loop invariant motion after loop interchange ? HAO CHEN GUI
2020-09-07 11:21 ` Richard Biener
2020-09-08 10:18 ` Bin.Cheng
2020-09-22  2:30   ` HAO CHEN GUI
2020-09-22  7:11     ` Richard Biener
2020-09-22  8:24     ` Bin.Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).