public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Merge epilog loop & loop version due to alias/alignment in vectorization?
@ 2014-02-04 16:28 Bingfeng Mei
  2014-02-04 18:56 ` Xinliang David Li
  2014-02-05  9:10 ` Richard Biener
  0 siblings, 2 replies; 4+ messages in thread
From: Bingfeng Mei @ 2014-02-04 16:28 UTC (permalink / raw)
  To: gcc

Hi,
One of biggest issues we have with GCC vectorization is bloated code size.
For example, vectorized version is 2.5 times of non-vectorized one for the
following simple code. One reason is that GCC often creates one loop copy
because of aliasing/alignment and one epilog loop because of loop iteration
constraint.

void foo (int *a, int *b, int N)
{
  int i;
  for (i = 0; i < N; i++)
  {
    a[i] = b[i];
  }
}

Looking closely, the epilog loop and alignement/aliasing loop are almost
identical, just different in initial values for some variables entering
the loop. Can they be merged into one in such situations? If yes, any 
suggestion on how to implement it? 

...
  <bb 7>:
  # i_39 = PHI <i_47(8), i_50(10)>
  _41 = (long unsigned int) i_39;
  _42 = _41 * 4;
  _43 = a_7(D) + _42;
  _44 = b_9(D) + _42;
  _45 = *_44;
  *_43 = _45;
  i_47 = i_39 + 1;
  if (N_4(D) > i_47)
    goto <bb 8>;
  else
    goto <bb 15>;

  <bb 8>:
  goto <bb 7>;

  <bb 9>:
  # i_51 = PHI <i_13(6)>
  tmp.6_56 = (int) ratio_mult_vf.5_38;
  if (niters.3_34 == ratio_mult_vf.5_38)
    goto <bb 16>;
  else
    goto <bb 10>;

  <bb 10>:
  # i_50 = PHI <tmp.6_56(9), 0(4)>
  goto <bb 7>;

  <bb 11>:
  goto <bb 6>;

  <bb 12>:

  <bb 13>:
  # i_24 = PHI <0(12), i_32(14)>
  _26 = (long unsigned int) i_24;
  _27 = _26 * 4;
  _28 = a_7(D) + _27;
  _29 = b_9(D) + _27;
  _30 = *_29;
  *_28 = _30;
  i_32 = i_24 + 1;
  if (N_4(D) > i_32)
    goto <bb 14>;
  else
    goto <bb 17>;
...

Thanks,
Bingfeng

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Merge epilog loop & loop version due to alias/alignment in vectorization?
  2014-02-04 16:28 Merge epilog loop & loop version due to alias/alignment in vectorization? Bingfeng Mei
@ 2014-02-04 18:56 ` Xinliang David Li
  2014-02-05 10:39   ` Bingfeng Mei
  2014-02-05  9:10 ` Richard Biener
  1 sibling, 1 reply; 4+ messages in thread
From: Xinliang David Li @ 2014-02-04 18:56 UTC (permalink / raw)
  To: Bingfeng Mei; +Cc: gcc, Cong Hou

See also http://gcc.gnu.org/ml/gcc/2013-08/msg00259.html

There are some concerns, but it would be interesting to do some
benchmarking of this.

David

On Tue, Feb 4, 2014 at 8:27 AM, Bingfeng Mei <bmei@broadcom.com> wrote:
> Hi,
> One of biggest issues we have with GCC vectorization is bloated code size.
> For example, vectorized version is 2.5 times of non-vectorized one for the
> following simple code. One reason is that GCC often creates one loop copy
> because of aliasing/alignment and one epilog loop because of loop iteration
> constraint.
>
> void foo (int *a, int *b, int N)
> {
>   int i;
>   for (i = 0; i < N; i++)
>   {
>     a[i] = b[i];
>   }
> }
>
> Looking closely, the epilog loop and alignement/aliasing loop are almost
> identical, just different in initial values for some variables entering
> the loop. Can they be merged into one in such situations? If yes, any
> suggestion on how to implement it?
>
> ...
>   <bb 7>:
>   # i_39 = PHI <i_47(8), i_50(10)>
>   _41 = (long unsigned int) i_39;
>   _42 = _41 * 4;
>   _43 = a_7(D) + _42;
>   _44 = b_9(D) + _42;
>   _45 = *_44;
>   *_43 = _45;
>   i_47 = i_39 + 1;
>   if (N_4(D) > i_47)
>     goto <bb 8>;
>   else
>     goto <bb 15>;
>
>   <bb 8>:
>   goto <bb 7>;
>
>   <bb 9>:
>   # i_51 = PHI <i_13(6)>
>   tmp.6_56 = (int) ratio_mult_vf.5_38;
>   if (niters.3_34 == ratio_mult_vf.5_38)
>     goto <bb 16>;
>   else
>     goto <bb 10>;
>
>   <bb 10>:
>   # i_50 = PHI <tmp.6_56(9), 0(4)>
>   goto <bb 7>;
>
>   <bb 11>:
>   goto <bb 6>;
>
>   <bb 12>:
>
>   <bb 13>:
>   # i_24 = PHI <0(12), i_32(14)>
>   _26 = (long unsigned int) i_24;
>   _27 = _26 * 4;
>   _28 = a_7(D) + _27;
>   _29 = b_9(D) + _27;
>   _30 = *_29;
>   *_28 = _30;
>   i_32 = i_24 + 1;
>   if (N_4(D) > i_32)
>     goto <bb 14>;
>   else
>     goto <bb 17>;
> ...
>
> Thanks,
> Bingfeng

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Merge epilog loop & loop version due to alias/alignment in vectorization?
  2014-02-04 16:28 Merge epilog loop & loop version due to alias/alignment in vectorization? Bingfeng Mei
  2014-02-04 18:56 ` Xinliang David Li
@ 2014-02-05  9:10 ` Richard Biener
  1 sibling, 0 replies; 4+ messages in thread
From: Richard Biener @ 2014-02-05  9:10 UTC (permalink / raw)
  To: Bingfeng Mei; +Cc: gcc

On Tue, Feb 4, 2014 at 5:27 PM, Bingfeng Mei <bmei@broadcom.com> wrote:
> Hi,
> One of biggest issues we have with GCC vectorization is bloated code size.
> For example, vectorized version is 2.5 times of non-vectorized one for the
> following simple code. One reason is that GCC often creates one loop copy
> because of aliasing/alignment and one epilog loop because of loop iteration
> constraint.

One thing to improve is to reduce the cases where we apply peeling
for alignment - by more properly modelling the cost effect for example
(also by considering that when you align 'a' then you might spuriously
misalign 'b').

Another idea is (if the target supports misaligned accesses) to
do both prologue and epilogue in vector code by doing redundant
work (overlap with the first / last vector iterations) and thus avoid
creating a loop for the prologue / epilogue.  Of course that has
constraints on the kind of operations that are supported (likely
more difficult if reductions / inductions are involved or if there
are dependences to be honored).

Richard.

> void foo (int *a, int *b, int N)
> {
>   int i;
>   for (i = 0; i < N; i++)
>   {
>     a[i] = b[i];
>   }
> }
>
> Looking closely, the epilog loop and alignement/aliasing loop are almost
> identical, just different in initial values for some variables entering
> the loop. Can they be merged into one in such situations? If yes, any
> suggestion on how to implement it?
>
> ...
>   <bb 7>:
>   # i_39 = PHI <i_47(8), i_50(10)>
>   _41 = (long unsigned int) i_39;
>   _42 = _41 * 4;
>   _43 = a_7(D) + _42;
>   _44 = b_9(D) + _42;
>   _45 = *_44;
>   *_43 = _45;
>   i_47 = i_39 + 1;
>   if (N_4(D) > i_47)
>     goto <bb 8>;
>   else
>     goto <bb 15>;
>
>   <bb 8>:
>   goto <bb 7>;
>
>   <bb 9>:
>   # i_51 = PHI <i_13(6)>
>   tmp.6_56 = (int) ratio_mult_vf.5_38;
>   if (niters.3_34 == ratio_mult_vf.5_38)
>     goto <bb 16>;
>   else
>     goto <bb 10>;
>
>   <bb 10>:
>   # i_50 = PHI <tmp.6_56(9), 0(4)>
>   goto <bb 7>;
>
>   <bb 11>:
>   goto <bb 6>;
>
>   <bb 12>:
>
>   <bb 13>:
>   # i_24 = PHI <0(12), i_32(14)>
>   _26 = (long unsigned int) i_24;
>   _27 = _26 * 4;
>   _28 = a_7(D) + _27;
>   _29 = b_9(D) + _27;
>   _30 = *_29;
>   *_28 = _30;
>   i_32 = i_24 + 1;
>   if (N_4(D) > i_32)
>     goto <bb 14>;
>   else
>     goto <bb 17>;
> ...
>
> Thanks,
> Bingfeng

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Merge epilog loop & loop version due to alias/alignment in vectorization?
  2014-02-04 18:56 ` Xinliang David Li
@ 2014-02-05 10:39   ` Bingfeng Mei
  0 siblings, 0 replies; 4+ messages in thread
From: Bingfeng Mei @ 2014-02-05 10:39 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: gcc, Cong Hou

Thanks, it seems that Cong's idea is exactly what I meant. Is there
a patch I can try? 

Bingfeng


-----Original Message-----
From: Xinliang David Li [mailto:davidxl@google.com] 
Sent: 04 February 2014 18:57
To: Bingfeng Mei
Cc: gcc@gcc.gnu.org; Cong Hou
Subject: Re: Merge epilog loop & loop version due to alias/alignment in vectorization?

See also http://gcc.gnu.org/ml/gcc/2013-08/msg00259.html

There are some concerns, but it would be interesting to do some
benchmarking of this.

David

On Tue, Feb 4, 2014 at 8:27 AM, Bingfeng Mei <bmei@broadcom.com> wrote:
> Hi,
> One of biggest issues we have with GCC vectorization is bloated code size.
> For example, vectorized version is 2.5 times of non-vectorized one for the
> following simple code. One reason is that GCC often creates one loop copy
> because of aliasing/alignment and one epilog loop because of loop iteration
> constraint.
>
> void foo (int *a, int *b, int N)
> {
>   int i;
>   for (i = 0; i < N; i++)
>   {
>     a[i] = b[i];
>   }
> }
>
> Looking closely, the epilog loop and alignement/aliasing loop are almost
> identical, just different in initial values for some variables entering
> the loop. Can they be merged into one in such situations? If yes, any
> suggestion on how to implement it?
>
> ...
>   <bb 7>:
>   # i_39 = PHI <i_47(8), i_50(10)>
>   _41 = (long unsigned int) i_39;
>   _42 = _41 * 4;
>   _43 = a_7(D) + _42;
>   _44 = b_9(D) + _42;
>   _45 = *_44;
>   *_43 = _45;
>   i_47 = i_39 + 1;
>   if (N_4(D) > i_47)
>     goto <bb 8>;
>   else
>     goto <bb 15>;
>
>   <bb 8>:
>   goto <bb 7>;
>
>   <bb 9>:
>   # i_51 = PHI <i_13(6)>
>   tmp.6_56 = (int) ratio_mult_vf.5_38;
>   if (niters.3_34 == ratio_mult_vf.5_38)
>     goto <bb 16>;
>   else
>     goto <bb 10>;
>
>   <bb 10>:
>   # i_50 = PHI <tmp.6_56(9), 0(4)>
>   goto <bb 7>;
>
>   <bb 11>:
>   goto <bb 6>;
>
>   <bb 12>:
>
>   <bb 13>:
>   # i_24 = PHI <0(12), i_32(14)>
>   _26 = (long unsigned int) i_24;
>   _27 = _26 * 4;
>   _28 = a_7(D) + _27;
>   _29 = b_9(D) + _27;
>   _30 = *_29;
>   *_28 = _30;
>   i_32 = i_24 + 1;
>   if (N_4(D) > i_32)
>     goto <bb 14>;
>   else
>     goto <bb 17>;
> ...
>
> Thanks,
> Bingfeng

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-02-05 10:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-04 16:28 Merge epilog loop & loop version due to alias/alignment in vectorization? Bingfeng Mei
2014-02-04 18:56 ` Xinliang David Li
2014-02-05 10:39   ` Bingfeng Mei
2014-02-05  9:10 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).