public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/55600] New: excessive size of vectorized code
@ 2012-12-05  0:34 neleai at seznam dot cz
  2012-12-05 10:33 ` [Bug tree-optimization/55600] " rguenth at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: neleai at seznam dot cz @ 2012-12-05  0:34 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55600

             Bug #: 55600
           Summary: excessive size of vectorized code
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: neleai@seznam.cz


Consider following code:

int sum(int *s){long i;
  int su=0;
  for(i=0;i<128;i+=2) su+=s[i]*s[i+1];
  return su;
}

When compiled by latest gcc with -O3 then generated assembly has 1099 bytes.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/55600] excessive size of vectorized code
  2012-12-05  0:34 [Bug tree-optimization/55600] New: excessive size of vectorized code neleai at seznam dot cz
@ 2012-12-05 10:33 ` rguenth at gcc dot gnu.org
  2012-12-26 22:04 ` neleai at seznam dot cz
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-12-05 10:33 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55600

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> 2012-12-05 10:33:06 UTC ---
GCC fully unrolls the vectorized looo.  ICC does not.

The loop rolls 16 times:

  <bb 3>:
  # vect_p.5_30 = PHI <vect_p.5_45(4), vect_p.8_31(2)>
  # vect_su.12_52 = PHI <vect_su.12_53(4), { 0, 0, 0, 0 }(2)>
  # ivtmp_61 = PHI <ivtmp_62(4), 0(2)>
  vect_var_.9_46 = MEM[(int *)vect_p.5_30];
  vect_p.5_47 = vect_p.5_30 + 16;
  vect_var_.10_48 = MEM[(int *)vect_p.5_47];
  vect_perm_even_49 = VEC_PERM_EXPR <vect_var_.9_46, vect_var_.10_48, { 0, 2,
4, 6 }>;
  vect_perm_odd_50 = VEC_PERM_EXPR <vect_var_.9_46, vect_var_.10_48, { 1, 3, 5,
7 }>;
  vect_var_.11_51 = vect_perm_even_49 * vect_perm_odd_50;
  vect_su.12_53 = vect_var_.11_51 + vect_su.12_52;
  vect_p.5_45 = vect_p.5_47 + 16;
  ivtmp_62 = ivtmp_61 + 1;
  if (ivtmp_62 < 16)
    goto <bb 4>;
  else
    goto <bb 5>;

  <bb 4>:
  goto <bb 3>;

but at -O3 we don't care too much about code size in this case.  So I'm not
sure you can call this a "bug".  Does it run slower?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/55600] excessive size of vectorized code
  2012-12-05  0:34 [Bug tree-optimization/55600] New: excessive size of vectorized code neleai at seznam dot cz
  2012-12-05 10:33 ` [Bug tree-optimization/55600] " rguenth at gcc dot gnu.org
@ 2012-12-26 22:04 ` neleai at seznam dot cz
  2012-12-26 22:05 ` neleai at seznam dot cz
  2021-08-07 23:09 ` [Bug target/55600] " pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: neleai at seznam dot cz @ 2012-12-26 22:04 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55600

--- Comment #2 from Ondrej Bilka <neleai at seznam dot cz> 2012-12-26 22:03:59 UTC ---
Yes when 128 is replaced by smaller constant. Attached patch gives on my i5
following:
size 32
vector

real    0m0.224s
user    0m0.220s
sys    0m0.000s
unroll

real    0m0.155s
user    0m0.148s
sys    0m0.004s
size 64
vector

real    0m0.398s
user    0m0.396s
sys    0m0.000s
unroll

real    0m0.380s
user    0m0.376s
sys    0m0.000s
size 128
vector

real    0m0.703s
user    0m0.700s
sys    0m0.000s
unroll

real    0m0.817s
user    0m0.812s
sys    0m0.000s


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/55600] excessive size of vectorized code
  2012-12-05  0:34 [Bug tree-optimization/55600] New: excessive size of vectorized code neleai at seznam dot cz
  2012-12-05 10:33 ` [Bug tree-optimization/55600] " rguenth at gcc dot gnu.org
  2012-12-26 22:04 ` neleai at seznam dot cz
@ 2012-12-26 22:05 ` neleai at seznam dot cz
  2021-08-07 23:09 ` [Bug target/55600] " pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: neleai at seznam dot cz @ 2012-12-26 22:05 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55600

--- Comment #3 from Ondrej Bilka <neleai at seznam dot cz> 2012-12-26 22:05:37 UTC ---
Created attachment 29052
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29052
benchmark


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/55600] excessive size of vectorized code
  2012-12-05  0:34 [Bug tree-optimization/55600] New: excessive size of vectorized code neleai at seznam dot cz
                   ` (2 preceding siblings ...)
  2012-12-26 22:05 ` neleai at seznam dot cz
@ 2021-08-07 23:09 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-07 23:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55600

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-linux-gnu
           Keywords|                            |missed-optimization
          Component|tree-optimization           |target

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
So here is what the current state for this.
GCC vectorizers it and unroll it fully at 128 and 64
clang keeps it as **scalars** but unrolls the loop 4 times at 128 and fully at
64
ICC vectorizers it and unrolls it half way (that is 32 times) at 128 and fully
at 64
MSVC keeps it as **scalars** but unrolls it half way (that is 32 times) at 128

So it looks all compilers do stuff hugely different here really.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-08-07 23:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-05  0:34 [Bug tree-optimization/55600] New: excessive size of vectorized code neleai at seznam dot cz
2012-12-05 10:33 ` [Bug tree-optimization/55600] " rguenth at gcc dot gnu.org
2012-12-26 22:04 ` neleai at seznam dot cz
2012-12-26 22:05 ` neleai at seznam dot cz
2021-08-07 23:09 ` [Bug target/55600] " pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).