public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/52252] New: An opportunity for x86 gcc vectorizer (gain up to 3 times)
@ 2012-02-14 22:42 evstupac at gmail dot com
  2012-02-15 11:55 ` [Bug tree-optimization/52252] " rguenth at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: evstupac at gmail dot com @ 2012-02-14 22:42 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52252

             Bug #: 52252
           Summary: An opportunity for x86 gcc vectorizer (gain up to 3
                    times)
    Classification: Unclassified
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: evstupac@gmail.com


This is an example of byte conversion from RGB (Red Green Blue) to CMYK (Cyan
Magenta Yellow blacK):

#define byte unsigned char
#define MIN(a, b) ((a) > (b)?(b):(a))

void convert_image(byte *in, byte *out, int size) {
    int i;
    for(i = 0; i < size; i++) {
        byte r = in[0];
        byte g = in[1];
        byte b = in[2];
        byte c, m, y, k, tmp;
        c = 255 - r;
        m = 255 - g;
        y = 255 - b;
        tmp = MIN(m, y);
        k = MIN(c, tmp);
        out[0] = c - k;
        out[1] = m - k;
        out[2] = y - k;
        out[3] = k;
        in += 3;
        out += 4;
    }
}

Here trunk gcc for Arm unrolls the loop by 2 and vectorizes it using neon; gcc
for x86 does not vectorize it.

There are 2  tricky moments in this loop:
1)    It converts 3 bytes into 4
2)    We need to shuffle bytes after load:
Let 0123456789ABCDF be 16 bytes in “in” array (first rgb is 012, next 345…)
To count vector minimum we need to place 0,1,2 bytes into 3 different vectors.

Gcc for Arm does this by 2 special loads:
  vld3.8  {d16, d18, d20}, [r2]!
  vld3.8  {d17, d19, d21}, [r2]
putting 0 and 3 bytes into q8(d16, d17)
        1 and 4 bytes into q9(d18, d19)
        2 and 5 bytes into q10(d20, d21)

And after all vector transformations it stores by 2 special stores:

  vst4.8  {d8, d10, d12, d14}, [r3]!
  vst4.8  {d9, d11, d13, d15}, [r3]

However x86 gcc can do the same loads:
  movq (%edi),%mm5
  movq %mm5,%mm7
  movq %mm5,%mm6
  pshufb %mm3,%mm5 /*0x00ffffff03ffffff*/
  pshufb %mm2,%mm6 /*0x01ffffff04ffffff*/
  pshufb %mm1,%mm7 /*0x02ffffff05ffffff*/
  /* %mm5 – r, %mm6 – g, %mm7 – b */

And same stores:
  pslld  $0x8,%mm6
  pslld  $0x10,%mm7
  pslld  $0x18,%mm4
  pxor   %mm5,%mm6 
  pxor   %mm7,%mm4
  pxor   %mm6,%mm4
  pshufb %mm0,%mm4 /*0x000102030405060708*/ /*here redundant*/
  movq %mm4,(%esi)
  /* %mm5 – c, %mm6 – m, %mm7 – y, %mm4 - k */

pshufb here does not do anything, so could be removed, only in case we store
less than 4 bytes we will need to shuffle them

Moreover x86 gcc can do unroll not only by 2, but by 4:
With the following loads:

  movdqu (%edi),%xmm5
  movdqa %xmm5,%xmm7
  movdqa %xmm5,%xmm6
  pshufb %xmm3,%xmm5 /*0x00ffffff03ffffff06ffffff09ffffff*/
  pshufb %xmm2,%xmm6 /*0x01ffffff04ffffff07ffffff0affffff*/
  pshufb %xmm1,%xmm7 /*0x02ffffff05ffffff08ffffff0bffffff*/
  /* %xmm5 – r, %xmm6 – g, %xmm7 – b */

And stores:
  pslld  $0x8,%xmm6
  pslld  $0x10,%xmm7
  pslld  $0x18,%xmm4
  pxor   %xmm5,%xmm6
  pxor   %xmm7,%xmm4
  pxor   %xmm6,%xmm4
  pshufb %xmm0,%xmm4 /*0x000102030405060708090a0b0c0d0e0f*/ /*here redundant*/
  movdqa %xmm4,(%esi)
  /* %xmm5 – c, %xmm6 – m, %xmm7 – y, %xmm4 - k */


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-11-28 22:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-14 22:42 [Bug c/52252] New: An opportunity for x86 gcc vectorizer (gain up to 3 times) evstupac at gmail dot com
2012-02-15 11:55 ` [Bug tree-optimization/52252] " rguenth at gcc dot gnu.org
2012-02-29 12:34 ` evstupac at gmail dot com
2012-07-13  8:48 ` rguenth at gcc dot gnu.org
2014-02-11 14:27 ` evstupac at gmail dot com
2014-05-07 12:11 ` kyukhin at gcc dot gnu.org
2014-06-11  8:38 ` kyukhin at gcc dot gnu.org
2014-06-18  7:47 ` kyukhin at gcc dot gnu.org
2023-08-31  7:07 ` rguenth at gcc dot gnu.org
2023-11-28  6:06 ` pinskia at gcc dot gnu.org
2023-11-28 10:55 ` rguenther at suse dot de
2023-11-28 22:24 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).