public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/52252] New: An opportunity for x86 gcc vectorizer (gain up to 3 times)
@ 2012-02-14 22:42 evstupac at gmail dot com
  2012-02-15 11:55 ` [Bug tree-optimization/52252] " rguenth at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: evstupac at gmail dot com @ 2012-02-14 22:42 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52252

             Bug #: 52252
           Summary: An opportunity for x86 gcc vectorizer (gain up to 3
                    times)
    Classification: Unclassified
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: evstupac@gmail.com


This is an example of byte conversion from RGB (Red Green Blue) to CMYK (Cyan
Magenta Yellow blacK):

#define byte unsigned char
#define MIN(a, b) ((a) > (b)?(b):(a))

void convert_image(byte *in, byte *out, int size) {
    int i;
    for(i = 0; i < size; i++) {
        byte r = in[0];
        byte g = in[1];
        byte b = in[2];
        byte c, m, y, k, tmp;
        c = 255 - r;
        m = 255 - g;
        y = 255 - b;
        tmp = MIN(m, y);
        k = MIN(c, tmp);
        out[0] = c - k;
        out[1] = m - k;
        out[2] = y - k;
        out[3] = k;
        in += 3;
        out += 4;
    }
}

Here trunk gcc for Arm unrolls the loop by 2 and vectorizes it using neon; gcc
for x86 does not vectorize it.

There are 2  tricky moments in this loop:
1)    It converts 3 bytes into 4
2)    We need to shuffle bytes after load:
Let 0123456789ABCDF be 16 bytes in “in” array (first rgb is 012, next 345…)
To count vector minimum we need to place 0,1,2 bytes into 3 different vectors.

Gcc for Arm does this by 2 special loads:
  vld3.8  {d16, d18, d20}, [r2]!
  vld3.8  {d17, d19, d21}, [r2]
putting 0 and 3 bytes into q8(d16, d17)
        1 and 4 bytes into q9(d18, d19)
        2 and 5 bytes into q10(d20, d21)

And after all vector transformations it stores by 2 special stores:

  vst4.8  {d8, d10, d12, d14}, [r3]!
  vst4.8  {d9, d11, d13, d15}, [r3]

However x86 gcc can do the same loads:
  movq (%edi),%mm5
  movq %mm5,%mm7
  movq %mm5,%mm6
  pshufb %mm3,%mm5 /*0x00ffffff03ffffff*/
  pshufb %mm2,%mm6 /*0x01ffffff04ffffff*/
  pshufb %mm1,%mm7 /*0x02ffffff05ffffff*/
  /* %mm5 – r, %mm6 – g, %mm7 – b */

And same stores:
  pslld  $0x8,%mm6
  pslld  $0x10,%mm7
  pslld  $0x18,%mm4
  pxor   %mm5,%mm6 
  pxor   %mm7,%mm4
  pxor   %mm6,%mm4
  pshufb %mm0,%mm4 /*0x000102030405060708*/ /*here redundant*/
  movq %mm4,(%esi)
  /* %mm5 – c, %mm6 – m, %mm7 – y, %mm4 - k */

pshufb here does not do anything, so could be removed, only in case we store
less than 4 bytes we will need to shuffle them

Moreover x86 gcc can do unroll not only by 2, but by 4:
With the following loads:

  movdqu (%edi),%xmm5
  movdqa %xmm5,%xmm7
  movdqa %xmm5,%xmm6
  pshufb %xmm3,%xmm5 /*0x00ffffff03ffffff06ffffff09ffffff*/
  pshufb %xmm2,%xmm6 /*0x01ffffff04ffffff07ffffff0affffff*/
  pshufb %xmm1,%xmm7 /*0x02ffffff05ffffff08ffffff0bffffff*/
  /* %xmm5 – r, %xmm6 – g, %xmm7 – b */

And stores:
  pslld  $0x8,%xmm6
  pslld  $0x10,%xmm7
  pslld  $0x18,%xmm4
  pxor   %xmm5,%xmm6
  pxor   %xmm7,%xmm4
  pxor   %xmm6,%xmm4
  pshufb %xmm0,%xmm4 /*0x000102030405060708090a0b0c0d0e0f*/ /*here redundant*/
  movdqa %xmm4,(%esi)
  /* %xmm5 – c, %xmm6 – m, %xmm7 – y, %xmm4 - k */


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/52252] An opportunity for x86 gcc vectorizer (gain up to 3 times)
  2012-02-14 22:42 [Bug c/52252] New: An opportunity for x86 gcc vectorizer (gain up to 3 times) evstupac at gmail dot com
@ 2012-02-15 11:55 ` rguenth at gcc dot gnu.org
  2012-02-29 12:34 ` evstupac at gmail dot com
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-02-15 11:55 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52252

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-02-15
          Component|target                      |tree-optimization
            Version|unknown                     |4.7.0
     Ever Confirmed|0                           |1
           Severity|normal                      |enhancement

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-02-15 11:53:58 UTC ---
We fail to SLP vectorize this because of

6: Build SLP failed: different operation in stmt k_15 = MIN_EXPR <tmp_14,
y_13>;

thus,

        out[0] = c - k;
        out[1] = m - k;
        out[2] = y - k;
        out[3] = k;

isn't detected as equivalent to

        out[0] = c - k;
        out[1] = m - k;
        out[2] = y - k;
        out[3] = <magic> - k;

or

        out[3] = k - 0;

whatever would be more suitable (the latter would fail to be detected as
induction I guess, the former would fail with a similar issue for the
definition of <magic>).

With

        out[3] = y - k;

we fail with

6: Load permutation 0 1 2 2 1 1 1 1 0 0 0 0 2 2 2 2
6: Build SLP failed: unsupported load permutation *out_37 = D.1721_16;

we can vectorize

void convert_image(byte *in, byte *out, int size) {
    int i;
    for(i = 0; i < size; i++) {
        byte r = in[0];
        byte g = in[1];
        byte b = in[2];
        byte a = in[3];
        byte c, m, y, k, z, tmp;
        c = 255 - r;
        m = 255 - g;
        y = 255 - b;
        z = 255 - a;
        tmp = MIN(m, y);
        k = MIN(c, tmp);
        out[0] = c - k;
        out[1] = m - k;
        out[2] = y - k;
        out[3] = z - k;
        in += 4;
        out += 4;
    }
}

though.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/52252] An opportunity for x86 gcc vectorizer (gain up to 3 times)
  2012-02-14 22:42 [Bug c/52252] New: An opportunity for x86 gcc vectorizer (gain up to 3 times) evstupac at gmail dot com
  2012-02-15 11:55 ` [Bug tree-optimization/52252] " rguenth at gcc dot gnu.org
@ 2012-02-29 12:34 ` evstupac at gmail dot com
  2012-07-13  8:48 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: evstupac at gmail dot com @ 2012-02-29 12:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52252

--- Comment #2 from Stupachenko Evgeny <evstupac at gmail dot com> 2012-02-29 12:32:20 UTC ---
The difference of 2 dumps from

Arm: gcc -O3 -mfpu=neon test.c -S -ftree-vectorizer-verbose=12
X86: gcc -O3 -m32 -msse3 test.c -S -ftree-vectorizer-verbose=12

Starts at:

For Arm (can use vec_load_lanes):

6: === vect_make_slp_decision === 
6: === vect_detect_hybrid_slp ===
6: === vect_analyze_loop_operations ===
6: examining phi: in_35 = PHI <in_22(7), in_5(D)(4)>

……

6: can use vec_load_lanes<CI><V16QI> 
6: vect_model_load_cost: unaligned supported by hardware. 
6: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .

For x86 (no array mode for V16QI[3]):

6: === vect_make_slp_decision === 
6: === vect_detect_hybrid_slp === 
6: === vect_analyze_loop_operations === 
6: examining phi: in_35 = PHI <in_22(7), in_5(D)(4)> 

.……

6: no array mode for V16QI[3] 
6: the size of the group of strided accesses is not a power of 2 
6: not vectorized: relevant stmt not supported: r_8 = *in_35; 

As I mentioned before, there is an ability for x86 to handle this (Arm can
shuffle than loads, x86 can use pshufb).


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/52252] An opportunity for x86 gcc vectorizer (gain up to 3 times)
  2012-02-14 22:42 [Bug c/52252] New: An opportunity for x86 gcc vectorizer (gain up to 3 times) evstupac at gmail dot com
  2012-02-15 11:55 ` [Bug tree-optimization/52252] " rguenth at gcc dot gnu.org
  2012-02-29 12:34 ` evstupac at gmail dot com
@ 2012-07-13  8:48 ` rguenth at gcc dot gnu.org
  2014-02-11 14:27 ` evstupac at gmail dot com
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-07-13  8:48 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52252

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |53947

--- Comment #3 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-07-13 08:48:18 UTC ---
Link to vectorizer missed-optimization meta-bug.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/52252] An opportunity for x86 gcc vectorizer (gain up to 3 times)
  2012-02-14 22:42 [Bug c/52252] New: An opportunity for x86 gcc vectorizer (gain up to 3 times) evstupac at gmail dot com
                   ` (2 preceding siblings ...)
  2012-07-13  8:48 ` rguenth at gcc dot gnu.org
@ 2014-02-11 14:27 ` evstupac at gmail dot com
  2014-05-07 12:11 ` kyukhin at gcc dot gnu.org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: evstupac at gmail dot com @ 2014-02-11 14:27 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52252

--- Comment #4 from Stupachenko Evgeny <evstupac at gmail dot com> ---
The patch giving an expected 3 times gain submitted for a discussion at:
http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00670.html


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/52252] An opportunity for x86 gcc vectorizer (gain up to 3 times)
  2012-02-14 22:42 [Bug c/52252] New: An opportunity for x86 gcc vectorizer (gain up to 3 times) evstupac at gmail dot com
                   ` (3 preceding siblings ...)
  2014-02-11 14:27 ` evstupac at gmail dot com
@ 2014-05-07 12:11 ` kyukhin at gcc dot gnu.org
  2014-06-11  8:38 ` kyukhin at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: kyukhin at gcc dot gnu.org @ 2014-05-07 12:11 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52252

--- Comment #5 from Kirill Yukhin <kyukhin at gcc dot gnu.org> ---
Author: kyukhin
Date: Wed May  7 12:10:22 2014
New Revision: 210155

URL: http://gcc.gnu.org/viewcvs?rev=210155&root=gcc&view=rev
Log:
gcc/
    * tree-vect-data-refs.c (vect_grouped_load_supported): New
    check for loads group of length 3.
    (vect_permute_load_chain): New permutations for loads group of
    length 3.
    * tree-vect-stmts.c (vect_model_load_cost): Change cost
    of vec_perm_shuffle for the new permutations.

gcc/testsuite/
    PR tree-optimization/52252
    * gcc.dg/vect/pr52252-ld.c: Test on loads group of size 3.


Added:
    trunk/gcc/testsuite/gcc.dg/vect/pr52252-ld.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-data-refs.c
    trunk/gcc/tree-vect-stmts.c


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/52252] An opportunity for x86 gcc vectorizer (gain up to 3 times)
  2012-02-14 22:42 [Bug c/52252] New: An opportunity for x86 gcc vectorizer (gain up to 3 times) evstupac at gmail dot com
                   ` (4 preceding siblings ...)
  2014-05-07 12:11 ` kyukhin at gcc dot gnu.org
@ 2014-06-11  8:38 ` kyukhin at gcc dot gnu.org
  2014-06-18  7:47 ` kyukhin at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: kyukhin at gcc dot gnu.org @ 2014-06-11  8:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52252

--- Comment #6 from Kirill Yukhin <kyukhin at gcc dot gnu.org> ---
Author: kyukhin
Date: Wed Jun 11 08:37:53 2014
New Revision: 211439

URL: http://gcc.gnu.org/viewcvs?rev=211439&root=gcc&view=rev
Log:
gcc/
    * tree-vect-data-refs.c (vect_grouped_store_supported): New
    check for stores group of length 3.
    (vect_permute_store_chain): New permutations for stores group of
    length 3.
    * tree-vect-stmts.c (vect_model_store_cost): Change cost
    of vec_perm_shuffle for the new permutations.

gcc/testsuite/
    PR tree-optimization/52252
    * gcc.dg/vect/pr52252-st.c: Test on stores group of size 3.


Added:
    trunk/gcc/testsuite/gcc.dg/vect/pr52252-st.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-data-refs.c
    trunk/gcc/tree-vect-stmts.c


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/52252] An opportunity for x86 gcc vectorizer (gain up to 3 times)
  2012-02-14 22:42 [Bug c/52252] New: An opportunity for x86 gcc vectorizer (gain up to 3 times) evstupac at gmail dot com
                   ` (5 preceding siblings ...)
  2014-06-11  8:38 ` kyukhin at gcc dot gnu.org
@ 2014-06-18  7:47 ` kyukhin at gcc dot gnu.org
  2023-08-31  7:07 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: kyukhin at gcc dot gnu.org @ 2014-06-18  7:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52252

--- Comment #7 from Kirill Yukhin <kyukhin at gcc dot gnu.org> ---
Author: kyukhin
Date: Wed Jun 18 07:46:18 2014
New Revision: 211769

URL: https://gcc.gnu.org/viewcvs?rev=211769&root=gcc&view=rev
Log:
gcc/
    * config/i386/i386.c (ix86_reassociation_width): Add alternative for
    vector case.
    * config/i386/i386.h (TARGET_VECTOR_PARALLEL_EXECUTION): New.
    * config/i386/x86-tune.def (X86_TUNE_VECTOR_PARALLEL_EXECUTION): New.
    * tree-vect-data-refs.c (vect_shift_permute_load_chain): New.
    Introduces alternative way of loads group permutaions.
    (vect_transform_grouped_load): Try alternative way of permutations.

gcc/testsuite/
    PR tree-optimization/52252
    * gcc.target/i386/pr52252-atom.c: Test on loads group of size 3.
    * gcc.target/i386/pr52252-core.c: Ditto.

    PR tree-optimization/61403
    * gcc.target/i386/pr61403.c: Test on loads and stores group of size 3.


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr52252-atom.c
    trunk/gcc/testsuite/gcc.target/i386/pr52252-core.c
    trunk/gcc/testsuite/gcc.target/i386/pr61403.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/i386.h
    trunk/gcc/config/i386/x86-tune.def
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-data-refs.c


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/52252] An opportunity for x86 gcc vectorizer (gain up to 3 times)
  2012-02-14 22:42 [Bug c/52252] New: An opportunity for x86 gcc vectorizer (gain up to 3 times) evstupac at gmail dot com
                   ` (6 preceding siblings ...)
  2014-06-18  7:47 ` kyukhin at gcc dot gnu.org
@ 2023-08-31  7:07 ` rguenth at gcc dot gnu.org
  2023-11-28  6:06 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-08-31  7:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52252

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
We are not optimally vectorizing this yet, we are using SLP to cover
out[0], out[1], out[2] and single element interleaving for out[3].  The
stores end up strided (aka scalar), that's not what the reporter intended.
We also unroll the loop four times.

The SLP discovery code splits the store group (in the end we should avoid
throwing away such information).  This makes it have a gap and stores with
a gap are only supported "strided" (we could at least store two and one
element, but ...).  We don't support "merging" back the group from SLP
and non-SLP.  With SLP only we might recover here, possibly we shouldn't
allow half SLP / non-SLP for a store group but it might fail even after
discovery so it might be difficult to force this.  Maybe a good case to
"prime" single-lane SLP.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/52252] An opportunity for x86 gcc vectorizer (gain up to 3 times)
  2012-02-14 22:42 [Bug c/52252] New: An opportunity for x86 gcc vectorizer (gain up to 3 times) evstupac at gmail dot com
                   ` (7 preceding siblings ...)
  2023-08-31  7:07 ` rguenth at gcc dot gnu.org
@ 2023-11-28  6:06 ` pinskia at gcc dot gnu.org
  2023-11-28 10:55 ` rguenther at suse dot de
  2023-11-28 22:24 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-28  6:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52252

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pinskia at gcc dot gnu.org

--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note there is also a missing scalar optimization here also (which will improve
the vectorized version in the end too).

Right now we have the following match pattern:
/* MIN (~X, ~Y) -> ~MAX (X, Y)
   MAX (~X, ~Y) -> ~MIN (X, Y)  */
(for minmax (min max)
 maxmin (max min)
 (simplify
  (minmax (bit_not:s@2 @0) (bit_not:s@3 @1))
  (bit_not (maxmin @0 @1)))


But that does not match here due to the :s. I am not 100% sure but trading 2
possible bit_not for adding another might end up improving things ...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/52252] An opportunity for x86 gcc vectorizer (gain up to 3 times)
  2012-02-14 22:42 [Bug c/52252] New: An opportunity for x86 gcc vectorizer (gain up to 3 times) evstupac at gmail dot com
                   ` (8 preceding siblings ...)
  2023-11-28  6:06 ` pinskia at gcc dot gnu.org
@ 2023-11-28 10:55 ` rguenther at suse dot de
  2023-11-28 22:24 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: rguenther at suse dot de @ 2023-11-28 10:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52252

--- Comment #11 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 28 Nov 2023, pinskia at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52252
> 
> Andrew Pinski <pinskia at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |pinskia at gcc dot gnu.org
> 
> --- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
> Note there is also a missing scalar optimization here also (which will improve
> the vectorized version in the end too).
> 
> Right now we have the following match pattern:
> /* MIN (~X, ~Y) -> ~MAX (X, Y)
>    MAX (~X, ~Y) -> ~MIN (X, Y)  */
> (for minmax (min max)
>  maxmin (max min)
>  (simplify
>   (minmax (bit_not:s@2 @0) (bit_not:s@3 @1))
>   (bit_not (maxmin @0 @1)))
> 
> 
> But that does not match here due to the :s. I am not 100% sure but trading 2
> possible bit_not for adding another might end up improving things ...

We're lacking a way to say one of the bit_not should be single-used,
one multi-use would be OK and a fair trade-off - not sure if that
would be enough here, of course.  That would mena changing to
a condition with single_use ().

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/52252] An opportunity for x86 gcc vectorizer (gain up to 3 times)
  2012-02-14 22:42 [Bug c/52252] New: An opportunity for x86 gcc vectorizer (gain up to 3 times) evstupac at gmail dot com
                   ` (9 preceding siblings ...)
  2023-11-28 10:55 ` rguenther at suse dot de
@ 2023-11-28 22:24 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-28 22:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52252

--- Comment #12 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #11)
> We're lacking a way to say one of the bit_not should be single-used,
> one multi-use would be OK and a fair trade-off - not sure if that
> would be enough here, of course.  That would mena changing to
> a condition with single_use ().

That does not fix it though. Because in this case we have:
  c_19 = ~r_16;
  m_20 = ~g_17;
  y_21 = ~b_18;
  tmp_22 = MIN_EXPR <m_20, y_21>;
  k_23 = MIN_EXPR <c_19, tmp_22>;
  _1 = c_19 - k_23;
  _3 = m_20 - k_23;
  _5 = y_21 - k_23;
  .. = k_23;

So both bit_not are used more than once.

so we have `~a - MIN<MIN<~a, ~b>, ~c>` which is the same as `MAX<MAX<a,b>,c> -
a`.

Let me file this as a seperate bug to continue the discussion there.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-11-28 22:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-14 22:42 [Bug c/52252] New: An opportunity for x86 gcc vectorizer (gain up to 3 times) evstupac at gmail dot com
2012-02-15 11:55 ` [Bug tree-optimization/52252] " rguenth at gcc dot gnu.org
2012-02-29 12:34 ` evstupac at gmail dot com
2012-07-13  8:48 ` rguenth at gcc dot gnu.org
2014-02-11 14:27 ` evstupac at gmail dot com
2014-05-07 12:11 ` kyukhin at gcc dot gnu.org
2014-06-11  8:38 ` kyukhin at gcc dot gnu.org
2014-06-18  7:47 ` kyukhin at gcc dot gnu.org
2023-08-31  7:07 ` rguenth at gcc dot gnu.org
2023-11-28  6:06 ` pinskia at gcc dot gnu.org
2023-11-28 10:55 ` rguenther at suse dot de
2023-11-28 22:24 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).