[Bug rtl-optimization/56175] New: Issue with combine phase on x86.

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/56175] New: Issue with combine phase on x86.
@ 2013-02-01 15:52 ysrumyan at gmail dot com
  2013-02-01 15:59 ` [Bug rtl-optimization/56175] " ysrumyan at gmail dot com
                   ` (17 more replies)
  0 siblings, 18 replies; 19+ messages in thread
From: ysrumyan at gmail dot com @ 2013-02-01 15:52 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175

             Bug #: 56175
           Summary: Issue with combine phase on x86.
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: ysrumyan@gmail.com


Analyzing performance of important benchmark on x86 Atom in 32bit mode we found
out that the code produced for attached testcase is not optimal - the inner
loop contains 18 instructions instead of 12.
The problem is that 'combine' does not perform desired substitution for the
following stmt:

    t = (u8)((x & 1) ^ ((u8)y & 1));
It is not able to convert it to more optimal form like:
    t = (u8)((x ^ (u8)y ) & 1);

This issue can be explained using the following testcase:

int foo( unsigned char x, unsigned short y)
{
  unsigned char z;
 if (x ==0 || y == 0)
  return 0;
 x>>=1;
 y>>=1;
  z = (unsigned char)((x & 1) ^ ((unsigned char)y & 1));
  if (z == 1)
    return 1;
  return 0;
}

For this case combine performs needed transformation and we get optimal
assembly:
...
    xorl    %edx, %eax
    andl    $1, %eax
    ret

For this case combine tries to perform the following substitution:

Trying 22, 20 -> 23:
Failed to match this instruction:
(parallel [
        (set (reg:QI 83 [ D.1758 ])
            (and:QI (xor:QI (reg:QI 79 [ x ])
                    (subreg:QI (reg:HI 81 [ y ]) 0))
                (const_int 1 [0x1])))
        (clobber (reg:CC 17 flags))
    ])
Failed to match this instruction:
(set (reg:QI 83 [ D.1758 ])
    (and:QI (xor:QI (reg:QI 79 [ x ])
            (subreg:QI (reg:HI 81 [ y ]) 0))
        (const_int 1 [0x1])))
Successfully matched this instruction:
(set (reg:QI 82 [ D.1760 ])
    (xor:QI (reg:QI 79 [ x ])
        (subreg:QI (reg:HI 81 [ y ]) 0)))
Successfully matched this instruction:
(set (reg:QI 83 [ D.1758 ])
    (and:QI (reg:QI 82 [ D.1760 ])
        (const_int 1 [0x1])))
where
(insn 20 19 21 4 (parallel [
            (set (reg:QI 80 [ D.1759 ])
                (and:QI (reg:QI 79 [ x ])
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ]) t.c:8 405 {*andqi_1}
     (expr_list:REG_DEAD (reg:QI 79 [ x ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
(insn 22 21 23 4 (parallel [
            (set (reg:HI 82 [ D.1760 ])
                (and:HI (reg:HI 81 [ y ])
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ]) t.c:8 404 {*andhi_1}
     (expr_list:REG_DEAD (reg:HI 81 [ y ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
(insn 23 22 24 4 (parallel [
            (set (reg:QI 83 [ D.1758 ])
                (xor:QI (reg:QI 80 [ D.1759 ])
                    (subreg:QI (reg:HI 82 [ D.1760 ]) 0)))
            (clobber (reg:CC 17 flags))
        ]) t.c:8 426 {*xorqi_1}
     (expr_list:REG_DEAD (reg:HI 82 [ D.1760 ])
        (expr_list:REG_DEAD (reg:QI 80 [ D.1759 ])
            (expr_list:REG_UNUSED (reg:CC 17 flags)
                (nil)))))
but for more compicated test that is attached combine tries to do the same
substitution in reverse order of operands and it is failed:

Trying 14, 13 -> 15:
Failed to match this instruction:
(parallel [
        (set (reg:QI 63 [ D.1770 ])
            (xor:QI (and:QI (reg/v:QI 72 [ x ])
                    (const_int 1 [0x1]))
                (and:QI (subreg:QI (reg/v:HI 74 [ y ]) 0)
                    (const_int 1 [0x1]))))
        (clobber (reg:CC 17 flags))
    ])
Failed to match this instruction:
(set (reg:QI 63 [ D.1770 ])
    (xor:QI (and:QI (reg/v:QI 72 [ x ])
            (const_int 1 [0x1]))
        (and:QI (subreg:QI (reg/v:HI 74 [ y ]) 0)
            (const_int 1 [0x1]))))
Successfully matched this instruction:
(set (reg:QI 77 [ D.1771 ])
    (and:QI (subreg:QI (reg/v:HI 74 [ y ]) 0)
        (const_int 1 [0x1])))
Failed to match this instruction:
(set (reg:QI 63 [ D.1770 ])
    (xor:QI (and:QI (reg/v:QI 72 [ x ])
            (const_int 1 [0x1]))
        (reg:QI 77 [ D.1771 ])))
where
(insn 13 12 14 3 (parallel [
            (set (reg:HI 76 [ D.1772 ])
                (and:HI (reg/v:HI 74 [ y ])
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ]) t1.c:9 404 {*andhi_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))
(insn 14 13 15 3 (parallel [
            (set (reg:QI 77 [ D.1771 ])
                (and:QI (reg/v:QI 72 [ x ])
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ]) t1.c:9 405 {*andqi_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))
(insn 15 14 16 3 (parallel [
            (set (reg:QI 63 [ D.1770 ])
                (xor:QI (reg:QI 77 [ D.1771 ])
                    (subreg:QI (reg:HI 76 [ D.1772 ]) 0)))
            (clobber (reg:CC 17 flags))
        ]) t1.c:9 426 {*xorqi_1}
     (expr_list:REG_DEAD (reg:QI 77 [ D.1771 ])
        (expr_list:REG_DEAD (reg:HI 76 [ D.1772 ])
            (expr_list:REG_UNUSED (reg:CC 17 flags)
                (nil)))))
It seems that if we tried to combine 13, 14 -> 15 we will be successful.
Note also that an order of instructions is different after expand.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/56175] Issue with combine phase on x86.
  2013-02-01 15:52 [Bug rtl-optimization/56175] New: Issue with combine phase on x86 ysrumyan at gmail dot com
@ 2013-02-01 15:59 ` ysrumyan at gmail dot com
  2013-02-02 12:02 ` ubizjak at gmail dot com
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: ysrumyan at gmail dot com @ 2013-02-01 15:59 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175

--- Comment #1 from Yuri Rumyantsev <ysrumyan at gmail dot com> 2013-02-01 15:58:49 UTC ---
Created attachment 29330
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29330
testcase

This test must be compiled with the following options:
"-O2 -ffast-math -msse2 -mfpmath=sse -m32 -march=atom -mtune=atom
-ftree-loop-if-convert"


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/56175] Issue with combine phase on x86.
  2013-02-01 15:52 [Bug rtl-optimization/56175] New: Issue with combine phase on x86 ysrumyan at gmail dot com
  2013-02-01 15:59 ` [Bug rtl-optimization/56175] " ysrumyan at gmail dot com
@ 2013-02-02 12:02 ` ubizjak at gmail dot com
  2013-02-04  8:55 ` ysrumyan at gmail dot com
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: ubizjak at gmail dot com @ 2013-02-02 12:02 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175

--- Comment #2 from Uros Bizjak <ubizjak at gmail dot com> 2013-02-02 12:01:56 UTC ---
(In reply to comment #1)
> Created attachment 29330 [details]
> testcase
> 
> This test must be compiled with the following options:
> "-O2 -ffast-math -msse2 -mfpmath=sse -m32 -march=atom -mtune=atom
> -ftree-loop-if-convert"

Compiling attached test with above flags, I got:

foo:
        movl    4(%esp), %eax
        movl    8(%esp), %edx
        testb   %al, %al
        je      .L3
        testw   %dx, %dx
        je      .L3
        shrb    %al
        shrw    %dx
        xorl    %edx, %eax
        andl    $1, %eax
        ret
.L3:
        xorl    %eax, %eax
        ret

which looks the same as your optimal assembly in Description - maybe due to the
fact that the attached test source is the same as the source in Description.

BTW: This is probably the case of missing CSE in tree optimizers. Combine pass
is not powerful enough to figure out optimal sequence, it more-or-less blindly
combines various patterns.

Please provide the test that exposes this missing optimization.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug rtl-optimization/56175] Issue with combine phase on x86.
  2013-02-01 15:52 [Bug rtl-optimization/56175] New: Issue with combine phase on x86 ysrumyan at gmail dot com
  2013-02-01 15:59 ` [Bug rtl-optimization/56175] " ysrumyan at gmail dot com
  2013-02-02 12:02 ` ubizjak at gmail dot com
@ 2013-02-04  8:55 ` ysrumyan at gmail dot com
  2013-02-04 10:11 ` [Bug tree-optimization/56175] " rguenth at gcc dot gnu.org
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: ysrumyan at gmail dot com @ 2013-02-04  8:55 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175

--- Comment #3 from Yuri Rumyantsev <ysrumyan at gmail dot com> 2013-02-04 08:55:12 UTC ---
Created attachment 29345
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29345
real test-case

Need to be compiled with proposed options.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/56175] Issue with combine phase on x86.
  2013-02-01 15:52 [Bug rtl-optimization/56175] New: Issue with combine phase on x86 ysrumyan at gmail dot com
                   ` (2 preceding siblings ...)
  2013-02-04  8:55 ` ysrumyan at gmail dot com
@ 2013-02-04 10:11 ` rguenth at gcc dot gnu.org
  2013-02-11 13:43 ` ysrumyan at gmail dot com
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-02-04 10:11 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2013-02-04
          Component|rtl-optimization            |tree-optimization
     Ever Confirmed|0                           |1
           Severity|normal                      |enhancement

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-04 10:10:31 UTC ---
This should be fixed on the GIMPLE level by simplify_bitwise_binary.  That is,
(A & C) ^ (B & C) -> (A ^ B) & C for all code combinations and C's that this
is valid for.  fold doesn't seem to have this complex pattern.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/56175] Issue with combine phase on x86.
  2013-02-01 15:52 [Bug rtl-optimization/56175] New: Issue with combine phase on x86 ysrumyan at gmail dot com
                   ` (3 preceding siblings ...)
  2013-02-04 10:11 ` [Bug tree-optimization/56175] " rguenth at gcc dot gnu.org
@ 2013-02-11 13:43 ` ysrumyan at gmail dot com
  2013-02-11 14:38 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: ysrumyan at gmail dot com @ 2013-02-11 13:43 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175

--- Comment #5 from Yuri Rumyantsev <ysrumyan at gmail dot com> 2013-02-11 13:42:49 UTC ---
This pattern is already recognized by simplify_bitwise_binary but only for
usual int type, i.e. if we change all short types to the ordinary int (or
unsigned) this simplification takes place (dump after 1st forwprop):

  <bb 4>:
  x_8 = x_2(D) >> 1;
  y_9 = y_4(D) >> 1;
  _10 = x_8 & 1;
  _11 = y_9 & 1;
  _16 = x_8 ^ y_9;
  z_12 = _16 & 1;

i.e. the issue is redundant type conversions:

  <bb 3>:
  x_7 = x_2(D) >> 1;
  y_8 = y_4(D) >> 1;
  _13 = x_7 & 1;
  _9 = (signed char) _13;
  _14 = y_8 & 1;
  _10 = (signed char) _14;
  _11 = _9 ^ _10;

I assume that if we delete these redundant conversions the required
simplification will happen.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/56175] Issue with combine phase on x86.
  2013-02-01 15:52 [Bug rtl-optimization/56175] New: Issue with combine phase on x86 ysrumyan at gmail dot com
                   ` (4 preceding siblings ...)
  2013-02-11 13:43 ` ysrumyan at gmail dot com
@ 2013-02-11 14:38 ` rguenth at gcc dot gnu.org
  2013-02-12 13:05 ` ysrumyan at gmail dot com
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-02-11 14:38 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-11 14:38:37 UTC ---
(In reply to comment #5)
> This pattern is already recognized by simplify_bitwise_binary but only for
> usual int type, i.e. if we change all short types to the ordinary int (or
> unsigned) this simplification takes place (dump after 1st forwprop):
> 
>   <bb 4>:
>   x_8 = x_2(D) >> 1;
>   y_9 = y_4(D) >> 1;
>   _10 = x_8 & 1;
>   _11 = y_9 & 1;
>   _16 = x_8 ^ y_9;
>   z_12 = _16 & 1;
> 
> i.e. the issue is redundant type conversions:
> 
>   <bb 3>:
>   x_7 = x_2(D) >> 1;
>   y_8 = y_4(D) >> 1;
>   _13 = x_7 & 1;
>   _9 = (signed char) _13;
>   _14 = y_8 & 1;
>   _10 = (signed char) _14;
>   _11 = _9 ^ _10;
> 
> I assume that if we delete these redundant conversions the required
> simplification will happen.

Ah, well.  The issue is that we transformed (unsigned char)y & 1
to (unsigned char)(y & 1).


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/56175] Issue with combine phase on x86.
  2013-02-01 15:52 [Bug rtl-optimization/56175] New: Issue with combine phase on x86 ysrumyan at gmail dot com
                   ` (5 preceding siblings ...)
  2013-02-11 14:38 ` rguenth at gcc dot gnu.org
@ 2013-02-12 13:05 ` ysrumyan at gmail dot com
  2013-02-12 13:26 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: ysrumyan at gmail dot com @ 2013-02-12 13:05 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175

--- Comment #7 from Yuri Rumyantsev <ysrumyan at gmail dot com> 2013-02-12 13:05:16 UTC ---
(In reply to comment #6)
> (In reply to comment #5)
> > This pattern is already recognized by simplify_bitwise_binary but only for
> > usual int type, i.e. if we change all short types to the ordinary int (or
> > unsigned) this simplification takes place (dump after 1st forwprop):
> > 
> >   <bb 4>:
> >   x_8 = x_2(D) >> 1;
> >   y_9 = y_4(D) >> 1;
> >   _10 = x_8 & 1;
> >   _11 = y_9 & 1;
> >   _16 = x_8 ^ y_9;
> >   z_12 = _16 & 1;
> > 
> > i.e. the issue is redundant type conversions:
> > 
> >   <bb 3>:
> >   x_7 = x_2(D) >> 1;
> >   y_8 = y_4(D) >> 1;
> >   _13 = x_7 & 1;
> >   _9 = (signed char) _13;
> >   _14 = y_8 & 1;
> >   _10 = (signed char) _14;
> >   _11 = _9 ^ _10;
> > 
> > I assume that if we delete these redundant conversions the required
> > simplification will happen.
> 
> Ah, well.  The issue is that we transformed (unsigned char)y & 1
> to (unsigned char)(y & 1).

Hi Richard,

We'd like to fix this issue since we can get +10.5% speedup on Atom.
What is your opinion on how better to fix this issue with 1st pattern in
simplify_bitwise_binary?

I have no idea why gcc does such transformation and what gain we can get from
it - decrease size of constant or create more opportunities for cse?
I can propose the following possible changes:

1. Introduce a hook for doing such transformation.
2. Introduce a new forwprop pass that does not do such transformation.
3. Do not perform such transformation for small positive constant.
4. Do not performa such transformation if (type-x) c == c.
etc.

Any help will be appreciated.
Yuri.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/56175] Issue with combine phase on x86.
  2013-02-01 15:52 [Bug rtl-optimization/56175] New: Issue with combine phase on x86 ysrumyan at gmail dot com
                   ` (6 preceding siblings ...)
  2013-02-12 13:05 ` ysrumyan at gmail dot com
@ 2013-02-12 13:26 ` rguenth at gcc dot gnu.org
  2013-02-12 14:44 ` ysrumyan at gmail dot com
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-02-12 13:26 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-12 13:25:59 UTC ---
(In reply to comment #7)
> (In reply to comment #6)
> > (In reply to comment #5)
> > > This pattern is already recognized by simplify_bitwise_binary but only for
> > > usual int type, i.e. if we change all short types to the ordinary int (or
> > > unsigned) this simplification takes place (dump after 1st forwprop):
> > > 
> > >   <bb 4>:
> > >   x_8 = x_2(D) >> 1;
> > >   y_9 = y_4(D) >> 1;
> > >   _10 = x_8 & 1;
> > >   _11 = y_9 & 1;
> > >   _16 = x_8 ^ y_9;
> > >   z_12 = _16 & 1;
> > > 
> > > i.e. the issue is redundant type conversions:
> > > 
> > >   <bb 3>:
> > >   x_7 = x_2(D) >> 1;
> > >   y_8 = y_4(D) >> 1;
> > >   _13 = x_7 & 1;
> > >   _9 = (signed char) _13;
> > >   _14 = y_8 & 1;
> > >   _10 = (signed char) _14;
> > >   _11 = _9 ^ _10;
> > > 
> > > I assume that if we delete these redundant conversions the required
> > > simplification will happen.
> > 
> > Ah, well.  The issue is that we transformed (unsigned char)y & 1
> > to (unsigned char)(y & 1).
> 
> Hi Richard,
> 
> We'd like to fix this issue since we can get +10.5% speedup on Atom.
> What is your opinion on how better to fix this issue with 1st pattern in
> simplify_bitwise_binary?
> 
> I have no idea why gcc does such transformation and what gain we can get from
> it - decrease size of constant or create more opportunities for cse?

Well, you'd have to track down what is responsible for that transform.

Generally promoting operations (and automatic vars) to word-mode may
be beneficial on most targets.  But that should be done late.

> I can propose the following possible changes:
> 
> 1. Introduce a hook for doing such transformation.
> 2. Introduce a new forwprop pass that does not do such transformation.
> 3. Do not perform such transformation for small positive constant.
> 4. Do not performa such transformation if (type-x) c == c.
> etc.

First track it down ;)

> Any help will be appreciated.
> Yuri.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/56175] Issue with combine phase on x86.
  2013-02-01 15:52 [Bug rtl-optimization/56175] New: Issue with combine phase on x86 ysrumyan at gmail dot com
                   ` (7 preceding siblings ...)
  2013-02-12 13:26 ` rguenth at gcc dot gnu.org
@ 2013-02-12 14:44 ` ysrumyan at gmail dot com
  2013-02-12 14:47 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: ysrumyan at gmail dot com @ 2013-02-12 14:44 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175

--- Comment #9 from Yuri Rumyantsev <ysrumyan at gmail dot com> 2013-02-12 14:43:53 UTC ---
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > (In reply to comment #5)
> > > > This pattern is already recognized by simplify_bitwise_binary but only for
> > > > usual int type, i.e. if we change all short types to the ordinary int (or
> > > > unsigned) this simplification takes place (dump after 1st forwprop):
> > > > 
> > > >   <bb 4>:
> > > >   x_8 = x_2(D) >> 1;
> > > >   y_9 = y_4(D) >> 1;
> > > >   _10 = x_8 & 1;
> > > >   _11 = y_9 & 1;
> > > >   _16 = x_8 ^ y_9;
> > > >   z_12 = _16 & 1;
> > > > 
> > > > i.e. the issue is redundant type conversions:
> > > > 
> > > >   <bb 3>:
> > > >   x_7 = x_2(D) >> 1;
> > > >   y_8 = y_4(D) >> 1;
> > > >   _13 = x_7 & 1;
> > > >   _9 = (signed char) _13;
> > > >   _14 = y_8 & 1;
> > > >   _10 = (signed char) _14;
> > > >   _11 = _9 ^ _10;
> > > > 
> > > > I assume that if we delete these redundant conversions the required
> > > > simplification will happen.
> > > 
> > > Ah, well.  The issue is that we transformed (unsigned char)y & 1
> > > to (unsigned char)(y & 1).
> > 
> > Hi Richard,
> > 
> > We'd like to fix this issue since we can get +10.5% speedup on Atom.
> > What is your opinion on how better to fix this issue with 1st pattern in
> > simplify_bitwise_binary?
> > 
> > I have no idea why gcc does such transformation and what gain we can get from
> > it - decrease size of constant or create more opportunities for cse?
> 
> Well, you'd have to track down what is responsible for that transform.
> 
> Generally promoting operations (and automatic vars) to word-mode may
> be beneficial on most targets.  But that should be done late.
> 
> > I can propose the following possible changes:
> > 
> > 1. Introduce a hook for doing such transformation.
> > 2. Introduce a new forwprop pass that does not do such transformation.
> > 3. Do not perform such transformation for small positive constant.
> > 4. Do not performa such transformation if (type-x) c == c.
> > etc.
> 
> First track it down ;)
> 
> > Any help will be appreciated.
> > Yuri.

Richard,

I am familiar with type promotion transformation that e.g. can transform byte
loop counter to word, but this is done by another phases, e.g. lto.

We found out the owner of this change

http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html 

What our next steps?

Thanks ahead.
Yuri.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/56175] Issue with combine phase on x86.
  2013-02-01 15:52 [Bug rtl-optimization/56175] New: Issue with combine phase on x86 ysrumyan at gmail dot com
                   ` (8 preceding siblings ...)
  2013-02-12 14:44 ` ysrumyan at gmail dot com
@ 2013-02-12 14:47 ` jakub at gcc dot gnu.org
  2013-02-14 12:04 ` ysrumyan at gmail dot com
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-02-12 14:47 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |ktietz at gcc dot gnu.org

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> 2013-02-12 14:46:50 UTC ---
For 4.9, Kai is working on type promotion/demotion GIMPLE pass(es), so when
discussing that change this can be also taken into account.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/56175] Issue with combine phase on x86.
  2013-02-01 15:52 [Bug rtl-optimization/56175] New: Issue with combine phase on x86 ysrumyan at gmail dot com
                   ` (9 preceding siblings ...)
  2013-02-12 14:47 ` jakub at gcc dot gnu.org
@ 2013-02-14 12:04 ` ysrumyan at gmail dot com
  2013-02-21 13:44 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: ysrumyan at gmail dot com @ 2013-02-14 12:04 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175

--- Comment #11 from Yuri Rumyantsev <ysrumyan at gmail dot com> 2013-02-14 12:03:37 UTC ---
I did measurements of 3 possible fixes:

1. Comment out 2 patterns related to type sinking.
2. Comment out 1st pattern only.
3. Prohibit type sinking if source type (of def_arg1) is short type.

Measuremnets were done on eembc 2.0 suite at base optset and they showed that
the 3rd fix is more profitable for x86 in 32-bit mode.

Since I hear nothing from the code owner I assume that we will add new target
hook returning true/false for type sinkning in the both patterns that will
anaylze the source type and likely destination type of operand.

Richard, what is your opinion?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/56175] Issue with combine phase on x86.
  2013-02-01 15:52 [Bug rtl-optimization/56175] New: Issue with combine phase on x86 ysrumyan at gmail dot com
                   ` (10 preceding siblings ...)
  2013-02-14 12:04 ` ysrumyan at gmail dot com
@ 2013-02-21 13:44 ` rguenth at gcc dot gnu.org
  2013-02-21 13:48 ` [Bug tree-optimization/56175] [4.7/4.8 Regression] " rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-02-21 13:44 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-21 13:44:09 UTC ---
For the real testcase I see

.L2:
        shrw    %ax
        movl    %edi, %edx
        subb    $1, %dl
        movl    %edx, %edi
        je      .L9
.L4:
        movl    %ecx, %esi
        movl    %eax, %ebx
        andl    $1, %esi
        andl    $1, %ebx
        shrb    %cl
        movl    %esi, %edx
        cmpb    %bl, %dl
        je      .L2

thus

        andl    $1, %esi
        andl    $1, %ebx
        cmpb    %bl, %dl

for

   t = (u8)((x & 1) ^ ((u8)y & 1));
   if (t == 1)

and with disabling the forwprop transformation:

.L2:
        shrw    %ax
        subb    $1, %dl
        je      .L9
.L4:
        movl    %ecx, %ebx
        shrb    %cl
        xorl    %eax, %ebx
        andl    $1, %ebx
        je      .L2

to confirm the issue again.  There is one less used register and the
zero-flag use by the conditional jump.

The following testcase is too simple to be not optimized anyway
at the RTL level but it may serve as a testcase for forwprop.

void bar (void);
unsigned short
foo (unsigned char x, unsigned short y)
{
  unsigned char t = (unsigned char)((x & 1) ^ ((unsigned char)y & 1));
  if (t == 1)
    bar ();
  return y;
}


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/56175] [4.7/4.8 Regression] Issue with combine phase on x86.
  2013-02-01 15:52 [Bug rtl-optimization/56175] New: Issue with combine phase on x86 ysrumyan at gmail dot com
                   ` (11 preceding siblings ...)
  2013-02-21 13:44 ` rguenth at gcc dot gnu.org
@ 2013-02-21 13:48 ` rguenth at gcc dot gnu.org
  2013-02-25 15:32 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-02-21 13:48 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.7.3
            Summary|Issue with combine phase on |[4.7/4.8 Regression] Issue
                   |x86.                        |with combine phase on x86.
           Severity|enhancement                 |normal


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/56175] [4.7/4.8 Regression] Issue with combine phase on x86.
  2013-02-01 15:52 [Bug rtl-optimization/56175] New: Issue with combine phase on x86 ysrumyan at gmail dot com
                   ` (12 preceding siblings ...)
  2013-02-21 13:48 ` [Bug tree-optimization/56175] [4.7/4.8 Regression] " rguenth at gcc dot gnu.org
@ 2013-02-25 15:32 ` rguenth at gcc dot gnu.org
  2013-02-25 16:00 ` [Bug tree-optimization/56175] [4.7 " rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-02-25 15:32 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-25 15:31:37 UTC ---
Author: rguenth
Date: Mon Feb 25 15:31:31 2013
New Revision: 196263

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=196263
Log:
2013-02-25  Richard Biener  <rguenther@suse.de>

    PR tree-optimization/56175
    * tree-ssa-forwprop.c (hoist_conversion_for_bitop_p): New predicate,
    split out from ...
    (simplify_bitwise_binary): ... here.  Also guard the conversion
    of (type) X op CST to (type) (X op ((type-x) CST)) with it.

    * gcc.dg/tree-ssa/forwprop-24.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/forwprop-24.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-ssa-forwprop.c


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/56175] [4.7 Regression] Issue with combine phase on x86.
  2013-02-01 15:52 [Bug rtl-optimization/56175] New: Issue with combine phase on x86 ysrumyan at gmail dot com
                   ` (13 preceding siblings ...)
  2013-02-25 15:32 ` rguenth at gcc dot gnu.org
@ 2013-02-25 16:00 ` rguenth at gcc dot gnu.org
  2013-04-03  9:51 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-02-25 16:00 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |4.8.0
            Summary|[4.7/4.8 Regression] Issue  |[4.7 Regression] Issue with
                   |with combine phase on x86.  |combine phase on x86.

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-25 15:59:48 UTC ---
Assembly for the "real" test-case looks good for me now.  Thus fixed on trunk
sofar.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/56175] [4.7 Regression] Issue with combine phase on x86.
  2013-02-01 15:52 [Bug rtl-optimization/56175] New: Issue with combine phase on x86 ysrumyan at gmail dot com
                   ` (14 preceding siblings ...)
  2013-02-25 16:00 ` [Bug tree-optimization/56175] [4.7 " rguenth at gcc dot gnu.org
@ 2013-04-03  9:51 ` rguenth at gcc dot gnu.org
  2013-04-11  8:00 ` rguenth at gcc dot gnu.org
  2014-06-12 13:23 ` rguenth at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-04-03  9:51 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/56175] [4.7 Regression] Issue with combine phase on x86.
  2013-02-01 15:52 [Bug rtl-optimization/56175] New: Issue with combine phase on x86 ysrumyan at gmail dot com
                   ` (15 preceding siblings ...)
  2013-04-03  9:51 ` rguenth at gcc dot gnu.org
@ 2013-04-11  8:00 ` rguenth at gcc dot gnu.org
  2014-06-12 13:23 ` rguenth at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-04-11  8:00 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.7.3                       |4.7.4

--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> 2013-04-11 07:59:39 UTC ---
GCC 4.7.3 is being released, adjusting target milestone.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/56175] [4.7 Regression] Issue with combine phase on x86.
  2013-02-01 15:52 [Bug rtl-optimization/56175] New: Issue with combine phase on x86 ysrumyan at gmail dot com
                   ` (16 preceding siblings ...)
  2013-04-11  8:00 ` rguenth at gcc dot gnu.org
@ 2014-06-12 13:23 ` rguenth at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-06-12 13:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED
   Target Milestone|4.7.4                       |4.8.0
      Known to fail|                            |4.7.4

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed for 4.8.0.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-06-12 13:23 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-01 15:52 [Bug rtl-optimization/56175] New: Issue with combine phase on x86 ysrumyan at gmail dot com
2013-02-01 15:59 ` [Bug rtl-optimization/56175] " ysrumyan at gmail dot com
2013-02-02 12:02 ` ubizjak at gmail dot com
2013-02-04  8:55 ` ysrumyan at gmail dot com
2013-02-04 10:11 ` [Bug tree-optimization/56175] " rguenth at gcc dot gnu.org
2013-02-11 13:43 ` ysrumyan at gmail dot com
2013-02-11 14:38 ` rguenth at gcc dot gnu.org
2013-02-12 13:05 ` ysrumyan at gmail dot com
2013-02-12 13:26 ` rguenth at gcc dot gnu.org
2013-02-12 14:44 ` ysrumyan at gmail dot com
2013-02-12 14:47 ` jakub at gcc dot gnu.org
2013-02-14 12:04 ` ysrumyan at gmail dot com
2013-02-21 13:44 ` rguenth at gcc dot gnu.org
2013-02-21 13:48 ` [Bug tree-optimization/56175] [4.7/4.8 Regression] " rguenth at gcc dot gnu.org
2013-02-25 15:32 ` rguenth at gcc dot gnu.org
2013-02-25 16:00 ` [Bug tree-optimization/56175] [4.7 " rguenth at gcc dot gnu.org
2013-04-03  9:51 ` rguenth at gcc dot gnu.org
2013-04-11  8:00 ` rguenth at gcc dot gnu.org
2014-06-12 13:23 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).