public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/29083]  New: useless clrlwi instruction produced for 16-bit bitfield
@ 2006-09-14 11:22 bonzini at gnu dot org
  2006-09-14 12:07 ` [Bug target/29083] " bonzini at gnu dot org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: bonzini at gnu dot org @ 2006-09-14 11:22 UTC (permalink / raw)
  To: gcc-bugs

This code produces this output:

struct s {
  unsigned x : 16;
};

int f(struct s *node)
{
  int a = 0;
  do
    a++;
  while ((--node)->x == a);
  return a;
}

        li r2,0
L2:
        lhzu r0,-4(r3)
        addi r2,r2,1
        rlwinm r0,r0,0,0xffff
        cmpw cr7,r0,r2
        beq+ cr7,L2
        mr r3,r2
        blr


If I change unsigned x : 16 to unsigned short x, instead I get the better code:

_f:
        mr r2,r3
        li r3,0
L2:
        lhzu r0,-2(r2)
        addi r3,r3,1
        cmpw cr7,r0,r3
        beq+ cr7,L2
        blr

(I don't care about the register allocation, the important point is that the
rlwinm instruction, which clears the left 16 bits as is clear from the
simplified mnemonic "clrlwi r0,r0,16", is useless).


-- 
           Summary: useless clrlwi instruction produced for 16-bit bitfield
           Product: gcc
           Version: 4.2.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: bonzini at gnu dot org
GCC target triplet: powerpc-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29083


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/29083] useless clrlwi instruction produced for 16-bit bitfield
  2006-09-14 11:22 [Bug target/29083] New: useless clrlwi instruction produced for 16-bit bitfield bonzini at gnu dot org
@ 2006-09-14 12:07 ` bonzini at gnu dot org
  2006-09-14 12:52 ` zadeck at naturalbridge dot com
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: bonzini at gnu dot org @ 2006-09-14 12:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from bonzini at gnu dot org  2006-09-14 12:07 -------
The sole difference in the IR is

;; if ((int) node->x == a) goto <L0>; else (void) 0;
(insn 19 18 20 (set (reg:HI 125)
        (mem/s/j:HI (reg/v/f:SI 123 [ node ])
                  [2 <variable>.x+0 S2 A32])) -1 (nil)

;; if ((int) MEM[base: (short unsigned int *) node] == a) goto <L0>; else
(void) 0;
(insn 20 19 21 (set (reg:HI 125)
        (mem/s:HI (reg/v/f:SI 123 [ node ])
                  [3 <variable>.x+0 S2 A8])) -1 (nil)
     (nil))

(COMPONENT_REF vs. TARGET_MEM_REF, the first produces A32 and the second A8)


----


It's actually flow's fault, because it fails to recognize a PRE_MODIFY address,
and things go downhill from there: life1 dump is

   16 r121:SI=r121:SI+0x1             |    17 r122:SI=r122:SI+0x1
   18 r123:SI=r123:SI-0x4             |    20 r126:HI=[--r124:SI]
   19 r125:HI=[r123:SI]               |       REG_INC: r124:SI
   20 r124:SI=zero_extend(r125:HI)    |    21 r125:SI=zero_extend(r126:HI)
      REG_DEAD: r125:HI               |       REG_DEAD: r126:HI
   21 r126:CC=cmp(r124:SI,r121:SI)    |    22 r127:CC=cmp(r125:SI,r122:SI)
      REG_DEAD: r124:SI               |       REG_DEAD: r125:SI
   22 pc={(r126:CC==0x0)?L13:pc}      |    23 pc={(r127:CC==0x0)?L14:pc}
      REG_DEAD: r126:CC               |       REG_DEAD: r127:CC
      REG_BR_PROB: 0x22c4                     REG_BR_PROB: 0x22c4
   24 NOTE_INSN_BASIC_BLOCK           |    25 NOTE_INSN_BASIC_BLOCK
   28 NOTE_INSN_FUNCTION_END          |    29 NOTE_INSN_FUNCTION_END
   31 r3:SI=r121:SI                   |    32 r3:SI=r122:SI
      REG_DEAD: r121:SI               |       REG_DEAD: r122:SI
   37 use r3:SI                       |    38 use r3:SI

while combine dump is

   14 NOTE_INSN_BASIC_BLOCK           |    15 NOTE_INSN_BASIC_BLOCK
   16 r121:SI=r121:SI+0x1             |    17 r122:SI=r122:SI+0x1
   18 NOTE_INSN_DELETED               |    20 NOTE_INSN_DELETED
   19 {r125:HI=[r123:SI-0x4];r123:SI= |    21 r125:SI=zero_extend([--r124:SI]
   20 r124:SI=zero_extend(r125:HI)    |       REG_INC: r124:SI
      REG_DEAD: r125:HI               |    22 r127:CC=cmp(r125:SI,r122:SI)
   21 r126:CC=cmp(r124:SI,r121:SI)    |       REG_DEAD: r125:SI
      REG_DEAD: r124:SI

where it has synthesized a movsi_movhi_update1, but then failed to implement
the merged.

Could this be fixed on dataflow-branch?


-- 

bonzini at gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |zadeck at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29083


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/29083] useless clrlwi instruction produced for 16-bit bitfield
  2006-09-14 11:22 [Bug target/29083] New: useless clrlwi instruction produced for 16-bit bitfield bonzini at gnu dot org
  2006-09-14 12:07 ` [Bug target/29083] " bonzini at gnu dot org
@ 2006-09-14 12:52 ` zadeck at naturalbridge dot com
  2006-09-14 15:23 ` pinskia at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: zadeck at naturalbridge dot com @ 2006-09-14 12:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from zadeck at naturalbridge dot com  2006-09-14 12:51 -------
Subject: Re:  useless clrlwi instruction produced for 16-bit
 bitfield

bonzini at gnu dot org wrote:
> ------- Comment #1 from bonzini at gnu dot org  2006-09-14 12:07 -------
> The sole difference in the IR is
>
> ;; if ((int) node->x == a) goto <L0>; else (void) 0;
> (insn 19 18 20 (set (reg:HI 125)
>         (mem/s/j:HI (reg/v/f:SI 123 [ node ])
>                   [2 <variable>.x+0 S2 A32])) -1 (nil)
>
> ;; if ((int) MEM[base: (short unsigned int *) node] == a) goto <L0>; else
> (void) 0;
> (insn 20 19 21 (set (reg:HI 125)
>         (mem/s:HI (reg/v/f:SI 123 [ node ])
>                   [3 <variable>.x+0 S2 A8])) -1 (nil)
>      (nil))
>
> (COMPONENT_REF vs. TARGET_MEM_REF, the first produces A32 and the second A8)
>
>
> ----
>
>
> It's actually flow's fault, because it fails to recognize a PRE_MODIFY address,
> and things go downhill from there: life1 dump is
>
>    16 r121:SI=r121:SI+0x1             |    17 r122:SI=r122:SI+0x1
>    18 r123:SI=r123:SI-0x4             |    20 r126:HI=[--r124:SI]
>    19 r125:HI=[r123:SI]               |       REG_INC: r124:SI
>    20 r124:SI=zero_extend(r125:HI)    |    21 r125:SI=zero_extend(r126:HI)
>       REG_DEAD: r125:HI               |       REG_DEAD: r126:HI
>    21 r126:CC=cmp(r124:SI,r121:SI)    |    22 r127:CC=cmp(r125:SI,r122:SI)
>       REG_DEAD: r124:SI               |       REG_DEAD: r125:SI
>    22 pc={(r126:CC==0x0)?L13:pc}      |    23 pc={(r127:CC==0x0)?L14:pc}
>       REG_DEAD: r126:CC               |       REG_DEAD: r127:CC
>       REG_BR_PROB: 0x22c4                     REG_BR_PROB: 0x22c4
>    24 NOTE_INSN_BASIC_BLOCK           |    25 NOTE_INSN_BASIC_BLOCK
>    28 NOTE_INSN_FUNCTION_END          |    29 NOTE_INSN_FUNCTION_END
>    31 r3:SI=r121:SI                   |    32 r3:SI=r122:SI
>       REG_DEAD: r121:SI               |       REG_DEAD: r122:SI
>    37 use r3:SI                       |    38 use r3:SI
>
> while combine dump is
>
>    14 NOTE_INSN_BASIC_BLOCK           |    15 NOTE_INSN_BASIC_BLOCK
>    16 r121:SI=r121:SI+0x1             |    17 r122:SI=r122:SI+0x1
>    18 NOTE_INSN_DELETED               |    20 NOTE_INSN_DELETED
>    19 {r125:HI=[r123:SI-0x4];r123:SI= |    21 r125:SI=zero_extend([--r124:SI]
>    20 r124:SI=zero_extend(r125:HI)    |       REG_INC: r124:SI
>       REG_DEAD: r125:HI               |    22 r127:CC=cmp(r125:SI,r122:SI)
>    21 r126:CC=cmp(r124:SI,r121:SI)    |       REG_DEAD: r125:SI
>       REG_DEAD: r124:SI
>
> where it has synthesized a movsi_movhi_update1, but then failed to implement
> the merged.
>
> Could this be fixed on dataflow-branch?
>
>
>   
The current flow does not recognize any pre modify cases. What flow does
do is recognize pre_increment, which is a subset of pre_modify that has
the restriction that the width of the load be equal to the amount of the
increment.  By changing the type of x, you made the example fit into the
restrictions of the current code.

The post side of things in flow is a little more general than the pre
side because this was hacked for the ia-64.

My code on the dataflow branch knows what the machine is capable of
doing and would get this case, since the ppc is capable of much more
general updates. 

Kenny


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29083


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/29083] useless clrlwi instruction produced for 16-bit bitfield
  2006-09-14 11:22 [Bug target/29083] New: useless clrlwi instruction produced for 16-bit bitfield bonzini at gnu dot org
  2006-09-14 12:07 ` [Bug target/29083] " bonzini at gnu dot org
  2006-09-14 12:52 ` zadeck at naturalbridge dot com
@ 2006-09-14 15:23 ` pinskia at gcc dot gnu dot org
  2006-10-22 23:56 ` [Bug rtl-optimization/29083] " pinskia at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-09-14 15:23 UTC (permalink / raw)
  To: gcc-bugs



-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pinskia at gcc dot gnu dot
                   |                            |org
           Severity|normal                      |enhancement


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29083


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/29083] useless clrlwi instruction produced for 16-bit bitfield
  2006-09-14 11:22 [Bug target/29083] New: useless clrlwi instruction produced for 16-bit bitfield bonzini at gnu dot org
                   ` (2 preceding siblings ...)
  2006-09-14 15:23 ` pinskia at gcc dot gnu dot org
@ 2006-10-22 23:56 ` pinskia at gcc dot gnu dot org
  2006-10-23  2:16 ` pinskia at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-10-22 23:56 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from pinskia at gcc dot gnu dot org  2006-10-22 23:55 -------
Confirmed, what happens is that combine combines:
(insn 18 16 19 3 (set (reg/v/f:SI 123 [ node ])
        (plus:SI (reg/v/f:SI 123 [ node ])
            (const_int -4 [0xfffffffffffffffc]))) 79 {*addsi3_internal1} (nil)
    (nil))

(insn 19 18 20 3 (set (reg:HI 125 [ <variable>.x ])
        (mem/s/j:HI (reg/v/f:SI 123 [ node ]) [2 <variable>.x+0 S2 A32])) 313
{*movhi_internal} (insn_list:REG_DEP_TRUE 18 (nil))
    (nil))


Into:
(note 18 16 19 3 NOTE_INSN_DELETED)

(insn 19 18 20 3 (parallel [
            (set (reg:HI 125 [ <variable>.x ])
                (mem/s/j:HI (plus:SI (reg/v/f:SI 123 [ node ])
                        (const_int -4 [0xfffffffffffffffc])) [2 <variable>.x+0
S2 A32]))
            (set (reg/v/f:SI 123 [ node ])
                (plus:SI (reg/v/f:SI 123 [ node ])
                    (const_int -4 [0xfffffffffffffffc])))
        ]) 361 {*movhi_update1} (nil)
    (nil))

But it forgets about:
(insn 20 19 21 3 (set (reg:SI 124 [ <variable>.x ])
        (zero_extend:SI (reg:HI 125 [ <variable>.x ]))) 42 {*rs6000.md:772}
(insn_list:REG_DEP_TRUE 19 (nil))
    (expr_list:REG_DEAD (reg:HI 125 [ <variable>.x ])
        (nil)))

Which is where the extra clrlwi comes from.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
          Component|target                      |rtl-optimization
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2006-10-22 23:55:55
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29083


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/29083] useless clrlwi instruction produced for 16-bit bitfield
  2006-09-14 11:22 [Bug target/29083] New: useless clrlwi instruction produced for 16-bit bitfield bonzini at gnu dot org
                   ` (3 preceding siblings ...)
  2006-10-22 23:56 ` [Bug rtl-optimization/29083] " pinskia at gcc dot gnu dot org
@ 2006-10-23  2:16 ` pinskia at gcc dot gnu dot org
  2006-10-23  2:28 ` pinskia at gcc dot gnu dot org
  2007-07-02 21:32 ` pinskia at gcc dot gnu dot org
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-10-23  2:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from pinskia at gcc dot gnu dot org  2006-10-23 02:16 -------
Yes this is fixed on the dataflow branch.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29083


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/29083] useless clrlwi instruction produced for 16-bit bitfield
  2006-09-14 11:22 [Bug target/29083] New: useless clrlwi instruction produced for 16-bit bitfield bonzini at gnu dot org
                   ` (4 preceding siblings ...)
  2006-10-23  2:16 ` pinskia at gcc dot gnu dot org
@ 2006-10-23  2:28 ` pinskia at gcc dot gnu dot org
  2007-07-02 21:32 ` pinskia at gcc dot gnu dot org
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-10-23  2:28 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from pinskia at gcc dot gnu dot org  2006-10-23 02:28 -------
This is also fixed at -O2 -fsee so SEE also fixes the problem.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29083


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/29083] useless clrlwi instruction produced for 16-bit bitfield
  2006-09-14 11:22 [Bug target/29083] New: useless clrlwi instruction produced for 16-bit bitfield bonzini at gnu dot org
                   ` (5 preceding siblings ...)
  2006-10-23  2:28 ` pinskia at gcc dot gnu dot org
@ 2007-07-02 21:32 ` pinskia at gcc dot gnu dot org
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-07-02 21:32 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from pinskia at gcc dot gnu dot org  2007-07-02 21:32 -------
Fixed for 4.3.0:
64bit:
.L2:
        addi 0,9,1
        lhzu 9,-4(3)
        cmpw 7,9,0
        beq 7,.L2

32bit:
.L2:
        addi 0,9,1
        lhzu 9,-4(3)
        cmpw 7,9,0
        beq 7,.L2


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED
   Target Milestone|---                         |4.3.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29083


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-07-02 21:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-09-14 11:22 [Bug target/29083] New: useless clrlwi instruction produced for 16-bit bitfield bonzini at gnu dot org
2006-09-14 12:07 ` [Bug target/29083] " bonzini at gnu dot org
2006-09-14 12:52 ` zadeck at naturalbridge dot com
2006-09-14 15:23 ` pinskia at gcc dot gnu dot org
2006-10-22 23:56 ` [Bug rtl-optimization/29083] " pinskia at gcc dot gnu dot org
2006-10-23  2:16 ` pinskia at gcc dot gnu dot org
2006-10-23  2:28 ` pinskia at gcc dot gnu dot org
2007-07-02 21:32 ` pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).