public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Problem of inline assembly for branch instruction
@ 2011-09-07 13:17 Naveen H. S
  2011-09-07 16:24 ` Ian Lance Taylor
  0 siblings, 1 reply; 7+ messages in thread
From: Naveen H. S @ 2011-09-07 13:17 UTC (permalink / raw)
  To: gcc-help

Hi,

We are implementing a special branch instruction for offsets between
0 and 32 bytes in CR16 port.
=======================================================================
    if (get_attr_length (insn) == 2)
      return \"beq0<tIsa>\t%0,%l1\";
    else
      return \"cmp<tIsa>\t$0, %0\;beq\t%l1\";
  }"
  [(set (attr "length")
       (if_then_else
           (and (ge (minus (match_dup 1) (pc)) (const_int 2))
                (le (minus (match_dup 1) (pc)) (const_int 32)))
           (const_int 2)
           (const_int 6)))]
=======================================================================

The branch instruction has been implemented and working as expected. 
However, it resulted in error while testing an application with inline
assembly. In the application, the linker calculated length of offset
from current location to the branch as 24. However, due to some inline
assembly in between PC and branch location, the offset is more than 32.
Hence, the linker generates error while linking application.

However, we could not find any solution to handling inline assembly in
such scenarios.
Please let us know if the compiler could be informed not to generate 
special branch instruction for code with inline assembly.
Also let us know if the inline assembly length attribute could be
passed to linker so that it can calculate the exact offset.

Thanks & Regards,
Naveen


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem of inline assembly for branch instruction
  2011-09-07 13:17 Problem of inline assembly for branch instruction Naveen H. S
@ 2011-09-07 16:24 ` Ian Lance Taylor
  2011-09-08 11:26   ` Naveen H. S
  2011-10-04 14:09   ` Problem with peephole optimizing the register Naveen H. S
  0 siblings, 2 replies; 7+ messages in thread
From: Ian Lance Taylor @ 2011-09-07 16:24 UTC (permalink / raw)
  To: Naveen H. S; +Cc: gcc-help

"Naveen H. S" <Naveen.S@kpitcummins.com> writes:

> The branch instruction has been implemented and working as expected. 
> However, it resulted in error while testing an application with inline
> assembly. In the application, the linker calculated length of offset
> from current location to the branch as 24. However, due to some inline
> assembly in between PC and branch location, the offset is more than 32.
> Hence, the linker generates error while linking application.

See define_asm_attributes for a way to estimate the length of an inline
asm instruction.  In the worst case, you can use this to make every
inline asm large enough to force the use of a long branch.

Ian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Problem of inline assembly for branch instruction
  2011-09-07 16:24 ` Ian Lance Taylor
@ 2011-09-08 11:26   ` Naveen H. S
  2011-10-04 14:09   ` Problem with peephole optimizing the register Naveen H. S
  1 sibling, 0 replies; 7+ messages in thread
From: Naveen H. S @ 2011-09-08 11:26 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc-help

Hi,

>> See define_asm_attributes for a way to estimate the length of an inline
>> asm instruction.  

Modifying the define_asm_attributes to maximum possible length of a 
single machine instruction solved the issue. 
Thanks very much for the suggestion.

Thanks & Regards,
Naveen

-----Original Message-----
From: Ian Lance Taylor [mailto:iant@google.com] 
Sent: Wednesday, September 07, 2011 9:54 PM
To: Naveen H. S
Cc: gcc-help@gcc.gnu.org
Subject: Re: Problem of inline assembly for branch instruction

"Naveen H. S" <Naveen.S@kpitcummins.com> writes:

> The branch instruction has been implemented and working as expected. 
> However, it resulted in error while testing an application with inline
> assembly. In the application, the linker calculated length of offset
> from current location to the branch as 24. However, due to some inline
> assembly in between PC and branch location, the offset is more than 32.
> Hence, the linker generates error while linking application.

See define_asm_attributes for a way to estimate the length of an inline
asm instruction.  In the worst case, you can use this to make every
inline asm large enough to force the use of a long branch.

Ian


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Problem with peephole optimizing the register
  2011-09-07 16:24 ` Ian Lance Taylor
  2011-09-08 11:26   ` Naveen H. S
@ 2011-10-04 14:09   ` Naveen H. S
  2011-10-04 16:30     ` Ian Lance Taylor
  1 sibling, 1 reply; 7+ messages in thread
From: Naveen H. S @ 2011-10-04 14:09 UTC (permalink / raw)
  To: gcc-help

Hi,

We have implemented peephole for generation of tbit instruction
in CR16 port.

The TBIT instruction copies the bit located in register or memory
location src at the bit position specified by offset, to PSR.F flag.
=====================================================================
tbitb $3, 8(r1) - Copies bit in pos 3 to mem loc 8 (R1) to PSR.F flag
=====================================================================

One of the instances of peephole generated is as follows:-
====================
Earlier code
     loadw   _Fdi@l,r7 			
     andw    $16384,r7
     bne0w   r7,.L9  
     ------------------->
     ------------------->
     ------------------->
     movb    r7, r2  
     bal (ra), _ClrFlags@c 


Optimized code     
     tbitw   $14,_Fdi@l
     bfs     .L9
     ------------------->
     ------------------->
     ------------------->	
     movb    r7, r2   <======= Problem
     bal (ra), _ClrFlags@c 
====================

It can be noted that the value at Fdi(label ref) is copied into r7 
register. The r7 register is modified and the branch depends on r7.
However, with tbit peepholes; r7 register is optimized away.

In an application, r7 register is being used as source in the
sequence after tbit instruction. The above snippet shows that
the r7 value is moved into r2. However, as the r2 value is optimized
away; wrong value is moved into r2 and hence application crashes.
The peep2_reg_dead_p, find_reg_note and other functions were tried 
out without much use.

Please let us know if there are any functions that would let compiler 
know about the register being used in later sequence and hence avoid
the peephole in such scenarios.

Thanks & Regards,
Naveen


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem with peephole optimizing the register
  2011-10-04 14:09   ` Problem with peephole optimizing the register Naveen H. S
@ 2011-10-04 16:30     ` Ian Lance Taylor
  2011-10-05  6:22       ` Naveen H. S
  0 siblings, 1 reply; 7+ messages in thread
From: Ian Lance Taylor @ 2011-10-04 16:30 UTC (permalink / raw)
  To: Naveen H. S; +Cc: gcc-help

"Naveen H. S" <Naveen.S@kpitcummins.com> writes:

> It can be noted that the value at Fdi(label ref) is copied into r7 
> register. The r7 register is modified and the branch depends on r7.
> However, with tbit peepholes; r7 register is optimized away.
>
> In an application, r7 register is being used as source in the
> sequence after tbit instruction. The above snippet shows that
> the r7 value is moved into r2. However, as the r2 value is optimized
> away; wrong value is moved into r2 and hence application crashes.
> The peep2_reg_dead_p, find_reg_note and other functions were tried 
> out without much use.

Show us the RTL and the define_peephole2.

From what you have described so far peep2_reg_dead_p ought to work.

Also, which version of gcc?

Ian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Problem with peephole optimizing the register
  2011-10-04 16:30     ` Ian Lance Taylor
@ 2011-10-05  6:22       ` Naveen H. S
  2011-10-05 13:48         ` Ian Lance Taylor
  0 siblings, 1 reply; 7+ messages in thread
From: Naveen H. S @ 2011-10-05  6:22 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc-help

[-- Attachment #1: Type: text/plain, Size: 601 bytes --]

Hi,

>> Show us the RTL and the define_peephole2.

Thanks for looking at the issue.
Please find attached the file tbit.txt which has complete information
of RTL pattern generated and defined peephole patterns.

>> From what you have described so far peep2_reg_dead_p ought to work.

To use far peep2_reg_dead_p; peephole2 patterns were defined.
far peep2_reg_dead_p masked the generation of tbit instruction completely.
Hence, tbit instructions were not at all generated in any instance.

>> Also, which version of gcc?

We are using gcc-4.5.1 version.

Thanks & Regards,
Naveen


[-- Attachment #2: tbit.txt --]
[-- Type: text/plain, Size: 8579 bytes --]

RTL Pattern without the peephole optimization
============================================================================================================================================================
#(insn 13 36 14 /home/Src/Components/FWU/Manager/FwuManager.c:1245 (set (reg:HI 7 r7 [orig:26 Fdi[0].Flags ] [26])
#        (mem/s:HI (symbol_ref:SI ("Fdi") [flags 0x2] <var_decl 0xb7a5b3f4 Fdi>) [3 Fdi[0].Flags+0 S2 A16])) 70 {*movhi_short} (expr_list:REG_EQUIV (mem/s:HI (symbol_ref:SI ("Fdi") [flags 0x2] <var_decl 0xb7a5b3f4 Fdi>) [3 Fdi[0].Flags+0 S2 A16])
#        (nil)))
        loadw   _Fdi@l, r7      # Fdi[0].Flags, Fdi[0].Flags    # 13    *movhi_short/5  [length = 6]
#(insn 14 13 15 /home/Src/Components/FWU/Manager/FwuManager.c:1245 (set (reg:HI 7 r7 [25])
#        (and:HI (reg:HI 7 r7 [orig:26 Fdi[0].Flags ] [26])
#            (const_int 16384 [0x4000]))) 27 {andhi3} (nil))
        andw    $16384, r7      #, tmp25        # 14    andhi3/3        [length = 4]
#(jump_insn 15 14 16 /home/Src/Components/FWU/Manager/FwuManager.c:1245 (parallel [
#            (set (pc)
#                (if_then_else (ne (reg:HI 7 r7 [25])
#                        (const_int 0 [0x0]))
#                    (label_ref:SI 27)
#                    (pc)))
#            (clobber (cc0))
#        ]) 74 {branchnehi} (expr_list:REG_BR_PROB (const_int 7100 [0x1bbc])
#        (nil))
# -> 27)
        bne0w   r7,.L9  # tmp25,        # 15    branchnehi      [length = 2]
============================================================================================================================================================	

RTL Pattern after the peephole optimization
===============================================================================================================		
#(insn 205 258 27 /home/Src/Components/FWU/Manager/FwuManager.c:1803 (parallel [
#            (reg:HI 2 r2 [54])
#            (reg/v:HI 0 r0 [orig:23 f ] [23])
#            (const_int 8 [0x8])
#            (ne (reg:HI 2 r2 [54])
#                (const_int 0 [0x0]))
#            (code_label 163 253 164 70 "" [5 uses])
#        ]) 147 {cr16_call_value+13} (nil))
        tbit    $3,r0   #, f    # 205   cr16_call_value+13      [length = 2]
        bfs     .L70    #,
RTL Pattern		
#(insn 17 16 18 /home/Src/Components/FWU/Manager/FwuManager.c:1249 (set (reg:HI 3 r3)
#        (const_int 16383 [0x3fff])) 70 {*movhi_short} (nil))
        movw    $16383, r3      #,      # 17    *movhi_short/3  [length = 4]
#(insn 18 17 19 /home/Src/Components/FWU/Manager/FwuManager.c:1249 (set (reg:QI 2 r2)
#        (reg:QI 7 r7 [25])) 69 {*movqi_short} (expr_list:REG_EQUAL (const_int 0 [0x0])
#        (nil)))
        movb    r7, r2  # tmp25,        # 18    *movqi_short/1  [length = 2]		
================================================================================================================		
		
Peephole pattern used to generate the tbit instruction and hence optmized code
====================================================================================		
(define_peephole
  [(set (match_operand:CR16IM 0 "register_operand" "")
        (match_operand:CR16IM 1 "memory_operand" ""))
   (set (match_dup 0)
        (and:CR16IM (match_dup 0)
                    (match_operand 2 "const_int_operand" "v")))
   (parallel [(set (pc)
              (if_then_else (match_operator 3 "ordered_comparison_operator"
                            [(match_dup 0)
                             (const_int 0)])
              (label_ref (match_operand 4 "" ""))
              (pc)))            
   (clobber (cc0))])]
  "TARGET_BIT_OPS && satisfies_constraint_v (operands[2])"
  "tbit%t0\t$%s2,%1\;bf%o3\t%l4")
  
Extra Info
Constarint v allows (1, 2 ......... -> 8192, 16384)
Print operand 't' prints b, w, w on QI, HI and SI mode respectively
Print operand 's' prints bit position
Print operand 'o' prints c on EQ and s on NE operations
====================================================================================  

Peephole2 pattern used to generate tbit instruction using far peep2_reg_dead_p
====================================================================================  
(define_peephole2
  [(set (match_operand:CR16IM 0 "register_operand" "")
        (match_operand:CR16IM 1 "memory_operand" ""))
   (set (match_dup 0)
        (and:CR16IM (match_dup 0)
                    (match_operand 2 "const_int_operand" "v")))
   (parallel [(set (pc)
              (if_then_else (match_operator 3 "ordered_comparison_operator"
                            [(match_dup 0)
                             (const_int 0)])
              (label_ref (match_operand 4 "" ""))
              (pc)))            
   (clobber (cc0))])]
  "TARGET_BIT_OPS && satisfies_constraint_v (operands[2])
   && far peep2_reg_dead_p (2, operands[0])"
  [(set (match_dup 0)
        (unspec [(match_dup 1)(match_dup 2)] UNSPEC_TBIT))
   (set (match_dup 0)           
        (unspec [(match_operator 3 ""
                            [(match_dup 1)
                             (const_int 0)])
              (label_ref (match_dup 4))
              (match_dup 0)] UNSPEC_BRANCH))]
  ""
)
====================================================================================  

UNSPEC instructions used in peephole2 patterns
====================================================================================  
(define_insn "tbit"
  [(set (match_operand 0 "" "")
        (unspec [(match_operand 1 "" "m,i")(match_operand 2 "" "")] UNSPEC_TBIT))]
  ""                         
  "@          
   tbit%t0\t$%s2,%1
   tbit%t0\t$%s3,%c1"
  [(set_attr "length" "2")]
) 
        
(define_insn "branch_fs"
  [(set (match_operand 0 "" "")
        (unspec [(match_operator 1 ""
                            [(match_operand 2 "" "")
                             (const_int 0)])
              (label_ref (match_operand 3 "" ""))
              (match_operand 4 "" "")] UNSPEC_BRANCH))]
 ""
 "bf%o1\t%l3"
  [(set_attr "length" "2")]
)
====================================================================================  

The generated assembly in the erraneous function
===================================================================
Generated assembler:
14019 _DeviceDataInit:
14022 0000 1E01 push ra
14023 0002 0701 push $1, r7
14026 0004 935A movw $-1, r3 #,
14027 0006 0258 movb $0, r2 #,
14028 0008 00C00000 bal (ra), _ClrFlags@c #
14031 000c 1100E0F0 tbitw $14,_Fdi@l # Fdi[0].Flags,, Fdi[0].Flags
14031 0000
14032 0012 8010 bfs .L9 #,
14034 0014 B35AFF3F movw $16383, r3 #,
14035 0018 7259 movb r7, r2 # tmp25,		--------> Problem here r7 used without initialization
14036 001a 00C00000 bal (ra), _ClrFlags@c #
14038 001e 135A movw $1, r3 #,
14039 0020 7259 movb r7, r2 # tmp25,		--------> Problem here r7 used without initialization
14040 0022 00C00000 bal (ra), _SetFlags@c #
14045 0026 0702 pop $1, r7
14046 0028 1E03 popret ra  

RTL of the above problem
================================================================================================================
#(insn 13 36 16 /home/Src/Components/FWU/Manager/FwuManager.c:1245 (parallel [
#            (reg:HI 7 r7 [orig:26 Fdi[0].Flags ] [26])
#            (mem/s:HI (symbol_ref:SI ("Fdi") [flags 0x2] <var_decl 0xb7a873f4 Fdi>) [3 Fdi[0].Flags+0 S2 A16])
#            (const_int 16384 [0x4000])
#            (ne (reg:HI 7 r7 [25])
#                (const_int 0 [0x0]))
#            (code_label 27 38 28 9 "" [1 uses])
#        ]) 136 {cr16_call_value+2} (expr_list:REG_EQUIV (mem/s:HI (symbol_ref:SI ("Fdi") [flags 0x2] <var_decl 0xb7a873f4 Fdi>) [3 Fdi[0].Flags+0 S2 A16])
#        (nil)))
        tbitw   $14,_Fdi@l      # Fdi[0].Flags,, Fdi[0].Flags   # 13    cr16_call_value+2       [length = 6]
        bfs     .L9     #,
        .loc 1 1249 0
#(insn 17 16 18 /home/Src/Components/FWU/Manager/FwuManager.c:1249 (set (reg:HI 3 r3)
#        (const_int 16383 [0x3fff])) 70 {*movhi_short} (nil))
        movw    $16383, r3      #,      # 17    *movhi_short/3  [length = 4]
#(insn 18 17 19 /home/Src/Components/FWU/Manager/FwuManager.c:1249 (set (reg:QI 2 r2)
#        (reg:QI 7 r7 [25])) 69 {*movqi_short} (expr_list:REG_EQUAL (const_int 0 [0x0])
#        (nil)))
        movb    r7, r2  # tmp25,        # 18    *movqi_short/1  [length = 2]		--------> Problem here r7 used without initialization
================================================================================================================

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem with peephole optimizing the register
  2011-10-05  6:22       ` Naveen H. S
@ 2011-10-05 13:48         ` Ian Lance Taylor
  0 siblings, 0 replies; 7+ messages in thread
From: Ian Lance Taylor @ 2011-10-05 13:48 UTC (permalink / raw)
  To: Naveen H. S; +Cc: gcc-help

"Naveen H. S" <Naveen.S@kpitcummins.com> writes:

> Peephole pattern used to generate the tbit instruction and hence optmized code
> ====================================================================================		
> (define_peephole
>   [(set (match_operand:CR16IM 0 "register_operand" "")
>         (match_operand:CR16IM 1 "memory_operand" ""))
>    (set (match_dup 0)
>         (and:CR16IM (match_dup 0)
>                     (match_operand 2 "const_int_operand" "v")))
>    (parallel [(set (pc)
>               (if_then_else (match_operator 3 "ordered_comparison_operator"
>                             [(match_dup 0)
>                              (const_int 0)])
>               (label_ref (match_operand 4 "" ""))
>               (pc)))            
>    (clobber (cc0))])]
>   "TARGET_BIT_OPS && satisfies_constraint_v (operands[2])"
>   "tbit%t0\t$%s2,%1\;bf%o3\t%l4")

This version is not checking whether operand[0] is dead.  These days we
generally recommend define_peephole2.  If you use define_peephole as in
this example, you need to use dead_or_set_p to check whether the
register is dead.


> Peephole2 pattern used to generate tbit instruction using far peep2_reg_dead_p
> ====================================================================================  
> (define_peephole2
>   [(set (match_operand:CR16IM 0 "register_operand" "")
>         (match_operand:CR16IM 1 "memory_operand" ""))
>    (set (match_dup 0)
>         (and:CR16IM (match_dup 0)
>                     (match_operand 2 "const_int_operand" "v")))
>    (parallel [(set (pc)
>               (if_then_else (match_operator 3 "ordered_comparison_operator"
>                             [(match_dup 0)
>                              (const_int 0)])
>               (label_ref (match_operand 4 "" ""))
>               (pc)))            
>    (clobber (cc0))])]
>   "TARGET_BIT_OPS && satisfies_constraint_v (operands[2])
>    && far peep2_reg_dead_p (2, operands[0])"
>   [(set (match_dup 0)
>         (unspec [(match_dup 1)(match_dup 2)] UNSPEC_TBIT))
>    (set (match_dup 0)           
>         (unspec [(match_operator 3 ""
>                             [(match_dup 1)
>                              (const_int 0)])
>               (label_ref (match_dup 4))
>               (match_dup 0)] UNSPEC_BRANCH))]
>   ""
> )

As far as I can see, this version won't compile--there is an extraneous
"far".  Also, you are passing 2 to peep2_reg_dead_p.  That is going to
check whether the register is live in the second insn of the peephole,
which it is: the second insn is the branch instruction, which tests the
register.  You need to pass 3 to peep2_reg_dead_p.

By the way, you don't need to explicitly check satisfies_constraint_v in
the peephold predicate, as that will be checked by the match_operand
constraint.

Ian

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-10-05 13:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-07 13:17 Problem of inline assembly for branch instruction Naveen H. S
2011-09-07 16:24 ` Ian Lance Taylor
2011-09-08 11:26   ` Naveen H. S
2011-10-04 14:09   ` Problem with peephole optimizing the register Naveen H. S
2011-10-04 16:30     ` Ian Lance Taylor
2011-10-05  6:22       ` Naveen H. S
2011-10-05 13:48         ` Ian Lance Taylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).