* Problem of inline assembly for branch instruction @ 2011-09-07 13:17 Naveen H. S 2011-09-07 16:24 ` Ian Lance Taylor 0 siblings, 1 reply; 7+ messages in thread From: Naveen H. S @ 2011-09-07 13:17 UTC (permalink / raw) To: gcc-help Hi, We are implementing a special branch instruction for offsets between 0 and 32 bytes in CR16 port. ======================================================================= if (get_attr_length (insn) == 2) return \"beq0<tIsa>\t%0,%l1\"; else return \"cmp<tIsa>\t$0, %0\;beq\t%l1\"; }" [(set (attr "length") (if_then_else (and (ge (minus (match_dup 1) (pc)) (const_int 2)) (le (minus (match_dup 1) (pc)) (const_int 32))) (const_int 2) (const_int 6)))] ======================================================================= The branch instruction has been implemented and working as expected. However, it resulted in error while testing an application with inline assembly. In the application, the linker calculated length of offset from current location to the branch as 24. However, due to some inline assembly in between PC and branch location, the offset is more than 32. Hence, the linker generates error while linking application. However, we could not find any solution to handling inline assembly in such scenarios. Please let us know if the compiler could be informed not to generate special branch instruction for code with inline assembly. Also let us know if the inline assembly length attribute could be passed to linker so that it can calculate the exact offset. Thanks & Regards, Naveen ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Problem of inline assembly for branch instruction 2011-09-07 13:17 Problem of inline assembly for branch instruction Naveen H. S @ 2011-09-07 16:24 ` Ian Lance Taylor 2011-09-08 11:26 ` Naveen H. S 2011-10-04 14:09 ` Problem with peephole optimizing the register Naveen H. S 0 siblings, 2 replies; 7+ messages in thread From: Ian Lance Taylor @ 2011-09-07 16:24 UTC (permalink / raw) To: Naveen H. S; +Cc: gcc-help "Naveen H. S" <Naveen.S@kpitcummins.com> writes: > The branch instruction has been implemented and working as expected. > However, it resulted in error while testing an application with inline > assembly. In the application, the linker calculated length of offset > from current location to the branch as 24. However, due to some inline > assembly in between PC and branch location, the offset is more than 32. > Hence, the linker generates error while linking application. See define_asm_attributes for a way to estimate the length of an inline asm instruction. In the worst case, you can use this to make every inline asm large enough to force the use of a long branch. Ian ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Problem of inline assembly for branch instruction 2011-09-07 16:24 ` Ian Lance Taylor @ 2011-09-08 11:26 ` Naveen H. S 2011-10-04 14:09 ` Problem with peephole optimizing the register Naveen H. S 1 sibling, 0 replies; 7+ messages in thread From: Naveen H. S @ 2011-09-08 11:26 UTC (permalink / raw) To: Ian Lance Taylor; +Cc: gcc-help Hi, >> See define_asm_attributes for a way to estimate the length of an inline >> asm instruction. Modifying the define_asm_attributes to maximum possible length of a single machine instruction solved the issue. Thanks very much for the suggestion. Thanks & Regards, Naveen -----Original Message----- From: Ian Lance Taylor [mailto:iant@google.com] Sent: Wednesday, September 07, 2011 9:54 PM To: Naveen H. S Cc: gcc-help@gcc.gnu.org Subject: Re: Problem of inline assembly for branch instruction "Naveen H. S" <Naveen.S@kpitcummins.com> writes: > The branch instruction has been implemented and working as expected. > However, it resulted in error while testing an application with inline > assembly. In the application, the linker calculated length of offset > from current location to the branch as 24. However, due to some inline > assembly in between PC and branch location, the offset is more than 32. > Hence, the linker generates error while linking application. See define_asm_attributes for a way to estimate the length of an inline asm instruction. In the worst case, you can use this to make every inline asm large enough to force the use of a long branch. Ian ^ permalink raw reply [flat|nested] 7+ messages in thread
* Problem with peephole optimizing the register 2011-09-07 16:24 ` Ian Lance Taylor 2011-09-08 11:26 ` Naveen H. S @ 2011-10-04 14:09 ` Naveen H. S 2011-10-04 16:30 ` Ian Lance Taylor 1 sibling, 1 reply; 7+ messages in thread From: Naveen H. S @ 2011-10-04 14:09 UTC (permalink / raw) To: gcc-help Hi, We have implemented peephole for generation of tbit instruction in CR16 port. The TBIT instruction copies the bit located in register or memory location src at the bit position specified by offset, to PSR.F flag. ===================================================================== tbitb $3, 8(r1) - Copies bit in pos 3 to mem loc 8 (R1) to PSR.F flag ===================================================================== One of the instances of peephole generated is as follows:- ==================== Earlier code loadw _Fdi@l,r7 andw $16384,r7 bne0w r7,.L9 -------------------> -------------------> -------------------> movb r7, r2 bal (ra), _ClrFlags@c Optimized code tbitw $14,_Fdi@l bfs .L9 -------------------> -------------------> -------------------> movb r7, r2 <======= Problem bal (ra), _ClrFlags@c ==================== It can be noted that the value at Fdi(label ref) is copied into r7 register. The r7 register is modified and the branch depends on r7. However, with tbit peepholes; r7 register is optimized away. In an application, r7 register is being used as source in the sequence after tbit instruction. The above snippet shows that the r7 value is moved into r2. However, as the r2 value is optimized away; wrong value is moved into r2 and hence application crashes. The peep2_reg_dead_p, find_reg_note and other functions were tried out without much use. Please let us know if there are any functions that would let compiler know about the register being used in later sequence and hence avoid the peephole in such scenarios. Thanks & Regards, Naveen ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Problem with peephole optimizing the register 2011-10-04 14:09 ` Problem with peephole optimizing the register Naveen H. S @ 2011-10-04 16:30 ` Ian Lance Taylor 2011-10-05 6:22 ` Naveen H. S 0 siblings, 1 reply; 7+ messages in thread From: Ian Lance Taylor @ 2011-10-04 16:30 UTC (permalink / raw) To: Naveen H. S; +Cc: gcc-help "Naveen H. S" <Naveen.S@kpitcummins.com> writes: > It can be noted that the value at Fdi(label ref) is copied into r7 > register. The r7 register is modified and the branch depends on r7. > However, with tbit peepholes; r7 register is optimized away. > > In an application, r7 register is being used as source in the > sequence after tbit instruction. The above snippet shows that > the r7 value is moved into r2. However, as the r2 value is optimized > away; wrong value is moved into r2 and hence application crashes. > The peep2_reg_dead_p, find_reg_note and other functions were tried > out without much use. Show us the RTL and the define_peephole2. From what you have described so far peep2_reg_dead_p ought to work. Also, which version of gcc? Ian ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Problem with peephole optimizing the register 2011-10-04 16:30 ` Ian Lance Taylor @ 2011-10-05 6:22 ` Naveen H. S 2011-10-05 13:48 ` Ian Lance Taylor 0 siblings, 1 reply; 7+ messages in thread From: Naveen H. S @ 2011-10-05 6:22 UTC (permalink / raw) To: Ian Lance Taylor; +Cc: gcc-help [-- Attachment #1: Type: text/plain, Size: 601 bytes --] Hi, >> Show us the RTL and the define_peephole2. Thanks for looking at the issue. Please find attached the file tbit.txt which has complete information of RTL pattern generated and defined peephole patterns. >> From what you have described so far peep2_reg_dead_p ought to work. To use far peep2_reg_dead_p; peephole2 patterns were defined. far peep2_reg_dead_p masked the generation of tbit instruction completely. Hence, tbit instructions were not at all generated in any instance. >> Also, which version of gcc? We are using gcc-4.5.1 version. Thanks & Regards, Naveen [-- Attachment #2: tbit.txt --] [-- Type: text/plain, Size: 8579 bytes --] RTL Pattern without the peephole optimization ============================================================================================================================================================ #(insn 13 36 14 /home/Src/Components/FWU/Manager/FwuManager.c:1245 (set (reg:HI 7 r7 [orig:26 Fdi[0].Flags ] [26]) # (mem/s:HI (symbol_ref:SI ("Fdi") [flags 0x2] <var_decl 0xb7a5b3f4 Fdi>) [3 Fdi[0].Flags+0 S2 A16])) 70 {*movhi_short} (expr_list:REG_EQUIV (mem/s:HI (symbol_ref:SI ("Fdi") [flags 0x2] <var_decl 0xb7a5b3f4 Fdi>) [3 Fdi[0].Flags+0 S2 A16]) # (nil))) loadw _Fdi@l, r7 # Fdi[0].Flags, Fdi[0].Flags # 13 *movhi_short/5 [length = 6] #(insn 14 13 15 /home/Src/Components/FWU/Manager/FwuManager.c:1245 (set (reg:HI 7 r7 [25]) # (and:HI (reg:HI 7 r7 [orig:26 Fdi[0].Flags ] [26]) # (const_int 16384 [0x4000]))) 27 {andhi3} (nil)) andw $16384, r7 #, tmp25 # 14 andhi3/3 [length = 4] #(jump_insn 15 14 16 /home/Src/Components/FWU/Manager/FwuManager.c:1245 (parallel [ # (set (pc) # (if_then_else (ne (reg:HI 7 r7 [25]) # (const_int 0 [0x0])) # (label_ref:SI 27) # (pc))) # (clobber (cc0)) # ]) 74 {branchnehi} (expr_list:REG_BR_PROB (const_int 7100 [0x1bbc]) # (nil)) # -> 27) bne0w r7,.L9 # tmp25, # 15 branchnehi [length = 2] ============================================================================================================================================================ RTL Pattern after the peephole optimization =============================================================================================================== #(insn 205 258 27 /home/Src/Components/FWU/Manager/FwuManager.c:1803 (parallel [ # (reg:HI 2 r2 [54]) # (reg/v:HI 0 r0 [orig:23 f ] [23]) # (const_int 8 [0x8]) # (ne (reg:HI 2 r2 [54]) # (const_int 0 [0x0])) # (code_label 163 253 164 70 "" [5 uses]) # ]) 147 {cr16_call_value+13} (nil)) tbit $3,r0 #, f # 205 cr16_call_value+13 [length = 2] bfs .L70 #, RTL Pattern #(insn 17 16 18 /home/Src/Components/FWU/Manager/FwuManager.c:1249 (set (reg:HI 3 r3) # (const_int 16383 [0x3fff])) 70 {*movhi_short} (nil)) movw $16383, r3 #, # 17 *movhi_short/3 [length = 4] #(insn 18 17 19 /home/Src/Components/FWU/Manager/FwuManager.c:1249 (set (reg:QI 2 r2) # (reg:QI 7 r7 [25])) 69 {*movqi_short} (expr_list:REG_EQUAL (const_int 0 [0x0]) # (nil))) movb r7, r2 # tmp25, # 18 *movqi_short/1 [length = 2] ================================================================================================================ Peephole pattern used to generate the tbit instruction and hence optmized code ==================================================================================== (define_peephole [(set (match_operand:CR16IM 0 "register_operand" "") (match_operand:CR16IM 1 "memory_operand" "")) (set (match_dup 0) (and:CR16IM (match_dup 0) (match_operand 2 "const_int_operand" "v"))) (parallel [(set (pc) (if_then_else (match_operator 3 "ordered_comparison_operator" [(match_dup 0) (const_int 0)]) (label_ref (match_operand 4 "" "")) (pc))) (clobber (cc0))])] "TARGET_BIT_OPS && satisfies_constraint_v (operands[2])" "tbit%t0\t$%s2,%1\;bf%o3\t%l4") Extra Info Constarint v allows (1, 2 ......... -> 8192, 16384) Print operand 't' prints b, w, w on QI, HI and SI mode respectively Print operand 's' prints bit position Print operand 'o' prints c on EQ and s on NE operations ==================================================================================== Peephole2 pattern used to generate tbit instruction using far peep2_reg_dead_p ==================================================================================== (define_peephole2 [(set (match_operand:CR16IM 0 "register_operand" "") (match_operand:CR16IM 1 "memory_operand" "")) (set (match_dup 0) (and:CR16IM (match_dup 0) (match_operand 2 "const_int_operand" "v"))) (parallel [(set (pc) (if_then_else (match_operator 3 "ordered_comparison_operator" [(match_dup 0) (const_int 0)]) (label_ref (match_operand 4 "" "")) (pc))) (clobber (cc0))])] "TARGET_BIT_OPS && satisfies_constraint_v (operands[2]) && far peep2_reg_dead_p (2, operands[0])" [(set (match_dup 0) (unspec [(match_dup 1)(match_dup 2)] UNSPEC_TBIT)) (set (match_dup 0) (unspec [(match_operator 3 "" [(match_dup 1) (const_int 0)]) (label_ref (match_dup 4)) (match_dup 0)] UNSPEC_BRANCH))] "" ) ==================================================================================== UNSPEC instructions used in peephole2 patterns ==================================================================================== (define_insn "tbit" [(set (match_operand 0 "" "") (unspec [(match_operand 1 "" "m,i")(match_operand 2 "" "")] UNSPEC_TBIT))] "" "@ tbit%t0\t$%s2,%1 tbit%t0\t$%s3,%c1" [(set_attr "length" "2")] ) (define_insn "branch_fs" [(set (match_operand 0 "" "") (unspec [(match_operator 1 "" [(match_operand 2 "" "") (const_int 0)]) (label_ref (match_operand 3 "" "")) (match_operand 4 "" "")] UNSPEC_BRANCH))] "" "bf%o1\t%l3" [(set_attr "length" "2")] ) ==================================================================================== The generated assembly in the erraneous function =================================================================== Generated assembler: 14019 _DeviceDataInit: 14022 0000 1E01 push ra 14023 0002 0701 push $1, r7 14026 0004 935A movw $-1, r3 #, 14027 0006 0258 movb $0, r2 #, 14028 0008 00C00000 bal (ra), _ClrFlags@c # 14031 000c 1100E0F0 tbitw $14,_Fdi@l # Fdi[0].Flags,, Fdi[0].Flags 14031 0000 14032 0012 8010 bfs .L9 #, 14034 0014 B35AFF3F movw $16383, r3 #, 14035 0018 7259 movb r7, r2 # tmp25, --------> Problem here r7 used without initialization 14036 001a 00C00000 bal (ra), _ClrFlags@c # 14038 001e 135A movw $1, r3 #, 14039 0020 7259 movb r7, r2 # tmp25, --------> Problem here r7 used without initialization 14040 0022 00C00000 bal (ra), _SetFlags@c # 14045 0026 0702 pop $1, r7 14046 0028 1E03 popret ra RTL of the above problem ================================================================================================================ #(insn 13 36 16 /home/Src/Components/FWU/Manager/FwuManager.c:1245 (parallel [ # (reg:HI 7 r7 [orig:26 Fdi[0].Flags ] [26]) # (mem/s:HI (symbol_ref:SI ("Fdi") [flags 0x2] <var_decl 0xb7a873f4 Fdi>) [3 Fdi[0].Flags+0 S2 A16]) # (const_int 16384 [0x4000]) # (ne (reg:HI 7 r7 [25]) # (const_int 0 [0x0])) # (code_label 27 38 28 9 "" [1 uses]) # ]) 136 {cr16_call_value+2} (expr_list:REG_EQUIV (mem/s:HI (symbol_ref:SI ("Fdi") [flags 0x2] <var_decl 0xb7a873f4 Fdi>) [3 Fdi[0].Flags+0 S2 A16]) # (nil))) tbitw $14,_Fdi@l # Fdi[0].Flags,, Fdi[0].Flags # 13 cr16_call_value+2 [length = 6] bfs .L9 #, .loc 1 1249 0 #(insn 17 16 18 /home/Src/Components/FWU/Manager/FwuManager.c:1249 (set (reg:HI 3 r3) # (const_int 16383 [0x3fff])) 70 {*movhi_short} (nil)) movw $16383, r3 #, # 17 *movhi_short/3 [length = 4] #(insn 18 17 19 /home/Src/Components/FWU/Manager/FwuManager.c:1249 (set (reg:QI 2 r2) # (reg:QI 7 r7 [25])) 69 {*movqi_short} (expr_list:REG_EQUAL (const_int 0 [0x0]) # (nil))) movb r7, r2 # tmp25, # 18 *movqi_short/1 [length = 2] --------> Problem here r7 used without initialization ================================================================================================================ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Problem with peephole optimizing the register 2011-10-05 6:22 ` Naveen H. S @ 2011-10-05 13:48 ` Ian Lance Taylor 0 siblings, 0 replies; 7+ messages in thread From: Ian Lance Taylor @ 2011-10-05 13:48 UTC (permalink / raw) To: Naveen H. S; +Cc: gcc-help "Naveen H. S" <Naveen.S@kpitcummins.com> writes: > Peephole pattern used to generate the tbit instruction and hence optmized code > ==================================================================================== > (define_peephole > [(set (match_operand:CR16IM 0 "register_operand" "") > (match_operand:CR16IM 1 "memory_operand" "")) > (set (match_dup 0) > (and:CR16IM (match_dup 0) > (match_operand 2 "const_int_operand" "v"))) > (parallel [(set (pc) > (if_then_else (match_operator 3 "ordered_comparison_operator" > [(match_dup 0) > (const_int 0)]) > (label_ref (match_operand 4 "" "")) > (pc))) > (clobber (cc0))])] > "TARGET_BIT_OPS && satisfies_constraint_v (operands[2])" > "tbit%t0\t$%s2,%1\;bf%o3\t%l4") This version is not checking whether operand[0] is dead. These days we generally recommend define_peephole2. If you use define_peephole as in this example, you need to use dead_or_set_p to check whether the register is dead. > Peephole2 pattern used to generate tbit instruction using far peep2_reg_dead_p > ==================================================================================== > (define_peephole2 > [(set (match_operand:CR16IM 0 "register_operand" "") > (match_operand:CR16IM 1 "memory_operand" "")) > (set (match_dup 0) > (and:CR16IM (match_dup 0) > (match_operand 2 "const_int_operand" "v"))) > (parallel [(set (pc) > (if_then_else (match_operator 3 "ordered_comparison_operator" > [(match_dup 0) > (const_int 0)]) > (label_ref (match_operand 4 "" "")) > (pc))) > (clobber (cc0))])] > "TARGET_BIT_OPS && satisfies_constraint_v (operands[2]) > && far peep2_reg_dead_p (2, operands[0])" > [(set (match_dup 0) > (unspec [(match_dup 1)(match_dup 2)] UNSPEC_TBIT)) > (set (match_dup 0) > (unspec [(match_operator 3 "" > [(match_dup 1) > (const_int 0)]) > (label_ref (match_dup 4)) > (match_dup 0)] UNSPEC_BRANCH))] > "" > ) As far as I can see, this version won't compile--there is an extraneous "far". Also, you are passing 2 to peep2_reg_dead_p. That is going to check whether the register is live in the second insn of the peephole, which it is: the second insn is the branch instruction, which tests the register. You need to pass 3 to peep2_reg_dead_p. By the way, you don't need to explicitly check satisfies_constraint_v in the peephold predicate, as that will be checked by the match_operand constraint. Ian ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-10-05 13:48 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-09-07 13:17 Problem of inline assembly for branch instruction Naveen H. S 2011-09-07 16:24 ` Ian Lance Taylor 2011-09-08 11:26 ` Naveen H. S 2011-10-04 14:09 ` Problem with peephole optimizing the register Naveen H. S 2011-10-04 16:30 ` Ian Lance Taylor 2011-10-05 6:22 ` Naveen H. S 2011-10-05 13:48 ` Ian Lance Taylor
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).