* question about DSE @ 2009-09-08 9:57 Alex Turjan 2009-09-08 10:25 ` Andrew Haley 2009-09-08 10:55 ` Michael Matz 0 siblings, 2 replies; 5+ messages in thread From: Alex Turjan @ 2009-09-08 9:57 UTC (permalink / raw) To: gcc; +Cc: rsandifor, zadeck Dear all, Im writing to you regarding the dead store elimination (dse) which runs after register allocation. Apparently dse removes wrongly the following store (present in bb2): (insn 374 47 52 2 test.c:107 (set (mem/c:SI (plus:PSI (reg/f:PSI 55 ptr15) (const_int 96 [0x60])) [19 fac_iter+0 S4 A32]) (reg/v:SI 16 r16 [orig:161 step109 ] [161])) 48 {si_indexed_store_incl_ra} (nil)) despite being consumed (in bb3) by the following 2 loads: (insn 380 71 64 3 test.c:112 (set (reg:HI 1 r1) (mem:HI (plus:PSI (reg/f:PSI 55 ptr15) (const_int 96 [0x60])) [0 S2 A16])) 12 {load} (nil)) (insn 382 346 65 3 test.c:112 (set (reg:HI 5 r5) (mem:HI (plus:PSI (reg/f:PSI 55 ptr15) (const_int 98 [0x62])) [0 S2 A16])) 12 {load} (nil)) Can anyone point what may be the problem? As you can see the store is SI while the loads are HI. While looking to the comments from dse.c I get to the following remark: " There are three cases where dse falls short: a) Reload sometimes creates the slot for one mode of access, and then inserts loads and/or stores for a smaller mode. " Does it mean that such cases are not treated properly by dse? I observed that if I run with the flag -fno-strict-aliasing the wrongly removed store is no longer removed and the code is runs correctly. Im wondering does the dse after register allocation make use of type based alias analysis? reagards, Alex ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: question about DSE 2009-09-08 9:57 question about DSE Alex Turjan @ 2009-09-08 10:25 ` Andrew Haley 2009-09-08 10:55 ` Michael Matz 1 sibling, 0 replies; 5+ messages in thread From: Andrew Haley @ 2009-09-08 10:25 UTC (permalink / raw) To: Alex Turjan; +Cc: gcc, rsandifor, zadeck Alex Turjan wrote: > Dear all, > Im writing to you regarding the dead store elimination (dse) which runs after register allocation. Apparently dse removes wrongly the following store (present in bb2): > > (insn 374 47 52 2 test.c:107 (set (mem/c:SI (plus:PSI (reg/f:PSI 55 ptr15) > (const_int 96 [0x60])) [19 fac_iter+0 S4 A32]) > (reg/v:SI 16 r16 [orig:161 step109 ] [161])) 48 {si_indexed_store_incl_ra} (nil)) > > despite being consumed (in bb3) by the following 2 loads: > (insn 380 71 64 3 test.c:112 (set (reg:HI 1 r1) > (mem:HI (plus:PSI (reg/f:PSI 55 ptr15) > (const_int 96 [0x60])) [0 S2 A16])) 12 {load} (nil)) > > (insn 382 346 65 3 test.c:112 (set (reg:HI 5 r5) > (mem:HI (plus:PSI (reg/f:PSI 55 ptr15) > (const_int 98 [0x62])) [0 S2 A16])) 12 {load} (nil)) > > > Can anyone point what may be the problem? > > As you can see the store is SI while the loads are HI. While looking to the comments from dse.c I get to the following remark: > > " There are three cases where dse falls short: > a) Reload sometimes creates the slot for one mode of access, and > then inserts loads and/or stores for a smaller mode. " > > Does it mean that such cases are not treated properly by dse? > > I observed that if I run with the flag -fno-strict-aliasing the wrongly removed store is no longer removed and the code is runs correctly. > Im wondering does the dse after register allocation make use of type based alias analysis? Here's part of the comment in alias.c: /* The alias sets assigned to MEMs assist the back-end in determining which MEMs can alias which other MEMs. In general, two MEMs in different alias sets cannot alias each other ... There's a lot more information in the comments there. Andrew. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: question about DSE 2009-09-08 9:57 question about DSE Alex Turjan 2009-09-08 10:25 ` Andrew Haley @ 2009-09-08 10:55 ` Michael Matz 2009-09-09 16:59 ` Alex Turjan 1 sibling, 1 reply; 5+ messages in thread From: Michael Matz @ 2009-09-08 10:55 UTC (permalink / raw) To: Alex Turjan; +Cc: gcc, rsandifor, zadeck Hi, On Tue, 8 Sep 2009, Alex Turjan wrote: > (insn 374 47 52 2 test.c:107 (set (mem/c:SI (plus:PSI (reg/f:PSI 55 ptr15) > (const_int 96 [0x60])) [19 fac_iter+0 S4 A32]) > (reg/v:SI 16 r16 [orig:161 step109 ] [161])) 48 {si_indexed_store_incl_ra} (nil)) An SI store to a MEM of alias set 19 (and a non-trapping while we're at it, the /c marker on the MEM) ... > despite being consumed (in bb3) by the following 2 loads: > (insn 380 71 64 3 test.c:112 (set (reg:HI 1 r1) > (mem:HI (plus:PSI (reg/f:PSI 55 ptr15) > (const_int 96 [0x60])) [0 S2 A16])) 12 {load} (nil)) > > (insn 382 346 65 3 test.c:112 (set (reg:HI 5 r5) > (mem:HI (plus:PSI (reg/f:PSI 55 ptr15) > (const_int 98 [0x62])) [0 S2 A16])) 12 {load} (nil)) ... and two HI loads of MEMs of alias set 0. This should be fine as set 0 aliases with everything. Either you need to open a bugreport with necessary patches to reproduce the above, or make it reproduce on trunk, or you need to debug dse.c yourself. There is one peculiarity about dse.c which might be able to help you in determining the cause. dse.c operates under the assumption that all spill slot accesses generated by reload are using exactly the same alias set and that any such slot is completely unaliased by anything else including stuff with alias set 0. This is done only by the after-reload dse pass. See functions dse_record_singleton_alias_set and friends. Your loads should kill the active stores in function check_mem_read_rtx should you need to debug it that far. My assumption would be these two split loads of HImode are generated by your backend from a given SImode MEM. If so, you need to make sure to copy the MEM_ALIAS_SET, at least for spill slots (better for everything) into the newly generated HImode mems. For spill slots it's not enough to set it to zero. Ciao, Michael. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: question about DSE 2009-09-08 10:55 ` Michael Matz @ 2009-09-09 16:59 ` Alex Turjan 2009-09-09 17:59 ` Richard Henderson 0 siblings, 1 reply; 5+ messages in thread From: Alex Turjan @ 2009-09-09 16:59 UTC (permalink / raw) To: Michael Matz; +Cc: gcc Hi Michael, > My assumption would be these two split loads of HImode are > generated by your backend from a given SImode MEM. Indeed your asumption is right. Bellow I have a mulsi3 expand in which I generate insns of mode HI. operands[1] gets spilled: in the produced BB as a single SI store while in the consumer BB as two separte HI loads (see a_hi and a_lo). (define_expand "mulsi3" [(match_operand: SI 0 "general_register_operand" "") (match_operand: SI 1 "general_register_operand" "") (match_operand: SI 2 "general_register_operand" "")] "" "{ rtx buff = gen_reg_rtx(SImode); rtx a_lo = gen_rtx_SUBREG(HImode, operands[1], 0); rtx a_hi = gen_rtx_SUBREG(HImode, operands[1], 2); rtx b_lo = gen_rtx_SUBREG(HImode, operands[2], 0); rtx b_hi = gen_rtx_SUBREG(HImode, operands[2], 2); rtx r_hi = gen_rtx_SUBREG(HImode, buff, 2); emit_insn(gen_umulhisi3(buff, a_lo, b_lo)); emit_insn(gen_machi3(r_hi, a_hi, b_lo, r_hi)); emit_insn(gen_machi3(r_hi, a_lo, b_hi, r_hi)); emit_move_insn(operands[0], buff); DONE; }") > If so, you need > to make sure to copy the MEM_ALIAS_SET, at least for spill slots (better > for everything) into the newly generated HImode mems. For spill slots > it's not enough to set it to zero. I get your point but as the generation SI->HI takes place in the expand it doesnt help to copy the MEM_ALIAS_SET becasue the operands are pseudo regs. However, to get a correct implementation I did the following. Instead of doing the split in the expand (as show above), I made use of the following define_insn_and_split: (define_expand "mulsi3" [(parallel [(set (match_operand:SI 0 "register_operand" "") (mult:SI (match_operand:SI 1 "register_operand" "") (match_operand:SI 2 "nonmemory_operand" ""))) (clobber (match_operand:SI 3 "register_operand" "")) ] ) ] "" "{ operands[3] = gen_reg_rtx(SImode); }") (define_insn_and_split "*mulsi3" [(parallel[(set (match_operand:SI 0 "register_operand" "=d,d") (mult:SI (match_operand:SI 1 "register_operand" "d,d") (match_operand:SI 2 "nonmemory_operand" "d,I"))) (clobber (match_operand:SI 3 "register_operand" "=d,d")) ])] "" "#" "reload_completed" [(clobber (const_int 0))] "{ rtx a_lo = gen_rtx_SUBREG(HImode, operands[1], 0); rtx a_hi = gen_rtx_SUBREG(HImode, operands[1], 2); rtx b_lo = gen_rtx_SUBREG(HImode, operands[2], 0); rtx b_hi = gen_rtx_SUBREG(HImode, operands[2], 2); rtx r_hi = gen_rtx_SUBREG(HImode, operands[3], 2); emit_insn(gen_umulhisi3(operands[3], a_lo, b_lo)); emit_insn(gen_machi3(r_hi, a_hi, b_lo, r_hi)); emit_insn(gen_machi3(r_hi, a_lo, b_hi, r_hi)); emit_move_insn(operands[0], operands[3]); DONE; }") By using this define_insn_and_split with the predicate "reload_completed" I ensure that the register allocation takes place on the operands of the "mulsi3" instruction as defined by the define_expand construct. In this way instead of the two separate HI loads (from my previouse mail) I get only one SI load which aliases whith the SI store. In consequence the SI store is no longer removed. 1.What do you think about this implementation? using define_insn_and_split 2.Is is true that in the define_expand constructs I should avoid inducing subregs? thanks, Alex ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: question about DSE 2009-09-09 16:59 ` Alex Turjan @ 2009-09-09 17:59 ` Richard Henderson 0 siblings, 0 replies; 5+ messages in thread From: Richard Henderson @ 2009-09-09 17:59 UTC (permalink / raw) To: Alex Turjan; +Cc: Michael Matz, gcc On 09/09/2009 09:59 AM, Alex Turjan wrote: > 1.What do you think about this implementation? using define_insn_and_split If this port uses BITS_PER_WORD=16, then the lower-subreg pass may be able to give you better register allocation by disconnecting the low and high parts of the 32-bit value. If this port uses BITS_PER_WORD=32, then lower-subreg can't do anything, and you may get better assembly by using the post-reload splitter. > 2.Is is true that in the define_expand constructs I should avoid inducing subregs? It's true that you should *always* avoid calling gen_rtx_SUBREG directly, and instead use one of gen_lowpart, gen_highpart, or simplify_gen_subreg. r~ ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-09-09 17:59 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-09-08 9:57 question about DSE Alex Turjan 2009-09-08 10:25 ` Andrew Haley 2009-09-08 10:55 ` Michael Matz 2009-09-09 16:59 ` Alex Turjan 2009-09-09 17:59 ` Richard Henderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).