question about DSE

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* question about DSE
@ 2009-09-08  9:57 Alex Turjan
  2009-09-08 10:25 ` Andrew Haley
  2009-09-08 10:55 ` Michael Matz
  0 siblings, 2 replies; 5+ messages in thread
From: Alex Turjan @ 2009-09-08  9:57 UTC (permalink / raw)
  To: gcc; +Cc: rsandifor, zadeck

Dear all,
Im writing to you regarding the dead store elimination (dse) which runs after register allocation. Apparently dse removes wrongly the following store (present in bb2):

(insn 374 47 52 2 test.c:107 (set (mem/c:SI (plus:PSI (reg/f:PSI 55 ptr15)
                (const_int 96 [0x60])) [19 fac_iter+0 S4 A32])
        (reg/v:SI 16 r16 [orig:161 step109 ] [161])) 48  {si_indexed_store_incl_ra} (nil))

despite being consumed (in bb3) by the following 2 loads:
(insn 380 71 64 3 test.c:112 (set (reg:HI 1 r1)
        (mem:HI (plus:PSI (reg/f:PSI 55 ptr15)
                (const_int 96 [0x60])) [0 S2 A16])) 12 {load} (nil))

(insn 382 346 65 3 test.c:112 (set (reg:HI 5 r5)
        (mem:HI (plus:PSI (reg/f:PSI 55 ptr15)
                (const_int 98 [0x62])) [0 S2 A16])) 12 {load} (nil))

Can anyone point what may be the problem?

As you can see the store is SI while the loads are HI. While looking to the comments from dse.c I get to the following remark:

" There are three cases where dse falls short:
     a) Reload sometimes creates the slot for one mode of access, and
     then inserts loads and/or stores for a smaller mode. "

Does it mean that such cases are not treated properly by dse?

I observed that if I run with the flag -fno-strict-aliasing the wrongly removed store is no longer removed and the code is runs correctly. 
Im wondering does the dse after register allocation make use of type based alias analysis? 

reagards,
Alex

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: question about DSE
  2009-09-08  9:57 question about DSE Alex Turjan
@ 2009-09-08 10:25 ` Andrew Haley
  2009-09-08 10:55 ` Michael Matz
  1 sibling, 0 replies; 5+ messages in thread
From: Andrew Haley @ 2009-09-08 10:25 UTC (permalink / raw)
  To: Alex Turjan; +Cc: gcc, rsandifor, zadeck

Alex Turjan wrote:
> Dear all,
> Im writing to you regarding the dead store elimination (dse) which runs after register allocation. Apparently dse removes wrongly the following store (present in bb2):
> 
> (insn 374 47 52 2 test.c:107 (set (mem/c:SI (plus:PSI (reg/f:PSI 55 ptr15)
>                 (const_int 96 [0x60])) [19 fac_iter+0 S4 A32])
>         (reg/v:SI 16 r16 [orig:161 step109 ] [161])) 48  {si_indexed_store_incl_ra} (nil))
> 
> despite being consumed (in bb3) by the following 2 loads:
> (insn 380 71 64 3 test.c:112 (set (reg:HI 1 r1)
>         (mem:HI (plus:PSI (reg/f:PSI 55 ptr15)
>                 (const_int 96 [0x60])) [0 S2 A16])) 12 {load} (nil))
> 
> (insn 382 346 65 3 test.c:112 (set (reg:HI 5 r5)
>         (mem:HI (plus:PSI (reg/f:PSI 55 ptr15)
>                 (const_int 98 [0x62])) [0 S2 A16])) 12 {load} (nil))
> 
> 
> Can anyone point what may be the problem?
> 
> As you can see the store is SI while the loads are HI. While looking to the comments from dse.c I get to the following remark:
> 
> " There are three cases where dse falls short:
>      a) Reload sometimes creates the slot for one mode of access, and
>      then inserts loads and/or stores for a smaller mode. "
> 
> Does it mean that such cases are not treated properly by dse?
> 
> I observed that if I run with the flag -fno-strict-aliasing the wrongly removed store is no longer removed and the code is runs correctly. 
> Im wondering does the dse after register allocation make use of type based alias analysis? 

Here's part of the comment in alias.c:

/* The alias sets assigned to MEMs assist the back-end in determining
   which MEMs can alias which other MEMs.  In general, two MEMs in
   different alias sets cannot alias each other ...

There's a lot more information in the comments there.

Andrew.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: question about DSE
  2009-09-08  9:57 question about DSE Alex Turjan
  2009-09-08 10:25 ` Andrew Haley
@ 2009-09-08 10:55 ` Michael Matz
  2009-09-09 16:59   ` Alex Turjan
  1 sibling, 1 reply; 5+ messages in thread
From: Michael Matz @ 2009-09-08 10:55 UTC (permalink / raw)
  To: Alex Turjan; +Cc: gcc, rsandifor, zadeck

Hi,

On Tue, 8 Sep 2009, Alex Turjan wrote:

> (insn 374 47 52 2 test.c:107 (set (mem/c:SI (plus:PSI (reg/f:PSI 55 ptr15)
>                 (const_int 96 [0x60])) [19 fac_iter+0 S4 A32])
>         (reg/v:SI 16 r16 [orig:161 step109 ] [161])) 48  {si_indexed_store_incl_ra} (nil))

An SI store to a MEM of alias set 19 (and a non-trapping while we're at 
it, the /c marker on the MEM) ...

> despite being consumed (in bb3) by the following 2 loads:
> (insn 380 71 64 3 test.c:112 (set (reg:HI 1 r1)
>         (mem:HI (plus:PSI (reg/f:PSI 55 ptr15)
>                 (const_int 96 [0x60])) [0 S2 A16])) 12 {load} (nil))
> 
> (insn 382 346 65 3 test.c:112 (set (reg:HI 5 r5)
>         (mem:HI (plus:PSI (reg/f:PSI 55 ptr15)
>                 (const_int 98 [0x62])) [0 S2 A16])) 12 {load} (nil))

... and two HI loads of MEMs of alias set 0.  This should be fine as set 0 
aliases with everything.  Either you need to open a bugreport with 
necessary patches to reproduce the above, or make it reproduce on trunk, 
or you need to debug dse.c yourself.

There is one peculiarity about dse.c which might be able to help you in 
determining the cause.  dse.c operates under the assumption that all spill 
slot accesses generated by reload are using exactly the same alias set and 
that any such slot is completely unaliased by anything else including 
stuff with alias set 0.  This is done only by the after-reload dse pass.

See functions dse_record_singleton_alias_set and friends.  Your loads 
should kill the active stores in function check_mem_read_rtx should you 
need to debug it that far.

My assumption would be these two split loads of HImode are generated by 
your backend from a given SImode MEM.  If so, you need to make sure to 
copy the MEM_ALIAS_SET, at least for spill slots (better for everything) 
into the newly generated HImode mems.  For spill slots it's not enough to 
set it to zero.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: question about DSE
  2009-09-08 10:55 ` Michael Matz
@ 2009-09-09 16:59   ` Alex Turjan
  2009-09-09 17:59     ` Richard Henderson
  0 siblings, 1 reply; 5+ messages in thread
From: Alex Turjan @ 2009-09-09 16:59 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc

Hi Michael,

> My assumption would be these two split loads of HImode are
> generated by your backend from a given SImode MEM. 

Indeed your asumption is right. Bellow I have a mulsi3 expand in which I generate insns of mode HI. operands[1] gets spilled: in the produced BB as a single SI store while in the consumer BB as two separte HI loads (see a_hi and a_lo).

(define_expand "mulsi3"
 [(match_operand: SI 0 "general_register_operand" "")
  (match_operand: SI 1 "general_register_operand" "")
  (match_operand: SI 2 "general_register_operand" "")]
 ""
 "{
  rtx buff = gen_reg_rtx(SImode);
  rtx a_lo = gen_rtx_SUBREG(HImode, operands[1], 0);     
  rtx a_hi = gen_rtx_SUBREG(HImode, operands[1], 2);     
  rtx b_lo = gen_rtx_SUBREG(HImode, operands[2], 0);     
  rtx b_hi = gen_rtx_SUBREG(HImode, operands[2], 2);     
  rtx r_hi = gen_rtx_SUBREG(HImode, buff, 2);     
  emit_insn(gen_umulhisi3(buff, a_lo, b_lo));
  emit_insn(gen_machi3(r_hi, a_hi, b_lo, r_hi));
  emit_insn(gen_machi3(r_hi, a_lo, b_hi, r_hi));
  emit_move_insn(operands[0], buff);
  DONE;
}")

> If so, you need
> to make sure to copy the MEM_ALIAS_SET, at least for spill slots (better
> for everything) into the newly generated HImode mems.  For spill slots
> it's not enough to set it to zero.

I get your point but as the generation SI->HI takes place in the expand it doesnt help to copy the MEM_ALIAS_SET becasue the operands are pseudo regs. 

However, to get a correct implementation I did the following. Instead of doing the split in the expand (as show above), I made use of the following define_insn_and_split:

(define_expand "mulsi3"
 [(parallel
[(set (match_operand:SI 0 "register_operand" "")
        (mult:SI (match_operand:SI 1 "register_operand" "")
                  (match_operand:SI 2 "nonmemory_operand" "")))
   (clobber (match_operand:SI  3 "register_operand"   ""))
]
)
]
 ""
 "{
operands[3]  = gen_reg_rtx(SImode);
}")

(define_insn_and_split "*mulsi3"
 [(parallel[(set (match_operand:SI 0 "register_operand" "=d,d")
        (mult:SI (match_operand:SI 1 "register_operand" "d,d")
                  (match_operand:SI 2 "nonmemory_operand" "d,I")))
        (clobber (match_operand:SI  3 "register_operand"   "=d,d"))
  ])]
""
 "#"
 "reload_completed"
  [(clobber (const_int 0))]
"{
  rtx a_lo = gen_rtx_SUBREG(HImode, operands[1], 0);     
  rtx a_hi = gen_rtx_SUBREG(HImode, operands[1], 2);     
  rtx b_lo = gen_rtx_SUBREG(HImode, operands[2], 0);     
  rtx b_hi = gen_rtx_SUBREG(HImode, operands[2], 2);     
  rtx r_hi = gen_rtx_SUBREG(HImode, operands[3], 2);     
  emit_insn(gen_umulhisi3(operands[3], a_lo, b_lo));
  emit_insn(gen_machi3(r_hi, a_hi, b_lo, r_hi));
  emit_insn(gen_machi3(r_hi, a_lo, b_hi, r_hi));
  emit_move_insn(operands[0], operands[3]);
 DONE;
}")

By using this define_insn_and_split with the predicate "reload_completed" 
I ensure that the register allocation takes place on the operands of the "mulsi3" instruction as defined by the define_expand construct. In this way instead of the two separate HI loads (from my previouse mail) I get only one SI load which aliases whith the SI store. In consequence the SI store is no longer removed.

1.What do you think about this implementation? using define_insn_and_split
2.Is is true that in the define_expand constructs I should avoid inducing subregs?

thanks,
Alex

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: question about DSE
  2009-09-09 16:59   ` Alex Turjan
@ 2009-09-09 17:59     ` Richard Henderson
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Henderson @ 2009-09-09 17:59 UTC (permalink / raw)
  To: Alex Turjan; +Cc: Michael Matz, gcc

On 09/09/2009 09:59 AM, Alex Turjan wrote:
> 1.What do you think about this implementation? using define_insn_and_split

If this port uses BITS_PER_WORD=16, then the lower-subreg pass may
be able to give you better register allocation by disconnecting the
low and high parts of the 32-bit value.

If this port uses BITS_PER_WORD=32, then lower-subreg can't do
anything, and you may get better assembly by using the post-reload
splitter.

> 2.Is is true that in the define_expand constructs I should avoid inducing subregs?

It's true that you should *always* avoid calling gen_rtx_SUBREG
directly, and instead use one of gen_lowpart, gen_highpart, or
simplify_gen_subreg.

r~

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-09-09 17:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-08  9:57 question about DSE Alex Turjan
2009-09-08 10:25 ` Andrew Haley
2009-09-08 10:55 ` Michael Matz
2009-09-09 16:59   ` Alex Turjan
2009-09-09 17:59     ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).