public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* redundant divmodsi4 not optimized away
@ 2010-04-27  1:46 Greg McGary
  2010-04-27  7:18 ` Ian Lance Taylor
  0 siblings, 1 reply; 5+ messages in thread
From: Greg McGary @ 2010-04-27  1:46 UTC (permalink / raw)
  To: gcc

I have a port without div or mod machine instructions.  I wrote 
divmodsi4 patterns that do the libcall directly, hoping that GCC would 
recognize the opportunity to use a single divmodsi4 to compute both 
quotient and remainder.  Alas, GCC calls divmodsi4 twice with the same 
divisor and dividend operands.  Is this supposed to work?  Is there a 
special trick to help the optimizer recognize the redundant insn?  I saw 
the 4yr-old thread regarding picochip's desire for the same effect and 
followed the same approach implemented in the current picochip.md (as 
well as my own approach) but no luck.

G

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: redundant divmodsi4 not optimized away
  2010-04-27  1:46 redundant divmodsi4 not optimized away Greg McGary
@ 2010-04-27  7:18 ` Ian Lance Taylor
  2010-04-27 18:37   ` Greg McGary
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Lance Taylor @ 2010-04-27  7:18 UTC (permalink / raw)
  To: Greg McGary; +Cc: gcc

Greg McGary <greg@mcgary.org> writes:

> I have a port without div or mod machine instructions.  I wrote
> divmodsi4 patterns that do the libcall directly, hoping that GCC would
> recognize the opportunity to use a single divmodsi4 to compute both
> quotient and remainder.  Alas, GCC calls divmodsi4 twice with the same
> divisor and dividend operands.  Is this supposed to work?  Is there a
> special trick to help the optimizer recognize the redundant insn?  I
> saw the 4yr-old thread regarding picochip's desire for the same effect
> and followed the same approach implemented in the current picochip.md
> (as well as my own approach) but no luck.

Using a divmodsi4 insn instead of divsi3/modsi3 insns ought to work.
You may need to give more information, such as the test case you are
using, and what your divmodsi4 insn looks like.

Ian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: redundant divmodsi4 not optimized away
  2010-04-27  7:18 ` Ian Lance Taylor
@ 2010-04-27 18:37   ` Greg McGary
  2010-04-28 13:11     ` Michael Matz
  0 siblings, 1 reply; 5+ messages in thread
From: Greg McGary @ 2010-04-27 18:37 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc

On 04/26/10 22:09, Ian Lance Taylor wrote:
> Greg McGary<greg@mcgary.org>  writes:
>
>    
>> I have a port without div or mod machine instructions.  I wrote
>> divmodsi4 patterns that do the libcall directly, hoping that GCC would
>> recognize the opportunity to use a single divmodsi4 to compute both
>> quotient and remainder.  Alas, GCC calls divmodsi4 twice with the same
>> divisor and dividend operands.  Is this supposed to work?  Is there a
>> special trick to help the optimizer recognize the redundant insn?  I
>> saw the 4yr-old thread regarding picochip's desire for the same effect
>> and followed the same approach implemented in the current picochip.md
>> (as well as my own approach) but no luck.
>>      
> Using a divmodsi4 insn instead of divsi3/modsi3 insns ought to work.
> You may need to give more information, such as the test case you are
> using, and what your divmodsi4 insn looks like.
>
> Ian
>    

The test case is __udivmoddi4 from libgcc2.c, specifically
the macro __udiv_qrnnd_c from longlong.h, which does this:

     __r1 = (n1) % __d1;
     __q1 = (n1) / __d1;

... and this ...

     __r0 = __r1 % __d1;
     __q0 = __r1 / __d1;

Below is my original insn set.  The __udivmodsi4 libcall accepts
operands in r1/r2, then returns quotient in r4 and remainder in r1

(define_insn_and_split "udivmodsi4"
   [(set (match_operand:SI 0 "gen_reg_operand" "=r")
     (udiv:SI (match_operand:SI 1 "gen_reg_operand" "r")
          (match_operand:SI 2 "gen_reg_operand" "r")))
    (set (match_operand:SI 3 "gen_reg_operand" "=r")
     (umod:SI (match_dup 1)
          (match_dup 2)))
    (clobber (reg:SI 1))
    (clobber (reg:SI 2))
    (clobber (reg:SI 3))
    (clobber (reg:SI 4))
    (clobber (reg:CC CC_REGNUM))
    (clobber (reg:SI RETURN_POINTER_REGNUM))]
   ""
   "#"
   "reload_completed"
   [(set (reg:SI 1)
     (match_dup 1))
    (set (reg:SI 2)
     (match_dup 2))
    (parallel [(set (reg:SI 4)
            (udiv:SI (reg:SI 1)
                 (reg:SI 2)))
           (set (reg:SI 1)
            (umod:SI (reg:SI 1)
                 (reg:SI 2)))
           (clobber (reg:SI 2))
           (clobber (reg:SI 3))
           (clobber (reg:CC CC_REGNUM))
           (clobber (reg:SI RETURN_POINTER_REGNUM))])
    (set (match_dup 0)
     (reg:SI 4))
    (set (match_dup 3)
     (reg:SI 1))])

(define_insn "*udivmodsi4_libcall"
   [(set (reg:SI 4)
     (udiv:SI (reg:SI 1)
          (reg:SI 2)))
    (set (reg:SI 1)
     (umod:SI (reg:SI 1)
          (reg:SI 2)))
    (clobber (reg:SI 2))
    (clobber (reg:SI 3))
    (clobber (reg:CC CC_REGNUM))
    (clobber (reg:SI RETURN_POINTER_REGNUM))]
   ""
   "call\\t__udivmodsi4"
   [(set_attr "length"    "4")])

Here is an alternative patterned after the approach in picochip.md.  I
had hoped since the picochip guys reported the same trouble four years
ago, the current picochip.md might have the magic bits.

(define_expand "udivmodsi4"
   [(parallel [(set (reg:SI 1)
            (match_operand:SI 1 "gen_reg_operand"  "r"))
           (clobber (reg:CC CC_REGNUM))])
    (parallel [(set (reg:SI 2)
            (match_operand:SI 2 "gen_reg_operand"  "r"))
           (clobber (reg:CC CC_REGNUM))])
    (parallel [(unspec_volatile [(const_int 0)] UNSPEC_UDIVMOD)
           (set (reg:SI 4)
            (udiv:SI (reg:SI 1)
                (reg:SI 2)))
           (set (reg:SI 1)
            (umod:SI (reg:SI 1)
                (reg:SI 2)))
           (clobber (reg:SI 2))
           (clobber (reg:SI 3))
           (clobber (reg:CC CC_REGNUM))
           (clobber (reg:SI RETURN_POINTER_REGNUM))])
    (set (match_operand:SI 0 "gen_reg_operand" "=r")
     (reg:SI 4))
    (set (match_operand:SI 3 "gen_reg_operand" "=r")
     (reg:SI 1))])

(define_insn "*udivmodsi4_libcall"
   [(unspec_volatile [(const_int 0)] UNSPEC_UDIVMOD)
    (set (reg:SI 4)
     (udiv:SI (reg:SI 1)
          (reg:SI 2)))
    (set (reg:SI 1)
     (umod:SI (reg:SI 1)
          (reg:SI 2)))
    (clobber (reg:SI 2))
    (clobber (reg:SI 3))
    (clobber (reg:CC CC_REGNUM))
    (clobber (reg:SI RETURN_POINTER_REGNUM))]
   ""
   "call\\t__udivmodsi4"
   [(set_attr "length"    "4")])


Alas, neither of them eliminates the redundant libcall.  If no clues
are forthcoming, I'll begin debugging CSE.

G

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: redundant divmodsi4 not optimized away
  2010-04-27 18:37   ` Greg McGary
@ 2010-04-28 13:11     ` Michael Matz
  2010-04-28 16:58       ` Greg McGary
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Matz @ 2010-04-28 13:11 UTC (permalink / raw)
  To: Greg McGary; +Cc: Ian Lance Taylor, gcc

Hi,

On Tue, 27 Apr 2010, Greg McGary wrote:

> (define_insn "*udivmodsi4_libcall"
>   [(set (reg:SI 4)
>     (udiv:SI (reg:SI 1)
>          (reg:SI 2)))
>    (set (reg:SI 1)
>     (umod:SI (reg:SI 1)
>          (reg:SI 2)))
>    (clobber (reg:SI 2))
>    (clobber (reg:SI 3))
>    (clobber (reg:CC CC_REGNUM))
>    (clobber (reg:SI RETURN_POINTER_REGNUM))]
>   ""
>   "call\\t__udivmodsi4"
>   [(set_attr "length"    "4")])

So, this pattern uses r2 and clobbers r2+r3.  Two calls in a row can't be 
eliminated because the execution of one destroys one operand of the other 
as far as GCC knows, and the necessary copies to reload the correct value 
into r2 before the second call might confuse combine/CSE/DCE/whatever.  At  
least that would be my theory to start from :)


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: redundant divmodsi4 not optimized away
  2010-04-28 13:11     ` Michael Matz
@ 2010-04-28 16:58       ` Greg McGary
  0 siblings, 0 replies; 5+ messages in thread
From: Greg McGary @ 2010-04-28 16:58 UTC (permalink / raw)
  To: Michael Matz; +Cc: Ian Lance Taylor, gcc

On 04/28/10 05:58, Michael Matz wrote:

> On Tue, 27 Apr 2010, Greg McGary wrote:
>    
>> (define_insn "*udivmodsi4_libcall"
>>    [(set (reg:SI 4)
>>      (udiv:SI (reg:SI 1)
>>           (reg:SI 2)))
>>     (set (reg:SI 1)
>>      (umod:SI (reg:SI 1)
>>           (reg:SI 2)))
>>     (clobber (reg:SI 2))
>>     (clobber (reg:SI 3))
>>     (clobber (reg:CC CC_REGNUM))
>>     (clobber (reg:SI RETURN_POINTER_REGNUM))]
>>    ""
>>    "call\\t__udivmodsi4"
>>    [(set_attr "length"    "4")])
>>      
> So, this pattern uses r2 and clobbers r2+r3.  Two calls in a row can't be
> eliminated because the execution of one destroys one operand of the other
> as far as GCC knows, and the necessary copies to reload the correct value
> into r2 before the second call might confuse combine/CSE/DCE/whatever.  At
> least that would be my theory to start from :)
>    

The libcall insn above appears only after reload, as the result of a 
split.  All the CSE passes occur before reload when the insn pattern is 
this:

   [(set (match_operand:SI 0 "gen_reg_operand" "=r")
     (udiv:SI (match_operand:SI 1 "gen_reg_operand" "r")
          (match_operand:SI 2 "gen_reg_operand" "r")))
    (set (match_operand:SI 3 "gen_reg_operand" "=r")
     (umod:SI (match_dup 1)
          (match_dup 2)))
    (clobber (reg:SI 1))
    (clobber (reg:SI 2))
    (clobber (reg:SI 3))
    (clobber (reg:SI 4))
    (clobber (reg:CC CC_REGNUM))
    (clobber (reg:SI RETURN_POINTER_REGNUM))]

G

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-04-28 16:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-27  1:46 redundant divmodsi4 not optimized away Greg McGary
2010-04-27  7:18 ` Ian Lance Taylor
2010-04-27 18:37   ` Greg McGary
2010-04-28 13:11     ` Michael Matz
2010-04-28 16:58       ` Greg McGary

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).