* [MIPS] MADD issue @ 2007-04-13 0:26 Fu, Chao-Ying 2007-04-13 13:07 ` Paolo Bonzini 2007-04-13 14:25 ` Richard Sandiford 0 siblings, 2 replies; 18+ messages in thread From: Fu, Chao-Ying @ 2007-04-13 0:26 UTC (permalink / raw) To: Richard Sandiford, gcc; +Cc: Thekkath, Radhika, Stephens, Nigel Hi Richard, After tracing GCC 4.x to see why MADD is not generated for MIPS32, I found out the main issue is that the pattern "adddi3" is not available for MIPS32. Because the missing of adddi3, GCC 4.x needs to split 64-bit addition to 4 separate RTL insns. This leads to that the combining phase fails to combine RTL insns to a single madd pattern. Could we enable "adddi3" for MIPS32 in GCC 4.x? Or is there a better way to generate MADD? Thanks a lot! Ex: (b67.c) long long test (long long a, int b, int c) { return a + (long long) b * (long long) c; } # gcc -S b67.c -O3 -mips32 (b67.s) test: .frame $sp,0,$31 .mask 0x00000000,0 .fmask 0x00000000,0 .set noreorder .set nomacro mtlo $5 mthi $4 madd $6,$7 mflo $3 j $31 mfhi $2 Regards, Chao-ying --------------------------------------------------------------------- Ex: (mips.md in GCC 3.4) (define_expand "adddi3" [(parallel [(set (match_operand:DI 0 "register_operand" "") (plus:DI (match_operand:DI 1 "register_operand" "") (match_operand:DI 2 "arith_operand" ""))) (clobber (match_dup 3))])] "TARGET_64BIT || (!TARGET_DEBUG_G_MODE && !TARGET_MIPS16)" { .... (define_insn "adddi3_internal_1" [(set (match_operand:DI 0 "register_operand" "=d,&d") (plus:DI (match_operand:DI 1 "register_operand" "0,d") (match_operand:DI 2 "register_operand" "d,d"))) (clobber (match_operand:SI 3 "register_operand" "=d,d"))] "!TARGET_64BIT && !TARGET_DEBUG_G_MODE && !TARGET_MIPS16" { return (REGNO (operands[0]) == REGNO (operands[1]) && REGNO (operands[0]) == REGNO (operands[2])) ? "srl\t%3,%L0,31\;sll\t%M0,%M0,1\;sll\t%L0,%L1,1\;addu\t%M0,%M0,%3" : "addu\t%L0,%L1,%L2\;sltu\t%3,%L0,%L2\;addu\t%M0,%M1,%M2\;addu\t%M0,%M0,%3"; } [(set_attr "type" "darith") (set_attr "mode" "DI") (set_attr "length" "16")]) (define_insn "*smul_acc_di" [(set (match_operand:DI 0 "register_operand" "=x") (plus:DI (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "d")) (sign_extend:DI (match_operand:SI 2 "register_operand" "d"))) (match_operand:DI 3 "register_operand" "0")))] "(TARGET_MAD || ISA_HAS_MACC) && !TARGET_64BIT" { if (TARGET_MAD) return "mad\t%1,%2"; else if (TARGET_MIPS5500) return "madd\t%1,%2"; else return "macc\t%.,%1,%2"; } [(set_attr "type" "imadd") (set_attr "mode" "SI")]) --------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [MIPS] MADD issue 2007-04-13 0:26 [MIPS] MADD issue Fu, Chao-Ying @ 2007-04-13 13:07 ` Paolo Bonzini 2007-04-13 17:05 ` Ian Lance Taylor 2007-04-13 14:25 ` Richard Sandiford 1 sibling, 1 reply; 18+ messages in thread From: Paolo Bonzini @ 2007-04-13 13:07 UTC (permalink / raw) To: Fu, Chao-Ying; +Cc: Richard Sandiford, GCC Development > (define_insn "adddi3_internal_1" > [(set (match_operand:DI 0 "register_operand" "=d,&d") > (plus:DI (match_operand:DI 1 "register_operand" "0,d") > (match_operand:DI 2 "register_operand" "d,d"))) > (clobber (match_operand:SI 3 "register_operand" "=d,d"))] > "!TARGET_64BIT && !TARGET_DEBUG_G_MODE && !TARGET_MIPS16" > { > return (REGNO (operands[0]) == REGNO (operands[1]) > && REGNO (operands[0]) == REGNO (operands[2])) > ? "srl\t%3,%L0,31\;sll\t%M0,%M0,1\;sll\t%L0,%L1,1\;addu\t%M0,%M0,%3" > : "addu\t%L0,%L1,%L2\;sltu\t%3,%L0,%L2\;addu\t%M0,%M1,%M2\;addu\t%M0,%M0,%3"; > } This should be a post-reload (i.e. predicated on reload_completed) split, I think. Paolo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [MIPS] MADD issue 2007-04-13 13:07 ` Paolo Bonzini @ 2007-04-13 17:05 ` Ian Lance Taylor 2007-04-13 18:10 ` Paolo Bonzini 0 siblings, 1 reply; 18+ messages in thread From: Ian Lance Taylor @ 2007-04-13 17:05 UTC (permalink / raw) To: bonzini; +Cc: Fu, Chao-Ying, Richard Sandiford, GCC Development Paolo Bonzini <paolo.bonzini@lu.unisi.ch> writes: > > (define_insn "adddi3_internal_1" > > [(set (match_operand:DI 0 "register_operand" "=d,&d") > > (plus:DI (match_operand:DI 1 "register_operand" "0,d") > > (match_operand:DI 2 "register_operand" "d,d"))) > > (clobber (match_operand:SI 3 "register_operand" "=d,d"))] > > "!TARGET_64BIT && !TARGET_DEBUG_G_MODE && !TARGET_MIPS16" > > { > > return (REGNO (operands[0]) == REGNO (operands[1]) > > && REGNO (operands[0]) == REGNO (operands[2])) > > ? "srl\t%3,%L0,31\;sll\t%M0,%M0,1\;sll\t%L0,%L1,1\;addu\t%M0,%M0,%3" > > : "addu\t%L0,%L1,%L2\;sltu\t%3,%L0,%L2\;addu\t%M0,%M1,%M2\;addu\t%M0,%M0,%3"; > > } > > This should be a post-reload (i.e. predicated on reload_completed) > split, I think. Actually, with the relatively recent lower-subreg work, it is desirable to split this sort of instruction before reload. That is, do an unconditional split. Ian ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [MIPS] MADD issue 2007-04-13 17:05 ` Ian Lance Taylor @ 2007-04-13 18:10 ` Paolo Bonzini 2007-04-13 18:33 ` Richard Sandiford 0 siblings, 1 reply; 18+ messages in thread From: Paolo Bonzini @ 2007-04-13 18:10 UTC (permalink / raw) To: Ian Lance Taylor; +Cc: Fu, Chao-Ying, Richard Sandiford, GCC Development >> This should be a post-reload (i.e. predicated on reload_completed) >> split, I think. > > Actually, with the relatively recent lower-subreg work, it is > desirable to split this sort of instruction before reload. That is, > do an unconditional split. Right. Combine cannot cope with the resulting 4-insn sequence and merge it back with the multiplication, but split is ran after combine and before the second lower-subreg pass. So, making this an unconditional split should be the best of both worlds. Paolo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [MIPS] MADD issue 2007-04-13 18:10 ` Paolo Bonzini @ 2007-04-13 18:33 ` Richard Sandiford 0 siblings, 0 replies; 18+ messages in thread From: Richard Sandiford @ 2007-04-13 18:33 UTC (permalink / raw) To: bonzini; +Cc: Ian Lance Taylor, Fu, Chao-Ying, GCC Development Paolo Bonzini <paolo.bonzini@lu.unisi.ch> writes: >>> This should be a post-reload (i.e. predicated on reload_completed) >>> split, I think. >> >> Actually, with the relatively recent lower-subreg work, it is >> desirable to split this sort of instruction before reload. That is, >> do an unconditional split. > > Right. Combine cannot cope with the resulting 4-insn sequence and merge > it back with the multiplication, but split is ran after combine and > before the second lower-subreg pass. > > So, making this an unconditional split should be the best of both worlds. The problem is, combine is also one of the passes that was able to optimise the split form so effectively. It's not the best of both worlds from that point of view. Richard ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [MIPS] MADD issue 2007-04-13 0:26 [MIPS] MADD issue Fu, Chao-Ying 2007-04-13 13:07 ` Paolo Bonzini @ 2007-04-13 14:25 ` Richard Sandiford 2007-04-13 15:23 ` Nigel Stephens 1 sibling, 1 reply; 18+ messages in thread From: Richard Sandiford @ 2007-04-13 14:25 UTC (permalink / raw) To: Fu, Chao-Ying; +Cc: gcc, Thekkath, Radhika, Stephens, Nigel "Fu, Chao-Ying" <fu@mips.com> writes: > After tracing GCC 4.x to see why MADD is not generated for MIPS32, > I found out the main issue is that the pattern "adddi3" > is not available for MIPS32. Because the missing > of adddi3, GCC 4.x needs to split 64-bit addition to 4 separate > RTL insns. This leads to that the combining phase fails > to combine RTL insns to a single madd pattern. > > Could we enable "adddi3" for MIPS32 in GCC 4.x? Or is there a > better way to generate MADD? Thanks a lot! The problem with: > Ex: (mips.md in GCC 3.4) > (define_expand "adddi3" > [(parallel [(set (match_operand:DI 0 "register_operand" "") > (plus:DI (match_operand:DI 1 "register_operand" "") > (match_operand:DI 2 "arith_operand" ""))) > (clobber (match_dup 3))])] > "TARGET_64BIT || (!TARGET_DEBUG_G_MODE && !TARGET_MIPS16)" > { > .... > > (define_insn "adddi3_internal_1" > [(set (match_operand:DI 0 "register_operand" "=d,&d") > (plus:DI (match_operand:DI 1 "register_operand" "0,d") > (match_operand:DI 2 "register_operand" "d,d"))) > (clobber (match_operand:SI 3 "register_operand" "=d,d"))] > "!TARGET_64BIT && !TARGET_DEBUG_G_MODE && !TARGET_MIPS16" > { > return (REGNO (operands[0]) == REGNO (operands[1]) > && REGNO (operands[0]) == REGNO (operands[2])) > ? "srl\t%3,%L0,31\;sll\t%M0,%M0,1\;sll\t%L0,%L1,1\;addu\t%M0,%M0,%3" > : "addu\t%L0,%L1,%L2\;sltu\t%3,%L0,%L2\;addu\t%M0,%M1,%M2\;addu\t%M0,%M0,%3"; > } > [(set_attr "type" "darith") > (set_attr "mode" "DI") > (set_attr "length" "16")]) ...this was that it tended to be very poor for the additions themselves. When optabs.c implements the additions instead, the early RTL optimisers get to see the individual instructions, and are often able to handle constant or part-constant operands better. This led to a noticable size improvement when I tested it originally. (I imagine the effects are even better now, thanks to the subreg lowering pass.) See: http://gcc.gnu.org/ml/gcc-patches/2004-05/msg00947.html for the patch that made this change, and some rationale. As far as madd goes, I think it would be better to either (a) get combine to handle this situation or (b) get expand to generate a fused multiply-add from the outset. (b) sounds like it might be useful in its own right. At the moment we treat the generation of floating-point multiply-adds as an optimisation, but in some applications it's critical not to round the intermediate result. (I don't know if there's a bugzilla entry about this.) If we treated fused multiply-add as a primitive operation, we could extend it to integer types too. In this case we'd also need to handle widening multiplications, but we already need to do that for stand-alone multiplications. Just random musings, and probably not the answer you wanted to hear, sorry. Richard ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [MIPS] MADD issue 2007-04-13 14:25 ` Richard Sandiford @ 2007-04-13 15:23 ` Nigel Stephens 2007-04-13 15:34 ` Richard Sandiford 0 siblings, 1 reply; 18+ messages in thread From: Nigel Stephens @ 2007-04-13 15:23 UTC (permalink / raw) To: richard; +Cc: Fu, Chao-Ying, gcc, Thekkath, Radhika Richard Sandiford wrote: > > As far as madd goes, I think it would be better to either > (a) get combine to handle this situation or (b) get expand > to generate a fused multiply-add from the outset. > > (b) sounds like it might be useful in its own right. At the moment we > treat the generation of floating-point multiply-adds as an optimisation, > but in some applications it's critical not to round the intermediate > result. (I don't know if there's a bugzilla entry about this.) > If we treated fused multiply-add as a primitive operation, we could > extend it to integer types too. In this case we'd also need to > handle widening multiplications, but we already need to do that > for stand-alone multiplications. > Richard While I agree with you philosophically, it feels like (b) might be quite a major task. A number of optimisation passes which currently recognise and MUL and PLUS separately (e.g. loop strength reduction) would now need to be extended to handle the fused MULPLUS and MULSUB operators. And although the reduction in instruction count due to your previous change is good, what is it as a percentage of the total? After all it only helps code which uses 64-bit integer types with a 32-bit ABI, which is probably quite a small proportion of most real-life applications -- whereas for some algorithms the ability to use MADD is absolutely critical to performance, and for them losing the ability to generate MADD is a significant backward step for the compiler. How about, as a workaround until (b) sees the light of day, we reimplement adddi3 and subdi3 only (not the other di mode patterns), qualified by ISA_HAS_MADD_MSUB. Perhaps they could also be implemented more cleanly nowadays, using define_insn_and_split and/or a "#" template, to avoid generating multi-instruction assembler sequences. Nigel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [MIPS] MADD issue 2007-04-13 15:23 ` Nigel Stephens @ 2007-04-13 15:34 ` Richard Sandiford 2007-04-13 16:19 ` Nigel Stephens 2007-04-20 14:32 ` Nigel Stephens 0 siblings, 2 replies; 18+ messages in thread From: Richard Sandiford @ 2007-04-13 15:34 UTC (permalink / raw) To: Nigel Stephens; +Cc: Fu, Chao-Ying, gcc, Thekkath, Radhika Nigel Stephens <nigel@mips.com> writes: > While I agree with you philosophically, it feels like (b) might be quite > a major task. A number of optimisation passes which currently recognise > and MUL and PLUS separately (e.g. loop strength reduction) would now > need to be extended to handle the fused MULPLUS and MULSUB operators. > > And although the reduction in instruction count due to your previous > change is good, what is it as a percentage of the total? After all it > only helps code which uses 64-bit integer types with a 32-bit ABI, which > is probably quite a small proportion of most real-life applications -- > whereas for some algorithms the ability to use MADD is absolutely > critical to performance, and for them losing the ability to generate > MADD is a significant backward step for the compiler. > > How about, as a workaround until (b) sees the light of day, we > reimplement adddi3 and subdi3 only (not the other di mode patterns), > qualified by ISA_HAS_MADD_MSUB. Perhaps they could also be implemented > more cleanly nowadays, using define_insn_and_split and/or a "#" > template, to avoid generating multi-instruction assembler sequences. The old patterns had a define_split too. That wasn't really the problem. If you don't want to add a tree code yet, it would still be possible to add the optab and expand support, recognising mult-add sequences in a similar way to how we recognise widening multiplies now. I feel at least that's a step in the right direction. Richard ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [MIPS] MADD issue 2007-04-13 15:34 ` Richard Sandiford @ 2007-04-13 16:19 ` Nigel Stephens 2007-04-20 14:32 ` Nigel Stephens 1 sibling, 0 replies; 18+ messages in thread From: Nigel Stephens @ 2007-04-13 16:19 UTC (permalink / raw) To: richard; +Cc: Fu, Chao-Ying, gcc, Thekkath, Radhika Richard Sandiford wrote: > Nigel Stephens <nigel@mips.com> writes: > >> While I agree with you philosophically, it feels like (b) might be quite >> a major task. A number of optimisation passes which currently recognise >> and MUL and PLUS separately (e.g. loop strength reduction) would now >> need to be extended to handle the fused MULPLUS and MULSUB operators. >> >> And although the reduction in instruction count due to your previous >> change is good, what is it as a percentage of the total? After all it >> only helps code which uses 64-bit integer types with a 32-bit ABI, which >> is probably quite a small proportion of most real-life applications -- >> whereas for some algorithms the ability to use MADD is absolutely >> critical to performance, and for them losing the ability to generate >> MADD is a significant backward step for the compiler. >> >> How about, as a workaround until (b) sees the light of day, we >> reimplement adddi3 and subdi3 only (not the other di mode patterns), >> qualified by ISA_HAS_MADD_MSUB. Perhaps they could also be implemented >> more cleanly nowadays, using define_insn_and_split and/or a "#" >> template, to avoid generating multi-instruction assembler sequences. >> > > The old patterns had a define_split too. That wasn't really the problem. > > If you don't want to add a tree code yet, it would still be possible to > add the optab and expand support, recognising mult-add sequences in a > similar way to how we recognise widening multiplies now. I feel at > least that's a step in the right direction. > OK, we'll have a think about that. Thanks Nigel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [MIPS] MADD issue 2007-04-13 15:34 ` Richard Sandiford 2007-04-13 16:19 ` Nigel Stephens @ 2007-04-20 14:32 ` Nigel Stephens 2007-04-20 14:44 ` Richard Sandiford 1 sibling, 1 reply; 18+ messages in thread From: Nigel Stephens @ 2007-04-20 14:32 UTC (permalink / raw) To: Nigel Stephens, Fu, Chao-Ying, gcc, Thekkath, Radhika, richard Richard Sandiford wrote: > Nigel Stephens <nigel@mips.com> writes: > >> While I agree with you philosophically, it feels like (b) might be quite >> a major task. A number of optimisation passes which currently recognise >> and MUL and PLUS separately (e.g. loop strength reduction) would now >> need to be extended to handle the fused MULPLUS and MULSUB operators. >> >> And although the reduction in instruction count due to your previous >> change is good, what is it as a percentage of the total? After all it >> only helps code which uses 64-bit integer types with a 32-bit ABI, which >> is probably quite a small proportion of most real-life applications -- >> whereas for some algorithms the ability to use MADD is absolutely >> critical to performance, and for them losing the ability to generate >> MADD is a significant backward step for the compiler. >> >> How about, as a workaround until (b) sees the light of day, we >> reimplement adddi3 and subdi3 only (not the other di mode patterns), >> qualified by ISA_HAS_MADD_MSUB. Perhaps they could also be implemented >> more cleanly nowadays, using define_insn_and_split and/or a "#" >> template, to avoid generating multi-instruction assembler sequences. >> > > The old patterns had a define_split too. That wasn't really the problem. > > If you don't want to add a tree code yet, it would still be possible to > add the optab and expand support, recognising mult-add sequences in a > similar way to how we recognise widening multiplies now. I feel at > least that's a step in the right direction. > > Richard I notice that at least the 32-bit rs6000, i386, sparc, frv, sh, cris, mcore, score, arm & pa backends still implement adddi3 as either a define_insn which outputs two instructions or an explicit define_expand followed define_split and associated sub patterns. Are we setting the bar too high for MIPS? :) Whilst I'm sure that your proposal is the right one going forward, it still feels like it could be significant amount of work to implement. And the simplified optab/expand support would only work for multiply-add sequences expressed within a single expression, and wouldn't be able to optimise disjoint multiply, then add expressions. Or have I missed something. in the short term we really do need to reenable madd/msub for MIPS32 targets in GCC 4. We could do that with a local patch to put back adddi3, but it would be better if we could coordinate this with you. Nigel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [MIPS] MADD issue 2007-04-20 14:32 ` Nigel Stephens @ 2007-04-20 14:44 ` Richard Sandiford 2007-04-20 15:05 ` Nigel Stephens 2007-04-20 16:46 ` Ian Lance Taylor 0 siblings, 2 replies; 18+ messages in thread From: Richard Sandiford @ 2007-04-20 14:44 UTC (permalink / raw) To: Nigel Stephens; +Cc: Fu, Chao-Ying, gcc, Thekkath, Radhika Nigel Stephens <nigel@mips.com> writes: > I notice that at least the 32-bit rs6000, i386, sparc, frv, sh, cris, > mcore, score, arm & pa backends still implement adddi3 as either a > define_insn which outputs two instructions or an explicit define_expand > followed define_split and associated sub patterns. Are we setting the > bar too high for MIPS? :) I don't think that follows. The main reason that ports like rs6000, i386, arm, sparc and pa define adddi3 is because those architectures provide special add-with-carry type instructions, or similar architecture-specific optimisations. MIPS has nothing like that. The old MIPS patterns just re-implemented the standard optabs version (and often did so less efficiently, as I said before). > Whilst I'm sure that your proposal is the right one going forward, it > still feels like it could be significant amount of work to implement. > And the simplified optab/expand support would only work for multiply-add > sequences expressed within a single expression, and wouldn't be able to > optimise disjoint multiply, then add expressions. Or have I missed > something. I don't think that follows either. Out-of-ssa should coalesce them if they are in the same basic block. And if they aren't in the same basic block, that's presumably because the tree optimisers think they shouldn't be. Combine wouldn't look across basic block boundaries either AFAIK. E.g. compiling: -------------------------------------------------------------------- typedef unsigned long long ull; typedef unsigned int ui; ull foo (ui x, ui y, ull z) { ull tmp = (ull) x * y; return tmp + z; } -------------------------------------------------------------------- with a recent snapshot and "-O2 -fdump-tree-all" shows that the final_cleanup dump does indeed have the combined form: -------------------------------------------------------------------- ;; Function foo (foo) foo (x, y, z) { <bb 2>: return (long long unsigned int) y * (long long unsigned int) x + z; } -------------------------------------------------------------------- I realise no-one else has spoken out in support of me, so perhaps I'm in a minority of one here. But it does seem to me that in the Tree-SSA world, it makes less sense to duplicate standard optabs in the backend purely for the reason of keeping DImode arithmetic around as DImode arithmetic for longer. > in the short term we really do need to reenable madd/msub for MIPS32 > targets in GCC 4. We could do that with a local patch to put back > adddi3, but it would be better if we could coordinate this with you. If removing the patterns had been purely a clean-up, I would be more open to the idea of putting the patterns back. But given that removing the patterns had an optimisation benefit of its own, I'm less open to the idea of adding them back, especially when (as far as I'm concerned) there's a clean way of getting the best of both worlds. Richard ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [MIPS] MADD issue 2007-04-20 14:44 ` Richard Sandiford @ 2007-04-20 15:05 ` Nigel Stephens 2007-04-20 16:38 ` Richard Sandiford 2007-04-20 16:46 ` Ian Lance Taylor 1 sibling, 1 reply; 18+ messages in thread From: Nigel Stephens @ 2007-04-20 15:05 UTC (permalink / raw) To: Nigel Stephens, Fu, Chao-Ying, gcc, Thekkath, Radhika, richard Richard Sandiford wrote: > Nigel Stephens <nigel@mips.com> writes: > >> I notice that at least the 32-bit rs6000, i386, sparc, frv, sh, cris, >> mcore, score, arm & pa backends still implement adddi3 as either a >> define_insn which outputs two instructions or an explicit define_expand >> followed define_split and associated sub patterns. Are we setting the >> bar too high for MIPS? :) >> > > I don't think that follows. The main reason that ports like rs6000, > i386, arm, sparc and pa define adddi3 is because those architectures > provide special add-with-carry type instructions, or similar > architecture-specific optimisations. Right, good point. > MIPS has nothing like that. > Actually the MIPS DSP ASE does have addsc and addwc, which could be used for this purpose. Sadly not subsc and subwc, though. > The old MIPS patterns just re-implemented the standard optabs version > (and often did so less efficiently, as I said before). > > >> Whilst I'm sure that your proposal is the right one going forward, it >> still feels like it could be significant amount of work to implement. >> And the simplified optab/expand support would only work for multiply-add >> sequences expressed within a single expression, and wouldn't be able to >> optimise disjoint multiply, then add expressions. Or have I missed >> something. >> > > I don't think that follows either. Out-of-ssa should coalesce them > if they are in the same basic block. And if they aren't in the > same basic block, that's presumably because the tree optimisers > think they shouldn't be. Combine wouldn't look across basic > block boundaries either AFAIK. > > E.g. compiling: > > -------------------------------------------------------------------- > typedef unsigned long long ull; > typedef unsigned int ui; > > ull foo (ui x, ui y, ull z) > { > ull tmp = (ull) x * y; > return tmp + z; > } > -------------------------------------------------------------------- > > with a recent snapshot and "-O2 -fdump-tree-all" shows that the > final_cleanup dump does indeed have the combined form: > > -------------------------------------------------------------------- > ;; Function foo (foo) > > foo (x, y, z) > { > <bb 2>: > return (long long unsigned int) y * (long long unsigned int) x + z; > > } > -------------------------------------------------------------------- > > Ah I see. Fair enough. > I realise no-one else has spoken out in support of me, so perhaps > I'm in a minority of one here. But it does seem to me that in the > Tree-SSA world, it makes less sense to duplicate standard optabs > in the backend purely for the reason of keeping DImode arithmetic > around as DImode arithmetic for longer. > > >> in the short term we really do need to reenable madd/msub for MIPS32 >> targets in GCC 4. We could do that with a local patch to put back >> adddi3, but it would be better if we could coordinate this with you. >> > > If removing the patterns had been purely a clean-up, I would be more > open to the idea of putting the patterns back. But given that removing > the patterns had an optimisation benefit of its own, I'm less open to > the idea of adding them back, especially when (as far as I'm concerned) > there's a clean way of getting the best of both worlds. > > OK, so maybe as the person who removed adddi3 from the MIPS backend, and the main proponent of the new fused opcodes, you get to volunteer to implement this? :) In the meantime the performance gain from being able to use a widening madd is more important to us than the benefit of improved optimisation of 64-bit addition, so we'll probably have to put adddi3 back in as a local patch. Nigel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [MIPS] MADD issue 2007-04-20 15:05 ` Nigel Stephens @ 2007-04-20 16:38 ` Richard Sandiford 2007-04-21 0:53 ` Fu, Chao-Ying 0 siblings, 1 reply; 18+ messages in thread From: Richard Sandiford @ 2007-04-20 16:38 UTC (permalink / raw) To: Nigel Stephens; +Cc: Fu, Chao-Ying, gcc, Thekkath, Radhika Nigel Stephens <nigel@mips.com> writes: > OK, so maybe as the person who removed adddi3 from the MIPS backend, and > the main proponent of the new fused opcodes, you get to volunteer to > implement this? :) Hey, I was pretty happy with the status quo ;) Richard ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [MIPS] MADD issue 2007-04-20 16:38 ` Richard Sandiford @ 2007-04-21 0:53 ` Fu, Chao-Ying 0 siblings, 0 replies; 18+ messages in thread From: Fu, Chao-Ying @ 2007-04-21 0:53 UTC (permalink / raw) To: Richard Sandiford, Stephens, Nigel; +Cc: gcc, Thekkath, Radhika [-- Attachment #1: Type: text/plain, Size: 1931 bytes --] Hi, How about a patch like this to support signed and unsigned multiplication and addition? We can extend to MSUB/MSUBU as well. Thanks! 2007-04-20 Chao-ying Fu <fu@mips.com> * optabs.c (init_optabs): Initialize smadd_widen_optab and umadd_widen_optab. * optabs.h (optab_index): Add OTI_smadd_widen and OTI_umadd_widen. (smadd_widen_optab, umadd_widen_optab): Define. * genopinit.c (optabs): Add smadd_widen_optab and usmadd_widen_optab. * expr.c (expand_expr_real_1): Add rtx of op2. Process PLUS_EXPR to generate multiplication and add instructions. * config/mips/mips-dspr2.md (maddsidi4, umaddsidi4): New patterns. Ex: (bug72.c) long long t1 (int a, int b, long long c) { return (long long)a * b + c; } unsigned long long t2 (unsigned int a, unsigned int b, unsigned long long c) { return (unsigned long long)a * b + c; } # <812> ~/dev/gcc4x/install32/bin/mipsisa32r2-elf-gcc -S -mdspr2 bug72.c -O t1: mtlo $7 mthi $6 madd $ac0,$5,$4 mflo $7 mfhi $6 move $2,$6 j $31 move $3,$7 t2: mtlo $7 mthi $6 maddu $ac0,$5,$4 mflo $7 mfhi $6 move $2,$6 j $31 move $3,$7 Regards, Chao-ying > -----Original Message----- > From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org]On Behalf Of > Richard Sandiford > Sent: Friday, April 20, 2007 8:05 AM > To: Stephens, Nigel > Cc: Fu, Chao-Ying; gcc@gcc.gnu.org; Thekkath, Radhika > Subject: Re: [MIPS] MADD issue > > > Nigel Stephens <nigel@mips.com> writes: > > OK, so maybe as the person who removed adddi3 from the MIPS > backend, and > > the main proponent of the new fused opcodes, you get to > volunteer to > > implement this? :) > > Hey, I was pretty happy with the status quo ;) > > Richard > [-- Attachment #2: gcc.diff --] [-- Type: application/octet-stream, Size: 6730 bytes --] Index: optabs.c =================================================================== --- optabs.c (revision 123998) +++ optabs.c (working copy) @@ -5439,6 +5439,8 @@ smul_widen_optab = init_optab (UNKNOWN); umul_widen_optab = init_optab (UNKNOWN); usmul_widen_optab = init_optab (UNKNOWN); + smadd_widen_optab = init_optab (UNKNOWN); + umadd_widen_optab = init_optab (UNKNOWN); sdiv_optab = init_optab (DIV); sdivv_optab = init_optabv (DIV); sdivmod_optab = init_optab (UNKNOWN); Index: optabs.h =================================================================== --- optabs.h (revision 123998) +++ optabs.h (working copy) @@ -86,6 +86,10 @@ OTI_umul_widen, /* Widening multiply of one unsigned and one signed operand. */ OTI_usmul_widen, + /* Signed multiply and add with result one machine mode wider than args */ + OTI_smadd_widen, + /* Unigned multiply and add with result one machine mode wider than args */ + OTI_umadd_widen, /* Signed divide */ OTI_sdiv, @@ -307,6 +311,8 @@ #define smul_widen_optab (optab_table[OTI_smul_widen]) #define umul_widen_optab (optab_table[OTI_umul_widen]) #define usmul_widen_optab (optab_table[OTI_usmul_widen]) +#define smadd_widen_optab (optab_table[OTI_smadd_widen]) +#define umadd_widen_optab (optab_table[OTI_umadd_widen]) #define sdiv_optab (optab_table[OTI_sdiv]) #define smulv_optab (optab_table[OTI_smulv]) #define sdivv_optab (optab_table[OTI_sdivv]) Index: genopinit.c =================================================================== --- genopinit.c (revision 123998) +++ genopinit.c (working copy) @@ -85,6 +85,8 @@ "smul_widen_optab->handlers[$B].insn_code = CODE_FOR_$(mul$a$b3$)$N", "umul_widen_optab->handlers[$B].insn_code = CODE_FOR_$(umul$a$b3$)$N", "usmul_widen_optab->handlers[$B].insn_code = CODE_FOR_$(usmul$a$b3$)$N", + "smadd_widen_optab->handlers[$B].insn_code = CODE_FOR_$(madd$a$b4$)$N", + "umadd_widen_optab->handlers[$B].insn_code = CODE_FOR_$(umadd$a$b4$)$N", "sdiv_optab->handlers[$A].insn_code = CODE_FOR_$(div$a3$)", "sdivv_optab->handlers[$A].insn_code = CODE_FOR_$(div$V$I$a3$)", "udiv_optab->handlers[$A].insn_code = CODE_FOR_$(udiv$I$a3$)", Index: expr.c =================================================================== --- expr.c (revision 123998) +++ expr.c (working copy) @@ -6824,7 +6824,7 @@ expand_expr_real_1 (tree exp, rtx target, enum machine_mode tmode, enum expand_modifier modifier, rtx *alt_rtl) { - rtx op0, op1, temp, decl_rtl; + rtx op0, op1, op2, temp, decl_rtl; tree type; int unsignedp; enum machine_mode mode; @@ -7977,6 +7977,80 @@ return op0; case PLUS_EXPR: + /* Check if this is a case for multiplication and addition. */ + if (TREE_CODE (type) == INTEGER_TYPE + && TREE_CODE (TREE_OPERAND (exp, 0)) == MULT_EXPR) + { + subexp0 = TREE_OPERAND (exp, 0); + tree subsubexp0 = TREE_OPERAND (subexp0, 0); + tree subsubexp1 = TREE_OPERAND (subexp0, 1); + enum tree_code code0 = TREE_CODE (subsubexp0); + enum tree_code code1 = TREE_CODE (subsubexp1); + if (code0 == NOP_EXPR && code1 == NOP_EXPR + && TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (subsubexp0, 0))) + < TYPE_PRECISION (TREE_TYPE (subsubexp0)) + && TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (subsubexp1, 0))) + < TYPE_PRECISION (TREE_TYPE (subsubexp1)) + && TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (subsubexp0, 0))) + == TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (subsubexp1, 0)))) + { + tree op0type = TREE_TYPE (TREE_OPERAND (subsubexp0, 0)); + enum machine_mode innermode = TYPE_MODE (op0type); + bool zextend_p = TYPE_UNSIGNED (op0type); + this_optab = zextend_p ? umadd_widen_optab : smadd_widen_optab; + if (mode == GET_MODE_2XWIDER_MODE (innermode) + && this_optab->handlers[(int) mode].insn_code + != CODE_FOR_nothing) + { + int icode = this_optab->handlers[(int) mode].insn_code; + enum machine_mode target_mode, mode0, mode1, mode2; + rtx xop0, xop1, xop2; + rtx last, pat, temp; + + expand_operands (TREE_OPERAND (subsubexp0, 0), + TREE_OPERAND (subsubexp1, 0), + NULL_RTX, &op0, &op1, EXPAND_NORMAL); + op2 = expand_expr (TREE_OPERAND (exp, 1), subtarget, + VOIDmode, 0); + target_mode = insn_data[icode].operand[0].mode; + mode0 = insn_data[icode].operand[1].mode; + mode1 = insn_data[icode].operand[2].mode; + mode2 = insn_data[icode].operand[3].mode; + xop0 = op0; + xop1 = op1; + xop2 = op2; + last = get_last_insn (); + + if (target && modifier != EXPAND_STACK_PARM) + temp = target; + else + temp = gen_reg_rtx (mode); + + if (!insn_data[icode].operand[0].predicate (temp, + target_mode)) + temp = gen_reg_rtx (target_mode); + if (!insn_data[icode].operand[1].predicate (xop0, mode0) + && mode0 != VOIDmode) + xop0 = copy_to_mode_reg (mode0, xop0); + if (!insn_data[icode].operand[2].predicate (xop1, mode1) + && mode1 != VOIDmode) + xop1 = copy_to_mode_reg (mode1, xop1); + if (!insn_data[icode].operand[3].predicate (xop2, mode2) + && mode2 != VOIDmode) + xop2 = copy_to_mode_reg (mode2, xop2); + pat = GEN_FCN (icode) (temp, xop0, xop1, xop2); + if (pat) + { + emit_insn (pat); + return REDUCE_BIT_FIELD (temp); + } + else + delete_insns_since (last); + } + } + } + + /* If we are adding a constant, a VAR_DECL that is sp, fp, or ap, and something else, make sure we add the register to the constant and then to the other thing. This case can occur during strength Index: config/mips/mips-dspr2.md =================================================================== --- config/mips/mips-dspr2.md (revision 123998) +++ config/mips/mips-dspr2.md (working copy) @@ -624,3 +624,29 @@ [(set_attr "type" "imadd") (set_attr "mode" "SI")]) +(define_insn "maddsidi4" + [(set (match_operand:DI 0 "register_operand" "=a") + (plus:DI + (mult:DI (sign_extend:DI + (match_operand:SI 1 "register_operand" "d")) + (sign_extend:DI + (match_operand:SI 2 "register_operand" "d"))) + (match_operand:DI 3 "register_operand" "0")))] + "TARGET_DSPR2 && !TARGET_64BIT" + "madd\t%q0,%1,%2" + [(set_attr "type" "imadd") + (set_attr "mode" "SI")]) + +(define_insn "umaddsidi4" + [(set (match_operand:DI 0 "register_operand" "=a") + (plus:DI + (mult:DI (zero_extend:DI + (match_operand:SI 1 "register_operand" "d")) + (zero_extend:DI + (match_operand:SI 2 "register_operand" "d"))) + (match_operand:DI 3 "register_operand" "0")))] + "TARGET_DSPR2 && !TARGET_64BIT" + "maddu\t%q0,%1,%2" + [(set_attr "type" "imadd") + (set_attr "mode" "SI")]) + ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [MIPS] MADD issue 2007-04-20 14:44 ` Richard Sandiford 2007-04-20 15:05 ` Nigel Stephens @ 2007-04-20 16:46 ` Ian Lance Taylor 2007-04-20 16:57 ` Richard Sandiford 1 sibling, 1 reply; 18+ messages in thread From: Ian Lance Taylor @ 2007-04-20 16:46 UTC (permalink / raw) To: Richard Sandiford; +Cc: Nigel Stephens, Fu, Chao-Ying, gcc, Thekkath, Radhika Richard Sandiford <richard@codesourcery.com> writes: > I realise no-one else has spoken out in support of me, so perhaps > I'm in a minority of one here. But it does seem to me that in the > Tree-SSA world, it makes less sense to duplicate standard optabs > in the backend purely for the reason of keeping DImode arithmetic > around as DImode arithmetic for longer. The main issue I know of is the RTL level loop optimizers (combine and CSE can mostly work off of REG_EQUAL notes). If you define_expand adddi3, they won't be able to handle loops using long long types. Whether this matters in practice for real code, I don't know. Certainly adddi3 and friends should not be straight define_insns, as they used to be for MIPS. With the lower-subreg pass, they should be either define_expand to individual insns or define_insn_and_split with an unconditional split before reload. Ian ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [MIPS] MADD issue 2007-04-20 16:46 ` Ian Lance Taylor @ 2007-04-20 16:57 ` Richard Sandiford 2007-04-20 17:00 ` Ian Lance Taylor 0 siblings, 1 reply; 18+ messages in thread From: Richard Sandiford @ 2007-04-20 16:57 UTC (permalink / raw) To: Ian Lance Taylor; +Cc: Nigel Stephens, Fu, Chao-Ying, gcc, Thekkath, Radhika Ian Lance Taylor <iant@google.com> writes: > Richard Sandiford <richard@codesourcery.com> writes: >> I realise no-one else has spoken out in support of me, so perhaps >> I'm in a minority of one here. But it does seem to me that in the >> Tree-SSA world, it makes less sense to duplicate standard optabs >> in the backend purely for the reason of keeping DImode arithmetic >> around as DImode arithmetic for longer. > > The main issue I know of is the RTL level loop optimizers (combine and > CSE can mostly work off of REG_EQUAL notes). If you define_expand > adddi3, they won't be able to handle loops using long long types. > Whether this matters in practice for real code, I don't know. My point was that I thought the interesting multi-word optimisations (including loop optimisations) ought to be done at the tree level instead, and that the main focus of the RTL optimisers ought to be optimising things after machine-specific information has been exposed. In contrast, the MIPS define_insn & define_splits existed specifically to avoid exposing machine-specific information to those optimisations. I'm not sure from your reply whether you disagree (although it sounds like you might). > Certainly adddi3 and friends should not be straight define_insns, as > they used to be for MIPS. With the lower-subreg pass, they should be > either define_expand to individual insns or define_insn_and_split with > an unconditional split before reload. Well, the old patterns had define_splits too. I don't think that was really the problem. Richard ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [MIPS] MADD issue 2007-04-20 16:57 ` Richard Sandiford @ 2007-04-20 17:00 ` Ian Lance Taylor 2007-04-20 17:06 ` Richard Sandiford 0 siblings, 1 reply; 18+ messages in thread From: Ian Lance Taylor @ 2007-04-20 17:00 UTC (permalink / raw) To: Richard Sandiford; +Cc: Nigel Stephens, Fu, Chao-Ying, gcc, Thekkath, Radhika Richard Sandiford <richard@codesourcery.com> writes: > Ian Lance Taylor <iant@google.com> writes: > > Richard Sandiford <richard@codesourcery.com> writes: > >> I realise no-one else has spoken out in support of me, so perhaps > >> I'm in a minority of one here. But it does seem to me that in the > >> Tree-SSA world, it makes less sense to duplicate standard optabs > >> in the backend purely for the reason of keeping DImode arithmetic > >> around as DImode arithmetic for longer. > > > > The main issue I know of is the RTL level loop optimizers (combine and > > CSE can mostly work off of REG_EQUAL notes). If you define_expand > > adddi3, they won't be able to handle loops using long long types. > > Whether this matters in practice for real code, I don't know. > > My point was that I thought the interesting multi-word optimisations > (including loop optimisations) ought to be done at the tree level instead, > and that the main focus of the RTL optimisers ought to be optimising > things after machine-specific information has been exposed. In contrast, > the MIPS define_insn & define_splits existed specifically to avoid > exposing machine-specific information to those optimisations. I'm not > sure from your reply whether you disagree (although it sounds like you might). I suppose I neither agree nor disagree. It's a matter for testing. It's clear that with our present scheme there are loop optimization opportunities at the RTL level in the form of hoisting new loop invariants created by expanding the addressing modes. And, of course, some machines have specific loop instructions which currently can only be handled at the RTL level. However, those should be more or less independent of adddi3. Ian ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [MIPS] MADD issue 2007-04-20 17:00 ` Ian Lance Taylor @ 2007-04-20 17:06 ` Richard Sandiford 0 siblings, 0 replies; 18+ messages in thread From: Richard Sandiford @ 2007-04-20 17:06 UTC (permalink / raw) To: Ian Lance Taylor; +Cc: Nigel Stephens, Fu, Chao-Ying, gcc, Thekkath, Radhika Ian Lance Taylor <iant@google.com> writes: > Richard Sandiford <richard@codesourcery.com> writes: > >> Ian Lance Taylor <iant@google.com> writes: >> > Richard Sandiford <richard@codesourcery.com> writes: >> >> I realise no-one else has spoken out in support of me, so perhaps >> >> I'm in a minority of one here. But it does seem to me that in the >> >> Tree-SSA world, it makes less sense to duplicate standard optabs >> >> in the backend purely for the reason of keeping DImode arithmetic >> >> around as DImode arithmetic for longer. >> > >> > The main issue I know of is the RTL level loop optimizers (combine and >> > CSE can mostly work off of REG_EQUAL notes). If you define_expand >> > adddi3, they won't be able to handle loops using long long types. >> > Whether this matters in practice for real code, I don't know. >> >> My point was that I thought the interesting multi-word optimisations >> (including loop optimisations) ought to be done at the tree level instead, >> and that the main focus of the RTL optimisers ought to be optimising >> things after machine-specific information has been exposed. In contrast, >> the MIPS define_insn & define_splits existed specifically to avoid >> exposing machine-specific information to those optimisations. I'm not >> sure from your reply whether you disagree (although it sounds like you might). > > I suppose I neither agree nor disagree. It's a matter for testing. > > It's clear that with our present scheme there are loop optimization > opportunities at the RTL level in the form of hoisting new loop > invariants created by expanding the addressing modes. And, of course, > some machines have specific loop instructions which currently can only > be handled at the RTL level. Right. In case my message might be interpreted as saying that we shouldn't have RTL loop optimisers, that wasn't the intention at all. MIPS definitely benefits from them with %hi accesses, etc. It was more the opposite: splitting the instructions early ought to give the RTL optimisers the opportunity to do more things that the tree optimisers simply couldn't. OTOH, trying to give the RTL optimisers the same sort of arithmetic operations that the tree level had seems like going out of our way to make the RTL optimisers repeat work. > However, those should be more or less independent of adddi3. Yeah, I hope so. Richard ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2007-04-20 23:59 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-04-13 0:26 [MIPS] MADD issue Fu, Chao-Ying 2007-04-13 13:07 ` Paolo Bonzini 2007-04-13 17:05 ` Ian Lance Taylor 2007-04-13 18:10 ` Paolo Bonzini 2007-04-13 18:33 ` Richard Sandiford 2007-04-13 14:25 ` Richard Sandiford 2007-04-13 15:23 ` Nigel Stephens 2007-04-13 15:34 ` Richard Sandiford 2007-04-13 16:19 ` Nigel Stephens 2007-04-20 14:32 ` Nigel Stephens 2007-04-20 14:44 ` Richard Sandiford 2007-04-20 15:05 ` Nigel Stephens 2007-04-20 16:38 ` Richard Sandiford 2007-04-21 0:53 ` Fu, Chao-Ying 2007-04-20 16:46 ` Ian Lance Taylor 2007-04-20 16:57 ` Richard Sandiford 2007-04-20 17:00 ` Ian Lance Taylor 2007-04-20 17:06 ` Richard Sandiford
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).