On 11/30/16 13:01, Wilco Dijkstra wrote: > Bernd Edlinger wrote: >> On 11/29/16 16:06, Wilco Dijkstra wrote: >>> Bernd Edlinger wrote: >>> >>> - "TARGET_32BIT && reload_completed >>> + "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed) >>> && ! (TARGET_NEON && IS_VFP_REGNUM (REGNO (operands[0])))" >>> >>> This is equivalent to "&& (!TARGET_IWMMXT || reload_completed)" since we're >>> already excluding NEON. >> >> Aehm, no. This would split the addi_neon insn before it is clear >> if the reload pass will assign a VFP register. > > Hmm that's strange... This instruction shouldn't be used to also split some random > Neon pattern - for example arm_subdi3 doesn't do the same. To understand and > reason about any of these complex patterns they should all work in the same way... > I was a bit surprised as well, when I saw that happen. But subdi3 is different: "TARGET_32BIT && !TARGET_NEON" "#" ; "subs\\t%Q0, %Q1, %Q2\;sbc\\t%R0, %R1, %R2" "&& reload_completed" so this never splits anything if TARGET_NEON. but adddi3 can not expand if TARGET_NEON but it's pattern simply looks exactly like the addi3_neon: (define_insn_and_split "*arm_adddi3" [(set (match_operand:DI 0 "s_register_operand" "=&r,&r,&r,&r,&r") (plus:DI (match_operand:DI 1 "s_register_operand" "%0, 0, r, 0, r") (match_operand:DI 2 "arm_adddi_operand" "r, 0, r, Dd, Dd"))) (clobber (reg:CC CC_REGNUM))] "TARGET_32BIT && !TARGET_NEON" "#" "TARGET_32BIT && reload_completed && ! (TARGET_NEON && IS_VFP_REGNUM (REGNO (operands[0])))" (define_insn "adddi3_neon" [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?w,?&r,?&r,?&r") (plus:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,w,r,0,r") (match_operand:DI 2 "arm_adddi_operand" "w,r,0,w,r,Dd,Dd"))) (clobber (reg:CC CC_REGNUM))] "TARGET_NEON" { switch (which_alternative) { case 0: /* fall through */ case 3: return "vadd.i64\t%P0, %P1, %P2"; case 1: return "#"; case 2: return "#"; case 4: return "#"; case 5: return "#"; case 6: return "#"; default: gcc_unreachable (); } Even the return "#" explicitly invokes the former pattern. So I think the author knew that, and did it on purpose. >> But when I make *arm_cmpdi_insn split early, it ICEs: > > (insn 4870 4869 1636 87 (set (scratch:SI) > (minus:SI (minus:SI (subreg:SI (reg:DI 2261) 4) > (subreg:SI (reg:DI 473 [ X$14 ]) 4)) > (ltu:SI (reg:CC_C 100 cc) > (const_int 0 [0])))) "pr77308-2.c":140 -1 > (nil)) > > That's easy, we don't have a sbcs , r1, r2 pattern. A quick workaround is > to create a temporary for operand[2] (if before reload) so it will match the standard > sbcs pattern, and then the split works fine. > >> So it is certainly possible, but not really simple to improve the >> stack size even further. But I would prefer to do that in a >> separate patch. > > Yes separate patches would be fine. However there is a lot of scope to improve this > further. For example after your patch shifts and logical operations are expanded in > expand, add/sub are in split1 after combine runs and everything else is split after > reload. It doesn't make sense to split different operations at different times - it means > you're still going to get the bad DImode subregs and miss lots of optimization > opportunities due to the mix of partly split and partly not-yet-split operations. > Yes. I did the add/sub differently because it was more easy this way, and it was simply sufficient to make the existing test cases happy. Also, the biggest benefit was IIRC from the very early splitting of the anddi/iordi/xordi patterns, because they have completely separate data flow in low and high parts. And that is not the case for the arihmetic patterns, but nevertheless they can still be optimized, preferably, when a new test case is found, that can demonstrate an improvement. I am not sure why the cmpdi pattern have an influence at all, because from the data flow you need all 64 bits of both sides. Nevertheless it is a fact: With the modified test case I get 264 bytes frame size, and that was 1920 before. I attached the completely untested follow-up patch now, but I would like to post that one again for review, after I applied my current patch, which is still waiting for final review (please feel pinged!). This is really exciting... Thanks Bernd.