* [RS6000] Fix PR61098, Poor code setting count register @ 2014-05-08 1:49 Alan Modra 2014-05-08 13:48 ` David Edelsohn 0 siblings, 1 reply; 12+ messages in thread From: Alan Modra @ 2014-05-08 1:49 UTC (permalink / raw) To: gcc-patches; +Cc: David Edelsohn On powerpc64, to set a large loop count we have code like the following after split1: (insn 67 14 68 4 (set (reg:DI 160) (const_int 99942400 [0x5f50000])) /home/amodra/unaligned_load.c:14 -1 (nil)) (insn 68 67 42 4 (set (reg:DI 160) (ior:DI (reg:DI 160) (const_int 57600 [0xe100]))) /home/amodra/unaligned_load.c:14 -1 (expr_list:REG_EQUAL (const_int 100000000 [0x5f5e100]) (nil))) and then test for loop exit with: (jump_insn 65 31 45 5 (parallel [ (set (pc) (if_then_else (ne (reg:DI 160) (const_int 1 [0x1])) (label_ref:DI 42) (pc))) (set (reg:DI 160) (plus:DI (reg:DI 160) (const_int -1 [0xffffffffffffffff]))) (clobber (scratch:CC)) (clobber (scratch:DI)) ]) /home/amodra/unaligned_load.c:15 800 {*ctrdi_internal1} (int_list:REG_BR_PROB 9899 (nil)) -> 42) The jump_insn of course is meant for use with bdnz, which implies a strong preference for reg 160 to live in the count register. Trouble is, the count register doesn't do arithmetic. So, use a new psuedo for intermediate results. On looking at this, I noticed the !TARGET_POWERPC64 code in rs6000_emit_set_long_const was broken, apparently expecting c1 and c2 to be the high and low 32 bits of the constant. That's no longer true, so I've fixed that as well. Bootstrapped and regression tested powerpc64-linux. OK for mainline and branches? PR target/61098 * config/rs6000/rs6000.c (rs6000_emit_set_const): Remove unneeded params and return value. Simplify. Update comment. (rs6000_emit_set_long_const): Remove unneeded param and return value. Correct !TARGET_POWERPC64 handling of constants > 2G. If we can, use a new pseudo for intermediate calculations. Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 209926) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -1068,7 +1069,7 @@ static tree rs6000_handle_longcall_attribute (tree static tree rs6000_handle_altivec_attribute (tree *, tree, tree, int, bool *); static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *); static tree rs6000_builtin_vectorized_libmass (tree, tree, tree); -static rtx rs6000_emit_set_long_const (rtx, HOST_WIDE_INT, HOST_WIDE_INT); +static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT); static int rs6000_memory_move_cost (enum machine_mode, reg_class_t, bool); static bool rs6000_debug_rtx_costs (rtx, int, int, int, int *, bool); static int rs6000_debug_address_cost (rtx, enum machine_mode, addr_space_t, @@ -7826,53 +7811,36 @@ rs6000_conditional_register_usage (void) } \f -/* Try to output insns to set TARGET equal to the constant C if it can - be done in less than N insns. Do all computations in MODE. - Returns the place where the output has been placed if it can be - done and the insns have been emitted. If it would take more than N - insns, zero is returned and no insns and emitted. */ +/* Output insns to set DEST equal to the constant SOURCE. */ -rtx -rs6000_emit_set_const (rtx dest, enum machine_mode mode, - rtx source, int n ATTRIBUTE_UNUSED) +void +rs6000_emit_set_const (rtx dest, rtx source) { - rtx result, insn, set; - HOST_WIDE_INT c0, c1; + enum machine_mode mode = GET_MODE (dest); + rtx temp, insn, set; + HOST_WIDE_INT c; + gcc_checking_assert (CONST_INT_P (source)); + c = INTVAL (source); switch (mode) { - case QImode: + case QImode: case HImode: - if (dest == NULL) - dest = gen_reg_rtx (mode); emit_insn (gen_rtx_SET (VOIDmode, dest, source)); - return dest; + return; case SImode: - result = !can_create_pseudo_p () ? dest : gen_reg_rtx (SImode); + temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (SImode); - emit_insn (gen_rtx_SET (VOIDmode, copy_rtx (result), - GEN_INT (INTVAL (source) - & (~ (HOST_WIDE_INT) 0xffff)))); + emit_insn (gen_rtx_SET (VOIDmode, copy_rtx (temp), + GEN_INT (c & (~ (HOST_WIDE_INT) 0xffff)))); emit_insn (gen_rtx_SET (VOIDmode, dest, - gen_rtx_IOR (SImode, copy_rtx (result), - GEN_INT (INTVAL (source) & 0xffff)))); - result = dest; + gen_rtx_IOR (SImode, copy_rtx (temp), + GEN_INT (c & 0xffff)))); break; case DImode: - switch (GET_CODE (source)) - { - case CONST_INT: - c0 = INTVAL (source); - c1 = -(c0 < 0); - break; - - default: - gcc_unreachable (); - } - - result = rs6000_emit_set_long_const (dest, c0, c1); + rs6000_emit_set_long_const (dest, c); break; default: @@ -7883,37 +7851,38 @@ rs6000_conditional_register_usage (void) set = single_set (insn); if (! CONSTANT_P (SET_SRC (set))) set_unique_reg_note (insn, REG_EQUAL, source); - - return result; } -/* Having failed to find a 3 insn sequence in rs6000_emit_set_const, - fall back to a straight forward decomposition. We do this to avoid - exponential run times encountered when looking for longer sequences - with rs6000_emit_set_const. */ -static rtx -rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c1, HOST_WIDE_INT c2) +/* Output insns to set a DImode DEST equal to the constant C. */ + +static void +rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c) { + rtx temp; + if (!TARGET_POWERPC64) { - rtx operand1, operand2; + rtx hi, lo; - operand1 = operand_subword_force (dest, WORDS_BIG_ENDIAN == 0, - DImode); - operand2 = operand_subword_force (copy_rtx (dest), WORDS_BIG_ENDIAN != 0, - DImode); - emit_move_insn (operand1, GEN_INT (c1)); - emit_move_insn (operand2, GEN_INT (c2)); + hi = operand_subword_force (copy_rtx (dest), WORDS_BIG_ENDIAN == 0, + DImode); + lo = operand_subword_force (dest, WORDS_BIG_ENDIAN != 0, + DImode); + emit_move_insn (hi, GEN_INT (c >> 32)); + c = ((c & 0xffffffff) ^ 0x80000000) - 0x80000000; + emit_move_insn (lo, GEN_INT (c)); } else { HOST_WIDE_INT ud1, ud2, ud3, ud4; - ud1 = c1 & 0xffff; - ud2 = (c1 & 0xffff0000) >> 16; - c2 = c1 >> 32; - ud3 = c2 & 0xffff; - ud4 = (c2 & 0xffff0000) >> 16; + ud1 = c & 0xffff; + c = c >> 16; + ud2 = c & 0xffff; + c = c >> 16; + ud3 = c & 0xffff; + c = c >> 16; + ud4 = c & 0xffff; if ((ud4 == 0xffff && ud3 == 0xffff && ud2 == 0xffff && (ud1 & 0x8000)) || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000))) @@ -7922,67 +7891,74 @@ rs6000_conditional_register_usage (void) else if ((ud4 == 0xffff && ud3 == 0xffff && (ud2 & 0x8000)) || (ud4 == 0 && ud3 == 0 && ! (ud2 & 0x8000))) { - emit_move_insn (dest, GEN_INT (((ud2 << 16) ^ 0x80000000) - - 0x80000000)); + temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); + + emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest, + GEN_INT (((ud2 << 16) ^ 0x80000000) - 0x80000000)); if (ud1 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), + emit_move_insn (dest, + gen_rtx_IOR (DImode, copy_rtx (temp), GEN_INT (ud1))); } else if (ud3 == 0 && ud4 == 0) { + temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); + gcc_assert (ud2 & 0x8000); - emit_move_insn (dest, GEN_INT (((ud2 << 16) ^ 0x80000000) - - 0x80000000)); + emit_move_insn (copy_rtx (temp), + GEN_INT (((ud2 << 16) ^ 0x80000000) - 0x80000000)); if (ud1 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), + emit_move_insn (copy_rtx (temp), + gen_rtx_IOR (DImode, copy_rtx (temp), GEN_INT (ud1))); - emit_move_insn (copy_rtx (dest), + emit_move_insn (dest, gen_rtx_ZERO_EXTEND (DImode, gen_lowpart (SImode, - copy_rtx (dest)))); + copy_rtx (temp)))); } else if ((ud4 == 0xffff && (ud3 & 0x8000)) || (ud4 == 0 && ! (ud3 & 0x8000))) { - emit_move_insn (dest, GEN_INT (((ud3 << 16) ^ 0x80000000) - - 0x80000000)); + temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); + + emit_move_insn (copy_rtx (temp), + GEN_INT (((ud3 << 16) ^ 0x80000000) - 0x80000000)); if (ud2 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), + emit_move_insn (copy_rtx (temp), + gen_rtx_IOR (DImode, copy_rtx (temp), GEN_INT (ud2))); - emit_move_insn (copy_rtx (dest), - gen_rtx_ASHIFT (DImode, copy_rtx (dest), + emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest, + gen_rtx_ASHIFT (DImode, copy_rtx (temp), GEN_INT (16))); if (ud1 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), + emit_move_insn (dest, + gen_rtx_IOR (DImode, copy_rtx (temp), GEN_INT (ud1))); } else { - emit_move_insn (dest, GEN_INT (((ud4 << 16) ^ 0x80000000) - - 0x80000000)); + temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); + + emit_move_insn (copy_rtx (temp), + GEN_INT (((ud4 << 16) ^ 0x80000000) - 0x80000000)); if (ud3 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), + emit_move_insn (copy_rtx (temp), + gen_rtx_IOR (DImode, copy_rtx (temp), GEN_INT (ud3))); - emit_move_insn (copy_rtx (dest), - gen_rtx_ASHIFT (DImode, copy_rtx (dest), + emit_move_insn (ud2 != 0 || ud1 != 0 ? copy_rtx (temp) : dest, + gen_rtx_ASHIFT (DImode, copy_rtx (temp), GEN_INT (32))); if (ud2 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), + emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest, + gen_rtx_IOR (DImode, copy_rtx (temp), GEN_INT (ud2 << 16))); if (ud1 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), + emit_move_insn (dest, + gen_rtx_IOR (DImode, copy_rtx (temp), GEN_INT (ud1))); } } - return dest; } /* Helper for the following. Get rid of [r+r] memory refs Index: gcc/config/rs6000/rs6000-protos.h =================================================================== --- gcc/config/rs6000/rs6000-protos.h (revision 209926) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -114,7 +114,7 @@ extern void rs6000_emit_cbranch (enum machine_mode extern char * output_cbranch (rtx, const char *, int, rtx); extern char * output_e500_flip_gt_bit (rtx, rtx); extern const char * output_probe_stack_range (rtx, rtx); -extern rtx rs6000_emit_set_const (rtx, enum machine_mode, rtx, int); +extern void rs6000_emit_set_const (rtx, rtx); extern int rs6000_emit_cmove (rtx, rtx, rtx, rtx); extern int rs6000_emit_vector_cond_expr (rtx, rtx, rtx, rtx, rtx, rtx); extern void rs6000_emit_minmax (rtx, enum rtx_code, rtx, rtx); Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 209926) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -8978,12 +8978,8 @@ (ior:SI (match_dup 0) (match_dup 3)))] " -{ rtx tem = rs6000_emit_set_const (operands[0], SImode, operands[1], 2); - - if (tem == operands[0]) - DONE; - else - FAIL; +{ rs6000_emit_set_const (operands[0], operands[1]); + DONE; }") (define_insn "*mov<mode>_internal2" @@ -10326,12 +10322,8 @@ [(set (match_dup 0) (match_dup 2)) (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 3)))] " -{ rtx tem = rs6000_emit_set_const (operands[0], DImode, operands[1], 5); - - if (tem == operands[0]) - DONE; - else - FAIL; +{ rs6000_emit_set_const (operands[0], operands[1]); + DONE; }") (define_split @@ -10341,12 +10333,8 @@ [(set (match_dup 0) (match_dup 2)) (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 3)))] " -{ rtx tem = rs6000_emit_set_const (operands[0], DImode, operands[1], 5); - - if (tem == operands[0]) - DONE; - else - FAIL; +{ rs6000_emit_set_const (operands[0], operands[1]); + DONE; }") \f ;; TImode/PTImode is similar, except that we usually want to compute the -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RS6000] Fix PR61098, Poor code setting count register 2014-05-08 1:49 [RS6000] Fix PR61098, Poor code setting count register Alan Modra @ 2014-05-08 13:48 ` David Edelsohn 2014-05-09 2:41 ` Alan Modra 0 siblings, 1 reply; 12+ messages in thread From: David Edelsohn @ 2014-05-08 13:48 UTC (permalink / raw) To: GCC Patches, David Edelsohn On Wed, May 7, 2014 at 9:48 PM, Alan Modra <amodra@gmail.com> wrote: > On powerpc64, to set a large loop count we have code like the > following after split1: > > (insn 67 14 68 4 (set (reg:DI 160) > (const_int 99942400 [0x5f50000])) /home/amodra/unaligned_load.c:14 -1 > (nil)) > (insn 68 67 42 4 (set (reg:DI 160) > (ior:DI (reg:DI 160) > (const_int 57600 [0xe100]))) /home/amodra/unaligned_load.c:14 -1 > (expr_list:REG_EQUAL (const_int 100000000 [0x5f5e100]) > (nil))) > > and then test for loop exit with: > > (jump_insn 65 31 45 5 (parallel [ > (set (pc) > (if_then_else (ne (reg:DI 160) > (const_int 1 [0x1])) > (label_ref:DI 42) > (pc))) > (set (reg:DI 160) > (plus:DI (reg:DI 160) > (const_int -1 [0xffffffffffffffff]))) > (clobber (scratch:CC)) > (clobber (scratch:DI)) > ]) /home/amodra/unaligned_load.c:15 800 {*ctrdi_internal1} > (int_list:REG_BR_PROB 9899 (nil)) > -> 42) > > The jump_insn of course is meant for use with bdnz, which implies a > strong preference for reg 160 to live in the count register. Trouble > is, the count register doesn't do arithmetic. > > So, use a new psuedo for intermediate results. On looking at this, > I noticed the !TARGET_POWERPC64 code in rs6000_emit_set_long_const was > broken, apparently expecting c1 and c2 to be the high and low 32 bits > of the constant. That's no longer true, so I've fixed that as well. > Bootstrapped and regression tested powerpc64-linux. OK for mainline > and branches? > > PR target/61098 > * config/rs6000/rs6000.c (rs6000_emit_set_const): Remove unneeded > params and return value. Simplify. Update comment. > (rs6000_emit_set_long_const): Remove unneeded param and return > value. Correct !TARGET_POWERPC64 handling of constants > 2G. > If we can, use a new pseudo for intermediate calculations. Alan, The history is 32 bit HWI. The ChangeLog does not mention the changes to rs6000.md nor rs6000-protos.h. Please do not remove all of the comments from the two functions. The comments should provide some documentation about the different purposes of the two functions other than setting DEST to a CONST. Why did you remove the test for NULL dest? - if (dest == NULL) - dest = gen_reg_rtx (mode); That could occur, at least it used to occur. I think that the way you rearranged the invocations of copy_rtx() in rs6000_emit_set_long_const() is okay, but it would be good for someone else to double check. Thanks, David ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RS6000] Fix PR61098, Poor code setting count register 2014-05-08 13:48 ` David Edelsohn @ 2014-05-09 2:41 ` Alan Modra 2014-05-11 2:24 ` David Edelsohn 0 siblings, 1 reply; 12+ messages in thread From: Alan Modra @ 2014-05-09 2:41 UTC (permalink / raw) To: David Edelsohn; +Cc: GCC Patches On Thu, May 08, 2014 at 09:48:35AM -0400, David Edelsohn wrote: > The history is 32 bit HWI. Right. > The ChangeLog does not mention the changes to rs6000.md nor rs6000-protos.h. Oops, added. * config/rs6000/rs6000.md (movsi_internal1_single+1): Update call to rs6000_emit_set_const in splitter. (movdi_internal64+2, +3): Likewise. * config/rs6000/rs6000-protos.h (rs6000_emit_set_const): Update prototype. > Please do not remove all of the comments from the two functions. The > comments should provide some documentation about the different > purposes of the two functions other than setting DEST to a CONST. I believe my updated comment covers the complete purpose of the function nowadays. The comments I removed are out-dated, and should have been removed a long time ago.. rs6000_emit_set_const does not even look at N, it always returns a non-zero result, and the return is only tested for non-zero. I removed MODE too, because that is always the same as GET_MODE (dest). > Why did you remove the test for NULL dest? > > - if (dest == NULL) > - dest = gen_reg_rtx (mode); > > That could occur, at least it used to occur. I'm sure we can't get a NULL dest nowadays. All (three) uses of rs6000_emit_set_const occur in splitters. They all must have passed a gpc_reg_operand constraint on operands[0] before calling rs6000_emit_set_const, so if NULL were possible we'd segfault in gpc_reg_operand. > I think that the way you rearranged the invocations of copy_rtx() in > rs6000_emit_set_long_const() is okay, but it would be good for someone > else to double check. Yeah, that function is a bit messy. I took the approach of always use a bare "dest" once in the last instruction emitted, with every other use getting hit with copy_rtx. The previous approach was similar, but used the bare "dest" on the first instruction emitted. Obviously you don't need copy_rtx anywhere with the new code when can_create_pseudo_p is true, but I felt it wasn't worth optimising that for the added source complication. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RS6000] Fix PR61098, Poor code setting count register 2014-05-09 2:41 ` Alan Modra @ 2014-05-11 2:24 ` David Edelsohn 2014-05-11 22:53 ` Alan Modra 2014-05-14 3:05 ` Alan Modra 0 siblings, 2 replies; 12+ messages in thread From: David Edelsohn @ 2014-05-11 2:24 UTC (permalink / raw) To: GCC Patches, Alan Modra On Thu, May 8, 2014 at 10:40 PM, Alan Modra <amodra@gmail.com> wrote: >> Please do not remove all of the comments from the two functions. The >> comments should provide some documentation about the different >> purposes of the two functions other than setting DEST to a CONST. > > I believe my updated comment covers the complete purpose of the > function nowadays. The comments I removed are out-dated, and should > have been removed a long time ago.. rs6000_emit_set_const does not > even look at N, it always returns a non-zero result, and the return is > only tested for non-zero. I removed MODE too, because that is always > the same as GET_MODE (dest). It is helpful if the comment expresses more than restating the information one can glean from the function name. It's useful to note that rs6000_emit_set_long_const is a standard decomposition with a bounded number of instructions. >> I think that the way you rearranged the invocations of copy_rtx() in >> rs6000_emit_set_long_const() is okay, but it would be good for someone >> else to double check. > > Yeah, that function is a bit messy. I took the approach of always use > a bare "dest" once in the last instruction emitted, with every other > use getting hit with copy_rtx. The previous approach was similar, > but used the bare "dest" on the first instruction emitted. Obviously > you don't need copy_rtx anywhere with the new code when > can_create_pseudo_p is true, but I felt it wasn't worth optimising > that for the added source complication. Can you help clarify the removal of the code that tests if the splitter failed? The splitters in the Alpha port follow mostly the same rhythm, with a little bit of further cleanup and consolidation relative to the rs6000 port. alpha_split_const_mov() falls back on alpha_emit_set_long_const(), but checks that the target is valid and allows the splitter to fail. Either the Alpha port is doing unnecessary work or this cleanup patch is too aggressive. Either way, a comment seems necessary. Thanks, David ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RS6000] Fix PR61098, Poor code setting count register 2014-05-11 2:24 ` David Edelsohn @ 2014-05-11 22:53 ` Alan Modra 2014-05-11 23:39 ` Alan Modra 2014-05-14 3:05 ` Alan Modra 1 sibling, 1 reply; 12+ messages in thread From: Alan Modra @ 2014-05-11 22:53 UTC (permalink / raw) To: David Edelsohn; +Cc: GCC Patches On Sat, May 10, 2014 at 10:24:34PM -0400, David Edelsohn wrote: > On Thu, May 8, 2014 at 10:40 PM, Alan Modra <amodra@gmail.com> wrote: > > rs6000_emit_set_const ... always returns a non-zero result ... > > Can you help clarify the removal of the code that tests if the > splitter failed? See above. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RS6000] Fix PR61098, Poor code setting count register 2014-05-11 22:53 ` Alan Modra @ 2014-05-11 23:39 ` Alan Modra 0 siblings, 0 replies; 12+ messages in thread From: Alan Modra @ 2014-05-11 23:39 UTC (permalink / raw) To: David Edelsohn, GCC Patches On Mon, May 12, 2014 at 08:23:16AM +0930, Alan Modra wrote: > On Sat, May 10, 2014 at 10:24:34PM -0400, David Edelsohn wrote: > > On Thu, May 8, 2014 at 10:40 PM, Alan Modra <amodra@gmail.com> wrote: > > > rs6000_emit_set_const ... always returns a non-zero result ... > > > > Can you help clarify the removal of the code that tests if the > > splitter failed? > > See above. On thinking some more, let me retract the patch for the time being. While it's true that I was only removing dead code in the splitters, the question of why this code has become dead is worth looking into. I suspect a previous patch to rs6000_emit_set_const was wrong, and that we should in fact be failing before reload (but never after), when an expansion would take too many instructions. "Too many" being a sequence that is slower than loading a 64-bit constant from the TOC. We try to make that sort of tradeoff in rs6000_emit_move (see num_insn_constant call), but that is really too early. Some manipulations of code modify constants. I've seen cases where rs6000_emit_move decided to inline a constant, but later changes to the value meant the expansion was five instructions. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RS6000] Fix PR61098, Poor code setting count register 2014-05-11 2:24 ` David Edelsohn 2014-05-11 22:53 ` Alan Modra @ 2014-05-14 3:05 ` Alan Modra 2014-05-14 3:46 ` David Edelsohn 1 sibling, 1 reply; 12+ messages in thread From: Alan Modra @ 2014-05-14 3:05 UTC (permalink / raw) To: David Edelsohn; +Cc: GCC Patches On Sat, May 10, 2014 at 10:24:34PM -0400, David Edelsohn wrote: > On Thu, May 8, 2014 at 10:40 PM, Alan Modra <amodra@gmail.com> wrote: > > >> Please do not remove all of the comments from the two functions. The > >> comments should provide some documentation about the different > >> purposes of the two functions other than setting DEST to a CONST. > > > > I believe my updated comment covers the complete purpose of the > > function nowadays. The comments I removed are out-dated, and should > > have been removed a long time ago.. rs6000_emit_set_const does not > > even look at N, it always returns a non-zero result, and the return is > > only tested for non-zero. I removed MODE too, because that is always > > the same as GET_MODE (dest). > > It is helpful if the comment expresses more than restating the > information one can glean from the function name. It's useful to note > that rs6000_emit_set_long_const is a standard decomposition with a > bounded number of instructions. > > >> I think that the way you rearranged the invocations of copy_rtx() in > >> rs6000_emit_set_long_const() is okay, but it would be good for someone > >> else to double check. > > > > Yeah, that function is a bit messy. I took the approach of always use > > a bare "dest" once in the last instruction emitted, with every other > > use getting hit with copy_rtx. The previous approach was similar, > > but used the bare "dest" on the first instruction emitted. Obviously > > you don't need copy_rtx anywhere with the new code when > > can_create_pseudo_p is true, but I felt it wasn't worth optimising > > that for the added source complication. > > Can you help clarify the removal of the code that tests if the > splitter failed? The splitters in the Alpha port follow mostly the > same rhythm, with a little bit of further cleanup and consolidation > relative to the rs6000 port. alpha_split_const_mov() falls back on > alpha_emit_set_long_const(), but checks that the target is valid and > allows the splitter to fail. Either the Alpha port is doing > unnecessary work or this cleanup patch is too aggressive. Either way, > a comment seems necessary. OK, I've had a good look at the history of this code. rs6000_emit_set_const and rs6000_emit_set_long_const were introduced with revision 44516, a largish patch by Dan Berlin. As you hint above, it seems the functions were copied from alpha. So the parameters were unnecessary and the comments just plain wrong for the rs6000 version of code right from the initial commit. Worse, only half of necessary infrastructure was copied from alpha.. So let me lay out what I believe should be happening with (set (reg) (constant)) At expand time, if the above set can't be implemented in a single instruction, then it should be decomposed to the equivalent set high part, ori low part, and possibly shift instructions so long as the resulting sequence is small. I think we basically do this correctly in rs6000_emit_move. See the num_insns_constant call there. Constants that can't be evaluated inline by two (or three) instructions will be replaced with a load from the TOC. The same thing ought to happen in the splitters that use rs6000_emit_set_const. rs6000_emit_set_const should refuse to expand to too many instructions (just like alpha). We don't do this, but if we did, this would leave some (set (reg) (constant)) instructions in the RTL. Alternatively the splitters could generate loads from the TOC, but see pr57836, which shows the loads from the TOC we crafted at expand time being reduced back to (set (reg) (constant)). Finally, at reload time, any remaining (set (reg) (constant)) (ie. those that result in a long inline sequence) should be forced to the TOC. This is the missing part of the infrastructure that wasn't copied from alpha. Our legitimate_constant_p needs to reject some constants.. As it is, reload simply expands to a four or five inline instruction sequence. David, I'd like some help with the legitimate_constant_p implementation. I have something that seems to work (not yet regression tested) but there are a number of things that I'm not clear on (eg. the revision 20229 change) so likely will get it wrong. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RS6000] Fix PR61098, Poor code setting count register 2014-05-14 3:05 ` Alan Modra @ 2014-05-14 3:46 ` David Edelsohn 2014-05-14 9:56 ` Alan Modra 0 siblings, 1 reply; 12+ messages in thread From: David Edelsohn @ 2014-05-14 3:46 UTC (permalink / raw) To: GCC Patches, Alan Modra Alan, Danny may have re-organized the code, but I thought that it originally came from Tom Rixx, if not earlier. I seem to remember problems in the past with late creation of TOC entries for constants causing problems, so it was easier to fall back to materializing all integer constants inline. I don't remember the PRs, but I think there were issues with creating a TOC if the late constant were the only TOC reference, or maybe the issue was buying a stack frame to materialize the TOC/GOT for a late constant. And maximum 5 instruction sequence is not really bad relative to a load from the TOC (even with medium model / data in TOC). There are a lot of trade-offs with respect to I$ expansion versus the load hitting in the L1 D$. Alpha emit_set_const does limit the number of instructions, but that is a search for a more efficient sequence than the naive sequence. The Alpha splitters use alpha_split_const_mov(), which tries alpha_emit_set_const() for an efficient sequence and then falls back to alpha_emit_set_long_const() for a naive sequence. Alpha uses PLUS instead of IOR because of the way its ISA works. alpha_emit_set_long_const() always will materialize the constant and does not check for a maximum number of instructions. This is why it's comment says "fall back to straight forward decomposition". However, alpha_emit_set_long_const() and alpha_split_const_mov() can fail, presumably because emit_move_insn() fails, not because of reaching a maximum number of instructions. alpha_legitimate_constant_p() rejecs expensive constants early. Once the splitter is invoked, it always tries to materialize the constant, but the splitter apparently can fail for other reasons. I don't mind exploring the benefits of tighening up rs6000_legitimate_const(), but I'm not sure it's an obvious win, especially with the potential failure corner cases. However, I want to have a better understanding about the part of the patch that removes the FAIL path from the splitters. Thanks, David On Tue, May 13, 2014 at 11:04 PM, Alan Modra <amodra@gmail.com> wrote: > On Sat, May 10, 2014 at 10:24:34PM -0400, David Edelsohn wrote: >> On Thu, May 8, 2014 at 10:40 PM, Alan Modra <amodra@gmail.com> wrote: >> >> >> Please do not remove all of the comments from the two functions. The >> >> comments should provide some documentation about the different >> >> purposes of the two functions other than setting DEST to a CONST. >> > >> > I believe my updated comment covers the complete purpose of the >> > function nowadays. The comments I removed are out-dated, and should >> > have been removed a long time ago.. rs6000_emit_set_const does not >> > even look at N, it always returns a non-zero result, and the return is >> > only tested for non-zero. I removed MODE too, because that is always >> > the same as GET_MODE (dest). >> >> It is helpful if the comment expresses more than restating the >> information one can glean from the function name. It's useful to note >> that rs6000_emit_set_long_const is a standard decomposition with a >> bounded number of instructions. >> >> >> I think that the way you rearranged the invocations of copy_rtx() in >> >> rs6000_emit_set_long_const() is okay, but it would be good for someone >> >> else to double check. >> > >> > Yeah, that function is a bit messy. I took the approach of always use >> > a bare "dest" once in the last instruction emitted, with every other >> > use getting hit with copy_rtx. The previous approach was similar, >> > but used the bare "dest" on the first instruction emitted. Obviously >> > you don't need copy_rtx anywhere with the new code when >> > can_create_pseudo_p is true, but I felt it wasn't worth optimising >> > that for the added source complication. >> >> Can you help clarify the removal of the code that tests if the >> splitter failed? The splitters in the Alpha port follow mostly the >> same rhythm, with a little bit of further cleanup and consolidation >> relative to the rs6000 port. alpha_split_const_mov() falls back on >> alpha_emit_set_long_const(), but checks that the target is valid and >> allows the splitter to fail. Either the Alpha port is doing >> unnecessary work or this cleanup patch is too aggressive. Either way, >> a comment seems necessary. > > OK, I've had a good look at the history of this code. > > rs6000_emit_set_const and rs6000_emit_set_long_const were introduced > with revision 44516, a largish patch by Dan Berlin. As you hint > above, it seems the functions were copied from alpha. So the > parameters were unnecessary and the comments just plain wrong for the > rs6000 version of code right from the initial commit. Worse, only > half of necessary infrastructure was copied from alpha.. > > So let me lay out what I believe should be happening with > (set (reg) (constant)) > > At expand time, if the above set can't be implemented in a single > instruction, then it should be decomposed to the equivalent set high > part, ori low part, and possibly shift instructions so long as the > resulting sequence is small. I think we basically do this correctly > in rs6000_emit_move. See the num_insns_constant call there. > Constants that can't be evaluated inline by two (or three) > instructions will be replaced with a load from the TOC. > > The same thing ought to happen in the splitters that use > rs6000_emit_set_const. rs6000_emit_set_const should refuse to expand > to too many instructions (just like alpha). We don't do this, but if > we did, this would leave some (set (reg) (constant)) instructions in > the RTL. Alternatively the splitters could generate loads from the > TOC, but see pr57836, which shows the loads from the TOC we crafted > at expand time being reduced back to (set (reg) (constant)). > > Finally, at reload time, any remaining (set (reg) (constant)) > (ie. those that result in a long inline sequence) should be forced to > the TOC. This is the missing part of the infrastructure that wasn't > copied from alpha. Our legitimate_constant_p needs to reject some > constants.. As it is, reload simply expands to a four or five > inline instruction sequence. > > David, I'd like some help with the legitimate_constant_p > implementation. I have something that seems to work (not yet > regression tested) but there are a number of things that I'm not clear > on (eg. the revision 20229 change) so likely will get it wrong. > > -- > Alan Modra > Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RS6000] Fix PR61098, Poor code setting count register 2014-05-14 3:46 ` David Edelsohn @ 2014-05-14 9:56 ` Alan Modra 2014-05-14 21:27 ` David Edelsohn 2014-05-23 15:23 ` Alan Modra 0 siblings, 2 replies; 12+ messages in thread From: Alan Modra @ 2014-05-14 9:56 UTC (permalink / raw) To: David Edelsohn; +Cc: GCC Patches Hi David, On Tue, May 13, 2014 at 11:46:20PM -0400, David Edelsohn wrote: > Danny may have re-organized the code, but I thought that it originally > came from Tom Rixx, if not earlier. OK, I'm not trying to apportion blame. My name is on plenty of dodgy code in the rs6000 backend too. :) > I seem to remember problems in the past with late creation of TOC > entries for constants causing problems, so it was easier to fall back > to materializing all integer constants inline. I don't remember the > PRs, but I think there were issues with creating a TOC if the late > constant were the only TOC reference, or maybe the issue was buying a > stack frame to materialize the TOC/GOT for a late constant. And > maximum 5 instruction sequence is not really bad relative to a load > from the TOC (even with medium model / data in TOC). There are a lot > of trade-offs with respect to I$ expansion versus the load hitting in > the L1 D$. Sure, but Steve will tell you that the 5 instruction sequence is both slower due to all the dependent ops, and results in larger code+data size. We definitely want to avoid it if possible, and pr67836 shows a case taken from glibc math library code where there should be no problem in using the TOC. > Alpha emit_set_const does limit the number of instructions, but that > is a search for a more efficient sequence than the naive sequence. The > Alpha splitters use alpha_split_const_mov(), which tries > alpha_emit_set_const() for an efficient sequence and then falls back > to alpha_emit_set_long_const() for a naive sequence. Alpha uses PLUS > instead of IOR because of the way its ISA works. > alpha_emit_set_long_const() always will materialize the constant and > does not check for a maximum number of instructions. This is why it's > comment says "fall back to straight forward decomposition". > > However, alpha_emit_set_long_const() and alpha_split_const_mov() can > fail, presumably because emit_move_insn() fails, not because of > reaching a maximum number of instructions. > > alpha_legitimate_constant_p() rejecs expensive constants early. Once > the splitter is invoked, it always tries to materialize the constant, > but the splitter apparently can fail for other reasons. No, that is wrong. alpha_emit_set_const does *not* always try to materialize the constant inline. It does so for constants that need more than three instructions only when TARGET_BUILD_CONSTANTS. > I don't mind exploring the benefits of tighening up > rs6000_legitimate_const(), but I'm not sure it's an obvious win, > especially with the potential failure corner cases. Yes, those potential corner cases have me worried too.. > However, I want to have a better understanding about the part of the > patch that removes the FAIL path from the splitters. That part was really quite simple. I was removing dead code. rs6000_emit_set_const has never returned anything but DEST, right from the initial commit. It can't be called with DEST == NULL, so "dest = gen_reg_rtx (mode)" is also dead code. However, I retracted that patch because I now think rs6000_emit_set_const should in fact sometimes result in the splitter failing, exactly as is done in the alpha port. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RS6000] Fix PR61098, Poor code setting count register 2014-05-14 9:56 ` Alan Modra @ 2014-05-14 21:27 ` David Edelsohn 2014-05-23 15:23 ` Alan Modra 1 sibling, 0 replies; 12+ messages in thread From: David Edelsohn @ 2014-05-14 21:27 UTC (permalink / raw) To: GCC Patches On Wed, May 14, 2014 at 5:56 AM, Alan Modra <amodra@gmail.com> wrote: >> I seem to remember problems in the past with late creation of TOC >> entries for constants causing problems, so it was easier to fall back >> to materializing all integer constants inline. I don't remember the >> PRs, but I think there were issues with creating a TOC if the late >> constant were the only TOC reference, or maybe the issue was buying a >> stack frame to materialize the TOC/GOT for a late constant. And >> maximum 5 instruction sequence is not really bad relative to a load >> from the TOC (even with medium model / data in TOC). There are a lot >> of trade-offs with respect to I$ expansion versus the load hitting in >> the L1 D$. > > Sure, but Steve will tell you that the 5 instruction sequence is both > slower due to all the dependent ops, and results in larger code+data > size. We definitely want to avoid it if possible, and pr67836 shows a > case taken from glibc math library code where there should be no > problem in using the TOC. I don't necessarily believe this is a win overall. If the constant reliably is in the L1 D$ (or maybe L2 D$) and accessed with a direct load (data in TOC or medium model), then yes. If it's farther away in the memory hierarchy, then it's not a win. I agree about the code expansion concern, which has its own secondary effects. If this is a constant in a tight loop, okay, but if it's a unique constant, it may not occur elsewhere in the code to be shared and may not be placed in the same cache line as other, recently accessed constants. This would push the load to L3 or farther. Also, remember that this same heuristic is used by AIX, which still defaults to small TOC model. So either the constant is in the TOC anchor constant pool, which hopefully will pre-load the anchor, or will be a constant in the TOC, possibly putting more pressure on TOC size and causing overflow. I am certain that there are anecdotal examples where it is a win for PPC64 Linux, but I would want more evidence that it's a general win. >> alpha_emit_set_long_const() always will materialize the constant and >> does not check for a maximum number of instructions. This is why it's >> comment says "fall back to straight forward decomposition". > No, that is wrong. alpha_emit_set_const does *not* always try to > materialize the constant inline. It does so for constants that need > more than three instructions only when TARGET_BUILD_CONSTANTS. I said that alpha_emit_set_long_const() always materializes the constant, but, as you say, it is not always called. alpha_emit_set_const() may fail if it requires too many instructions or the search depth is too deep. You seem to be referring to some of the logic in alpha_split_const_mov() as well. Again, this definitely is worth exploring. And I am confident that there are cases where loading the constant from memory is a win. I just don't have a good instinct if it is a win most of the time for a broad range of real-world applications. One optimization opportunity in GLIBC is not a general heuristic. I don't think that we know a lot about the context of the use of the constant to apply a finer-grained policy. I think the original code tried to put the constant in memory if it appeared before reload, when everything could be calculated correctly for prologue and materializing the TOC, but tried to materialze any constants that appeared during reload using splitters. That can avoid some of the problem corner cases. The code needs to handle PPC32 PPC64 eABI AIX Thanks, David ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RS6000] Fix PR61098, Poor code setting count register 2014-05-14 9:56 ` Alan Modra 2014-05-14 21:27 ` David Edelsohn @ 2014-05-23 15:23 ` Alan Modra 2014-05-24 16:26 ` David Edelsohn 1 sibling, 1 reply; 12+ messages in thread From: Alan Modra @ 2014-05-23 15:23 UTC (permalink / raw) To: David Edelsohn, GCC Patches OK, let's start again from scratch. This patch fixes PR61098, a problem caused by trying to do arithmetic on the count register. The fix is to provide a new pseudo in rs6000_emit_set_long_const so arithmetic will be done in a gpr. Additionally, the patch fixes a number of other bugs and cleanup issues with rs6000_emit_set_{,long_}const. - rs6000_emit_set_long_const took two HWI constant parameters, a relic from the days when HWI might be 32 bits on powerpc. We're only setting a 64-bit value, so remove the unnecessary parameter. - The !TARGET_POWERPC64 handling of DImode assumed a 32 bit HWI, and the insn setting the low 32-bit reg was wrongly marked with a reg_equiv note saying the reg contained the entire 64-bit constant. I hadn't spotted the bad reg_equiv when writing the previous patch. - The comments describing the functions are inaccurate and misleading. - rs6000_emit_set_long_const always returns DEST, so it's caller can assume this and rs6000_emit_set_long_const return void. - The code testing for a NULL DEST in rs6000_emit_set_const is dead. DEST cannot be NULL, since the three uses of the function are in rs6000.md splitters where DEST (operand[0]) satisfies gpc_reg_operand. - The above two points mean that rs6000_emit_set_const always returns DEST, which in turn would allow rs6000_emit_set_const to return void. However, in view of a possible future change that might need to return status on whether rs6000_emit_set_const emitted anything, return a bool. - rs6000_emit_set_const parameter N is currently unused, and MODE always matches GET_MODE (DEST), so N and MODE can be removed. - The code is liberally sprinkled with copy_rtx. DEST/TEMP is always used once without copy_rtx, but which insn uses copy_rtx varies. I changed the code to always use a bare DEST as the last insn for consistency. (Writing the code this way might allow us to omit the copy_rtx on DEST/TEMP entirely. Before reload TEMP will be a new pseudo reg, thus doesn't need copy_rtx, and after reload we shouldn't have a SUBREG DEST. I wasn't sure of exactly what might happen during reload, so left well enough alone.) Bootstrapped and regression tested powerpc64-linux. OK to apply mainline? PR target/61098 * config/rs6000/rs6000.c (rs6000_emit_set_const): Remove unneeded params and return a bool. Remove dead code. Update comment. Assert we have a const_int source. Remove bogus code from 32-bit HWI days. Move !TARGET_POWERPC64 handling, and correct handling of constants > 2G and reg_equal note, from.. (rs6000_emit_set_long_const): ..here. Remove unneeded param and return value. Update comment. If we can, use a new pseudo for intermediate calculations. * config/rs6000/rs6000-protos.h (rs6000_emit_set_const): Update prototype. * config/rs6000/rs6000.md (movsi_internal1_single+1): Update call to rs6000_emit_set_const in splitter. (movdi_internal64+2, +3): Likewise. Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 210835) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -1068,7 +1068,7 @@ static tree rs6000_handle_longcall_attribute (tree static tree rs6000_handle_altivec_attribute (tree *, tree, tree, int, bool *); static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *); static tree rs6000_builtin_vectorized_libmass (tree, tree, tree); -static rtx rs6000_emit_set_long_const (rtx, HOST_WIDE_INT, HOST_WIDE_INT); +static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT); static int rs6000_memory_move_cost (enum machine_mode, reg_class_t, bool); static bool rs6000_debug_rtx_costs (rtx, int, int, int, int *, bool); static int rs6000_debug_address_cost (rtx, enum machine_mode, addr_space_t, @@ -7849,53 +7849,50 @@ rs6000_conditional_register_usage (void) } \f -/* Try to output insns to set TARGET equal to the constant C if it can - be done in less than N insns. Do all computations in MODE. - Returns the place where the output has been placed if it can be - done and the insns have been emitted. If it would take more than N - insns, zero is returned and no insns and emitted. */ +/* Output insns to set DEST equal to the constant SOURCE as a series of + lis, ori and shl instructions and return TRUE. */ -rtx -rs6000_emit_set_const (rtx dest, enum machine_mode mode, - rtx source, int n ATTRIBUTE_UNUSED) +bool +rs6000_emit_set_const (rtx dest, rtx source) { - rtx result, insn, set; - HOST_WIDE_INT c0, c1; + enum machine_mode mode = GET_MODE (dest); + rtx temp, insn, set; + HOST_WIDE_INT c; + gcc_checking_assert (CONST_INT_P (source)); + c = INTVAL (source); switch (mode) { - case QImode: + case QImode: case HImode: - if (dest == NULL) - dest = gen_reg_rtx (mode); emit_insn (gen_rtx_SET (VOIDmode, dest, source)); - return dest; + return true; case SImode: - result = !can_create_pseudo_p () ? dest : gen_reg_rtx (SImode); + temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (SImode); - emit_insn (gen_rtx_SET (VOIDmode, copy_rtx (result), - GEN_INT (INTVAL (source) - & (~ (HOST_WIDE_INT) 0xffff)))); + emit_insn (gen_rtx_SET (VOIDmode, copy_rtx (temp), + GEN_INT (c & ~(HOST_WIDE_INT) 0xffff))); emit_insn (gen_rtx_SET (VOIDmode, dest, - gen_rtx_IOR (SImode, copy_rtx (result), - GEN_INT (INTVAL (source) & 0xffff)))); - result = dest; + gen_rtx_IOR (SImode, copy_rtx (temp), + GEN_INT (c & 0xffff)))); break; case DImode: - switch (GET_CODE (source)) + if (!TARGET_POWERPC64) { - case CONST_INT: - c0 = INTVAL (source); - c1 = -(c0 < 0); - break; + rtx hi, lo; - default: - gcc_unreachable (); + hi = operand_subword_force (copy_rtx (dest), WORDS_BIG_ENDIAN == 0, + DImode); + lo = operand_subword_force (dest, WORDS_BIG_ENDIAN != 0, + DImode); + emit_move_insn (hi, GEN_INT (c >> 32)); + c = ((c & 0xffffffff) ^ 0x80000000) - 0x80000000; + emit_move_insn (lo, GEN_INT (c)); } - - result = rs6000_emit_set_long_const (dest, c0, c1); + else + rs6000_emit_set_long_const (dest, c); break; default: @@ -7905,107 +7902,103 @@ rs6000_conditional_register_usage (void) insn = get_last_insn (); set = single_set (insn); if (! CONSTANT_P (SET_SRC (set))) - set_unique_reg_note (insn, REG_EQUAL, source); + set_unique_reg_note (insn, REG_EQUAL, GEN_INT (c)); - return result; + return true; } -/* Having failed to find a 3 insn sequence in rs6000_emit_set_const, - fall back to a straight forward decomposition. We do this to avoid - exponential run times encountered when looking for longer sequences - with rs6000_emit_set_const. */ -static rtx -rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c1, HOST_WIDE_INT c2) +/* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode. + Output insns to set DEST equal to the constant C as a series of + lis, ori and shl instructions. */ + +static void +rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c) { - if (!TARGET_POWERPC64) + rtx temp; + HOST_WIDE_INT ud1, ud2, ud3, ud4; + + ud1 = c & 0xffff; + c = c >> 16; + ud2 = c & 0xffff; + c = c >> 16; + ud3 = c & 0xffff; + c = c >> 16; + ud4 = c & 0xffff; + + if ((ud4 == 0xffff && ud3 == 0xffff && ud2 == 0xffff && (ud1 & 0x8000)) + || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000))) + emit_move_insn (dest, GEN_INT ((ud1 ^ 0x8000) - 0x8000)); + + else if ((ud4 == 0xffff && ud3 == 0xffff && (ud2 & 0x8000)) + || (ud4 == 0 && ud3 == 0 && ! (ud2 & 0x8000))) { - rtx operand1, operand2; + temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); - operand1 = operand_subword_force (dest, WORDS_BIG_ENDIAN == 0, - DImode); - operand2 = operand_subword_force (copy_rtx (dest), WORDS_BIG_ENDIAN != 0, - DImode); - emit_move_insn (operand1, GEN_INT (c1)); - emit_move_insn (operand2, GEN_INT (c2)); + emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest, + GEN_INT (((ud2 << 16) ^ 0x80000000) - 0x80000000)); + if (ud1 != 0) + emit_move_insn (dest, + gen_rtx_IOR (DImode, copy_rtx (temp), + GEN_INT (ud1))); } - else + else if (ud3 == 0 && ud4 == 0) { - HOST_WIDE_INT ud1, ud2, ud3, ud4; + temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); - ud1 = c1 & 0xffff; - ud2 = (c1 & 0xffff0000) >> 16; - c2 = c1 >> 32; - ud3 = c2 & 0xffff; - ud4 = (c2 & 0xffff0000) >> 16; + gcc_assert (ud2 & 0x8000); + emit_move_insn (copy_rtx (temp), + GEN_INT (((ud2 << 16) ^ 0x80000000) - 0x80000000)); + if (ud1 != 0) + emit_move_insn (copy_rtx (temp), + gen_rtx_IOR (DImode, copy_rtx (temp), + GEN_INT (ud1))); + emit_move_insn (dest, + gen_rtx_ZERO_EXTEND (DImode, + gen_lowpart (SImode, + copy_rtx (temp)))); + } + else if ((ud4 == 0xffff && (ud3 & 0x8000)) + || (ud4 == 0 && ! (ud3 & 0x8000))) + { + temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); - if ((ud4 == 0xffff && ud3 == 0xffff && ud2 == 0xffff && (ud1 & 0x8000)) - || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000))) - emit_move_insn (dest, GEN_INT ((ud1 ^ 0x8000) - 0x8000)); + emit_move_insn (copy_rtx (temp), + GEN_INT (((ud3 << 16) ^ 0x80000000) - 0x80000000)); + if (ud2 != 0) + emit_move_insn (copy_rtx (temp), + gen_rtx_IOR (DImode, copy_rtx (temp), + GEN_INT (ud2))); + emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest, + gen_rtx_ASHIFT (DImode, copy_rtx (temp), + GEN_INT (16))); + if (ud1 != 0) + emit_move_insn (dest, + gen_rtx_IOR (DImode, copy_rtx (temp), + GEN_INT (ud1))); + } + else + { + temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); - else if ((ud4 == 0xffff && ud3 == 0xffff && (ud2 & 0x8000)) - || (ud4 == 0 && ud3 == 0 && ! (ud2 & 0x8000))) - { - emit_move_insn (dest, GEN_INT (((ud2 << 16) ^ 0x80000000) - - 0x80000000)); - if (ud1 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), - GEN_INT (ud1))); - } - else if (ud3 == 0 && ud4 == 0) - { - gcc_assert (ud2 & 0x8000); - emit_move_insn (dest, GEN_INT (((ud2 << 16) ^ 0x80000000) - - 0x80000000)); - if (ud1 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), - GEN_INT (ud1))); - emit_move_insn (copy_rtx (dest), - gen_rtx_ZERO_EXTEND (DImode, - gen_lowpart (SImode, - copy_rtx (dest)))); - } - else if ((ud4 == 0xffff && (ud3 & 0x8000)) - || (ud4 == 0 && ! (ud3 & 0x8000))) - { - emit_move_insn (dest, GEN_INT (((ud3 << 16) ^ 0x80000000) - - 0x80000000)); - if (ud2 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), - GEN_INT (ud2))); - emit_move_insn (copy_rtx (dest), - gen_rtx_ASHIFT (DImode, copy_rtx (dest), - GEN_INT (16))); - if (ud1 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), - GEN_INT (ud1))); - } - else - { - emit_move_insn (dest, GEN_INT (((ud4 << 16) ^ 0x80000000) - - 0x80000000)); - if (ud3 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), - GEN_INT (ud3))); + emit_move_insn (copy_rtx (temp), + GEN_INT (((ud4 << 16) ^ 0x80000000) - 0x80000000)); + if (ud3 != 0) + emit_move_insn (copy_rtx (temp), + gen_rtx_IOR (DImode, copy_rtx (temp), + GEN_INT (ud3))); - emit_move_insn (copy_rtx (dest), - gen_rtx_ASHIFT (DImode, copy_rtx (dest), - GEN_INT (32))); - if (ud2 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), - GEN_INT (ud2 << 16))); - if (ud1 != 0) - emit_move_insn (copy_rtx (dest), - gen_rtx_IOR (DImode, copy_rtx (dest), - GEN_INT (ud1))); - } + emit_move_insn (ud2 != 0 || ud1 != 0 ? copy_rtx (temp) : dest, + gen_rtx_ASHIFT (DImode, copy_rtx (temp), + GEN_INT (32))); + if (ud2 != 0) + emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest, + gen_rtx_IOR (DImode, copy_rtx (temp), + GEN_INT (ud2 << 16))); + if (ud1 != 0) + emit_move_insn (dest, + gen_rtx_IOR (DImode, copy_rtx (temp), + GEN_INT (ud1))); } - return dest; } /* Helper for the following. Get rid of [r+r] memory refs Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 210835) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -8850,9 +8850,8 @@ (ior:SI (match_dup 0) (match_dup 3)))] " -{ rtx tem = rs6000_emit_set_const (operands[0], SImode, operands[1], 2); - - if (tem == operands[0]) +{ + if (rs6000_emit_set_const (operands[0], operands[1])) DONE; else FAIL; @@ -9922,9 +9921,8 @@ [(set (match_dup 0) (match_dup 2)) (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 3)))] " -{ rtx tem = rs6000_emit_set_const (operands[0], DImode, operands[1], 5); - - if (tem == operands[0]) +{ + if (rs6000_emit_set_const (operands[0], operands[1])) DONE; else FAIL; @@ -9937,9 +9935,8 @@ [(set (match_dup 0) (match_dup 2)) (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 3)))] " -{ rtx tem = rs6000_emit_set_const (operands[0], DImode, operands[1], 5); - - if (tem == operands[0]) +{ + if (rs6000_emit_set_const (operands[0], operands[1])) DONE; else FAIL; Index: gcc/config/rs6000/rs6000-protos.h =================================================================== --- gcc/config/rs6000/rs6000-protos.h (revision 210834) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -114,7 +114,7 @@ extern void rs6000_emit_cbranch (enum machine_mode extern char * output_cbranch (rtx, const char *, int, rtx); extern char * output_e500_flip_gt_bit (rtx, rtx); extern const char * output_probe_stack_range (rtx, rtx); -extern rtx rs6000_emit_set_const (rtx, enum machine_mode, rtx, int); +extern bool rs6000_emit_set_const (rtx, rtx); extern int rs6000_emit_cmove (rtx, rtx, rtx, rtx); extern int rs6000_emit_vector_cond_expr (rtx, rtx, rtx, rtx, rtx, rtx); extern void rs6000_emit_minmax (rtx, enum rtx_code, rtx, rtx); -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RS6000] Fix PR61098, Poor code setting count register 2014-05-23 15:23 ` Alan Modra @ 2014-05-24 16:26 ` David Edelsohn 0 siblings, 0 replies; 12+ messages in thread From: David Edelsohn @ 2014-05-24 16:26 UTC (permalink / raw) To: GCC Patches, Alan Modra On Fri, May 23, 2014 at 11:23 AM, Alan Modra <amodra@gmail.com> wrote: > OK, let's start again from scratch. This patch fixes PR61098, a > problem caused by trying to do arithmetic on the count register. The > fix is to provide a new pseudo in rs6000_emit_set_long_const so > arithmetic will be done in a gpr. > > Additionally, the patch fixes a number of other bugs and cleanup > issues with rs6000_emit_set_{,long_}const. > > - rs6000_emit_set_long_const took two HWI constant parameters, a relic > from the days when HWI might be 32 bits on powerpc. We're only > setting a 64-bit value, so remove the unnecessary parameter. > > - The !TARGET_POWERPC64 handling of DImode assumed a 32 bit HWI, and > the insn setting the low 32-bit reg was wrongly marked with a > reg_equiv note saying the reg contained the entire 64-bit constant. > I hadn't spotted the bad reg_equiv when writing the previous patch. > > - The comments describing the functions are inaccurate and misleading. > > - rs6000_emit_set_long_const always returns DEST, so it's caller can > assume this and rs6000_emit_set_long_const return void. > > - The code testing for a NULL DEST in rs6000_emit_set_const is dead. > DEST cannot be NULL, since the three uses of the function are in > rs6000.md splitters where DEST (operand[0]) satisfies > gpc_reg_operand. > > - The above two points mean that rs6000_emit_set_const always returns > DEST, which in turn would allow rs6000_emit_set_const to return > void. However, in view of a possible future change that might need > to return status on whether rs6000_emit_set_const emitted anything, > return a bool. > > - rs6000_emit_set_const parameter N is currently unused, and MODE > always matches GET_MODE (DEST), so N and MODE can be removed. > > - The code is liberally sprinkled with copy_rtx. DEST/TEMP is always > used once without copy_rtx, but which insn uses copy_rtx varies. I > changed the code to always use a bare DEST as the last insn for > consistency. (Writing the code this way might allow us to omit the > copy_rtx on DEST/TEMP entirely. Before reload TEMP will be a new > pseudo reg, thus doesn't need copy_rtx, and after reload we > shouldn't have a SUBREG DEST. I wasn't sure of exactly what might > happen during reload, so left well enough alone.) > > Bootstrapped and regression tested powerpc64-linux. OK to apply > mainline? This is a much clearer start. Thanks for the revised version. This is okay. Thanks, David ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-05-24 16:26 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-05-08 1:49 [RS6000] Fix PR61098, Poor code setting count register Alan Modra 2014-05-08 13:48 ` David Edelsohn 2014-05-09 2:41 ` Alan Modra 2014-05-11 2:24 ` David Edelsohn 2014-05-11 22:53 ` Alan Modra 2014-05-11 23:39 ` Alan Modra 2014-05-14 3:05 ` Alan Modra 2014-05-14 3:46 ` David Edelsohn 2014-05-14 9:56 ` Alan Modra 2014-05-14 21:27 ` David Edelsohn 2014-05-23 15:23 ` Alan Modra 2014-05-24 16:26 ` David Edelsohn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).