From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15899 invoked by alias); 12 Nov 2014 01:18:49 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 15889 invoked by uid 89); 12 Nov 2014 01:18:49 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.2 X-HELO: e36.co.us.ibm.com Received: from e36.co.us.ibm.com (HELO e36.co.us.ibm.com) (32.97.110.154) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Wed, 12 Nov 2014 01:18:45 +0000 Received: from /spool/local by e36.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 11 Nov 2014 18:18:43 -0700 Received: from d03dlp01.boulder.ibm.com (9.17.202.177) by e36.co.us.ibm.com (192.168.1.136) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 11 Nov 2014 18:18:40 -0700 Received: from b03cxnp07028.gho.boulder.ibm.com (b03cxnp07028.gho.boulder.ibm.com [9.17.130.15]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id 117031FF003E for ; Tue, 11 Nov 2014 18:05:04 -0700 (MST) Received: from d03av05.boulder.ibm.com (d03av05.boulder.ibm.com [9.17.195.85]) by b03cxnp07028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id sAC1GJI85833070 for ; Wed, 12 Nov 2014 02:16:19 +0100 Received: from d03av05.boulder.ibm.com (localhost [127.0.0.1]) by d03av05.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id sAC1GIVa003579 for ; Tue, 11 Nov 2014 18:16:19 -0700 Received: from ibm-tiger.the-meissners.org (dhcp-9-32-77-206.usma.ibm.com [9.32.77.206]) by d03av05.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id sAC1GIsC003568; Tue, 11 Nov 2014 18:16:18 -0700 Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500) id 0A9234205F; Tue, 11 Nov 2014 20:16:17 -0500 (EST) Date: Wed, 12 Nov 2014 01:21:00 -0000 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, joseph@codesourcery.com, macro@codesourcery.com, pattyo.lists@gmail.com, segher@kernel.crashing.org, hainque@adacore.com, dmalcolm@redhat.com Subject: Re: PATCH [5 of 7], rs6000, add support for scalar floating point in Altivec registers Message-ID: <20141112011617.GE3720@ibm-tiger.the-meissners.org> Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, joseph@codesourcery.com, macro@codesourcery.com, pattyo.lists@gmail.com, segher@kernel.crashing.org, hainque@adacore.com, dmalcolm@redhat.com References: <20141112002113.GA1489@ibm-tiger.the-meissners.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="qOrJKOH36bD5yhNe" Content-Disposition: inline In-Reply-To: <20141112002113.GA1489@ibm-tiger.the-meissners.org> User-Agent: Mutt/1.5.20 (2009-12-10) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14111201-0021-0000-0000-0000061251B8 X-IsSubscribed: yes X-SW-Source: 2014-11/txt/msg01115.txt.bz2 --qOrJKOH36bD5yhNe Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-length: 2784 This is the big patch that enables the upper regs support. It reorganizes the secondary reload handler to try and make it easier to understand, by having a variable that says it is done, rather than using cascading if statements. The secondary reload inner function (which is called from the reload helper functions with a base scratch register) has been reworked quite a bit. I also discovered that we have two peephole2's that try to reduce SF->SF and DF->DF moves. Unfortunately, this breaks the use of a traditional floating point register to reload data in/out of an Altivec register. At some future point, I would like to revisit this, but it is needed to enable the upper regs support. I don't believe this will affect the non-server PowerPC ports, since the reload handlers are only enabled under VSX. However, it would be nice if other PowerPC folk can apply these patches and make sure there are no regressions. Is this patch ok to check in? 2014-11-11 Michael Meissner Ulrich Weigand * config/rs6000/rs6000.c (rs6000_secondary_reload_toc_costs): Helper function to identify costs of a TOC load for secondary reload support. (rs6000_secondary_reload_memory): Helper function for secondary reload, to determine if a particular memory operation is directly handled by the hardware, or if it needs support from secondary reload to create a valid address. (rs6000_secondary_reload): Rework code, to be clearer. If the appropriate -mupper-regs-{sf,df} is used, use FPR registers to reload scalar values, since the FPR registers have D-form addressing. Move most of the code handling memory to the function rs6000_secondary_reload_memory, and use the reg_addr structure to determine what type of address modes are supported. Print more debug information if -mdebug=addr. (rs6000_secondary_reload_inner): Rework entire function to be more general. Use the reg_addr bits to determine what type of addressing is supported. (rs6000_preferred_reload_class): Rework. Move constant handling into a single place. Prefer using FLOAT_REGS for scalar floating point. (rs6000_secondary_reload_class): Use a FPR register to move a value from an Altivec register to a GPR, and vice versa. Move VSX handling above traditional floating point. * config/rs6000/rs6000.md (mov_hardfloat, FMOVE32 case): Delete some spaces in the constraints. (DF->DF move peephole2): Disable if -mupper-regs-{sf,df} to allow using FPR registers to load/store an Altivec register for scalar floating point types. (SF->SF move peephole2): Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797 --qOrJKOH36bD5yhNe Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="gcc-power8.patch132f" Content-length: 35908 Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 217387) +++ gcc/config/rs6000/rs6000.c (revision 217388) @@ -16454,6 +16454,278 @@ register_to_reg_type (rtx reg, bool *is_ return reg_class_to_reg_type[(int)rclass]; } +/* Helper function to return the cost of adding a TOC entry address. */ + +static inline int +rs6000_secondary_reload_toc_costs (addr_mask_type addr_mask) +{ + int ret; + + if (TARGET_CMODEL != CMODEL_SMALL) + ret = ((addr_mask & RELOAD_REG_OFFSET) == 0) ? 1 : 2; + + else + ret = (TARGET_MINIMAL_TOC) ? 6 : 3; + + return ret; +} + +/* Helper function for rs6000_secondary_reload to determine whether the memory + address (ADDR) with a given register class (RCLASS) and machine mode (MODE) + needs reloading. Return negative if the memory is not handled by the memory + helper functions and to try a different reload method, 0 if no additional + instructions are need, and positive to give the extra cost for the + memory. */ + +static int +rs6000_secondary_reload_memory (rtx addr, + enum reg_class rclass, + enum machine_mode mode) +{ + int extra_cost = 0; + rtx reg, and_arg, plus_arg0, plus_arg1; + addr_mask_type addr_mask; + const char *type = NULL; + const char *fail_msg = NULL; + + if (GPR_REG_CLASS_P (rclass)) + addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_GPR]; + + else if (rclass == FLOAT_REGS) + addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_FPR]; + + else if (rclass == ALTIVEC_REGS) + addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_VMX]; + + /* For the combined VSX_REGS, turn off Altivec AND -16. */ + else if (rclass == VSX_REGS) + addr_mask = (reg_addr[mode].addr_mask[RELOAD_REG_VMX] + & ~RELOAD_REG_AND_M16); + + else + { + if (TARGET_DEBUG_ADDR) + fprintf (stderr, + "rs6000_secondary_reload_memory: mode = %s, class = %s, " + "class is not GPR, FPR, VMX\n", + GET_MODE_NAME (mode), reg_class_names[rclass]); + + return -1; + } + + /* If the register isn't valid in this register class, just return now. */ + if ((addr_mask & RELOAD_REG_VALID) == 0) + { + if (TARGET_DEBUG_ADDR) + fprintf (stderr, + "rs6000_secondary_reload_memory: mode = %s, class = %s, " + "not valid in class\n", + GET_MODE_NAME (mode), reg_class_names[rclass]); + + return -1; + } + + switch (GET_CODE (addr)) + { + /* Does the register class supports auto update forms for this mode? We + don't need a scratch register, since the powerpc only supports + PRE_INC, PRE_DEC, and PRE_MODIFY. */ + case PRE_INC: + case PRE_DEC: + reg = XEXP (addr, 0); + if (!base_reg_operand (addr, GET_MODE (reg))) + { + fail_msg = "no base register #1"; + extra_cost = -1; + } + + else if ((addr_mask & RELOAD_REG_PRE_INCDEC) == 0) + { + extra_cost = 1; + type = "update"; + } + break; + + case PRE_MODIFY: + reg = XEXP (addr, 0); + plus_arg1 = XEXP (addr, 1); + if (!base_reg_operand (reg, GET_MODE (reg)) + || GET_CODE (plus_arg1) != PLUS + || !rtx_equal_p (reg, XEXP (plus_arg1, 0))) + { + fail_msg = "bad PRE_MODIFY"; + extra_cost = -1; + } + + else if ((addr_mask & RELOAD_REG_PRE_MODIFY) == 0) + { + extra_cost = 1; + type = "update"; + } + break; + + /* Do we need to simulate AND -16 to clear the bottom address bits used + in VMX load/stores? Only allow the AND for vector sizes. */ + case AND: + and_arg = XEXP (addr, 0); + if (GET_MODE_SIZE (mode) != 16 + || GET_CODE (XEXP (addr, 1)) != CONST_INT + || INTVAL (XEXP (addr, 1)) != -16) + { + fail_msg = "bad Altivec AND #1"; + extra_cost = -1; + } + + if (rclass != ALTIVEC_REGS) + { + if (legitimate_indirect_address_p (and_arg, false)) + extra_cost = 1; + + else if (legitimate_indexed_address_p (and_arg, false)) + extra_cost = 2; + + else + { + fail_msg = "bad Altivec AND #2"; + extra_cost = -1; + } + + type = "and"; + } + break; + + /* If this is an indirect address, make sure it is a base register. */ + case REG: + case SUBREG: + if (!legitimate_indirect_address_p (addr, false)) + { + extra_cost = 1; + type = "move"; + } + break; + + /* If this is an indexed address, make sure the register class can handle + indexed addresses for this mode. */ + case PLUS: + plus_arg0 = XEXP (addr, 0); + plus_arg1 = XEXP (addr, 1); + + /* (plus (plus (reg) (constant)) (constant)) is generated during + push_reload processing, so handle it now. */ + if (GET_CODE (plus_arg0) == PLUS && CONST_INT_P (plus_arg1)) + { + if ((addr_mask & RELOAD_REG_OFFSET) == 0) + { + extra_cost = 1; + type = "offset"; + } + } + + else if (!base_reg_operand (plus_arg0, GET_MODE (plus_arg0))) + { + fail_msg = "no base register #2"; + extra_cost = -1; + } + + else if (int_reg_operand (plus_arg1, GET_MODE (plus_arg1))) + { + if ((addr_mask & RELOAD_REG_INDEXED) == 0 + || !legitimate_indexed_address_p (addr, false)) + { + extra_cost = 1; + type = "indexed"; + } + } + + /* Make sure the register class can handle offset addresses. */ + else if (rs6000_legitimate_offset_address_p (mode, addr, false, true)) + { + if ((addr_mask & RELOAD_REG_OFFSET) == 0) + { + extra_cost = 1; + type = "offset"; + } + } + + else + { + fail_msg = "bad PLUS"; + extra_cost = -1; + } + + break; + + case LO_SUM: + if (!legitimate_lo_sum_address_p (mode, addr, false)) + { + fail_msg = "bad LO_SUM"; + extra_cost = -1; + } + + if ((addr_mask & RELOAD_REG_OFFSET) == 0) + { + extra_cost = 1; + type = "lo_sum"; + } + break; + + /* Static addresses need to create a TOC entry. */ + case CONST: + case SYMBOL_REF: + case LABEL_REF: + type = "address"; + extra_cost = rs6000_secondary_reload_toc_costs (addr_mask); + break; + + /* TOC references look like offsetable memory. */ + case UNSPEC: + if (TARGET_CMODEL == CMODEL_SMALL || XINT (addr, 1) != UNSPEC_TOCREL) + { + fail_msg = "bad UNSPEC"; + extra_cost = -1; + } + + else if ((addr_mask & RELOAD_REG_OFFSET) == 0) + { + extra_cost = 1; + type = "toc reference"; + } + break; + + default: + { + fail_msg = "bad address"; + extra_cost = -1; + } + } + + if (TARGET_DEBUG_ADDR /* && extra_cost != 0 */) + { + if (extra_cost < 0) + fprintf (stderr, + "rs6000_secondary_reload_memory error: mode = %s, " + "class = %s, addr_mask = '%s', %s\n", + GET_MODE_NAME (mode), + reg_class_names[rclass], + rs6000_debug_addr_mask (addr_mask, false), + (fail_msg != NULL) ? fail_msg : ""); + + else + fprintf (stderr, + "rs6000_secondary_reload_memory: mode = %s, class = %s, " + "addr_mask = '%s', extra cost = %d, %s\n", + GET_MODE_NAME (mode), + reg_class_names[rclass], + rs6000_debug_addr_mask (addr_mask, false), + extra_cost, + (type) ? type : ""); + + debug_rtx (addr); + } + + return extra_cost; +} + /* Helper function for rs6000_secondary_reload to return true if a move to a different register classe is really a simple move. */ @@ -16660,8 +16932,15 @@ rs6000_secondary_reload (bool in_p, reg_class_t ret = ALL_REGS; enum insn_code icode; bool default_p = false; + bool done_p = false; + + /* Allow subreg of memory before/during reload. */ + bool memory_p = (MEM_P (x) + || (!reload_completed && GET_CODE (x) == SUBREG + && MEM_P (SUBREG_REG (x)))); sri->icode = CODE_FOR_nothing; + sri->extra_cost = 0; icode = ((in_p) ? reg_addr[mode].reload_load : reg_addr[mode].reload_store); @@ -16685,121 +16964,54 @@ rs6000_secondary_reload (bool in_p, { icode = (enum insn_code)sri->icode; default_p = false; + done_p = true; ret = NO_REGS; } } - /* Handle vector moves with reload helper functions. */ - if (ret == ALL_REGS && icode != CODE_FOR_nothing) + /* Make sure 0.0 is not reloaded or forced into memory. */ + if (x == CONST0_RTX (mode) && VSX_REG_CLASS_P (rclass)) { ret = NO_REGS; - sri->icode = CODE_FOR_nothing; - sri->extra_cost = 0; + default_p = false; + done_p = true; + } - if (GET_CODE (x) == MEM) - { - rtx addr = XEXP (x, 0); + /* If this is a scalar floating point value and we want to load it into the + traditional Altivec registers, do it via a move via a traditional floating + point register. Also make sure that non-zero constants use a FPR. */ + if (!done_p && reg_addr[mode].scalar_in_vmx_p + && (rclass == VSX_REGS || rclass == ALTIVEC_REGS) + && (memory_p || (GET_CODE (x) == CONST_DOUBLE))) + { + ret = FLOAT_REGS; + default_p = false; + done_p = true; + } - /* Loads to and stores from gprs can do reg+offset, and wouldn't need - an extra register in that case, but it would need an extra - register if the addressing is reg+reg or (reg+reg)&(-16). Special - case load/store quad. */ - if (rclass == GENERAL_REGS || rclass == BASE_REGS) - { - if (TARGET_POWERPC64 && TARGET_QUAD_MEMORY - && GET_MODE_SIZE (mode) == 16 - && quad_memory_operand (x, mode)) - { - sri->icode = icode; - sri->extra_cost = 2; - } + /* Handle reload of load/stores if we have reload helper functions. */ + if (!done_p && icode != CODE_FOR_nothing && memory_p) + { + int extra_cost = rs6000_secondary_reload_memory (XEXP (x, 0), rclass, + mode); - else if (!legitimate_indirect_address_p (addr, false) - && !rs6000_legitimate_offset_address_p (PTImode, addr, - false, true)) - { - sri->icode = icode; - /* account for splitting the loads, and converting the - address from reg+reg to reg. */ - sri->extra_cost = (((TARGET_64BIT) ? 3 : 5) - + ((GET_CODE (addr) == AND) ? 1 : 0)); - } - } - /* Allow scalar loads to/from the traditional floating point - registers, even if VSX memory is set. */ - else if ((rclass == FLOAT_REGS || rclass == NO_REGS) - && (GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8) - && (legitimate_indirect_address_p (addr, false) - || legitimate_indirect_address_p (addr, false) - || rs6000_legitimate_offset_address_p (mode, addr, - false, true))) - - ; - /* Loads to and stores from vector registers can only do reg+reg - addressing. Altivec registers can also do (reg+reg)&(-16). Allow - scalar modes loading up the traditional floating point registers - to use offset addresses. */ - else if (rclass == VSX_REGS || rclass == ALTIVEC_REGS - || rclass == FLOAT_REGS || rclass == NO_REGS) - { - if (!VECTOR_MEM_ALTIVEC_P (mode) - && GET_CODE (addr) == AND - && GET_CODE (XEXP (addr, 1)) == CONST_INT - && INTVAL (XEXP (addr, 1)) == -16 - && (legitimate_indirect_address_p (XEXP (addr, 0), false) - || legitimate_indexed_address_p (XEXP (addr, 0), false))) - { - sri->icode = icode; - sri->extra_cost = ((GET_CODE (XEXP (addr, 0)) == PLUS) - ? 2 : 1); - } - else if (!legitimate_indirect_address_p (addr, false) - && (rclass == NO_REGS - || !legitimate_indexed_address_p (addr, false))) - { - sri->icode = icode; - sri->extra_cost = 1; - } - else - icode = CODE_FOR_nothing; - } - /* Any other loads, including to pseudo registers which haven't been - assigned to a register yet, default to require a scratch - register. */ - else - { - sri->icode = icode; - sri->extra_cost = 2; - } - } - else if (REG_P (x)) + if (extra_cost >= 0) { - int regno = true_regnum (x); - - icode = CODE_FOR_nothing; - if (regno < 0 || regno >= FIRST_PSEUDO_REGISTER) - default_p = true; - else + done_p = true; + ret = NO_REGS; + if (extra_cost > 0) { - enum reg_class xclass = REGNO_REG_CLASS (regno); - enum rs6000_reg_type rtype1 = reg_class_to_reg_type[(int)rclass]; - enum rs6000_reg_type rtype2 = reg_class_to_reg_type[(int)xclass]; - - /* If memory is needed, use default_secondary_reload to create the - stack slot. */ - if (rtype1 != rtype2 || !IS_STD_REG_TYPE (rtype1)) - default_p = true; - else - ret = NO_REGS; + sri->extra_cost = extra_cost; + sri->icode = icode; } } - else - default_p = true; } - else if (TARGET_POWERPC64 - && reg_class_to_reg_type[(int)rclass] == GPR_REG_TYPE - && MEM_P (x) - && GET_MODE_SIZE (GET_MODE (x)) >= UNITS_PER_WORD) + + /* Handle unaligned loads and stores of integer registers. */ + if (!done_p && TARGET_POWERPC64 + && reg_class_to_reg_type[(int)rclass] == GPR_REG_TYPE + && memory_p + && GET_MODE_SIZE (GET_MODE (x)) >= UNITS_PER_WORD) { rtx addr = XEXP (x, 0); rtx off = address_offset (addr); @@ -16828,6 +17040,7 @@ rs6000_secondary_reload (bool in_p, sri->icode = CODE_FOR_reload_di_store; sri->extra_cost = 2; ret = NO_REGS; + done_p = true; } else default_p = true; @@ -16835,10 +17048,11 @@ rs6000_secondary_reload (bool in_p, else default_p = true; } - else if (!TARGET_POWERPC64 - && reg_class_to_reg_type[(int)rclass] == GPR_REG_TYPE - && MEM_P (x) - && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD) + + if (!done_p && !TARGET_POWERPC64 + && reg_class_to_reg_type[(int)rclass] == GPR_REG_TYPE + && memory_p + && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD) { rtx addr = XEXP (x, 0); rtx off = address_offset (addr); @@ -16874,6 +17088,7 @@ rs6000_secondary_reload (bool in_p, sri->icode = CODE_FOR_reload_si_store; sri->extra_cost = 2; ret = NO_REGS; + done_p = true; } else default_p = true; @@ -16881,7 +17096,8 @@ rs6000_secondary_reload (bool in_p, else default_p = true; } - else + + if (!done_p) default_p = true; if (default_p) @@ -16899,15 +17115,20 @@ rs6000_secondary_reload (bool in_p, reg_class_names[rclass], GET_MODE_NAME (mode)); + if (reload_completed) + fputs (", after reload", stderr); + + if (!done_p) + fputs (", done_p not set", stderr); + if (default_p) - fprintf (stderr, ", default secondary reload"); + fputs (", default secondary reload", stderr); if (sri->icode != CODE_FOR_nothing) - fprintf (stderr, ", reload func = %s, extra cost = %d\n", + fprintf (stderr, ", reload func = %s, extra cost = %d", insn_data[sri->icode].name, sri->extra_cost); - else - fprintf (stderr, "\n"); + fputs ("\n", stderr); debug_rtx (x); } @@ -16947,209 +17168,148 @@ rs6000_secondary_reload_fail (int line, gcc_unreachable (); } -/* Fixup reload addresses for Altivec or VSX loads/stores to change SP+offset - to SP+reg addressing. */ +/* Fixup reload addresses for values in GPR, FPR, and VMX registers that have + reload helper functions. These were identified in + rs6000_secondary_reload_memory, and if reload decided to use the secondary + reload, it calls the insns: + reload___store + reload___load + + which in turn calls this function, to do whatever is necessary to create + valid addresses. */ void rs6000_secondary_reload_inner (rtx reg, rtx mem, rtx scratch, bool store_p) { int regno = true_regnum (reg); machine_mode mode = GET_MODE (reg); - enum reg_class rclass; + addr_mask_type addr_mask; rtx addr; - rtx and_op2 = NULL_RTX; - rtx addr_op1; - rtx addr_op2; - rtx scratch_or_premodify = scratch; - rtx and_rtx; + rtx new_addr; + rtx op_reg, op0, op1; + rtx and_op; rtx cc_clobber; + rtvec rv; - if (TARGET_DEBUG_ADDR) - rs6000_secondary_reload_trace (__LINE__, reg, mem, scratch, store_p); + if (regno < 0 || regno >= FIRST_PSEUDO_REGISTER || !MEM_P (mem) + || !base_reg_operand (scratch, GET_MODE (scratch))) + rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); - if (regno < 0 || regno >= FIRST_PSEUDO_REGISTER) + if (IN_RANGE (regno, FIRST_GPR_REGNO, LAST_GPR_REGNO)) + addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_GPR]; + + else if (IN_RANGE (regno, FIRST_FPR_REGNO, LAST_FPR_REGNO)) + addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_FPR]; + + else if (IN_RANGE (regno, FIRST_ALTIVEC_REGNO, LAST_ALTIVEC_REGNO)) + addr_mask = reg_addr[mode].addr_mask[RELOAD_REG_VMX]; + + else rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); - if (GET_CODE (mem) != MEM) + /* Make sure the mode is valid in this register class. */ + if ((addr_mask & RELOAD_REG_VALID) == 0) rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); - rclass = REGNO_REG_CLASS (regno); - addr = find_replacement (&XEXP (mem, 0)); + if (TARGET_DEBUG_ADDR) + rs6000_secondary_reload_trace (__LINE__, reg, mem, scratch, store_p); - switch (rclass) + new_addr = addr = XEXP (mem, 0); + switch (GET_CODE (addr)) { - /* GPRs can handle reg + small constant, all other addresses need to use - the scratch register. */ - case GENERAL_REGS: - case BASE_REGS: - if (GET_CODE (addr) == AND) + /* Does the register class support auto update forms for this mode? If + not, do the update now. We don't need a scratch register, since the + powerpc only supports PRE_INC, PRE_DEC, and PRE_MODIFY. */ + case PRE_INC: + case PRE_DEC: + op_reg = XEXP (addr, 0); + if (!base_reg_operand (op_reg, Pmode)) + rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); + + if ((addr_mask & RELOAD_REG_PRE_INCDEC) == 0) { - and_op2 = XEXP (addr, 1); - addr = find_replacement (&XEXP (addr, 0)); + emit_insn (gen_add2_insn (op_reg, GEN_INT (GET_MODE_SIZE (mode)))); + new_addr = op_reg; } + break; - if (GET_CODE (addr) == PRE_MODIFY) - { - scratch_or_premodify = find_replacement (&XEXP (addr, 0)); - if (!REG_P (scratch_or_premodify)) - rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); + case PRE_MODIFY: + op0 = XEXP (addr, 0); + op1 = XEXP (addr, 1); + if (!base_reg_operand (op0, Pmode) + || GET_CODE (op1) != PLUS + || !rtx_equal_p (op0, XEXP (op1, 0))) + rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); - addr = find_replacement (&XEXP (addr, 1)); - if (GET_CODE (addr) != PLUS) - rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); + if ((addr_mask & RELOAD_REG_PRE_MODIFY) == 0) + { + emit_insn (gen_rtx_SET (VOIDmode, op0, op1)); + new_addr = reg; } + break; - if (GET_CODE (addr) == PLUS - && (and_op2 != NULL_RTX - || !rs6000_legitimate_offset_address_p (PTImode, addr, - false, true))) - { - /* find_replacement already recurses into both operands of - PLUS so we don't need to call it here. */ - addr_op1 = XEXP (addr, 0); - addr_op2 = XEXP (addr, 1); - if (!legitimate_indirect_address_p (addr_op1, false)) - rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); + /* Do we need to simulate AND -16 to clear the bottom address bits used + in VMX load/stores? */ + case AND: + op0 = XEXP (addr, 0); + op1 = XEXP (addr, 1); + if ((addr_mask & RELOAD_REG_AND_M16) == 0) + { + if (REG_P (op0) || GET_CODE (op0) == SUBREG) + op_reg = op0; - if (!REG_P (addr_op2) - && (GET_CODE (addr_op2) != CONST_INT - || !satisfies_constraint_I (addr_op2))) + else if (GET_CODE (op1) == PLUS) { - if (TARGET_DEBUG_ADDR) - { - fprintf (stderr, - "\nMove plus addr to register %s, mode = %s: ", - rs6000_reg_names[REGNO (scratch)], - GET_MODE_NAME (mode)); - debug_rtx (addr_op2); - } - rs6000_emit_move (scratch, addr_op2, Pmode); - addr_op2 = scratch; + emit_insn (gen_rtx_SET (VOIDmode, scratch, op1)); + op_reg = scratch; } - emit_insn (gen_rtx_SET (VOIDmode, - scratch_or_premodify, - gen_rtx_PLUS (Pmode, - addr_op1, - addr_op2))); - - addr = scratch_or_premodify; - scratch_or_premodify = scratch; - } - else if (!legitimate_indirect_address_p (addr, false) - && !rs6000_legitimate_offset_address_p (PTImode, addr, - false, true)) - { - if (TARGET_DEBUG_ADDR) - { - fprintf (stderr, "\nMove addr to register %s, mode = %s: ", - rs6000_reg_names[REGNO (scratch_or_premodify)], - GET_MODE_NAME (mode)); - debug_rtx (addr); - } - rs6000_emit_move (scratch_or_premodify, addr, Pmode); - addr = scratch_or_premodify; - scratch_or_premodify = scratch; + else + rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); + + and_op = gen_rtx_AND (GET_MODE (scratch), op_reg, op1); + cc_clobber = gen_rtx_CLOBBER (VOIDmode, gen_rtx_SCRATCH (CCmode)); + rv = gen_rtvec (2, gen_rtx_SET (VOIDmode, scratch, and_op), cc_clobber); + emit_insn (gen_rtx_PARALLEL (VOIDmode, rv)); + new_addr = scratch; } break; - /* Float registers can do offset+reg addressing for scalar types. */ - case FLOAT_REGS: - if (legitimate_indirect_address_p (addr, false) /* reg */ - || legitimate_indexed_address_p (addr, false) /* reg+reg */ - || ((GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8) - && and_op2 == NULL_RTX - && scratch_or_premodify == scratch - && rs6000_legitimate_offset_address_p (mode, addr, false, false))) - break; - - /* If this isn't a legacy floating point load/store, fall through to the - VSX defaults. */ - - /* VSX/Altivec registers can only handle reg+reg addressing. Move other - addresses into a scratch register. */ - case VSX_REGS: - case ALTIVEC_REGS: - - /* With float regs, we need to handle the AND ourselves, since we can't - use the Altivec instruction with an implicit AND -16. Allow scalar - loads to float registers to use reg+offset even if VSX. */ - if (GET_CODE (addr) == AND - && (rclass != ALTIVEC_REGS || GET_MODE_SIZE (mode) != 16 - || GET_CODE (XEXP (addr, 1)) != CONST_INT - || INTVAL (XEXP (addr, 1)) != -16 - || !VECTOR_MEM_ALTIVEC_P (mode))) - { - and_op2 = XEXP (addr, 1); - addr = find_replacement (&XEXP (addr, 0)); - } - - /* If we aren't using a VSX load, save the PRE_MODIFY register and use it - as the address later. */ - if (GET_CODE (addr) == PRE_MODIFY - && ((ALTIVEC_OR_VSX_VECTOR_MODE (mode) - && (rclass != FLOAT_REGS - || (GET_MODE_SIZE (mode) != 4 && GET_MODE_SIZE (mode) != 8))) - || and_op2 != NULL_RTX - || !legitimate_indexed_address_p (XEXP (addr, 1), false))) + /* If this is an indirect address, make sure it is a base register. */ + case REG: + case SUBREG: + if (!base_reg_operand (addr, GET_MODE (addr))) { - scratch_or_premodify = find_replacement (&XEXP (addr, 0)); - if (!legitimate_indirect_address_p (scratch_or_premodify, false)) - rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); - - addr = find_replacement (&XEXP (addr, 1)); - if (GET_CODE (addr) != PLUS) - rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); + emit_insn (gen_rtx_SET (VOIDmode, scratch, addr)); + new_addr = scratch; } + break; - if (legitimate_indirect_address_p (addr, false) /* reg */ - || legitimate_indexed_address_p (addr, false) /* reg+reg */ - || (GET_CODE (addr) == AND /* Altivec memory */ - && rclass == ALTIVEC_REGS - && GET_CODE (XEXP (addr, 1)) == CONST_INT - && INTVAL (XEXP (addr, 1)) == -16 - && (legitimate_indirect_address_p (XEXP (addr, 0), false) - || legitimate_indexed_address_p (XEXP (addr, 0), false)))) - ; + /* If this is an indexed address, make sure the register class can handle + indexed addresses for this mode. */ + case PLUS: + op0 = XEXP (addr, 0); + op1 = XEXP (addr, 1); + if (!base_reg_operand (op0, Pmode)) + rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); - else if (GET_CODE (addr) == PLUS) + else if (int_reg_operand (op1, Pmode)) { - addr_op1 = XEXP (addr, 0); - addr_op2 = XEXP (addr, 1); - if (!REG_P (addr_op1)) - rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); - - if (TARGET_DEBUG_ADDR) + if ((addr_mask & RELOAD_REG_INDEXED) == 0) { - fprintf (stderr, "\nMove plus addr to register %s, mode = %s: ", - rs6000_reg_names[REGNO (scratch)], GET_MODE_NAME (mode)); - debug_rtx (addr_op2); + emit_insn (gen_rtx_SET (VOIDmode, scratch, addr)); + new_addr = scratch; } - rs6000_emit_move (scratch, addr_op2, Pmode); - emit_insn (gen_rtx_SET (VOIDmode, - scratch_or_premodify, - gen_rtx_PLUS (Pmode, - addr_op1, - scratch))); - addr = scratch_or_premodify; - scratch_or_premodify = scratch; } - else if (GET_CODE (addr) == SYMBOL_REF || GET_CODE (addr) == CONST - || GET_CODE (addr) == CONST_INT || GET_CODE (addr) == LO_SUM - || REG_P (addr)) + /* Make sure the register class can handle offset addresses. */ + else if (rs6000_legitimate_offset_address_p (mode, addr, false, true)) { - if (TARGET_DEBUG_ADDR) + if ((addr_mask & RELOAD_REG_OFFSET) == 0) { - fprintf (stderr, "\nMove addr to register %s, mode = %s: ", - rs6000_reg_names[REGNO (scratch_or_premodify)], - GET_MODE_NAME (mode)); - debug_rtx (addr); + emit_insn (gen_rtx_SET (VOIDmode, scratch, addr)); + new_addr = scratch; } - - rs6000_emit_move (scratch_or_premodify, addr, Pmode); - addr = scratch_or_premodify; - scratch_or_premodify = scratch; } else @@ -17157,55 +17317,56 @@ rs6000_secondary_reload_inner (rtx reg, break; - default: - rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); - } + case LO_SUM: + op0 = XEXP (addr, 0); + op1 = XEXP (addr, 1); + if (!base_reg_operand (op0, Pmode)) + rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); - /* If the original address involved a pre-modify that we couldn't use the VSX - memory instruction with update, and we haven't taken care of already, - store the address in the pre-modify register and use that as the - address. */ - if (scratch_or_premodify != scratch && scratch_or_premodify != addr) - { - emit_insn (gen_rtx_SET (VOIDmode, scratch_or_premodify, addr)); - addr = scratch_or_premodify; - } - - /* If the original address involved an AND -16 and we couldn't use an ALTIVEC - memory instruction, recreate the AND now, including the clobber which is - generated by the general ANDSI3/ANDDI3 patterns for the - andi. instruction. */ - if (and_op2 != NULL_RTX) - { - if (! legitimate_indirect_address_p (addr, false)) + else if (int_reg_operand (op1, Pmode)) { - emit_insn (gen_rtx_SET (VOIDmode, scratch, addr)); - addr = scratch; + if ((addr_mask & RELOAD_REG_INDEXED) == 0) + { + emit_insn (gen_rtx_SET (VOIDmode, scratch, addr)); + new_addr = scratch; + } } - if (TARGET_DEBUG_ADDR) + /* Make sure the register class can handle offset addresses. */ + else if (legitimate_lo_sum_address_p (mode, addr, false)) { - fprintf (stderr, "\nAnd addr to register %s, mode = %s: ", - rs6000_reg_names[REGNO (scratch)], GET_MODE_NAME (mode)); - debug_rtx (and_op2); + if ((addr_mask & RELOAD_REG_OFFSET) == 0) + { + emit_insn (gen_rtx_SET (VOIDmode, scratch, addr)); + new_addr = scratch; + } } - and_rtx = gen_rtx_SET (VOIDmode, - scratch, - gen_rtx_AND (Pmode, - addr, - and_op2)); - - cc_clobber = gen_rtx_CLOBBER (CCmode, gen_rtx_SCRATCH (CCmode)); - emit_insn (gen_rtx_PARALLEL (VOIDmode, - gen_rtvec (2, and_rtx, cc_clobber))); - addr = scratch; + else + rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); + + break; + + case SYMBOL_REF: + case CONST: + case LABEL_REF: + if (TARGET_TOC) + emit_insn (gen_rtx_SET (VOIDmode, scratch, + create_TOC_reference (addr, scratch))); + else + rs6000_emit_move (scratch, addr, Pmode); + + new_addr = scratch; + break; + + default: + rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); } /* Adjust the address if it changed. */ - if (addr != XEXP (mem, 0)) + if (addr != new_addr) { - mem = replace_equiv_address_nv (mem, addr); + mem = replace_equiv_address_nv (mem, new_addr); if (TARGET_DEBUG_ADDR) fprintf (stderr, "\nrs6000_secondary_reload_inner, mem adjusted.\n"); } @@ -17350,43 +17511,35 @@ static enum reg_class rs6000_preferred_reload_class (rtx x, enum reg_class rclass) { machine_mode mode = GET_MODE (x); + bool is_constant = CONSTANT_P (x); - if (TARGET_VSX && x == CONST0_RTX (mode) && VSX_REG_CLASS_P (rclass)) - return rclass; - - if (VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) - && (rclass == ALTIVEC_REGS || rclass == VSX_REGS) - && easy_vector_constant (x, mode)) - return ALTIVEC_REGS; - - if ((CONSTANT_P (x) || GET_CODE (x) == PLUS)) + /* Do VSX tests before handling traditional floaitng point registers. */ + if (TARGET_VSX && VSX_REG_CLASS_P (rclass)) { - if (reg_class_subset_p (GENERAL_REGS, rclass)) - return GENERAL_REGS; - if (reg_class_subset_p (BASE_REGS, rclass)) - return BASE_REGS; - return NO_REGS; - } + if (is_constant) + { + /* Zero is always allowed in all VSX registers. */ + if (x == CONST0_RTX (mode)) + return rclass; - if (GET_MODE_CLASS (mode) == MODE_INT && rclass == NON_SPECIAL_REGS) - return GENERAL_REGS; + /* If this is a vector constant that can be formed with a few Altivec + instructions, we want altivec registers. */ + if (GET_CODE (x) == CONST_VECTOR && easy_vector_constant (x, mode)) + return ALTIVEC_REGS; - /* For VSX, prefer the traditional registers for 64-bit values because we can - use the non-VSX loads. Prefer the Altivec registers if Altivec is - handling the vector operations (i.e. V16QI, V8HI, and V4SI), or if we - prefer Altivec loads.. */ - if (rclass == VSX_REGS) - { - if (MEM_P (x) && reg_addr[mode].scalar_in_vmx_p) - { - rtx addr = XEXP (x, 0); - if (rs6000_legitimate_offset_address_p (mode, addr, false, true) - || legitimate_lo_sum_address_p (mode, addr, false)) - return FLOAT_REGS; + /* Force constant to memory. */ + return NO_REGS; } - else if (GET_MODE_SIZE (mode) <= 8 && !reg_addr[mode].scalar_in_vmx_p) + + /* If this is a scalar floating point value, prefer the traditional + floating point registers so that we can use D-form (register+offset) + addressing. */ + if (GET_MODE_SIZE (mode) < 16) return FLOAT_REGS; + /* Prefer the Altivec registers if Altivec is handling the vector + operations (i.e. V16QI, V8HI, and V4SI), or if we prefer Altivec + loads. */ if (VECTOR_UNIT_ALTIVEC_P (mode) || VECTOR_MEM_ALTIVEC_P (mode) || mode == V1TImode) return ALTIVEC_REGS; @@ -17394,6 +17547,18 @@ rs6000_preferred_reload_class (rtx x, en return rclass; } + if (is_constant || GET_CODE (x) == PLUS) + { + if (reg_class_subset_p (GENERAL_REGS, rclass)) + return GENERAL_REGS; + if (reg_class_subset_p (BASE_REGS, rclass)) + return BASE_REGS; + return NO_REGS; + } + + if (GET_MODE_CLASS (mode) == MODE_INT && rclass == NON_SPECIAL_REGS) + return GENERAL_REGS; + return rclass; } @@ -17513,30 +17678,34 @@ rs6000_secondary_reload_class (enum reg_ else regno = -1; + /* If we have VSX register moves, prefer moving scalar values between + Altivec registers and GPR by going via an FPR (and then via memory) + instead of reloading the secondary memory address for Altivec moves. */ + if (TARGET_VSX + && GET_MODE_SIZE (mode) < 16 + && (((rclass == GENERAL_REGS || rclass == BASE_REGS) + && (regno >= 0 && ALTIVEC_REGNO_P (regno))) + || ((rclass == VSX_REGS || rclass == ALTIVEC_REGS) + && (regno >= 0 && INT_REGNO_P (regno))))) + return FLOAT_REGS; + /* We can place anything into GENERAL_REGS and can put GENERAL_REGS into anything. */ if (rclass == GENERAL_REGS || rclass == BASE_REGS || (regno >= 0 && INT_REGNO_P (regno))) return NO_REGS; + /* Constants, memory, and VSX registers can go into VSX registers (both the + traditional floating point and the altivec registers). */ + if (rclass == VSX_REGS + && (regno == -1 || VSX_REGNO_P (regno))) + return NO_REGS; + /* Constants, memory, and FP registers can go into FP registers. */ if ((regno == -1 || FP_REGNO_P (regno)) && (rclass == FLOAT_REGS || rclass == NON_SPECIAL_REGS)) return (mode != SDmode || lra_in_progress) ? NO_REGS : GENERAL_REGS; - /* Memory, and FP/altivec registers can go into fp/altivec registers under - VSX. However, for scalar variables, use the traditional floating point - registers so that we can use offset+register addressing. */ - if (TARGET_VSX - && (regno == -1 || VSX_REGNO_P (regno)) - && VSX_REG_CLASS_P (rclass)) - { - if (GET_MODE_SIZE (mode) < 16) - return FLOAT_REGS; - - return NO_REGS; - } - /* Memory, and AltiVec registers can go into AltiVec registers. */ if ((regno == -1 || ALTIVEC_REGNO_P (regno)) && rclass == ALTIVEC_REGS) Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 217386) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -7850,7 +7850,7 @@ (define_split (define_insn "mov_hardfloat" [(set (match_operand:FMOVE32 0 "nonimmediate_operand" "=!r,!r,m,f,,,,,,Z,?,?r,*c*l,!r,*h,!r,!r") - (match_operand:FMOVE32 1 "input_operand" "r,m,r,f,,j,,,Z,,r,,r, h, 0, G,Fn"))] + (match_operand:FMOVE32 1 "input_operand" "r,m,r,f,,j,,,Z,,r,,r,h,0,G,Fn"))] "(gpc_reg_operand (operands[0], mode) || gpc_reg_operand (operands[1], mode)) && (TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_SINGLE_FLOAT)" @@ -9816,12 +9816,15 @@ (define_insn "*movdf_update2" ;; sequences, using get_attr_length here will smash the operands ;; array. Neither is there an early_cobbler_p predicate. ;; Disallow subregs for E500 so we don't munge frob_di_df_2. +;; Also this optimization interferes with scalars going into +;; altivec registers (the code does reloading through the FPRs). (define_peephole2 [(set (match_operand:DF 0 "gpc_reg_operand" "") (match_operand:DF 1 "any_operand" "")) (set (match_operand:DF 2 "gpc_reg_operand" "") (match_dup 0))] "!(TARGET_E500_DOUBLE && GET_CODE (operands[2]) == SUBREG) + && !TARGET_UPPER_REGS_DF && peep2_reg_dead_p (2, operands[0])" [(set (match_dup 2) (match_dup 1))]) @@ -9830,7 +9833,8 @@ (define_peephole2 (match_operand:SF 1 "any_operand" "")) (set (match_operand:SF 2 "gpc_reg_operand" "") (match_dup 0))] - "peep2_reg_dead_p (2, operands[0])" + "!TARGET_UPPER_REGS_SF + && peep2_reg_dead_p (2, operands[0])" [(set (match_dup 2) (match_dup 1))]) --qOrJKOH36bD5yhNe--