From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22877 invoked by alias); 6 Oct 2012 10:22:45 -0000 Received: (qmail 22863 invoked by uid 22791); 6 Oct 2012 10:22:44 -0000 X-SWARE-Spam-Status: No, hits=-4.3 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,KHOP_RCVD_TRUST,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,TW_EG,TW_ZJ X-Spam-Check-By: sourceware.org Received: from mail-wi0-f179.google.com (HELO mail-wi0-f179.google.com) (209.85.212.179) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 06 Oct 2012 10:22:35 +0000 Received: by mail-wi0-f179.google.com with SMTP id hq7so1358514wib.8 for ; Sat, 06 Oct 2012 03:22:33 -0700 (PDT) Received: by 10.216.132.223 with SMTP id o73mr6502258wei.69.1349518953340; Sat, 06 Oct 2012 03:22:33 -0700 (PDT) Received: from localhost ([2.26.188.227]) by mx.google.com with ESMTPS id q7sm8319771wiy.11.2012.10.06.03.22.30 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 06 Oct 2012 03:22:32 -0700 (PDT) From: Richard Sandiford To: Andrew Pinski Mail-Followup-To: Andrew Pinski ,Uros Bizjak , Paul_Koning@dell.com, gcc-patches@gcc.gnu.org, ebotcazou@adacore.com, kkojima@gcc.gnu.org, aoliva@redhat.com, dje.gcc@gmail.com, uweigand@de.ibm.com, walt@tilera.com, rdsandiford@googlemail.com Cc: Uros Bizjak , Paul_Koning@dell.com, gcc-patches@gcc.gnu.org, ebotcazou@adacore.com, kkojima@gcc.gnu.org, aoliva@redhat.com, dje.gcc@gmail.com, uweigand@de.ibm.com, walt@tilera.com Subject: RFA: Simplifying truncation and integer lowpart subregs In-Reply-To: <87bogkr7o2.fsf@talisman.home> (Richard Sandiford's message of "Tue, 02 Oct 2012 20:33:01 +0100") References: <877grgu0yt.fsf@talisman.home> <3730255.NiV98gQJ1a@polaris> <87bogkr7o2.fsf@talisman.home> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux) Date: Sat, 06 Oct 2012 10:22:00 -0000 Message-ID: <87k3v3oq72.fsf_-_@talisman.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2012-10/txt/msg00607.txt.bz2 [cc:ing sh, spu and tilegx maintainers] Richard Sandiford writes: > Andrew Pinski writes: >> On Thu, Sep 27, 2012 at 11:13 AM, Uros Bizjak wrote: >>> 2012-09-27 Uros Bizjak >>> >>> PR rtl-optimization/54457 >>> * simplify-rtx.c (simplify_subreg): >>> Simplify (subreg:M (op:N ((x:N) (y:N)), 0) >>> to (op:M (subreg:M (x:N) 0) (subreg:M (x:N) 0)), where >>> the outer subreg is effectively a truncation to the original mode M. >> >> >> When I was doing something similar on our internal toolchain at >> Cavium. I found doing this caused a regression on MIPS64 n32 in >> gcc.c-torture/execute/20040709-1.c Where: >> >> >> (insn 15 14 16 2 (set (reg/v:DI 200 [ y ]) >> (reg:DI 2 $2)) t.c:16 301 {*movdi_64bit} >> (expr_list:REG_DEAD (reg:DI 2 $2) >> (nil))) >> >> (insn 16 15 17 2 (set (reg:DI 210) >> (zero_extract:DI (reg/v:DI 200 [ y ]) >> (const_int 29 [0x1d]) >> (const_int 0 [0]))) t.c:16 249 {extzvdi} >> (expr_list:REG_DEAD (reg/v:DI 200 [ y ]) >> (nil))) >> >> (insn 17 16 23 2 (set (reg:SI 211) >> (truncate:SI (reg:DI 210))) t.c:16 175 {truncdisi2} >> (expr_list:REG_DEAD (reg:DI 210) >> (nil))) >> >> Gets converted to: >> (insn 23 17 26 2 (set (reg/i:SI 2 $2) >> (and:SI (reg:SI 2 $2 [+4 ]) >> (const_int 536870911 [0x1fffffff]))) t.c:18 156 {*andsi3} >> (nil)) >> >> Which is considered an ext instruction >> >> And with the Octeon simulator which causes undefined arguments to >> 32bit word operations to come out as 0xDEADBEEF which showed the >> regression. I fixed it by changing it to produce TRUNCATE instead of >> the subreg. >> >> I did the simplification on ior/and rather than plus/minus/mult so the >> issue is only when expanding to this to and/ior. > > Hmm, hadn't thought of that. I think some of the existing subreg > optimisations suffer the same problem. I.e. we can't assume that > subreg truncations of nested operands are OK just because the outer > subreg is OK. > > I've got a patch I'm testing. The idea is to split most of the lowpart subreg handling out of simplify_subreg and apply it to TRUNCATE too. There are three reasons: - I wanted to make the !TRULY_NOOP_TRUNCATION truncation simplifications as similar to subreg truncation simplifications as possible. - Some of the current lowpart subreg simplifications are also correct for vector truncations. - Ideally, using simplify_gen_unary (TRUNCATE, ...) instead of simplify_gen_subreg shouldn't penalise TRULY_NOOP_TRUNCATION targets. There is already code to use gen_lowpart_no_emit for truncations that reduce to subregs, but as things stand, gen_lowpart_no_emit only passes objects like SUBREG, REG, MEM, etc., to simplify_gen_subreg; others go through gen_lowpart_SUBREG and get no recursive simplification. We inherited this code from a 1996 patch (r13058): if ((TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op)) ? (num_sign_bit_copies (op, GET_MODE (op)) > (unsigned int) (GET_MODE_PRECISION (GET_MODE (op)) - GET_MODE_PRECISION (mode))) ... return rtl_hooks.gen_lowpart_no_emit (mode, op); I don't see any reason for the sign-bit check. If truncations are noops, we should be able to use a subreg regardless. Other than removing that check, the patch just moves simplifications around. I've not tried to match new patterns. The other !TRULY_NOOP_TRUNCATION targets are sh64, spu and tilegx. I don't think sh64 has any patterns that would be adversely affected, although the patch ought to make these patterns redundant: (define_insn_and_split "*logical_sidisi3" [(set (match_operand:SI 0 "arith_reg_dest" "=r,r") (truncate:SI (sign_extend:DI (match_operator:SI 3 "logical_operator" [(match_operand:SI 1 "arith_reg_operand" "%r,r") (match_operand:SI 2 "logical_operand" "r,I10")]))))] "TARGET_SHMEDIA" "#" "&& 1" [(set (match_dup 0) (match_dup 3))]) (define_insn_and_split "*logical_sidi3_2" [(set (match_operand:DI 0 "arith_reg_dest" "=r,r") (sign_extend:DI (truncate:SI (sign_extend:DI (match_operator:SI 3 "logical_operator" [(match_operand:SI 1 "arith_reg_operand" "%r,r") (match_operand:SI 2 "logical_operand" "r,I10")])))))] "TARGET_SHMEDIA" "#" "&& 1" [(set (match_dup 0) (sign_extend:DI (match_dup 3)))]) combine should now simplify the first to the normal SI logical op and the second to *logical_sidisi3. I don't think any spu or tilegx patterns are affected either way. Tested on x86_64-linux-gnu, mipsisa32-elf and mipsisa64-elf. Also tested by making sure that there were no code differences for a set of gcc .ii files on gcc20 (-O2 -march=native). OK to install? Richard gcc/ * machmode.h (GET_MODE_UNIT_PRECISION): New macro. * simplify-rtx.c (simplify_truncation): New function. (simplify_unary_operation_1): Use it. Remove sign bit test for !TRULY_NOOP_TRUNCATION_MODES_P. (simplify_subreg): Use simplify_int_lowpart for TRUNCATE. * config/mips/mips.c (mips_truncated_op_cost): New function. (mips_rtx_costs): Adjust test for BADDU. * config/mips/mips.md (*baddu_di): Push truncates to operands. Index: gcc/machmode.h =================================================================== --- gcc/machmode.h 2012-06-23 08:30:36.000000000 +0100 +++ gcc/machmode.h 2012-10-06 10:03:47.146873855 +0100 @@ -217,6 +217,11 @@ #define GET_MODE_UNIT_SIZE(MODE) \ #define GET_MODE_UNIT_BITSIZE(MODE) \ ((unsigned short) (GET_MODE_UNIT_SIZE (MODE) * BITS_PER_UNIT)) +#define GET_MODE_UNIT_PRECISION(MODE) \ + (GET_MODE_INNER (MODE) == VOIDmode \ + ? GET_MODE_PRECISION (MODE) \ + : GET_MODE_PRECISION (GET_MODE_INNER (MODE))) + /* Get the number of units in the object. */ extern const unsigned char mode_nunits[NUM_MACHINE_MODES]; Index: gcc/simplify-rtx.c =================================================================== --- gcc/simplify-rtx.c 2012-10-02 20:34:15.969129966 +0100 +++ gcc/simplify-rtx.c 2012-10-06 10:08:08.349859303 +0100 @@ -564,6 +564,167 @@ simplify_replace_rtx (rtx x, const_rtx o return simplify_replace_fn_rtx (x, old_rtx, 0, new_rtx); } +/* Try to simplify a MODE truncation of OP, which has OP_MODE. */ + +static rtx +simplify_truncation (enum machine_mode mode, rtx op, + enum machine_mode op_mode) +{ + unsigned int precision = GET_MODE_UNIT_PRECISION (mode); + unsigned int op_precision = GET_MODE_UNIT_PRECISION (op_mode); + gcc_assert (precision <= op_precision); + + /* Optimize truncations of zero and sign extended values. */ + if (GET_CODE (op) == ZERO_EXTEND + || GET_CODE (op) == SIGN_EXTEND) + { + /* There are three possibilities. If MODE is the same as the + origmode, we can omit both the extension and the subreg. + If MODE is not larger than the origmode, we can apply the + truncation without the extension. Finally, if the outermode + is larger than the origmode, we can just extend to the appropriate + mode. */ + enum machine_mode origmode = GET_MODE (XEXP (op, 0)); + if (mode == origmode) + return XEXP (op, 0); + else if (precision <= GET_MODE_UNIT_PRECISION (origmode)) + return simplify_gen_unary (TRUNCATE, mode, + XEXP (op, 0), origmode); + else + return simplify_gen_unary (GET_CODE (op), mode, + XEXP (op, 0), origmode); + } + + /* Simplify (truncate:SI (op:DI (x:DI) (y:DI))) + to (op:SI (truncate:SI (x:DI)) (truncate:SI (x:DI))). */ + if (GET_CODE (op) == PLUS + || GET_CODE (op) == MINUS + || GET_CODE (op) == MULT) + { + rtx op0 = simplify_gen_unary (TRUNCATE, mode, XEXP (op, 0), op_mode); + if (op0) + { + rtx op1 = simplify_gen_unary (TRUNCATE, mode, XEXP (op, 1), op_mode); + if (op1) + return simplify_gen_binary (GET_CODE (op), mode, op0, op1); + } + } + + /* Simplify (truncate:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C)) into + to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and + the outer subreg is effectively a truncation to the original mode. */ + if ((GET_CODE (op) == LSHIFTRT + || GET_CODE (op) == ASHIFTRT) + /* Ensure that OP_MODE is at least twice as wide as MODE + to avoid the possibility that an outer LSHIFTRT shifts by more + than the sign extension's sign_bit_copies and introduces zeros + into the high bits of the result. */ + && 2 * precision <= op_precision + && CONST_INT_P (XEXP (op, 1)) + && GET_CODE (XEXP (op, 0)) == SIGN_EXTEND + && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode + && INTVAL (XEXP (op, 1)) < precision) + return simplify_gen_binary (ASHIFTRT, mode, + XEXP (XEXP (op, 0), 0), XEXP (op, 1)); + + /* Likewise (truncate:QI (lshiftrt:SI (zero_extend:SI (x:QI)) C)) into + to (lshiftrt:QI (x:QI) C), where C is a suitable small constant and + the outer subreg is effectively a truncation to the original mode. */ + if ((GET_CODE (op) == LSHIFTRT + || GET_CODE (op) == ASHIFTRT) + && CONST_INT_P (XEXP (op, 1)) + && GET_CODE (XEXP (op, 0)) == ZERO_EXTEND + && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode + && INTVAL (XEXP (op, 1)) < precision) + return simplify_gen_binary (LSHIFTRT, mode, + XEXP (XEXP (op, 0), 0), XEXP (op, 1)); + + /* Likewise (truncate:QI (ashift:SI (zero_extend:SI (x:QI)) C)) into + to (ashift:QI (x:QI) C), where C is a suitable small constant and + the outer subreg is effectively a truncation to the original mode. */ + if (GET_CODE (op) == ASHIFT + && CONST_INT_P (XEXP (op, 1)) + && (GET_CODE (XEXP (op, 0)) == ZERO_EXTEND + || GET_CODE (XEXP (op, 0)) == SIGN_EXTEND) + && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode + && INTVAL (XEXP (op, 1)) < precision) + return simplify_gen_binary (ASHIFT, mode, + XEXP (XEXP (op, 0), 0), XEXP (op, 1)); + + /* Recognize a word extraction from a multi-word subreg. */ + if ((GET_CODE (op) == LSHIFTRT + || GET_CODE (op) == ASHIFTRT) + && SCALAR_INT_MODE_P (mode) + && SCALAR_INT_MODE_P (op_mode) + && precision >= BITS_PER_WORD + && 2 * precision <= op_precision + && CONST_INT_P (XEXP (op, 1)) + && (INTVAL (XEXP (op, 1)) & (precision - 1)) == 0 + && INTVAL (XEXP (op, 1)) >= 0 + && INTVAL (XEXP (op, 1)) < op_precision) + { + int byte = subreg_lowpart_offset (mode, op_mode); + int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT; + return simplify_gen_subreg (mode, XEXP (op, 0), op_mode, + (WORDS_BIG_ENDIAN + ? byte - shifted_bytes + : byte + shifted_bytes)); + } + + /* If we have a TRUNCATE of a right shift of MEM, make a new MEM + and try replacing the TRUNCATE and shift with it. Don't do this + if the MEM has a mode-dependent address. */ + if ((GET_CODE (op) == LSHIFTRT + || GET_CODE (op) == ASHIFTRT) + && SCALAR_INT_MODE_P (op_mode) + && MEM_P (XEXP (op, 0)) + && CONST_INT_P (XEXP (op, 1)) + && (INTVAL (XEXP (op, 1)) % GET_MODE_BITSIZE (mode)) == 0 + && INTVAL (XEXP (op, 1)) > 0 + && INTVAL (XEXP (op, 1)) < GET_MODE_BITSIZE (op_mode) + && ! mode_dependent_address_p (XEXP (XEXP (op, 0), 0), + MEM_ADDR_SPACE (XEXP (op, 0))) + && ! MEM_VOLATILE_P (XEXP (op, 0)) + && (GET_MODE_SIZE (mode) >= UNITS_PER_WORD + || WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN)) + { + int byte = subreg_lowpart_offset (mode, op_mode); + int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT; + return adjust_address_nv (XEXP (op, 0), mode, + (WORDS_BIG_ENDIAN + ? byte - shifted_bytes + : byte + shifted_bytes)); + } + + /* (truncate:SI (OP:DI ({sign,zero}_extend:DI foo:SI))) is + (OP:SI foo:SI) if OP is NEG or ABS. */ + if ((GET_CODE (op) == ABS + || GET_CODE (op) == NEG) + && (GET_CODE (XEXP (op, 0)) == SIGN_EXTEND + || GET_CODE (XEXP (op, 0)) == ZERO_EXTEND) + && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode) + return simplify_gen_unary (GET_CODE (op), mode, + XEXP (XEXP (op, 0), 0), mode); + + /* (truncate:A (subreg:B (truncate:C X) 0)) is + (truncate:A X). */ + if (GET_CODE (op) == SUBREG + && SCALAR_INT_MODE_P (mode) + && SCALAR_INT_MODE_P (op_mode) + && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op))) + && GET_CODE (SUBREG_REG (op)) == TRUNCATE + && subreg_lowpart_p (op)) + return simplify_gen_unary (TRUNCATE, mode, XEXP (SUBREG_REG (op), 0), + GET_MODE (XEXP (SUBREG_REG (op), 0))); + + /* (truncate:A (truncate:B X)) is (truncate:A X). */ + if (GET_CODE (op) == TRUNCATE) + return simplify_gen_unary (TRUNCATE, mode, XEXP (op, 0), + GET_MODE (XEXP (op, 0))); + + return NULL_RTX; +} + /* Try to simplify a unary operation CODE whose output mode is to be MODE with input operand OP whose mode was originally OP_MODE. Return zero if no simplification can be made. */ @@ -689,12 +850,6 @@ simplify_unary_operation_1 (enum rtx_cod op_mode = mode; in2 = simplify_gen_unary (NOT, op_mode, in2, op_mode); - if (GET_CODE (in2) == NOT && GET_CODE (in1) != NOT) - { - rtx tem = in2; - in2 = in1; in1 = tem; - } - return gen_rtx_fmt_ee (GET_CODE (op) == IOR ? AND : IOR, mode, in1, in2); } @@ -821,44 +976,24 @@ simplify_unary_operation_1 (enum rtx_cod if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT) break; - /* (truncate:SI ({sign,zero}_extend:DI foo:SI)) == foo:SI. */ - if ((GET_CODE (op) == SIGN_EXTEND - || GET_CODE (op) == ZERO_EXTEND) - && GET_MODE (XEXP (op, 0)) == mode) - return XEXP (op, 0); - - /* (truncate:SI (OP:DI ({sign,zero}_extend:DI foo:SI))) is - (OP:SI foo:SI) if OP is NEG or ABS. */ - if ((GET_CODE (op) == ABS - || GET_CODE (op) == NEG) - && (GET_CODE (XEXP (op, 0)) == SIGN_EXTEND - || GET_CODE (XEXP (op, 0)) == ZERO_EXTEND) - && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode) - return simplify_gen_unary (GET_CODE (op), mode, - XEXP (XEXP (op, 0), 0), mode); + /* Don't optimize (lshiftrt (mult ...)) as it would interfere + with the umulXi3_highpart patterns. */ + if (GET_CODE (op) == LSHIFTRT + && GET_CODE (XEXP (op, 0)) == MULT) + break; - /* (truncate:A (subreg:B (truncate:C X) 0)) is - (truncate:A X). */ - if (GET_CODE (op) == SUBREG - && GET_CODE (SUBREG_REG (op)) == TRUNCATE - && subreg_lowpart_p (op)) - return simplify_gen_unary (TRUNCATE, mode, XEXP (SUBREG_REG (op), 0), - GET_MODE (XEXP (SUBREG_REG (op), 0))); + if (GET_MODE (op) != VOIDmode) + { + temp = simplify_truncation (mode, op, GET_MODE (op)); + if (temp) + return temp; + } /* If we know that the value is already truncated, we can - replace the TRUNCATE with a SUBREG. Note that this is also - valid if TRULY_NOOP_TRUNCATION is false for the corresponding - modes we just have to apply a different definition for - truncation. But don't do this for an (LSHIFTRT (MULT ...)) - since this will cause problems with the umulXi3_highpart - patterns. */ - if ((TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op)) - ? (num_sign_bit_copies (op, GET_MODE (op)) - > (unsigned int) (GET_MODE_PRECISION (GET_MODE (op)) - - GET_MODE_PRECISION (mode))) - : truncated_to_mode (mode, op)) - && ! (GET_CODE (op) == LSHIFTRT - && GET_CODE (XEXP (op, 0)) == MULT)) + replace the TRUNCATE with a SUBREG. */ + if (GET_MODE_NUNITS (mode) == 1 + && (TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op)) + || truncated_to_mode (mode, op))) return rtl_hooks.gen_lowpart_no_emit (mode, op); /* A truncate of a comparison can be replaced with a subreg if @@ -5595,14 +5730,6 @@ simplify_subreg (enum machine_mode outer return NULL_RTX; } - /* Merge implicit and explicit truncations. */ - - if (GET_CODE (op) == TRUNCATE - && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode) - && subreg_lowpart_offset (outermode, innermode) == byte) - return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0), - GET_MODE (XEXP (op, 0))); - /* SUBREG of a hard register => just change the register number and/or mode. If the hard register is not valid in that mode, suppress this simplification. If the hard register is the stack, @@ -5688,160 +5815,23 @@ simplify_subreg (enum machine_mode outer return NULL_RTX; } - /* Optimize SUBREG truncations of zero and sign extended values. */ - if ((GET_CODE (op) == ZERO_EXTEND - || GET_CODE (op) == SIGN_EXTEND) - && SCALAR_INT_MODE_P (innermode) - && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)) + /* A SUBREG resulting from a zero extension may fold to zero if + it extracts higher bits that the ZERO_EXTEND's source bits. */ + if (GET_CODE (op) == ZERO_EXTEND) { unsigned int bitpos = subreg_lsb_1 (outermode, innermode, byte); - - /* If we're requesting the lowpart of a zero or sign extension, - there are three possibilities. If the outermode is the same - as the origmode, we can omit both the extension and the subreg. - If the outermode is not larger than the origmode, we can apply - the truncation without the extension. Finally, if the outermode - is larger than the origmode, but both are integer modes, we - can just extend to the appropriate mode. */ - if (bitpos == 0) - { - enum machine_mode origmode = GET_MODE (XEXP (op, 0)); - if (outermode == origmode) - return XEXP (op, 0); - if (GET_MODE_PRECISION (outermode) <= GET_MODE_PRECISION (origmode)) - return simplify_gen_subreg (outermode, XEXP (op, 0), origmode, - subreg_lowpart_offset (outermode, - origmode)); - if (SCALAR_INT_MODE_P (outermode)) - return simplify_gen_unary (GET_CODE (op), outermode, - XEXP (op, 0), origmode); - } - - /* A SUBREG resulting from a zero extension may fold to zero if - it extracts higher bits that the ZERO_EXTEND's source bits. */ - if (GET_CODE (op) == ZERO_EXTEND - && bitpos >= GET_MODE_PRECISION (GET_MODE (XEXP (op, 0)))) + if (bitpos >= GET_MODE_PRECISION (GET_MODE (XEXP (op, 0)))) return CONST0_RTX (outermode); } - /* Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) - to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)), where - the outer subreg is effectively a truncation to the original mode. */ - if ((GET_CODE (op) == PLUS - || GET_CODE (op) == MINUS - || GET_CODE (op) == MULT) - && SCALAR_INT_MODE_P (outermode) + if (SCALAR_INT_MODE_P (outermode) && SCALAR_INT_MODE_P (innermode) && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode) && byte == subreg_lowpart_offset (outermode, innermode)) { - rtx op0 = simplify_gen_subreg (outermode, XEXP (op, 0), - innermode, byte); - if (op0) - { - rtx op1 = simplify_gen_subreg (outermode, XEXP (op, 1), - innermode, byte); - if (op1) - return simplify_gen_binary (GET_CODE (op), outermode, op0, op1); - } - } - - /* Simplify (subreg:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C), 0) into - to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and - the outer subreg is effectively a truncation to the original mode. */ - if ((GET_CODE (op) == LSHIFTRT - || GET_CODE (op) == ASHIFTRT) - && SCALAR_INT_MODE_P (outermode) - && SCALAR_INT_MODE_P (innermode) - /* Ensure that OUTERMODE is at least twice as wide as the INNERMODE - to avoid the possibility that an outer LSHIFTRT shifts by more - than the sign extension's sign_bit_copies and introduces zeros - into the high bits of the result. */ - && (2 * GET_MODE_PRECISION (outermode)) <= GET_MODE_PRECISION (innermode) - && CONST_INT_P (XEXP (op, 1)) - && GET_CODE (XEXP (op, 0)) == SIGN_EXTEND - && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode - && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (outermode) - && subreg_lsb_1 (outermode, innermode, byte) == 0) - return simplify_gen_binary (ASHIFTRT, outermode, - XEXP (XEXP (op, 0), 0), XEXP (op, 1)); - - /* Likewise (subreg:QI (lshiftrt:SI (zero_extend:SI (x:QI)) C), 0) into - to (lshiftrt:QI (x:QI) C), where C is a suitable small constant and - the outer subreg is effectively a truncation to the original mode. */ - if ((GET_CODE (op) == LSHIFTRT - || GET_CODE (op) == ASHIFTRT) - && SCALAR_INT_MODE_P (outermode) - && SCALAR_INT_MODE_P (innermode) - && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode) - && CONST_INT_P (XEXP (op, 1)) - && GET_CODE (XEXP (op, 0)) == ZERO_EXTEND - && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode - && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (outermode) - && subreg_lsb_1 (outermode, innermode, byte) == 0) - return simplify_gen_binary (LSHIFTRT, outermode, - XEXP (XEXP (op, 0), 0), XEXP (op, 1)); - - /* Likewise (subreg:QI (ashift:SI (zero_extend:SI (x:QI)) C), 0) into - to (ashift:QI (x:QI) C), where C is a suitable small constant and - the outer subreg is effectively a truncation to the original mode. */ - if (GET_CODE (op) == ASHIFT - && SCALAR_INT_MODE_P (outermode) - && SCALAR_INT_MODE_P (innermode) - && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode) - && CONST_INT_P (XEXP (op, 1)) - && (GET_CODE (XEXP (op, 0)) == ZERO_EXTEND - || GET_CODE (XEXP (op, 0)) == SIGN_EXTEND) - && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode - && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (outermode) - && subreg_lsb_1 (outermode, innermode, byte) == 0) - return simplify_gen_binary (ASHIFT, outermode, - XEXP (XEXP (op, 0), 0), XEXP (op, 1)); - - /* Recognize a word extraction from a multi-word subreg. */ - if ((GET_CODE (op) == LSHIFTRT - || GET_CODE (op) == ASHIFTRT) - && SCALAR_INT_MODE_P (innermode) - && GET_MODE_PRECISION (outermode) >= BITS_PER_WORD - && GET_MODE_PRECISION (innermode) >= (2 * GET_MODE_PRECISION (outermode)) - && CONST_INT_P (XEXP (op, 1)) - && (INTVAL (XEXP (op, 1)) & (GET_MODE_PRECISION (outermode) - 1)) == 0 - && INTVAL (XEXP (op, 1)) >= 0 - && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (innermode) - && byte == subreg_lowpart_offset (outermode, innermode)) - { - int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT; - return simplify_gen_subreg (outermode, XEXP (op, 0), innermode, - (WORDS_BIG_ENDIAN - ? byte - shifted_bytes - : byte + shifted_bytes)); - } - - /* If we have a lowpart SUBREG of a right shift of MEM, make a new MEM - and try replacing the SUBREG and shift with it. Don't do this if - the MEM has a mode-dependent address or if we would be widening it. */ - - if ((GET_CODE (op) == LSHIFTRT - || GET_CODE (op) == ASHIFTRT) - && SCALAR_INT_MODE_P (innermode) - && MEM_P (XEXP (op, 0)) - && CONST_INT_P (XEXP (op, 1)) - && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (GET_MODE (op)) - && (INTVAL (XEXP (op, 1)) % GET_MODE_BITSIZE (outermode)) == 0 - && INTVAL (XEXP (op, 1)) > 0 - && INTVAL (XEXP (op, 1)) < GET_MODE_BITSIZE (innermode) - && ! mode_dependent_address_p (XEXP (XEXP (op, 0), 0), - MEM_ADDR_SPACE (XEXP (op, 0))) - && ! MEM_VOLATILE_P (XEXP (op, 0)) - && byte == subreg_lowpart_offset (outermode, innermode) - && (GET_MODE_SIZE (outermode) >= UNITS_PER_WORD - || WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN)) - { - int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT; - return adjust_address_nv (XEXP (op, 0), outermode, - (WORDS_BIG_ENDIAN - ? byte - shifted_bytes - : byte + shifted_bytes)); + rtx tem = simplify_truncation (outermode, op, innermode); + if (tem) + return tem; } return NULL_RTX; Index: gcc/config/mips/mips.c =================================================================== --- gcc/config/mips/mips.c 2012-10-02 21:02:21.000000000 +0100 +++ gcc/config/mips/mips.c 2012-10-06 10:06:42.617864078 +0100 @@ -3527,6 +3527,17 @@ mips_set_reg_reg_cost (enum machine_mode } } +/* Return the cost of an operand X that can be trucated for free. + SPEED says whether we're optimizing for size or speed. */ + +static int +mips_truncated_op_cost (rtx x, bool speed) +{ + if (GET_CODE (x) == TRUNCATE) + x = XEXP (x, 0); + return set_src_cost (x, speed); +} + /* Implement TARGET_RTX_COSTS. */ static bool @@ -3907,12 +3918,13 @@ mips_rtx_costs (rtx x, int code, int out case ZERO_EXTEND: if (outer_code == SET && ISA_HAS_BADDU - && (GET_CODE (XEXP (x, 0)) == TRUNCATE - || GET_CODE (XEXP (x, 0)) == SUBREG) && GET_MODE (XEXP (x, 0)) == QImode - && GET_CODE (XEXP (XEXP (x, 0), 0)) == PLUS) + && GET_CODE (XEXP (x, 0)) == PLUS) { - *total = set_src_cost (XEXP (XEXP (x, 0), 0), speed); + rtx plus = XEXP (x, 0); + *total = (COSTS_N_INSNS (1) + + mips_truncated_op_cost (XEXP (plus, 0), speed) + + mips_truncated_op_cost (XEXP (plus, 1), speed)); return true; } *total = mips_zero_extend_cost (mode, XEXP (x, 0)); Index: gcc/config/mips/mips.md =================================================================== --- gcc/config/mips/mips.md 2012-10-02 20:37:34.000000000 +0100 +++ gcc/config/mips/mips.md 2012-10-06 10:03:47.156873851 +0100 @@ -1305,9 +1305,8 @@ (define_insn "*baddu_si" (define_insn "*baddu_di" [(set (match_operand:GPR 0 "register_operand" "=d") (zero_extend:GPR - (truncate:QI - (plus:DI (match_operand:DI 1 "register_operand" "d") - (match_operand:DI 2 "register_operand" "d")))))] + (plus:QI (truncate:QI (match_operand:DI 1 "register_operand" "d")) + (truncate:QI (match_operand:DI 2 "register_operand" "d")))))] "ISA_HAS_BADDU && TARGET_64BIT" "baddu\\t%0,%1,%2" [(set_attr "alu_type" "add")])