* [PATCH, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant @ 2012-09-25 8:04 Uros Bizjak 2012-09-26 18:17 ` Richard Sandiford 0 siblings, 1 reply; 23+ messages in thread From: Uros Bizjak @ 2012-09-25 8:04 UTC (permalink / raw) To: gcc-patches [-- Attachment #1: Type: text/plain, Size: 2532 bytes --] Hello! Attached patch fixes the combine deficiency, where combine is not able to recognize combination of: Trying 6 -> 8: Failed to match this instruction: (set (reg:QI 66) (mem/j:QI (plus:SI (subreg:SI (plus:DI (reg/v:DI 62 [ position ]) (const_int 1 [0x1])) 0) (symbol_ref:SI ("array") [flags 0x40] <var_decl 0x7ffff19ad260 array>)) [0 array S1 A8])) This should be a valid address, but the RTX is not accepted in ix86_decompose_address since we have two displacements here. Combine should simplify this RTX to: (set (reg:QI 68) (mem/j:QI (plus:SI (subreg:SI (reg/v:DI 62 [ position ]) 0) (const:SI (plus:SI (symbol_ref:SI ("array") [flags 0x40] <var_decl 0x7f8d1bc41390 array>) (const_int 1 [0x1])))) [0 array S1 A8])) as is the case with -m32 (but rejected for 32bit targets in ix86_address_subreg_operand for unrelated reason). The solution is, to transform operands of PLUS and MINUS RTX in the form of (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C), as is done for standalone subreg operation in 32bit case. This transformation generates canonical (plus:M (plus:M (subreg:N (A 0) C1) C2)) RTXes that can be recognized and further optimized by simplify_plus_minus. The effect of the patch on following testcase (-O2 -mx32 -maddres-mode=short): --cut here-- extern char array[40]; char foo (long long position) { return array[position + 1]; } --cut here-- foo: - addq $1, %rdi - movzbl array(%edi), %eax + movzbl array+1(%edi), %eax ret The patch is effective also for -fpic: foo: movl array@GOTPCREL(%rip), %eax - addq $1, %rdi - movzbl (%eax,%edi), %eax + movzbl 1(%eax,%edi), %eax ret Generated code for vanilla 32bit and 64bit targets is unchanged. 2012-09-25 Uros Bizjak <ubizjak@gmail.com> PR rtl-optimization/54457 * combine.c (combine_simplify_rtx) <RTX_COMM_ARITH, RTX_BIN_ARIT>: For PLUS and MINUS, convert plus_minus_operand_p operands in the form of (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C). * simplify-rtx.c (plus_minus_operand_p): Make global. * rtl.h (plus_minus_operand_p): Declare. testsuite/ChangeLog: 2012-09-25 Uros Bizjak <ubizjak@gmail.com> PR rtl-optimization/54457 * gcc.target/i386/pr54457.c: New test. Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, also on x32 target by H.J [1]. OK for mainline? [1] http://gcc.gnu.org/ml/gcc-testresults/2012-09/msg02448.html Uros. [-- Attachment #2: p.diff.txt --] [-- Type: text/plain, Size: 2590 bytes --] Index: combine.c =================================================================== --- combine.c (revision 191681) +++ combine.c (working copy) @@ -5314,6 +5314,26 @@ combine_simplify_rtx (rtx x, enum machine_mode op0 break; case RTX_COMM_ARITH: case RTX_BIN_ARITH: + if ((code == PLUS || code == MINUS) + && GET_MODE_CLASS (mode) == MODE_INT + && HWI_COMPUTABLE_MODE_P (mode)) + { + /* Adjust operands in the form of (subreg:M (op:N A C) 0) + to (op:M (subreg:N (A 0)) C). This transformation + generates canonical (plus:M (plus:M (subreg:N (A 0) C1) C2)) + RTXes that can be recognized and further optimized by + simplify_plus_minus. */ + if (GET_CODE (XEXP (x, 0)) == SUBREG + && plus_minus_operand_p (SUBREG_REG (XEXP (x, 0)))) + SUBST (XEXP (x, 0), + force_to_mode (XEXP (x, 0), mode, + ~(unsigned HOST_WIDE_INT) 0, 0)); + if (GET_CODE (XEXP (x, 1)) == SUBREG + && plus_minus_operand_p (SUBREG_REG (XEXP (x, 1)))) + SUBST (XEXP (x, 1), + force_to_mode (XEXP (x, 1), mode, + ~(unsigned HOST_WIDE_INT) 0, 0)); + } temp = simplify_binary_operation (code, mode, XEXP (x, 0), XEXP (x, 1)); break; case RTX_BITFIELD_OPS: Index: rtl.h =================================================================== --- rtl.h (revision 191681) +++ rtl.h (working copy) @@ -1895,6 +1895,7 @@ extern bool val_signbit_known_set_p (enum machine_ unsigned HOST_WIDE_INT); extern bool val_signbit_known_clear_p (enum machine_mode, unsigned HOST_WIDE_INT); +extern bool plus_minus_operand_p (const_rtx); /* In reginfo.c */ extern enum machine_mode choose_hard_reg_mode (unsigned int, unsigned int, Index: simplify-rtx.c =================================================================== --- simplify-rtx.c (revision 191681) +++ simplify-rtx.c (working copy) @@ -48,7 +48,6 @@ along with GCC; see the file COPYING3. If not see ((((HOST_WIDE_INT) low) < 0) ? ((HOST_WIDE_INT) -1) : ((HOST_WIDE_INT) 0)) static rtx neg_const_int (enum machine_mode, const_rtx); -static bool plus_minus_operand_p (const_rtx); static bool simplify_plus_minus_op_data_cmp (rtx, rtx); static rtx simplify_plus_minus (enum rtx_code, enum machine_mode, rtx, rtx); static rtx simplify_immed_subreg (enum machine_mode, rtx, enum machine_mode, @@ -4186,7 +4185,7 @@ simplify_plus_minus (enum rtx_code code, enum mach } /* Check whether an operand is suitable for calling simplify_plus_minus. */ -static bool +bool plus_minus_operand_p (const_rtx x) { return GET_CODE (x) == PLUS ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant 2012-09-25 8:04 [PATCH, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant Uros Bizjak @ 2012-09-26 18:17 ` Richard Sandiford 2012-09-26 21:38 ` Eric Botcazou 0 siblings, 1 reply; 23+ messages in thread From: Richard Sandiford @ 2012-09-26 18:17 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches Uros Bizjak <ubizjak@gmail.com> writes: > The solution is, to transform operands of PLUS and MINUS RTX in the > form of (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C), as is > done for standalone subreg operation in 32bit case. I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is a good transformation, but why do we need to handle as special the case where the subreg is itself the operand of a plus or minus? I think it should happen regardless of where the subreg occurs. On the face of it, this feels like something that could legimately go in simplify_subreg. Richard ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant 2012-09-26 18:17 ` Richard Sandiford @ 2012-09-26 21:38 ` Eric Botcazou 2012-09-27 14:25 ` Uros Bizjak 0 siblings, 1 reply; 23+ messages in thread From: Eric Botcazou @ 2012-09-26 21:38 UTC (permalink / raw) To: Richard Sandiford; +Cc: gcc-patches, Uros Bizjak > I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is > a good transformation, but why do we need to handle as special > the case where the subreg is itself the operand of a plus or minus? > I think it should happen regardless of where the subreg occurs. Don't we need to restrict this to the low part though? -- Eric Botcazou ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant 2012-09-26 21:38 ` Eric Botcazou @ 2012-09-27 14:25 ` Uros Bizjak 2012-09-27 16:10 ` Richard Sandiford 0 siblings, 1 reply; 23+ messages in thread From: Uros Bizjak @ 2012-09-27 14:25 UTC (permalink / raw) To: Eric Botcazou; +Cc: Richard Sandiford, gcc-patches [-- Attachment #1: Type: text/plain, Size: 1573 bytes --] On Wed, Sep 26, 2012 at 11:22 PM, Eric Botcazou <ebotcazou@adacore.com> wrote: >> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is >> a good transformation, but why do we need to handle as special >> the case where the subreg is itself the operand of a plus or minus? >> I think it should happen regardless of where the subreg occurs. > > Don't we need to restrict this to the low part though? I have tried this approach with attached patch. Unfortunately, although it survived bootstrap without libjava on x86_64, it failed building libjava with: /home/uros/gcc-svn/trunk/libjava/classpath/javax/swing/plaf/basic/BasicSliderUI.java:1299:0: error: insn does not satisfy its constraints: } ^ (insn 237 398 399 7 (set (reg:SI 1 dx [125]) (plus:SI (subreg:SI (mult:DI (reg:DI 1 dx [orig:72 D.78627 ] [72]) (const_int 2 [0x2])) 0) (reg:SI 5 di))) /home/uros/gcc-svn/trunk/libjava/classpath/javax/swing/plaf/basic/BasicSliderUI.java:1271 240 {*leasi} (expr_list:REG_DEAD (reg:DI 5 di) (nil))) Original RTX was (subreg:SI (plus:DI (mult:DI (...) reg:DI))), which is valid RTX pattern for lea insn, the above is not. Due to these problems, I think the safer approach is to limit the transformation to (plus:SI (subreg:SI (plus:DI (...) 0)) RTXes, as was the case with original patch. This approach would fix a specific problem where simplify_plus_minus is not able to simplify the combined RTX at combine time. Please note, that combined RTXes are always checked for correctness at combine pass. Uros. [-- Attachment #2: s.diff.txt --] [-- Type: text/plain, Size: 1199 bytes --] Index: simplify-rtx.c =================================================================== --- simplify-rtx.c (revision 191796) +++ simplify-rtx.c (working copy) @@ -5689,6 +5688,21 @@ simplify_subreg (enum machine_mode outermode, rtx return CONST0_RTX (outermode); } + /* Simplify (subreg:SI (plus:DI ((x:DI) (y:DI)), 0) + to (plus:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)), where + the outer subreg is effectively a truncation to the original mode. */ + if ((GET_CODE (op) == PLUS + || GET_CODE (op) == MINUS) + && SCALAR_INT_MODE_P (outermode) + && SCALAR_INT_MODE_P (innermode) + && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode) + && subreg_lsb_1 (outermode, innermode, byte) == 0) + return simplify_gen_binary (GET_CODE (op), outermode, + simplify_gen_subreg (outermode, XEXP (op, 0), + innermode, 0), + simplify_gen_subreg (outermode, XEXP (op, 1), + innermode, 0)); + /* Simplify (subreg:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C), 0) into to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and the outer subreg is effectively a truncation to the original mode. */ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant 2012-09-27 14:25 ` Uros Bizjak @ 2012-09-27 16:10 ` Richard Sandiford 2012-09-27 18:20 ` [PATCH v2, " Uros Bizjak 0 siblings, 1 reply; 23+ messages in thread From: Richard Sandiford @ 2012-09-27 16:10 UTC (permalink / raw) To: Uros Bizjak; +Cc: Eric Botcazou, gcc-patches Uros Bizjak <ubizjak@gmail.com> writes: > On Wed, Sep 26, 2012 at 11:22 PM, Eric Botcazou <ebotcazou@adacore.com> wrote: >>> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is >>> a good transformation, but why do we need to handle as special >>> the case where the subreg is itself the operand of a plus or minus? >>> I think it should happen regardless of where the subreg occurs. >> >> Don't we need to restrict this to the low part though? > > I have tried this approach with attached patch. Unfortunately, > although it survived bootstrap without libjava on x86_64, it failed > building libjava with: > > /home/uros/gcc-svn/trunk/libjava/classpath/javax/swing/plaf/basic/BasicSliderUI.java:1299:0: > error: insn does not satisfy its constraints: > } > ^ > (insn 237 398 399 7 (set (reg:SI 1 dx [125]) > (plus:SI (subreg:SI (mult:DI (reg:DI 1 dx [orig:72 D.78627 ] [72]) > (const_int 2 [0x2])) 0) > (reg:SI 5 di))) > /home/uros/gcc-svn/trunk/libjava/classpath/javax/swing/plaf/basic/BasicSliderUI.java:1271 > 240 {*leasi} > (expr_list:REG_DEAD (reg:DI 5 di) > (nil))) > > Original RTX was (subreg:SI (plus:DI (mult:DI (...) reg:DI))), which > is valid RTX pattern for lea insn, the above is not. > > Due to these problems, I think the safer approach is to limit the > transformation to (plus:SI (subreg:SI (plus:DI (...) 0)) RTXes, as was > the case with original patch. This approach would fix a specific > problem where simplify_plus_minus is not able to simplify the combined > RTX at combine time. Please note, that combined RTXes are always > checked for correctness at combine pass. I think instead the (subreg (plus ...)) handling should be applied to (subreg (mult ...)) too. IMO the correct form of the above address ought to be: (set (reg:SI 1 dx [125]) (plus:SI (mult:SI (reg:SI 1 dx [orig:72 D.78627 ] [72]) (const_int 2 [0x2])) (reg:SI 5 di)) Richard ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant 2012-09-27 16:10 ` Richard Sandiford @ 2012-09-27 18:20 ` Uros Bizjak 2012-09-27 18:35 ` Paul_Koning 2012-09-27 20:33 ` [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant Jakub Jelinek 0 siblings, 2 replies; 23+ messages in thread From: Uros Bizjak @ 2012-09-27 18:20 UTC (permalink / raw) To: gcc-patches; +Cc: Richard Sandiford, Eric Botcazou [-- Attachment #1: Type: text/plain, Size: 2736 bytes --] On Thu, Sep 27, 2012 at 4:25 PM, Richard Sandiford <rdsandiford@googlemail.com> wrote: >>>> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is >>>> a good transformation, but why do we need to handle as special >>>> the case where the subreg is itself the operand of a plus or minus? >>>> I think it should happen regardless of where the subreg occurs. >>> >>> Don't we need to restrict this to the low part though? >> >> I have tried this approach with attached patch. Unfortunately, >> although it survived bootstrap without libjava on x86_64, it failed >> building libjava with: >> >> /home/uros/gcc-svn/trunk/libjava/classpath/javax/swing/plaf/basic/BasicSliderUI.java:1299:0: >> error: insn does not satisfy its constraints: >> } >> ^ >> (insn 237 398 399 7 (set (reg:SI 1 dx [125]) >> (plus:SI (subreg:SI (mult:DI (reg:DI 1 dx [orig:72 D.78627 ] [72]) >> (const_int 2 [0x2])) 0) >> (reg:SI 5 di))) >> /home/uros/gcc-svn/trunk/libjava/classpath/javax/swing/plaf/basic/BasicSliderUI.java:1271 >> 240 {*leasi} >> (expr_list:REG_DEAD (reg:DI 5 di) >> (nil))) >> >> Original RTX was (subreg:SI (plus:DI (mult:DI (...) reg:DI))), which >> is valid RTX pattern for lea insn, the above is not. >> >> Due to these problems, I think the safer approach is to limit the >> transformation to (plus:SI (subreg:SI (plus:DI (...) 0)) RTXes, as was >> the case with original patch. This approach would fix a specific >> problem where simplify_plus_minus is not able to simplify the combined >> RTX at combine time. Please note, that combined RTXes are always >> checked for correctness at combine pass. > > I think instead the (subreg (plus ...)) handling should be applied > to (subreg (mult ...)) too. IMO the correct form of the above address > ought to be: > > (set (reg:SI 1 dx [125]) > (plus:SI (mult:SI (reg:SI 1 dx [orig:72 D.78627 ] [72]) > (const_int 2 [0x2])) > (reg:SI 5 di)) Great, this works as expected! After some off-line discussion with Richard, attached is v2 of the patch. 2012-09-27 Uros Bizjak <ubizjak@gmail.com> PR rtl-optimization/54457 * simplify-rtx.c (simplify_subreg): Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)). testsuite/ChangeLog: 2012-09-27 Uros Bizjak <ubizjak@gmail.com> PR rtl-optimization/54457 * gcc.target/i386/pr54457.c: New test. Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}. BTW: I propose that we start with limited selection of opcodes, so x32 autotester will pick and test the patch with SImode addresses. OK for mainline? Uros. [-- Attachment #2: p.diff.txt --] [-- Type: text/plain, Size: 1917 bytes --] Index: simplify-rtx.c =================================================================== --- simplify-rtx.c (revision 191808) +++ simplify-rtx.c (working copy) @@ -5689,6 +5689,28 @@ simplify_subreg (enum machine_mode outermode, rtx return CONST0_RTX (outermode); } + /* Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) + to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)), where + the outer subreg is effectively a truncation to the original mode. */ + if ((GET_CODE (op) == PLUS + || GET_CODE (op) == MINUS + || GET_CODE (op) == MULT) + && SCALAR_INT_MODE_P (outermode) + && SCALAR_INT_MODE_P (innermode) + && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode) + && byte == subreg_lowpart_offset (outermode, innermode)) + { + rtx op0 = simplify_gen_subreg (outermode, XEXP (op, 0), + innermode, byte); + if (op0) + { + rtx op1 = simplify_gen_subreg (outermode, XEXP (op, 1), + innermode, byte); + if (op1) + return simplify_gen_binary (GET_CODE (op), outermode, op0, op1); + } + } + /* Simplify (subreg:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C), 0) into to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and the outer subreg is effectively a truncation to the original mode. */ Index: testsuite/gcc.target/i386/pr54457.c =================================================================== --- testsuite/gcc.target/i386/pr54457.c (revision 0) +++ testsuite/gcc.target/i386/pr54457.c (working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile { target { ! { ia32 } } } } */ +/* { dg-options "-O2 -mx32 -maddress-mode=short" } */ + +extern char array[40]; + +char foo (long long position) +{ + return array[position + 1]; +} + +/* { dg-final { scan-assembler-not "add\[lq\]?\[^\n\]*1" } } */ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant 2012-09-27 18:20 ` [PATCH v2, " Uros Bizjak @ 2012-09-27 18:35 ` Paul_Koning 2012-09-27 19:21 ` Uros Bizjak 2012-09-27 20:33 ` [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant Jakub Jelinek 1 sibling, 1 reply; 23+ messages in thread From: Paul_Koning @ 2012-09-27 18:35 UTC (permalink / raw) To: ubizjak; +Cc: gcc-patches, rdsandiford, ebotcazou On Sep 27, 2012, at 2:04 PM, Uros Bizjak wrote: > > > >>>>> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is >>>>> a good transformation, but why do we need to handle as special >>>>> the case where the subreg is itself the operand of a plus or minus? >>>>> I think it should happen regardless of where the subreg occurs. >>>> >>>> Don't we need to restrict this to the low part though? >>> >>> ... > > After some off-line discussion with Richard, attached is v2 of the patch. > > 2012-09-27 Uros Bizjak <ubizjak@gmail.com> > > PR rtl-optimization/54457 > * simplify-rtx.c (simplify_subreg): > Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) > to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)). > ... Is it just specific to DI -> SI, or is it for any large mode -> smaller mode, like SI -> HI? paul ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant 2012-09-27 18:35 ` Paul_Koning @ 2012-09-27 19:21 ` Uros Bizjak 2012-10-02 2:13 ` Andrew Pinski 0 siblings, 1 reply; 23+ messages in thread From: Uros Bizjak @ 2012-09-27 19:21 UTC (permalink / raw) To: Paul_Koning; +Cc: gcc-patches, rdsandiford, ebotcazou On Thu, Sep 27, 2012 at 8:08 PM, <Paul_Koning@dell.com> wrote: >>>>>> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is >>>>>> a good transformation, but why do we need to handle as special >>>>>> the case where the subreg is itself the operand of a plus or minus? >>>>>> I think it should happen regardless of where the subreg occurs. >>>>> >>>>> Don't we need to restrict this to the low part though? >>>> >>>> ... >> >> After some off-line discussion with Richard, attached is v2 of the patch. >> >> 2012-09-27 Uros Bizjak <ubizjak@gmail.com> >> >> PR rtl-optimization/54457 >> * simplify-rtx.c (simplify_subreg): >> Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) >> to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)). >> ... > > Is it just specific to DI -> SI, or is it for any large mode -> smaller mode, like SI -> HI? Oh, I just copied v1 ChangeLog. The patch converts all modes where size of mode M < size of mode N. Updated ChangeLog reads: 2012-09-27 Uros Bizjak <ubizjak@gmail.com> PR rtl-optimization/54457 * simplify-rtx.c (simplify_subreg): Simplify (subreg:M (op:N ((x:N) (y:N)), 0) to (op:M (subreg:M (x:N) 0) (subreg:M (x:N) 0)), where the outer subreg is effectively a truncation to the original mode M. testsuite/ChangeLog: 2012-09-27 Uros Bizjak <ubizjak@gmail.com> PR rtl-optimization/54457 * gcc.target/i386/pr54457.c: New test. Uros. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant 2012-09-27 19:21 ` Uros Bizjak @ 2012-10-02 2:13 ` Andrew Pinski 2012-10-02 19:32 ` Richard Sandiford 0 siblings, 1 reply; 23+ messages in thread From: Andrew Pinski @ 2012-10-02 2:13 UTC (permalink / raw) To: Uros Bizjak; +Cc: Paul_Koning, gcc-patches, rdsandiford, ebotcazou On Thu, Sep 27, 2012 at 11:13 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Thu, Sep 27, 2012 at 8:08 PM, <Paul_Koning@dell.com> wrote: > >>>>>>> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is >>>>>>> a good transformation, but why do we need to handle as special >>>>>>> the case where the subreg is itself the operand of a plus or minus? >>>>>>> I think it should happen regardless of where the subreg occurs. >>>>>> >>>>>> Don't we need to restrict this to the low part though? >>>>> >>>>> ... >>> >>> After some off-line discussion with Richard, attached is v2 of the patch. >>> >>> 2012-09-27 Uros Bizjak <ubizjak@gmail.com> >>> >>> PR rtl-optimization/54457 >>> * simplify-rtx.c (simplify_subreg): >>> Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) >>> to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)). >>> ... >> >> Is it just specific to DI -> SI, or is it for any large mode -> smaller mode, like SI -> HI? > > Oh, I just copied v1 ChangeLog. The patch converts all modes where > size of mode M < size of mode N. Updated ChangeLog reads: > > 2012-09-27 Uros Bizjak <ubizjak@gmail.com> > > PR rtl-optimization/54457 > * simplify-rtx.c (simplify_subreg): > Simplify (subreg:M (op:N ((x:N) (y:N)), 0) > to (op:M (subreg:M (x:N) 0) (subreg:M (x:N) 0)), where > the outer subreg is effectively a truncation to the original mode M. When I was doing something similar on our internal toolchain at Cavium. I found doing this caused a regression on MIPS64 n32 in gcc.c-torture/execute/20040709-1.c Where: (insn 15 14 16 2 (set (reg/v:DI 200 [ y ]) (reg:DI 2 $2)) t.c:16 301 {*movdi_64bit} (expr_list:REG_DEAD (reg:DI 2 $2) (nil))) (insn 16 15 17 2 (set (reg:DI 210) (zero_extract:DI (reg/v:DI 200 [ y ]) (const_int 29 [0x1d]) (const_int 0 [0]))) t.c:16 249 {extzvdi} (expr_list:REG_DEAD (reg/v:DI 200 [ y ]) (nil))) (insn 17 16 23 2 (set (reg:SI 211) (truncate:SI (reg:DI 210))) t.c:16 175 {truncdisi2} (expr_list:REG_DEAD (reg:DI 210) (nil))) Gets converted to: (insn 23 17 26 2 (set (reg/i:SI 2 $2) (and:SI (reg:SI 2 $2 [+4 ]) (const_int 536870911 [0x1fffffff]))) t.c:18 156 {*andsi3} (nil)) Which is considered an ext instruction And with the Octeon simulator which causes undefined arguments to 32bit word operations to come out as 0xDEADBEEF which showed the regression. I fixed it by changing it to produce TRUNCATE instead of the subreg. I did the simplification on ior/and rather than plus/minus/mult so the issue is only when expanding to this to and/ior. Thanks, Andrew Pinski > > testsuite/ChangeLog: > > 2012-09-27 Uros Bizjak <ubizjak@gmail.com> > > PR rtl-optimization/54457 > * gcc.target/i386/pr54457.c: New test. > > Uros. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant 2012-10-02 2:13 ` Andrew Pinski @ 2012-10-02 19:32 ` Richard Sandiford 2012-10-06 10:22 ` RFA: Simplifying truncation and integer lowpart subregs Richard Sandiford 0 siblings, 1 reply; 23+ messages in thread From: Richard Sandiford @ 2012-10-02 19:32 UTC (permalink / raw) To: Andrew Pinski; +Cc: Uros Bizjak, Paul_Koning, gcc-patches, ebotcazou Andrew Pinski <andrew.pinski@caviumnetworks.com> writes: > On Thu, Sep 27, 2012 at 11:13 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >> 2012-09-27 Uros Bizjak <ubizjak@gmail.com> >> >> PR rtl-optimization/54457 >> * simplify-rtx.c (simplify_subreg): >> Simplify (subreg:M (op:N ((x:N) (y:N)), 0) >> to (op:M (subreg:M (x:N) 0) (subreg:M (x:N) 0)), where >> the outer subreg is effectively a truncation to the original mode M. > > > When I was doing something similar on our internal toolchain at > Cavium. I found doing this caused a regression on MIPS64 n32 in > gcc.c-torture/execute/20040709-1.c Where: > > > (insn 15 14 16 2 (set (reg/v:DI 200 [ y ]) > (reg:DI 2 $2)) t.c:16 301 {*movdi_64bit} > (expr_list:REG_DEAD (reg:DI 2 $2) > (nil))) > > (insn 16 15 17 2 (set (reg:DI 210) > (zero_extract:DI (reg/v:DI 200 [ y ]) > (const_int 29 [0x1d]) > (const_int 0 [0]))) t.c:16 249 {extzvdi} > (expr_list:REG_DEAD (reg/v:DI 200 [ y ]) > (nil))) > > (insn 17 16 23 2 (set (reg:SI 211) > (truncate:SI (reg:DI 210))) t.c:16 175 {truncdisi2} > (expr_list:REG_DEAD (reg:DI 210) > (nil))) > > Gets converted to: > (insn 23 17 26 2 (set (reg/i:SI 2 $2) > (and:SI (reg:SI 2 $2 [+4 ]) > (const_int 536870911 [0x1fffffff]))) t.c:18 156 {*andsi3} > (nil)) > > Which is considered an ext instruction > > And with the Octeon simulator which causes undefined arguments to > 32bit word operations to come out as 0xDEADBEEF which showed the > regression. I fixed it by changing it to produce TRUNCATE instead of > the subreg. > > I did the simplification on ior/and rather than plus/minus/mult so the > issue is only when expanding to this to and/ior. Hmm, hadn't thought of that. I think some of the existing subreg optimisations suffer the same problem. I.e. we can't assume that subreg truncations of nested operands are OK just because the outer subreg is OK. I've got a patch I'm testing. BTW, I haven't forgotten about your other ext patch. Was hoping to see whether we could finally take the opportunity to parameterise the ext* patterns by mode, but got distracted with other patches. Maybe I'll just have to admit I won't get time to try it for 4.8... Richard ^ permalink raw reply [flat|nested] 23+ messages in thread
* RFA: Simplifying truncation and integer lowpart subregs 2012-10-02 19:32 ` Richard Sandiford @ 2012-10-06 10:22 ` Richard Sandiford 2012-10-06 11:13 ` Eric Botcazou 0 siblings, 1 reply; 23+ messages in thread From: Richard Sandiford @ 2012-10-06 10:22 UTC (permalink / raw) To: Andrew Pinski Cc: Uros Bizjak, Paul_Koning, gcc-patches, ebotcazou, kkojima, aoliva, dje.gcc, uweigand, walt [cc:ing sh, spu and tilegx maintainers] Richard Sandiford <rdsandiford@googlemail.com> writes: > Andrew Pinski <andrew.pinski@caviumnetworks.com> writes: >> On Thu, Sep 27, 2012 at 11:13 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >>> 2012-09-27 Uros Bizjak <ubizjak@gmail.com> >>> >>> PR rtl-optimization/54457 >>> * simplify-rtx.c (simplify_subreg): >>> Simplify (subreg:M (op:N ((x:N) (y:N)), 0) >>> to (op:M (subreg:M (x:N) 0) (subreg:M (x:N) 0)), where >>> the outer subreg is effectively a truncation to the original mode M. >> >> >> When I was doing something similar on our internal toolchain at >> Cavium. I found doing this caused a regression on MIPS64 n32 in >> gcc.c-torture/execute/20040709-1.c Where: >> >> >> (insn 15 14 16 2 (set (reg/v:DI 200 [ y ]) >> (reg:DI 2 $2)) t.c:16 301 {*movdi_64bit} >> (expr_list:REG_DEAD (reg:DI 2 $2) >> (nil))) >> >> (insn 16 15 17 2 (set (reg:DI 210) >> (zero_extract:DI (reg/v:DI 200 [ y ]) >> (const_int 29 [0x1d]) >> (const_int 0 [0]))) t.c:16 249 {extzvdi} >> (expr_list:REG_DEAD (reg/v:DI 200 [ y ]) >> (nil))) >> >> (insn 17 16 23 2 (set (reg:SI 211) >> (truncate:SI (reg:DI 210))) t.c:16 175 {truncdisi2} >> (expr_list:REG_DEAD (reg:DI 210) >> (nil))) >> >> Gets converted to: >> (insn 23 17 26 2 (set (reg/i:SI 2 $2) >> (and:SI (reg:SI 2 $2 [+4 ]) >> (const_int 536870911 [0x1fffffff]))) t.c:18 156 {*andsi3} >> (nil)) >> >> Which is considered an ext instruction >> >> And with the Octeon simulator which causes undefined arguments to >> 32bit word operations to come out as 0xDEADBEEF which showed the >> regression. I fixed it by changing it to produce TRUNCATE instead of >> the subreg. >> >> I did the simplification on ior/and rather than plus/minus/mult so the >> issue is only when expanding to this to and/ior. > > Hmm, hadn't thought of that. I think some of the existing subreg > optimisations suffer the same problem. I.e. we can't assume that > subreg truncations of nested operands are OK just because the outer > subreg is OK. > > I've got a patch I'm testing. The idea is to split most of the lowpart subreg handling out of simplify_subreg and apply it to TRUNCATE too. There are three reasons: - I wanted to make the !TRULY_NOOP_TRUNCATION truncation simplifications as similar to subreg truncation simplifications as possible. - Some of the current lowpart subreg simplifications are also correct for vector truncations. - Ideally, using simplify_gen_unary (TRUNCATE, ...) instead of simplify_gen_subreg shouldn't penalise TRULY_NOOP_TRUNCATION targets. There is already code to use gen_lowpart_no_emit for truncations that reduce to subregs, but as things stand, gen_lowpart_no_emit only passes objects like SUBREG, REG, MEM, etc., to simplify_gen_subreg; others go through gen_lowpart_SUBREG and get no recursive simplification. We inherited this code from a 1996 patch (r13058): if ((TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op)) ? (num_sign_bit_copies (op, GET_MODE (op)) > (unsigned int) (GET_MODE_PRECISION (GET_MODE (op)) - GET_MODE_PRECISION (mode))) ... return rtl_hooks.gen_lowpart_no_emit (mode, op); I don't see any reason for the sign-bit check. If truncations are noops, we should be able to use a subreg regardless. Other than removing that check, the patch just moves simplifications around. I've not tried to match new patterns. The other !TRULY_NOOP_TRUNCATION targets are sh64, spu and tilegx. I don't think sh64 has any patterns that would be adversely affected, although the patch ought to make these patterns redundant: (define_insn_and_split "*logical_sidisi3" [(set (match_operand:SI 0 "arith_reg_dest" "=r,r") (truncate:SI (sign_extend:DI (match_operator:SI 3 "logical_operator" [(match_operand:SI 1 "arith_reg_operand" "%r,r") (match_operand:SI 2 "logical_operand" "r,I10")]))))] "TARGET_SHMEDIA" "#" "&& 1" [(set (match_dup 0) (match_dup 3))]) (define_insn_and_split "*logical_sidi3_2" [(set (match_operand:DI 0 "arith_reg_dest" "=r,r") (sign_extend:DI (truncate:SI (sign_extend:DI (match_operator:SI 3 "logical_operator" [(match_operand:SI 1 "arith_reg_operand" "%r,r") (match_operand:SI 2 "logical_operand" "r,I10")])))))] "TARGET_SHMEDIA" "#" "&& 1" [(set (match_dup 0) (sign_extend:DI (match_dup 3)))]) combine should now simplify the first to the normal SI logical op and the second to *logical_sidisi3. I don't think any spu or tilegx patterns are affected either way. Tested on x86_64-linux-gnu, mipsisa32-elf and mipsisa64-elf. Also tested by making sure that there were no code differences for a set of gcc .ii files on gcc20 (-O2 -march=native). OK to install? Richard gcc/ * machmode.h (GET_MODE_UNIT_PRECISION): New macro. * simplify-rtx.c (simplify_truncation): New function. (simplify_unary_operation_1): Use it. Remove sign bit test for !TRULY_NOOP_TRUNCATION_MODES_P. (simplify_subreg): Use simplify_int_lowpart for TRUNCATE. * config/mips/mips.c (mips_truncated_op_cost): New function. (mips_rtx_costs): Adjust test for BADDU. * config/mips/mips.md (*baddu_di<mode>): Push truncates to operands. Index: gcc/machmode.h =================================================================== --- gcc/machmode.h 2012-06-23 08:30:36.000000000 +0100 +++ gcc/machmode.h 2012-10-06 10:03:47.146873855 +0100 @@ -217,6 +217,11 @@ #define GET_MODE_UNIT_SIZE(MODE) \ #define GET_MODE_UNIT_BITSIZE(MODE) \ ((unsigned short) (GET_MODE_UNIT_SIZE (MODE) * BITS_PER_UNIT)) +#define GET_MODE_UNIT_PRECISION(MODE) \ + (GET_MODE_INNER (MODE) == VOIDmode \ + ? GET_MODE_PRECISION (MODE) \ + : GET_MODE_PRECISION (GET_MODE_INNER (MODE))) + /* Get the number of units in the object. */ extern const unsigned char mode_nunits[NUM_MACHINE_MODES]; Index: gcc/simplify-rtx.c =================================================================== --- gcc/simplify-rtx.c 2012-10-02 20:34:15.969129966 +0100 +++ gcc/simplify-rtx.c 2012-10-06 10:08:08.349859303 +0100 @@ -564,6 +564,167 @@ simplify_replace_rtx (rtx x, const_rtx o return simplify_replace_fn_rtx (x, old_rtx, 0, new_rtx); } \f +/* Try to simplify a MODE truncation of OP, which has OP_MODE. */ + +static rtx +simplify_truncation (enum machine_mode mode, rtx op, + enum machine_mode op_mode) +{ + unsigned int precision = GET_MODE_UNIT_PRECISION (mode); + unsigned int op_precision = GET_MODE_UNIT_PRECISION (op_mode); + gcc_assert (precision <= op_precision); + + /* Optimize truncations of zero and sign extended values. */ + if (GET_CODE (op) == ZERO_EXTEND + || GET_CODE (op) == SIGN_EXTEND) + { + /* There are three possibilities. If MODE is the same as the + origmode, we can omit both the extension and the subreg. + If MODE is not larger than the origmode, we can apply the + truncation without the extension. Finally, if the outermode + is larger than the origmode, we can just extend to the appropriate + mode. */ + enum machine_mode origmode = GET_MODE (XEXP (op, 0)); + if (mode == origmode) + return XEXP (op, 0); + else if (precision <= GET_MODE_UNIT_PRECISION (origmode)) + return simplify_gen_unary (TRUNCATE, mode, + XEXP (op, 0), origmode); + else + return simplify_gen_unary (GET_CODE (op), mode, + XEXP (op, 0), origmode); + } + + /* Simplify (truncate:SI (op:DI (x:DI) (y:DI))) + to (op:SI (truncate:SI (x:DI)) (truncate:SI (x:DI))). */ + if (GET_CODE (op) == PLUS + || GET_CODE (op) == MINUS + || GET_CODE (op) == MULT) + { + rtx op0 = simplify_gen_unary (TRUNCATE, mode, XEXP (op, 0), op_mode); + if (op0) + { + rtx op1 = simplify_gen_unary (TRUNCATE, mode, XEXP (op, 1), op_mode); + if (op1) + return simplify_gen_binary (GET_CODE (op), mode, op0, op1); + } + } + + /* Simplify (truncate:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C)) into + to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and + the outer subreg is effectively a truncation to the original mode. */ + if ((GET_CODE (op) == LSHIFTRT + || GET_CODE (op) == ASHIFTRT) + /* Ensure that OP_MODE is at least twice as wide as MODE + to avoid the possibility that an outer LSHIFTRT shifts by more + than the sign extension's sign_bit_copies and introduces zeros + into the high bits of the result. */ + && 2 * precision <= op_precision + && CONST_INT_P (XEXP (op, 1)) + && GET_CODE (XEXP (op, 0)) == SIGN_EXTEND + && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode + && INTVAL (XEXP (op, 1)) < precision) + return simplify_gen_binary (ASHIFTRT, mode, + XEXP (XEXP (op, 0), 0), XEXP (op, 1)); + + /* Likewise (truncate:QI (lshiftrt:SI (zero_extend:SI (x:QI)) C)) into + to (lshiftrt:QI (x:QI) C), where C is a suitable small constant and + the outer subreg is effectively a truncation to the original mode. */ + if ((GET_CODE (op) == LSHIFTRT + || GET_CODE (op) == ASHIFTRT) + && CONST_INT_P (XEXP (op, 1)) + && GET_CODE (XEXP (op, 0)) == ZERO_EXTEND + && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode + && INTVAL (XEXP (op, 1)) < precision) + return simplify_gen_binary (LSHIFTRT, mode, + XEXP (XEXP (op, 0), 0), XEXP (op, 1)); + + /* Likewise (truncate:QI (ashift:SI (zero_extend:SI (x:QI)) C)) into + to (ashift:QI (x:QI) C), where C is a suitable small constant and + the outer subreg is effectively a truncation to the original mode. */ + if (GET_CODE (op) == ASHIFT + && CONST_INT_P (XEXP (op, 1)) + && (GET_CODE (XEXP (op, 0)) == ZERO_EXTEND + || GET_CODE (XEXP (op, 0)) == SIGN_EXTEND) + && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode + && INTVAL (XEXP (op, 1)) < precision) + return simplify_gen_binary (ASHIFT, mode, + XEXP (XEXP (op, 0), 0), XEXP (op, 1)); + + /* Recognize a word extraction from a multi-word subreg. */ + if ((GET_CODE (op) == LSHIFTRT + || GET_CODE (op) == ASHIFTRT) + && SCALAR_INT_MODE_P (mode) + && SCALAR_INT_MODE_P (op_mode) + && precision >= BITS_PER_WORD + && 2 * precision <= op_precision + && CONST_INT_P (XEXP (op, 1)) + && (INTVAL (XEXP (op, 1)) & (precision - 1)) == 0 + && INTVAL (XEXP (op, 1)) >= 0 + && INTVAL (XEXP (op, 1)) < op_precision) + { + int byte = subreg_lowpart_offset (mode, op_mode); + int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT; + return simplify_gen_subreg (mode, XEXP (op, 0), op_mode, + (WORDS_BIG_ENDIAN + ? byte - shifted_bytes + : byte + shifted_bytes)); + } + + /* If we have a TRUNCATE of a right shift of MEM, make a new MEM + and try replacing the TRUNCATE and shift with it. Don't do this + if the MEM has a mode-dependent address. */ + if ((GET_CODE (op) == LSHIFTRT + || GET_CODE (op) == ASHIFTRT) + && SCALAR_INT_MODE_P (op_mode) + && MEM_P (XEXP (op, 0)) + && CONST_INT_P (XEXP (op, 1)) + && (INTVAL (XEXP (op, 1)) % GET_MODE_BITSIZE (mode)) == 0 + && INTVAL (XEXP (op, 1)) > 0 + && INTVAL (XEXP (op, 1)) < GET_MODE_BITSIZE (op_mode) + && ! mode_dependent_address_p (XEXP (XEXP (op, 0), 0), + MEM_ADDR_SPACE (XEXP (op, 0))) + && ! MEM_VOLATILE_P (XEXP (op, 0)) + && (GET_MODE_SIZE (mode) >= UNITS_PER_WORD + || WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN)) + { + int byte = subreg_lowpart_offset (mode, op_mode); + int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT; + return adjust_address_nv (XEXP (op, 0), mode, + (WORDS_BIG_ENDIAN + ? byte - shifted_bytes + : byte + shifted_bytes)); + } + + /* (truncate:SI (OP:DI ({sign,zero}_extend:DI foo:SI))) is + (OP:SI foo:SI) if OP is NEG or ABS. */ + if ((GET_CODE (op) == ABS + || GET_CODE (op) == NEG) + && (GET_CODE (XEXP (op, 0)) == SIGN_EXTEND + || GET_CODE (XEXP (op, 0)) == ZERO_EXTEND) + && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode) + return simplify_gen_unary (GET_CODE (op), mode, + XEXP (XEXP (op, 0), 0), mode); + + /* (truncate:A (subreg:B (truncate:C X) 0)) is + (truncate:A X). */ + if (GET_CODE (op) == SUBREG + && SCALAR_INT_MODE_P (mode) + && SCALAR_INT_MODE_P (op_mode) + && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op))) + && GET_CODE (SUBREG_REG (op)) == TRUNCATE + && subreg_lowpart_p (op)) + return simplify_gen_unary (TRUNCATE, mode, XEXP (SUBREG_REG (op), 0), + GET_MODE (XEXP (SUBREG_REG (op), 0))); + + /* (truncate:A (truncate:B X)) is (truncate:A X). */ + if (GET_CODE (op) == TRUNCATE) + return simplify_gen_unary (TRUNCATE, mode, XEXP (op, 0), + GET_MODE (XEXP (op, 0))); + + return NULL_RTX; +} +\f /* Try to simplify a unary operation CODE whose output mode is to be MODE with input operand OP whose mode was originally OP_MODE. Return zero if no simplification can be made. */ @@ -689,12 +850,6 @@ simplify_unary_operation_1 (enum rtx_cod op_mode = mode; in2 = simplify_gen_unary (NOT, op_mode, in2, op_mode); - if (GET_CODE (in2) == NOT && GET_CODE (in1) != NOT) - { - rtx tem = in2; - in2 = in1; in1 = tem; - } - return gen_rtx_fmt_ee (GET_CODE (op) == IOR ? AND : IOR, mode, in1, in2); } @@ -821,44 +976,24 @@ simplify_unary_operation_1 (enum rtx_cod if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT) break; - /* (truncate:SI ({sign,zero}_extend:DI foo:SI)) == foo:SI. */ - if ((GET_CODE (op) == SIGN_EXTEND - || GET_CODE (op) == ZERO_EXTEND) - && GET_MODE (XEXP (op, 0)) == mode) - return XEXP (op, 0); - - /* (truncate:SI (OP:DI ({sign,zero}_extend:DI foo:SI))) is - (OP:SI foo:SI) if OP is NEG or ABS. */ - if ((GET_CODE (op) == ABS - || GET_CODE (op) == NEG) - && (GET_CODE (XEXP (op, 0)) == SIGN_EXTEND - || GET_CODE (XEXP (op, 0)) == ZERO_EXTEND) - && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode) - return simplify_gen_unary (GET_CODE (op), mode, - XEXP (XEXP (op, 0), 0), mode); + /* Don't optimize (lshiftrt (mult ...)) as it would interfere + with the umulXi3_highpart patterns. */ + if (GET_CODE (op) == LSHIFTRT + && GET_CODE (XEXP (op, 0)) == MULT) + break; - /* (truncate:A (subreg:B (truncate:C X) 0)) is - (truncate:A X). */ - if (GET_CODE (op) == SUBREG - && GET_CODE (SUBREG_REG (op)) == TRUNCATE - && subreg_lowpart_p (op)) - return simplify_gen_unary (TRUNCATE, mode, XEXP (SUBREG_REG (op), 0), - GET_MODE (XEXP (SUBREG_REG (op), 0))); + if (GET_MODE (op) != VOIDmode) + { + temp = simplify_truncation (mode, op, GET_MODE (op)); + if (temp) + return temp; + } /* If we know that the value is already truncated, we can - replace the TRUNCATE with a SUBREG. Note that this is also - valid if TRULY_NOOP_TRUNCATION is false for the corresponding - modes we just have to apply a different definition for - truncation. But don't do this for an (LSHIFTRT (MULT ...)) - since this will cause problems with the umulXi3_highpart - patterns. */ - if ((TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op)) - ? (num_sign_bit_copies (op, GET_MODE (op)) - > (unsigned int) (GET_MODE_PRECISION (GET_MODE (op)) - - GET_MODE_PRECISION (mode))) - : truncated_to_mode (mode, op)) - && ! (GET_CODE (op) == LSHIFTRT - && GET_CODE (XEXP (op, 0)) == MULT)) + replace the TRUNCATE with a SUBREG. */ + if (GET_MODE_NUNITS (mode) == 1 + && (TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op)) + || truncated_to_mode (mode, op))) return rtl_hooks.gen_lowpart_no_emit (mode, op); /* A truncate of a comparison can be replaced with a subreg if @@ -5595,14 +5730,6 @@ simplify_subreg (enum machine_mode outer return NULL_RTX; } - /* Merge implicit and explicit truncations. */ - - if (GET_CODE (op) == TRUNCATE - && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode) - && subreg_lowpart_offset (outermode, innermode) == byte) - return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0), - GET_MODE (XEXP (op, 0))); - /* SUBREG of a hard register => just change the register number and/or mode. If the hard register is not valid in that mode, suppress this simplification. If the hard register is the stack, @@ -5688,160 +5815,23 @@ simplify_subreg (enum machine_mode outer return NULL_RTX; } - /* Optimize SUBREG truncations of zero and sign extended values. */ - if ((GET_CODE (op) == ZERO_EXTEND - || GET_CODE (op) == SIGN_EXTEND) - && SCALAR_INT_MODE_P (innermode) - && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)) + /* A SUBREG resulting from a zero extension may fold to zero if + it extracts higher bits that the ZERO_EXTEND's source bits. */ + if (GET_CODE (op) == ZERO_EXTEND) { unsigned int bitpos = subreg_lsb_1 (outermode, innermode, byte); - - /* If we're requesting the lowpart of a zero or sign extension, - there are three possibilities. If the outermode is the same - as the origmode, we can omit both the extension and the subreg. - If the outermode is not larger than the origmode, we can apply - the truncation without the extension. Finally, if the outermode - is larger than the origmode, but both are integer modes, we - can just extend to the appropriate mode. */ - if (bitpos == 0) - { - enum machine_mode origmode = GET_MODE (XEXP (op, 0)); - if (outermode == origmode) - return XEXP (op, 0); - if (GET_MODE_PRECISION (outermode) <= GET_MODE_PRECISION (origmode)) - return simplify_gen_subreg (outermode, XEXP (op, 0), origmode, - subreg_lowpart_offset (outermode, - origmode)); - if (SCALAR_INT_MODE_P (outermode)) - return simplify_gen_unary (GET_CODE (op), outermode, - XEXP (op, 0), origmode); - } - - /* A SUBREG resulting from a zero extension may fold to zero if - it extracts higher bits that the ZERO_EXTEND's source bits. */ - if (GET_CODE (op) == ZERO_EXTEND - && bitpos >= GET_MODE_PRECISION (GET_MODE (XEXP (op, 0)))) + if (bitpos >= GET_MODE_PRECISION (GET_MODE (XEXP (op, 0)))) return CONST0_RTX (outermode); } - /* Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) - to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)), where - the outer subreg is effectively a truncation to the original mode. */ - if ((GET_CODE (op) == PLUS - || GET_CODE (op) == MINUS - || GET_CODE (op) == MULT) - && SCALAR_INT_MODE_P (outermode) + if (SCALAR_INT_MODE_P (outermode) && SCALAR_INT_MODE_P (innermode) && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode) && byte == subreg_lowpart_offset (outermode, innermode)) { - rtx op0 = simplify_gen_subreg (outermode, XEXP (op, 0), - innermode, byte); - if (op0) - { - rtx op1 = simplify_gen_subreg (outermode, XEXP (op, 1), - innermode, byte); - if (op1) - return simplify_gen_binary (GET_CODE (op), outermode, op0, op1); - } - } - - /* Simplify (subreg:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C), 0) into - to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and - the outer subreg is effectively a truncation to the original mode. */ - if ((GET_CODE (op) == LSHIFTRT - || GET_CODE (op) == ASHIFTRT) - && SCALAR_INT_MODE_P (outermode) - && SCALAR_INT_MODE_P (innermode) - /* Ensure that OUTERMODE is at least twice as wide as the INNERMODE - to avoid the possibility that an outer LSHIFTRT shifts by more - than the sign extension's sign_bit_copies and introduces zeros - into the high bits of the result. */ - && (2 * GET_MODE_PRECISION (outermode)) <= GET_MODE_PRECISION (innermode) - && CONST_INT_P (XEXP (op, 1)) - && GET_CODE (XEXP (op, 0)) == SIGN_EXTEND - && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode - && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (outermode) - && subreg_lsb_1 (outermode, innermode, byte) == 0) - return simplify_gen_binary (ASHIFTRT, outermode, - XEXP (XEXP (op, 0), 0), XEXP (op, 1)); - - /* Likewise (subreg:QI (lshiftrt:SI (zero_extend:SI (x:QI)) C), 0) into - to (lshiftrt:QI (x:QI) C), where C is a suitable small constant and - the outer subreg is effectively a truncation to the original mode. */ - if ((GET_CODE (op) == LSHIFTRT - || GET_CODE (op) == ASHIFTRT) - && SCALAR_INT_MODE_P (outermode) - && SCALAR_INT_MODE_P (innermode) - && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode) - && CONST_INT_P (XEXP (op, 1)) - && GET_CODE (XEXP (op, 0)) == ZERO_EXTEND - && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode - && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (outermode) - && subreg_lsb_1 (outermode, innermode, byte) == 0) - return simplify_gen_binary (LSHIFTRT, outermode, - XEXP (XEXP (op, 0), 0), XEXP (op, 1)); - - /* Likewise (subreg:QI (ashift:SI (zero_extend:SI (x:QI)) C), 0) into - to (ashift:QI (x:QI) C), where C is a suitable small constant and - the outer subreg is effectively a truncation to the original mode. */ - if (GET_CODE (op) == ASHIFT - && SCALAR_INT_MODE_P (outermode) - && SCALAR_INT_MODE_P (innermode) - && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode) - && CONST_INT_P (XEXP (op, 1)) - && (GET_CODE (XEXP (op, 0)) == ZERO_EXTEND - || GET_CODE (XEXP (op, 0)) == SIGN_EXTEND) - && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode - && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (outermode) - && subreg_lsb_1 (outermode, innermode, byte) == 0) - return simplify_gen_binary (ASHIFT, outermode, - XEXP (XEXP (op, 0), 0), XEXP (op, 1)); - - /* Recognize a word extraction from a multi-word subreg. */ - if ((GET_CODE (op) == LSHIFTRT - || GET_CODE (op) == ASHIFTRT) - && SCALAR_INT_MODE_P (innermode) - && GET_MODE_PRECISION (outermode) >= BITS_PER_WORD - && GET_MODE_PRECISION (innermode) >= (2 * GET_MODE_PRECISION (outermode)) - && CONST_INT_P (XEXP (op, 1)) - && (INTVAL (XEXP (op, 1)) & (GET_MODE_PRECISION (outermode) - 1)) == 0 - && INTVAL (XEXP (op, 1)) >= 0 - && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (innermode) - && byte == subreg_lowpart_offset (outermode, innermode)) - { - int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT; - return simplify_gen_subreg (outermode, XEXP (op, 0), innermode, - (WORDS_BIG_ENDIAN - ? byte - shifted_bytes - : byte + shifted_bytes)); - } - - /* If we have a lowpart SUBREG of a right shift of MEM, make a new MEM - and try replacing the SUBREG and shift with it. Don't do this if - the MEM has a mode-dependent address or if we would be widening it. */ - - if ((GET_CODE (op) == LSHIFTRT - || GET_CODE (op) == ASHIFTRT) - && SCALAR_INT_MODE_P (innermode) - && MEM_P (XEXP (op, 0)) - && CONST_INT_P (XEXP (op, 1)) - && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (GET_MODE (op)) - && (INTVAL (XEXP (op, 1)) % GET_MODE_BITSIZE (outermode)) == 0 - && INTVAL (XEXP (op, 1)) > 0 - && INTVAL (XEXP (op, 1)) < GET_MODE_BITSIZE (innermode) - && ! mode_dependent_address_p (XEXP (XEXP (op, 0), 0), - MEM_ADDR_SPACE (XEXP (op, 0))) - && ! MEM_VOLATILE_P (XEXP (op, 0)) - && byte == subreg_lowpart_offset (outermode, innermode) - && (GET_MODE_SIZE (outermode) >= UNITS_PER_WORD - || WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN)) - { - int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT; - return adjust_address_nv (XEXP (op, 0), outermode, - (WORDS_BIG_ENDIAN - ? byte - shifted_bytes - : byte + shifted_bytes)); + rtx tem = simplify_truncation (outermode, op, innermode); + if (tem) + return tem; } return NULL_RTX; Index: gcc/config/mips/mips.c =================================================================== --- gcc/config/mips/mips.c 2012-10-02 21:02:21.000000000 +0100 +++ gcc/config/mips/mips.c 2012-10-06 10:06:42.617864078 +0100 @@ -3527,6 +3527,17 @@ mips_set_reg_reg_cost (enum machine_mode } } +/* Return the cost of an operand X that can be trucated for free. + SPEED says whether we're optimizing for size or speed. */ + +static int +mips_truncated_op_cost (rtx x, bool speed) +{ + if (GET_CODE (x) == TRUNCATE) + x = XEXP (x, 0); + return set_src_cost (x, speed); +} + /* Implement TARGET_RTX_COSTS. */ static bool @@ -3907,12 +3918,13 @@ mips_rtx_costs (rtx x, int code, int out case ZERO_EXTEND: if (outer_code == SET && ISA_HAS_BADDU - && (GET_CODE (XEXP (x, 0)) == TRUNCATE - || GET_CODE (XEXP (x, 0)) == SUBREG) && GET_MODE (XEXP (x, 0)) == QImode - && GET_CODE (XEXP (XEXP (x, 0), 0)) == PLUS) + && GET_CODE (XEXP (x, 0)) == PLUS) { - *total = set_src_cost (XEXP (XEXP (x, 0), 0), speed); + rtx plus = XEXP (x, 0); + *total = (COSTS_N_INSNS (1) + + mips_truncated_op_cost (XEXP (plus, 0), speed) + + mips_truncated_op_cost (XEXP (plus, 1), speed)); return true; } *total = mips_zero_extend_cost (mode, XEXP (x, 0)); Index: gcc/config/mips/mips.md =================================================================== --- gcc/config/mips/mips.md 2012-10-02 20:37:34.000000000 +0100 +++ gcc/config/mips/mips.md 2012-10-06 10:03:47.156873851 +0100 @@ -1305,9 +1305,8 @@ (define_insn "*baddu_si" (define_insn "*baddu_di<mode>" [(set (match_operand:GPR 0 "register_operand" "=d") (zero_extend:GPR - (truncate:QI - (plus:DI (match_operand:DI 1 "register_operand" "d") - (match_operand:DI 2 "register_operand" "d")))))] + (plus:QI (truncate:QI (match_operand:DI 1 "register_operand" "d")) + (truncate:QI (match_operand:DI 2 "register_operand" "d")))))] "ISA_HAS_BADDU && TARGET_64BIT" "baddu\\t%0,%1,%2" [(set_attr "alu_type" "add")]) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFA: Simplifying truncation and integer lowpart subregs 2012-10-06 10:22 ` RFA: Simplifying truncation and integer lowpart subregs Richard Sandiford @ 2012-10-06 11:13 ` Eric Botcazou 2012-10-06 12:39 ` Richard Sandiford 0 siblings, 1 reply; 23+ messages in thread From: Eric Botcazou @ 2012-10-06 11:13 UTC (permalink / raw) To: Richard Sandiford Cc: gcc-patches, Andrew Pinski, Uros Bizjak, Paul_Koning, kkojima, aoliva, dje.gcc, uweigand, walt > Tested on x86_64-linux-gnu, mipsisa32-elf and mipsisa64-elf. Also tested > by making sure that there were no code differences for a set of gcc .ii > files on gcc20 (-O2 -march=native). OK to install? Are you sure that generating TRUNCATEs out of nowhere in simplify_subreg is always correct? > gcc/ > * machmode.h (GET_MODE_UNIT_PRECISION): New macro. > * simplify-rtx.c (simplify_truncation): New function. You should say where it comes from. > (simplify_unary_operation_1): Use it. Remove sign bit test > for !TRULY_NOOP_TRUNCATION_MODES_P. (simplify_unary_operation_1) <TRUNCATE>: ... > (simplify_subreg): Use simplify_int_lowpart for TRUNCATE. This function doesn't exist. And this is misleading, it's not just for TRUNCATE, it's for a truncation to the lowpart. > /* Try to simplify a unary operation CODE whose output mode is to be > MODE with input operand OP whose mode was originally OP_MODE. > Return zero if no simplification can be made. */ > @@ -689,12 +850,6 @@ simplify_unary_operation_1 (enum rtx_cod > op_mode = mode; > in2 = simplify_gen_unary (NOT, op_mode, in2, op_mode); > > - if (GET_CODE (in2) == NOT && GET_CODE (in1) != NOT) > - { > - rtx tem = in2; > - in2 = in1; in1 = tem; > - } > - > return gen_rtx_fmt_ee (GET_CODE (op) == IOR ? AND : IOR, > mode, in1, in2); > } Why is this hunk here? > @@ -5595,14 +5730,6 @@ simplify_subreg (enum machine_mode outer > return NULL_RTX; > } > > - /* Merge implicit and explicit truncations. */ > - > - if (GET_CODE (op) == TRUNCATE > - && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode) > - && subreg_lowpart_offset (outermode, innermode) == byte) > - return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0), > - GET_MODE (XEXP (op, 0))); > - > /* SUBREG of a hard register => just change the register number > and/or mode. If the hard register is not valid in that mode, > suppress this simplification. If the hard register is the stack, Likewise. -- Eric Botcazou ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFA: Simplifying truncation and integer lowpart subregs 2012-10-06 11:13 ` Eric Botcazou @ 2012-10-06 12:39 ` Richard Sandiford 2012-10-06 13:05 ` Eric Botcazou 0 siblings, 1 reply; 23+ messages in thread From: Richard Sandiford @ 2012-10-06 12:39 UTC (permalink / raw) To: Eric Botcazou Cc: gcc-patches, Andrew Pinski, Uros Bizjak, Paul_Koning, kkojima, aoliva, dje.gcc, uweigand, walt Thanks for the review. Eric Botcazou <ebotcazou@adacore.com> writes: >> Tested on x86_64-linux-gnu, mipsisa32-elf and mipsisa64-elf. Also tested >> by making sure that there were no code differences for a set of gcc .ii >> files on gcc20 (-O2 -march=native). OK to install? > > Are you sure that generating TRUNCATEs out of nowhere in simplify_subreg is > always correct? We only use TRUNCATEs when truncating the operands of an existing operation. We don't convert the SUBREG itself to a TRUNCATE. I don't think it matters whether the narrowing of the operation was brought about by a TRUNCATE or by a SUBREG. I think modelling it as a TRUNCATE operation is correct for !TRULY_NOOP_TRUNCATION (it's the bug that Andrew pointed out). And we shouldn't generate an actual TRUNCATE rtx for TRULY_NOOP_TRUNCATION (the thing about making simplify_gen_unary (TRUNCATE, ...) no worse than simplify_gen_subreg for those targets). I suppose: /* We can't handle truncation to a partial integer mode here because we don't know the real bitsize of the partial integer mode. */ if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT) break; might be a problem though; we should still allow a subreg to be generated. Is that what you were thinking of, or something else? I'm certainly open to other ideas though. Or do you think that I was wrong about doing this narrowing in simplify_subreg to begin with? >> gcc/ >> * machmode.h (GET_MODE_UNIT_PRECISION): New macro. >> * simplify-rtx.c (simplify_truncation): New function. > > You should say where it comes from. OK. >> (simplify_unary_operation_1): Use it. Remove sign bit test >> for !TRULY_NOOP_TRUNCATION_MODES_P. > > (simplify_unary_operation_1) <TRUNCATE>: ... > >> (simplify_subreg): Use simplify_int_lowpart for TRUNCATE. > > This function doesn't exist. And this is misleading, it's not just for > TRUNCATE, it's for a truncation to the lowpart. Curses, forgot to update that part. Thanks. >> /* Try to simplify a unary operation CODE whose output mode is to be >> MODE with input operand OP whose mode was originally OP_MODE. >> Return zero if no simplification can be made. */ >> @@ -689,12 +850,6 @@ simplify_unary_operation_1 (enum rtx_cod >> op_mode = mode; >> in2 = simplify_gen_unary (NOT, op_mode, in2, op_mode); >> >> - if (GET_CODE (in2) == NOT && GET_CODE (in1) != NOT) >> - { >> - rtx tem = in2; >> - in2 = in1; in1 = tem; >> - } >> - >> return gen_rtx_fmt_ee (GET_CODE (op) == IOR ? AND : IOR, >> mode, in1, in2); >> } > > Why is this hunk here? Sorry, I've no idea :-( >> @@ -5595,14 +5730,6 @@ simplify_subreg (enum machine_mode outer >> return NULL_RTX; >> } >> >> - /* Merge implicit and explicit truncations. */ >> - >> - if (GET_CODE (op) == TRUNCATE >> - && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode) >> - && subreg_lowpart_offset (outermode, innermode) == byte) >> - return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0), >> - GET_MODE (XEXP (op, 0))); >> - >> /* SUBREG of a hard register => just change the register number >> and/or mode. If the hard register is not valid in that mode, >> suppress this simplification. If the hard register is the stack, > > Likewise. This moved to simplify_truncation: /* (truncate:A (truncate:B X)) is (truncate:A X). */ if (GET_CODE (op) == TRUNCATE) return simplify_gen_unary (TRUNCATE, mode, XEXP (op, 0), GET_MODE (XEXP (op, 0))); I haven't included an updated patch because of your general concern above. Richard ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFA: Simplifying truncation and integer lowpart subregs 2012-10-06 12:39 ` Richard Sandiford @ 2012-10-06 13:05 ` Eric Botcazou 2012-10-07 8:56 ` Richard Sandiford 0 siblings, 1 reply; 23+ messages in thread From: Eric Botcazou @ 2012-10-06 13:05 UTC (permalink / raw) To: Richard Sandiford Cc: gcc-patches, Andrew Pinski, Uros Bizjak, Paul_Koning, kkojima, aoliva, dje.gcc, uweigand, walt > I think modelling it as a TRUNCATE operation is correct for > !TRULY_NOOP_TRUNCATION (it's the bug that Andrew pointed out). > And we shouldn't generate an actual TRUNCATE rtx for > TRULY_NOOP_TRUNCATION (the thing about making > simplify_gen_unary (TRUNCATE, ...) no worse than simplify_gen_subreg > for those targets). I suppose: > > /* We can't handle truncation to a partial integer mode here > because we don't know the real bitsize of the partial > integer mode. */ > if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT) > break; > > might be a problem though; we should still allow a subreg to be > generated. Is that what you were thinking of, or something else? I was thinking of the !TRULY_NOOP_TRUNCATION case, where the two operations aren't equivalent. Generating TRUNCATE in simplify_subreg seems suspicious to me in this case but, if not doing it is the source of the bug, I guess I need to do some homework on this TRULY_NOOP_TRUNCATION stuff. :-) Maybe add a blurb to the head comment of simplify_truncation, explaining that it is valid to call the function both for TRUNCATEs and truncations to the lowpart, and why it is correct to generate new TRUNCATEs in the latter case. -- Eric Botcazou ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFA: Simplifying truncation and integer lowpart subregs 2012-10-06 13:05 ` Eric Botcazou @ 2012-10-07 8:56 ` Richard Sandiford 2012-10-07 12:36 ` Eric Botcazou 2012-11-28 2:27 ` Ramana Radhakrishnan 0 siblings, 2 replies; 23+ messages in thread From: Richard Sandiford @ 2012-10-07 8:56 UTC (permalink / raw) To: Eric Botcazou Cc: gcc-patches, Andrew Pinski, Uros Bizjak, Paul_Koning, kkojima, aoliva, dje.gcc, uweigand, walt Eric Botcazou <ebotcazou@adacore.com> writes: >> I think modelling it as a TRUNCATE operation is correct for >> !TRULY_NOOP_TRUNCATION (it's the bug that Andrew pointed out). >> And we shouldn't generate an actual TRUNCATE rtx for >> TRULY_NOOP_TRUNCATION (the thing about making >> simplify_gen_unary (TRUNCATE, ...) no worse than simplify_gen_subreg >> for those targets). I suppose: >> >> /* We can't handle truncation to a partial integer mode here >> because we don't know the real bitsize of the partial >> integer mode. */ >> if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT) >> break; >> >> might be a problem though; we should still allow a subreg to be >> generated. Is that what you were thinking of, or something else? > > I was thinking of the !TRULY_NOOP_TRUNCATION case, where the two operations > aren't equivalent. Generating TRUNCATE in simplify_subreg seems suspicious to > me in this case but, if not doing it is the source of the bug, I guess I need > to do some homework on this TRULY_NOOP_TRUNCATION stuff. :-) > > Maybe add a blurb to the head comment of simplify_truncation, explaining that > it is valid to call the function both for TRUNCATEs and truncations to the > lowpart, and why it is correct to generate new TRUNCATEs in the latter case. Yeah, in hindsight, the patch was definitely lacking commentary. How about the patch below? It also fixes the partial int case and gets rid of the errant NOT hunk. Tested in the same way as before. Richard gcc/ * machmode.h (GET_MODE_UNIT_PRECISION): New macro. * simplify-rtx.c (simplify_truncation): New function, extracted from simplify_subreg and (in small part) from simplify_unary_operation_1. (simplify_unary_operation_1) <TRUNCATE>: Use it. Remove sign bit test for !TRULY_NOOP_TRUNCATION_MODES_P. (simplify_subreg): Use simplify_truncate for lowpart subregs where both the inner and outer modes are scalar integers. * config/mips/mips.c (mips_truncated_op_cost): New function. (mips_rtx_costs): Adjust test for BADDU. * config/mips/mips.md (*baddu_di<mode>): Push truncates to operands. Index: gcc/machmode.h =================================================================== --- gcc/machmode.h 2012-10-06 11:22:55.888609330 +0100 +++ gcc/machmode.h 2012-10-06 11:24:42.459603393 +0100 @@ -217,6 +217,11 @@ #define GET_MODE_UNIT_SIZE(MODE) \ #define GET_MODE_UNIT_BITSIZE(MODE) \ ((unsigned short) (GET_MODE_UNIT_SIZE (MODE) * BITS_PER_UNIT)) +#define GET_MODE_UNIT_PRECISION(MODE) \ + (GET_MODE_INNER (MODE) == VOIDmode \ + ? GET_MODE_PRECISION (MODE) \ + : GET_MODE_PRECISION (GET_MODE_INNER (MODE))) + /* Get the number of units in the object. */ extern const unsigned char mode_nunits[NUM_MACHINE_MODES]; Index: gcc/simplify-rtx.c =================================================================== --- gcc/simplify-rtx.c 2012-10-06 11:23:57.005605924 +0100 +++ gcc/simplify-rtx.c 2012-10-06 17:14:40.018658588 +0100 @@ -564,6 +564,212 @@ simplify_replace_rtx (rtx x, const_rtx o return simplify_replace_fn_rtx (x, old_rtx, 0, new_rtx); } \f +/* Try to simplify a MODE truncation of OP, which has OP_MODE. + Only handle cases where the truncated value is inherently an rvalue. + + RTL provides two ways of truncating a value: + + 1. a lowpart subreg. This form is only a truncation when both + the outer and inner modes (here MODE and OP_MODE respectively) + are scalar integers, and only then when the subreg is used as + an rvalue. + + It is only valid to form such truncating subregs if the + truncation requires no action by the target. The onus for + proving this is on the creator of the subreg -- e.g. the + caller to simplify_subreg or simplify_gen_subreg -- and typically + involves either TRULY_NOOP_TRUNCATION_MODES_P or truncated_to_mode. + + 2. a TRUNCATE. This form handles both scalar and compound integers. + + The first form is preferred where valid. However, the TRUNCATE + handling in simplify_unary_operation turns the second form into the + first form when TRULY_NOOP_TRUNCATION_MODES_P or truncated_to_mode allow, + so it is generally safe to form rvalue truncations using: + + simplify_gen_unary (TRUNCATE, ...) + + and leave simplify_unary_operation to work out which representation + should be used. + + Because of the proof requirements on (1), simplify_truncation must + also use simplify_gen_unary (TRUNCATE, ...) to truncate parts of OP, + regardless of whether the outer truncation came from a SUBREG or a + TRUNCATE. For example, if the caller has proven that an SImode + truncation of: + + (and:DI X Y) + + is a no-op and can be represented as a subreg, it does not follow + that SImode truncations of X and Y are also no-ops. On a target + like 64-bit MIPS that requires SImode values to be stored in + sign-extended form, an SImode truncation of: + + (and:DI (reg:DI X) (const_int 63)) + + is trivially a no-op because only the lower 6 bits can be set. + However, X is still an arbitrary 64-bit number and so we cannot + assume that truncating it too is a no-op. */ + +static rtx +simplify_truncation (enum machine_mode mode, rtx op, + enum machine_mode op_mode) +{ + unsigned int precision = GET_MODE_UNIT_PRECISION (mode); + unsigned int op_precision = GET_MODE_UNIT_PRECISION (op_mode); + gcc_assert (precision <= op_precision); + + /* Optimize truncations of zero and sign extended values. */ + if (GET_CODE (op) == ZERO_EXTEND + || GET_CODE (op) == SIGN_EXTEND) + { + /* There are three possibilities. If MODE is the same as the + origmode, we can omit both the extension and the subreg. + If MODE is not larger than the origmode, we can apply the + truncation without the extension. Finally, if the outermode + is larger than the origmode, we can just extend to the appropriate + mode. */ + enum machine_mode origmode = GET_MODE (XEXP (op, 0)); + if (mode == origmode) + return XEXP (op, 0); + else if (precision <= GET_MODE_UNIT_PRECISION (origmode)) + return simplify_gen_unary (TRUNCATE, mode, + XEXP (op, 0), origmode); + else + return simplify_gen_unary (GET_CODE (op), mode, + XEXP (op, 0), origmode); + } + + /* Simplify (truncate:SI (op:DI (x:DI) (y:DI))) + to (op:SI (truncate:SI (x:DI)) (truncate:SI (x:DI))). */ + if (GET_CODE (op) == PLUS + || GET_CODE (op) == MINUS + || GET_CODE (op) == MULT) + { + rtx op0 = simplify_gen_unary (TRUNCATE, mode, XEXP (op, 0), op_mode); + if (op0) + { + rtx op1 = simplify_gen_unary (TRUNCATE, mode, XEXP (op, 1), op_mode); + if (op1) + return simplify_gen_binary (GET_CODE (op), mode, op0, op1); + } + } + + /* Simplify (truncate:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C)) into + to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and + the outer subreg is effectively a truncation to the original mode. */ + if ((GET_CODE (op) == LSHIFTRT + || GET_CODE (op) == ASHIFTRT) + /* Ensure that OP_MODE is at least twice as wide as MODE + to avoid the possibility that an outer LSHIFTRT shifts by more + than the sign extension's sign_bit_copies and introduces zeros + into the high bits of the result. */ + && 2 * precision <= op_precision + && CONST_INT_P (XEXP (op, 1)) + && GET_CODE (XEXP (op, 0)) == SIGN_EXTEND + && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode + && INTVAL (XEXP (op, 1)) < precision) + return simplify_gen_binary (ASHIFTRT, mode, + XEXP (XEXP (op, 0), 0), XEXP (op, 1)); + + /* Likewise (truncate:QI (lshiftrt:SI (zero_extend:SI (x:QI)) C)) into + to (lshiftrt:QI (x:QI) C), where C is a suitable small constant and + the outer subreg is effectively a truncation to the original mode. */ + if ((GET_CODE (op) == LSHIFTRT + || GET_CODE (op) == ASHIFTRT) + && CONST_INT_P (XEXP (op, 1)) + && GET_CODE (XEXP (op, 0)) == ZERO_EXTEND + && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode + && INTVAL (XEXP (op, 1)) < precision) + return simplify_gen_binary (LSHIFTRT, mode, + XEXP (XEXP (op, 0), 0), XEXP (op, 1)); + + /* Likewise (truncate:QI (ashift:SI (zero_extend:SI (x:QI)) C)) into + to (ashift:QI (x:QI) C), where C is a suitable small constant and + the outer subreg is effectively a truncation to the original mode. */ + if (GET_CODE (op) == ASHIFT + && CONST_INT_P (XEXP (op, 1)) + && (GET_CODE (XEXP (op, 0)) == ZERO_EXTEND + || GET_CODE (XEXP (op, 0)) == SIGN_EXTEND) + && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode + && INTVAL (XEXP (op, 1)) < precision) + return simplify_gen_binary (ASHIFT, mode, + XEXP (XEXP (op, 0), 0), XEXP (op, 1)); + + /* Recognize a word extraction from a multi-word subreg. */ + if ((GET_CODE (op) == LSHIFTRT + || GET_CODE (op) == ASHIFTRT) + && SCALAR_INT_MODE_P (mode) + && SCALAR_INT_MODE_P (op_mode) + && precision >= BITS_PER_WORD + && 2 * precision <= op_precision + && CONST_INT_P (XEXP (op, 1)) + && (INTVAL (XEXP (op, 1)) & (precision - 1)) == 0 + && INTVAL (XEXP (op, 1)) >= 0 + && INTVAL (XEXP (op, 1)) < op_precision) + { + int byte = subreg_lowpart_offset (mode, op_mode); + int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT; + return simplify_gen_subreg (mode, XEXP (op, 0), op_mode, + (WORDS_BIG_ENDIAN + ? byte - shifted_bytes + : byte + shifted_bytes)); + } + + /* If we have a TRUNCATE of a right shift of MEM, make a new MEM + and try replacing the TRUNCATE and shift with it. Don't do this + if the MEM has a mode-dependent address. */ + if ((GET_CODE (op) == LSHIFTRT + || GET_CODE (op) == ASHIFTRT) + && SCALAR_INT_MODE_P (op_mode) + && MEM_P (XEXP (op, 0)) + && CONST_INT_P (XEXP (op, 1)) + && (INTVAL (XEXP (op, 1)) % GET_MODE_BITSIZE (mode)) == 0 + && INTVAL (XEXP (op, 1)) > 0 + && INTVAL (XEXP (op, 1)) < GET_MODE_BITSIZE (op_mode) + && ! mode_dependent_address_p (XEXP (XEXP (op, 0), 0), + MEM_ADDR_SPACE (XEXP (op, 0))) + && ! MEM_VOLATILE_P (XEXP (op, 0)) + && (GET_MODE_SIZE (mode) >= UNITS_PER_WORD + || WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN)) + { + int byte = subreg_lowpart_offset (mode, op_mode); + int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT; + return adjust_address_nv (XEXP (op, 0), mode, + (WORDS_BIG_ENDIAN + ? byte - shifted_bytes + : byte + shifted_bytes)); + } + + /* (truncate:SI (OP:DI ({sign,zero}_extend:DI foo:SI))) is + (OP:SI foo:SI) if OP is NEG or ABS. */ + if ((GET_CODE (op) == ABS + || GET_CODE (op) == NEG) + && (GET_CODE (XEXP (op, 0)) == SIGN_EXTEND + || GET_CODE (XEXP (op, 0)) == ZERO_EXTEND) + && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode) + return simplify_gen_unary (GET_CODE (op), mode, + XEXP (XEXP (op, 0), 0), mode); + + /* (truncate:A (subreg:B (truncate:C X) 0)) is + (truncate:A X). */ + if (GET_CODE (op) == SUBREG + && SCALAR_INT_MODE_P (mode) + && SCALAR_INT_MODE_P (op_mode) + && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op))) + && GET_CODE (SUBREG_REG (op)) == TRUNCATE + && subreg_lowpart_p (op)) + return simplify_gen_unary (TRUNCATE, mode, XEXP (SUBREG_REG (op), 0), + GET_MODE (XEXP (SUBREG_REG (op), 0))); + + /* (truncate:A (truncate:B X)) is (truncate:A X). */ + if (GET_CODE (op) == TRUNCATE) + return simplify_gen_unary (TRUNCATE, mode, XEXP (op, 0), + GET_MODE (XEXP (op, 0))); + + return NULL_RTX; +} +\f /* Try to simplify a unary operation CODE whose output mode is to be MODE with input operand OP whose mode was originally OP_MODE. Return zero if no simplification can be made. */ @@ -815,50 +1021,34 @@ simplify_unary_operation_1 (enum rtx_cod break; case TRUNCATE: - /* We can't handle truncation to a partial integer mode here - because we don't know the real bitsize of the partial - integer mode. */ - if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT) - break; - - /* (truncate:SI ({sign,zero}_extend:DI foo:SI)) == foo:SI. */ - if ((GET_CODE (op) == SIGN_EXTEND - || GET_CODE (op) == ZERO_EXTEND) - && GET_MODE (XEXP (op, 0)) == mode) - return XEXP (op, 0); + /* Don't optimize (lshiftrt (mult ...)) as it would interfere + with the umulXi3_highpart patterns. */ + if (GET_CODE (op) == LSHIFTRT + && GET_CODE (XEXP (op, 0)) == MULT) + break; - /* (truncate:SI (OP:DI ({sign,zero}_extend:DI foo:SI))) is - (OP:SI foo:SI) if OP is NEG or ABS. */ - if ((GET_CODE (op) == ABS - || GET_CODE (op) == NEG) - && (GET_CODE (XEXP (op, 0)) == SIGN_EXTEND - || GET_CODE (XEXP (op, 0)) == ZERO_EXTEND) - && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode) - return simplify_gen_unary (GET_CODE (op), mode, - XEXP (XEXP (op, 0), 0), mode); + if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT) + { + if (TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op))) + return rtl_hooks.gen_lowpart_no_emit (mode, op); + /* We can't handle truncation to a partial integer mode here + because we don't know the real bitsize of the partial + integer mode. */ + break; + } - /* (truncate:A (subreg:B (truncate:C X) 0)) is - (truncate:A X). */ - if (GET_CODE (op) == SUBREG - && GET_CODE (SUBREG_REG (op)) == TRUNCATE - && subreg_lowpart_p (op)) - return simplify_gen_unary (TRUNCATE, mode, XEXP (SUBREG_REG (op), 0), - GET_MODE (XEXP (SUBREG_REG (op), 0))); + if (GET_MODE (op) != VOIDmode) + { + temp = simplify_truncation (mode, op, GET_MODE (op)); + if (temp) + return temp; + } /* If we know that the value is already truncated, we can - replace the TRUNCATE with a SUBREG. Note that this is also - valid if TRULY_NOOP_TRUNCATION is false for the corresponding - modes we just have to apply a different definition for - truncation. But don't do this for an (LSHIFTRT (MULT ...)) - since this will cause problems with the umulXi3_highpart - patterns. */ - if ((TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op)) - ? (num_sign_bit_copies (op, GET_MODE (op)) - > (unsigned int) (GET_MODE_PRECISION (GET_MODE (op)) - - GET_MODE_PRECISION (mode))) - : truncated_to_mode (mode, op)) - && ! (GET_CODE (op) == LSHIFTRT - && GET_CODE (XEXP (op, 0)) == MULT)) + replace the TRUNCATE with a SUBREG. */ + if (GET_MODE_NUNITS (mode) == 1 + && (TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op)) + || truncated_to_mode (mode, op))) return rtl_hooks.gen_lowpart_no_emit (mode, op); /* A truncate of a comparison can be replaced with a subreg if @@ -5596,14 +5786,6 @@ simplify_subreg (enum machine_mode outer return NULL_RTX; } - /* Merge implicit and explicit truncations. */ - - if (GET_CODE (op) == TRUNCATE - && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode) - && subreg_lowpart_offset (outermode, innermode) == byte) - return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0), - GET_MODE (XEXP (op, 0))); - /* SUBREG of a hard register => just change the register number and/or mode. If the hard register is not valid in that mode, suppress this simplification. If the hard register is the stack, @@ -5689,160 +5871,23 @@ simplify_subreg (enum machine_mode outer return NULL_RTX; } - /* Optimize SUBREG truncations of zero and sign extended values. */ - if ((GET_CODE (op) == ZERO_EXTEND - || GET_CODE (op) == SIGN_EXTEND) - && SCALAR_INT_MODE_P (innermode) - && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)) + /* A SUBREG resulting from a zero extension may fold to zero if + it extracts higher bits that the ZERO_EXTEND's source bits. */ + if (GET_CODE (op) == ZERO_EXTEND) { unsigned int bitpos = subreg_lsb_1 (outermode, innermode, byte); - - /* If we're requesting the lowpart of a zero or sign extension, - there are three possibilities. If the outermode is the same - as the origmode, we can omit both the extension and the subreg. - If the outermode is not larger than the origmode, we can apply - the truncation without the extension. Finally, if the outermode - is larger than the origmode, but both are integer modes, we - can just extend to the appropriate mode. */ - if (bitpos == 0) - { - enum machine_mode origmode = GET_MODE (XEXP (op, 0)); - if (outermode == origmode) - return XEXP (op, 0); - if (GET_MODE_PRECISION (outermode) <= GET_MODE_PRECISION (origmode)) - return simplify_gen_subreg (outermode, XEXP (op, 0), origmode, - subreg_lowpart_offset (outermode, - origmode)); - if (SCALAR_INT_MODE_P (outermode)) - return simplify_gen_unary (GET_CODE (op), outermode, - XEXP (op, 0), origmode); - } - - /* A SUBREG resulting from a zero extension may fold to zero if - it extracts higher bits that the ZERO_EXTEND's source bits. */ - if (GET_CODE (op) == ZERO_EXTEND - && bitpos >= GET_MODE_PRECISION (GET_MODE (XEXP (op, 0)))) + if (bitpos >= GET_MODE_PRECISION (GET_MODE (XEXP (op, 0)))) return CONST0_RTX (outermode); } - /* Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) - to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)), where - the outer subreg is effectively a truncation to the original mode. */ - if ((GET_CODE (op) == PLUS - || GET_CODE (op) == MINUS - || GET_CODE (op) == MULT) - && SCALAR_INT_MODE_P (outermode) - && SCALAR_INT_MODE_P (innermode) - && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode) - && byte == subreg_lowpart_offset (outermode, innermode)) - { - rtx op0 = simplify_gen_subreg (outermode, XEXP (op, 0), - innermode, byte); - if (op0) - { - rtx op1 = simplify_gen_subreg (outermode, XEXP (op, 1), - innermode, byte); - if (op1) - return simplify_gen_binary (GET_CODE (op), outermode, op0, op1); - } - } - - /* Simplify (subreg:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C), 0) into - to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and - the outer subreg is effectively a truncation to the original mode. */ - if ((GET_CODE (op) == LSHIFTRT - || GET_CODE (op) == ASHIFTRT) - && SCALAR_INT_MODE_P (outermode) - && SCALAR_INT_MODE_P (innermode) - /* Ensure that OUTERMODE is at least twice as wide as the INNERMODE - to avoid the possibility that an outer LSHIFTRT shifts by more - than the sign extension's sign_bit_copies and introduces zeros - into the high bits of the result. */ - && (2 * GET_MODE_PRECISION (outermode)) <= GET_MODE_PRECISION (innermode) - && CONST_INT_P (XEXP (op, 1)) - && GET_CODE (XEXP (op, 0)) == SIGN_EXTEND - && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode - && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (outermode) - && subreg_lsb_1 (outermode, innermode, byte) == 0) - return simplify_gen_binary (ASHIFTRT, outermode, - XEXP (XEXP (op, 0), 0), XEXP (op, 1)); - - /* Likewise (subreg:QI (lshiftrt:SI (zero_extend:SI (x:QI)) C), 0) into - to (lshiftrt:QI (x:QI) C), where C is a suitable small constant and - the outer subreg is effectively a truncation to the original mode. */ - if ((GET_CODE (op) == LSHIFTRT - || GET_CODE (op) == ASHIFTRT) - && SCALAR_INT_MODE_P (outermode) - && SCALAR_INT_MODE_P (innermode) - && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode) - && CONST_INT_P (XEXP (op, 1)) - && GET_CODE (XEXP (op, 0)) == ZERO_EXTEND - && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode - && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (outermode) - && subreg_lsb_1 (outermode, innermode, byte) == 0) - return simplify_gen_binary (LSHIFTRT, outermode, - XEXP (XEXP (op, 0), 0), XEXP (op, 1)); - - /* Likewise (subreg:QI (ashift:SI (zero_extend:SI (x:QI)) C), 0) into - to (ashift:QI (x:QI) C), where C is a suitable small constant and - the outer subreg is effectively a truncation to the original mode. */ - if (GET_CODE (op) == ASHIFT - && SCALAR_INT_MODE_P (outermode) + if (SCALAR_INT_MODE_P (outermode) && SCALAR_INT_MODE_P (innermode) && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode) - && CONST_INT_P (XEXP (op, 1)) - && (GET_CODE (XEXP (op, 0)) == ZERO_EXTEND - || GET_CODE (XEXP (op, 0)) == SIGN_EXTEND) - && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode - && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (outermode) - && subreg_lsb_1 (outermode, innermode, byte) == 0) - return simplify_gen_binary (ASHIFT, outermode, - XEXP (XEXP (op, 0), 0), XEXP (op, 1)); - - /* Recognize a word extraction from a multi-word subreg. */ - if ((GET_CODE (op) == LSHIFTRT - || GET_CODE (op) == ASHIFTRT) - && SCALAR_INT_MODE_P (innermode) - && GET_MODE_PRECISION (outermode) >= BITS_PER_WORD - && GET_MODE_PRECISION (innermode) >= (2 * GET_MODE_PRECISION (outermode)) - && CONST_INT_P (XEXP (op, 1)) - && (INTVAL (XEXP (op, 1)) & (GET_MODE_PRECISION (outermode) - 1)) == 0 - && INTVAL (XEXP (op, 1)) >= 0 - && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (innermode) && byte == subreg_lowpart_offset (outermode, innermode)) { - int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT; - return simplify_gen_subreg (outermode, XEXP (op, 0), innermode, - (WORDS_BIG_ENDIAN - ? byte - shifted_bytes - : byte + shifted_bytes)); - } - - /* If we have a lowpart SUBREG of a right shift of MEM, make a new MEM - and try replacing the SUBREG and shift with it. Don't do this if - the MEM has a mode-dependent address or if we would be widening it. */ - - if ((GET_CODE (op) == LSHIFTRT - || GET_CODE (op) == ASHIFTRT) - && SCALAR_INT_MODE_P (innermode) - && MEM_P (XEXP (op, 0)) - && CONST_INT_P (XEXP (op, 1)) - && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (GET_MODE (op)) - && (INTVAL (XEXP (op, 1)) % GET_MODE_BITSIZE (outermode)) == 0 - && INTVAL (XEXP (op, 1)) > 0 - && INTVAL (XEXP (op, 1)) < GET_MODE_BITSIZE (innermode) - && ! mode_dependent_address_p (XEXP (XEXP (op, 0), 0), - MEM_ADDR_SPACE (XEXP (op, 0))) - && ! MEM_VOLATILE_P (XEXP (op, 0)) - && byte == subreg_lowpart_offset (outermode, innermode) - && (GET_MODE_SIZE (outermode) >= UNITS_PER_WORD - || WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN)) - { - int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT; - return adjust_address_nv (XEXP (op, 0), outermode, - (WORDS_BIG_ENDIAN - ? byte - shifted_bytes - : byte + shifted_bytes)); + rtx tem = simplify_truncation (outermode, op, innermode); + if (tem) + return tem; } return NULL_RTX; Index: gcc/config/mips/mips.c =================================================================== --- gcc/config/mips/mips.c 2012-10-06 11:22:55.888609330 +0100 +++ gcc/config/mips/mips.c 2012-10-06 16:32:25.240799783 +0100 @@ -3527,6 +3527,17 @@ mips_set_reg_reg_cost (enum machine_mode } } +/* Return the cost of an operand X that can be trucated for free. + SPEED says whether we're optimizing for size or speed. */ + +static int +mips_truncated_op_cost (rtx x, bool speed) +{ + if (GET_CODE (x) == TRUNCATE) + x = XEXP (x, 0); + return set_src_cost (x, speed); +} + /* Implement TARGET_RTX_COSTS. */ static bool @@ -3907,12 +3918,13 @@ mips_rtx_costs (rtx x, int code, int out case ZERO_EXTEND: if (outer_code == SET && ISA_HAS_BADDU - && (GET_CODE (XEXP (x, 0)) == TRUNCATE - || GET_CODE (XEXP (x, 0)) == SUBREG) && GET_MODE (XEXP (x, 0)) == QImode - && GET_CODE (XEXP (XEXP (x, 0), 0)) == PLUS) + && GET_CODE (XEXP (x, 0)) == PLUS) { - *total = set_src_cost (XEXP (XEXP (x, 0), 0), speed); + rtx plus = XEXP (x, 0); + *total = (COSTS_N_INSNS (1) + + mips_truncated_op_cost (XEXP (plus, 0), speed) + + mips_truncated_op_cost (XEXP (plus, 1), speed)); return true; } *total = mips_zero_extend_cost (mode, XEXP (x, 0)); Index: gcc/config/mips/mips.md =================================================================== --- gcc/config/mips/mips.md 2012-10-06 11:22:55.888609330 +0100 +++ gcc/config/mips/mips.md 2012-10-06 16:32:21.168800012 +0100 @@ -1305,9 +1305,8 @@ (define_insn "*baddu_si" (define_insn "*baddu_di<mode>" [(set (match_operand:GPR 0 "register_operand" "=d") (zero_extend:GPR - (truncate:QI - (plus:DI (match_operand:DI 1 "register_operand" "d") - (match_operand:DI 2 "register_operand" "d")))))] + (plus:QI (truncate:QI (match_operand:DI 1 "register_operand" "d")) + (truncate:QI (match_operand:DI 2 "register_operand" "d")))))] "ISA_HAS_BADDU && TARGET_64BIT" "baddu\\t%0,%1,%2" [(set_attr "alu_type" "add")]) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFA: Simplifying truncation and integer lowpart subregs 2012-10-07 8:56 ` Richard Sandiford @ 2012-10-07 12:36 ` Eric Botcazou 2012-11-28 2:27 ` Ramana Radhakrishnan 1 sibling, 0 replies; 23+ messages in thread From: Eric Botcazou @ 2012-10-07 12:36 UTC (permalink / raw) To: Richard Sandiford Cc: gcc-patches, Andrew Pinski, Uros Bizjak, Paul_Koning, kkojima, aoliva, dje.gcc, uweigand, walt > Yeah, in hindsight, the patch was definitely lacking commentary. > How about the patch below? It also fixes the partial int case > and gets rid of the errant NOT hunk. Tested in the same way as before. > > Richard > > > gcc/ > * machmode.h (GET_MODE_UNIT_PRECISION): New macro. > * simplify-rtx.c (simplify_truncation): New function, > extracted from simplify_subreg and (in small part) from > simplify_unary_operation_1. > (simplify_unary_operation_1) <TRUNCATE>: Use it. Remove sign bit > test for !TRULY_NOOP_TRUNCATION_MODES_P. > (simplify_subreg): Use simplify_truncate for lowpart subregs > where both the inner and outer modes are scalar integers. This looks good to me, thanks. -- Eric Botcazou ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFA: Simplifying truncation and integer lowpart subregs 2012-10-07 8:56 ` Richard Sandiford 2012-10-07 12:36 ` Eric Botcazou @ 2012-11-28 2:27 ` Ramana Radhakrishnan 2012-11-28 21:45 ` Richard Sandiford 1 sibling, 1 reply; 23+ messages in thread From: Ramana Radhakrishnan @ 2012-11-28 2:27 UTC (permalink / raw) To: Eric Botcazou, gcc-patches, Andrew Pinski, Uros Bizjak, Paul_Koning, kkojima, aoliva, dje.gcc, uweigand, walt, rdsandiford On Sun, Oct 7, 2012 at 8:56 AM, Richard Sandiford <rdsandiford@googlemail.com> wrote: > Eric Botcazou <ebotcazou@adacore.com> writes: >>> I think modelling it as a TRUNCATE operation is correct for >>> !TRULY_NOOP_TRUNCATION (it's the bug that Andrew pointed out). >>> And we shouldn't generate an actual TRUNCATE rtx for >>> TRULY_NOOP_TRUNCATION (the thing about making >>> simplify_gen_unary (TRUNCATE, ...) no worse than simplify_gen_subreg >>> for those targets). I suppose: >>> >>> /* We can't handle truncation to a partial integer mode here >>> because we don't know the real bitsize of the partial >>> integer mode. */ >>> if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT) >>> break; >>> >>> might be a problem though; we should still allow a subreg to be >>> generated. Is that what you were thinking of, or something else? >> >> I was thinking of the !TRULY_NOOP_TRUNCATION case, where the two operations >> aren't equivalent. Generating TRUNCATE in simplify_subreg seems suspicious to >> me in this case but, if not doing it is the source of the bug, I guess I need >> to do some homework on this TRULY_NOOP_TRUNCATION stuff. :-) >> >> Maybe add a blurb to the head comment of simplify_truncation, explaining that >> it is valid to call the function both for TRUNCATEs and truncations to the >> lowpart, and why it is correct to generate new TRUNCATEs in the latter case. > > Yeah, in hindsight, the patch was definitely lacking commentary. > How about the patch below? It also fixes the partial int case > and gets rid of the errant NOT hunk. Tested in the same way as before. > > Richard > > > gcc/ > * machmode.h (GET_MODE_UNIT_PRECISION): New macro. > * simplify-rtx.c (simplify_truncation): New function, > extracted from simplify_subreg and (in small part) from > simplify_unary_operation_1. > (simplify_unary_operation_1) <TRUNCATE>: Use it. Remove sign bit > test for !TRULY_NOOP_TRUNCATION_MODES_P. > (simplify_subreg): Use simplify_truncate for lowpart subregs > where both the inner and outer modes are scalar integers. > * config/mips/mips.c (mips_truncated_op_cost): New function. > (mips_rtx_costs): Adjust test for BADDU. > * config/mips/mips.md (*baddu_di<mode>): Push truncates to operands. This triggers PR55052 on ARM.I've attached the .i file and the dumps to the bug report. Thanks, Ramana ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: RFA: Simplifying truncation and integer lowpart subregs 2012-11-28 2:27 ` Ramana Radhakrishnan @ 2012-11-28 21:45 ` Richard Sandiford 0 siblings, 0 replies; 23+ messages in thread From: Richard Sandiford @ 2012-11-28 21:45 UTC (permalink / raw) To: ramrad01 Cc: Eric Botcazou, gcc-patches, Andrew Pinski, Uros Bizjak, Paul_Koning, kkojima, aoliva, dje.gcc, uweigand, walt Ramana Radhakrishnan <ramana.gcc@googlemail.com> writes: > On Sun, Oct 7, 2012 at 8:56 AM, Richard Sandiford > <rdsandiford@googlemail.com> wrote: >> Eric Botcazou <ebotcazou@adacore.com> writes: >>>> I think modelling it as a TRUNCATE operation is correct for >>>> !TRULY_NOOP_TRUNCATION (it's the bug that Andrew pointed out). >>>> And we shouldn't generate an actual TRUNCATE rtx for >>>> TRULY_NOOP_TRUNCATION (the thing about making >>>> simplify_gen_unary (TRUNCATE, ...) no worse than simplify_gen_subreg >>>> for those targets). I suppose: >>>> >>>> /* We can't handle truncation to a partial integer mode here >>>> because we don't know the real bitsize of the partial >>>> integer mode. */ >>>> if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT) >>>> break; >>>> >>>> might be a problem though; we should still allow a subreg to be >>>> generated. Is that what you were thinking of, or something else? >>> >>> I was thinking of the !TRULY_NOOP_TRUNCATION case, where the two operations >>> aren't equivalent. Generating TRUNCATE in simplify_subreg seems >>> suspicious to >>> me in this case but, if not doing it is the source of the bug, I guess I need >>> to do some homework on this TRULY_NOOP_TRUNCATION stuff. :-) >>> >>> Maybe add a blurb to the head comment of simplify_truncation, explaining that >>> it is valid to call the function both for TRUNCATEs and truncations to the >>> lowpart, and why it is correct to generate new TRUNCATEs in the latter case. >> >> Yeah, in hindsight, the patch was definitely lacking commentary. >> How about the patch below? It also fixes the partial int case >> and gets rid of the errant NOT hunk. Tested in the same way as before. >> >> Richard >> >> >> gcc/ >> * machmode.h (GET_MODE_UNIT_PRECISION): New macro. >> * simplify-rtx.c (simplify_truncation): New function, >> extracted from simplify_subreg and (in small part) from >> simplify_unary_operation_1. >> (simplify_unary_operation_1) <TRUNCATE>: Use it. Remove sign bit >> test for !TRULY_NOOP_TRUNCATION_MODES_P. >> (simplify_subreg): Use simplify_truncate for lowpart subregs >> where both the inner and outer modes are scalar integers. >> * config/mips/mips.c (mips_truncated_op_cost): New function. >> (mips_rtx_costs): Adjust test for BADDU. >> * config/mips/mips.md (*baddu_di<mode>): Push truncates to operands. > > This triggers PR55052 on ARM.I've attached the .i file and the dumps > to the bug report. Thanks. I'd managed to drop a SCALAR_INT_MODE_P check when splitting the ZERO_EXTEND handling into two. This patch reinstates the check. Tested on x86_64-linux-gnu and applied as obvious. Richard gcc/ PR rtl-optimization/55052 * simplify-rtx.c (simplify_subreg): Restore SCALAR_INT_MODE_P check. Index: gcc/simplify-rtx.c =================================================================== --- gcc/simplify-rtx.c 2012-11-27 18:52:29.000000000 +0000 +++ gcc/simplify-rtx.c 2012-11-28 19:54:30.500525576 +0000 @@ -5875,7 +5875,7 @@ simplify_subreg (enum machine_mode outer /* A SUBREG resulting from a zero extension may fold to zero if it extracts higher bits that the ZERO_EXTEND's source bits. */ - if (GET_CODE (op) == ZERO_EXTEND) + if (GET_CODE (op) == ZERO_EXTEND && SCALAR_INT_MODE_P (innermode)) { unsigned int bitpos = subreg_lsb_1 (outermode, innermode, byte); if (bitpos >= GET_MODE_PRECISION (GET_MODE (XEXP (op, 0))) ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant 2012-09-27 18:20 ` [PATCH v2, " Uros Bizjak 2012-09-27 18:35 ` Paul_Koning @ 2012-09-27 20:33 ` Jakub Jelinek 2012-09-27 22:37 ` Richard Sandiford 2012-09-28 15:39 ` Uros Bizjak 1 sibling, 2 replies; 23+ messages in thread From: Jakub Jelinek @ 2012-09-27 20:33 UTC (permalink / raw) To: Uros Bizjak; +Cc: gcc-patches, Richard Sandiford, Eric Botcazou On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote: > After some off-line discussion with Richard, attached is v2 of the patch. > > 2012-09-27 Uros Bizjak <ubizjak@gmail.com> > > PR rtl-optimization/54457 > * simplify-rtx.c (simplify_subreg): > Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) > to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)). Is that a good idea even for WORD_REGISTER_OPERATIONS targets? > --- simplify-rtx.c (revision 191808) > +++ simplify-rtx.c (working copy) > @@ -5689,6 +5689,28 @@ simplify_subreg (enum machine_mode outermode, rtx > return CONST0_RTX (outermode); > } > > + /* Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) > + to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)), where > + the outer subreg is effectively a truncation to the original mode. */ > + if ((GET_CODE (op) == PLUS > + || GET_CODE (op) == MINUS > + || GET_CODE (op) == MULT) > + && SCALAR_INT_MODE_P (outermode) > + && SCALAR_INT_MODE_P (innermode) > + && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode) > + && byte == subreg_lowpart_offset (outermode, innermode)) > + { > + rtx op0 = simplify_gen_subreg (outermode, XEXP (op, 0), > + innermode, byte); > + if (op0) > + { > + rtx op1 = simplify_gen_subreg (outermode, XEXP (op, 1), > + innermode, byte); > + if (op1) > + return simplify_gen_binary (GET_CODE (op), outermode, op0, op1); > + } > + } > + > /* Simplify (subreg:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C), 0) into > to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and > the outer subreg is effectively a truncation to the original mode. */ Jakub ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant 2012-09-27 20:33 ` [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant Jakub Jelinek @ 2012-09-27 22:37 ` Richard Sandiford 2012-09-28 15:39 ` Uros Bizjak 1 sibling, 0 replies; 23+ messages in thread From: Richard Sandiford @ 2012-09-27 22:37 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Uros Bizjak, gcc-patches, Eric Botcazou Jakub Jelinek <jakub@redhat.com> writes: > On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote: >> After some off-line discussion with Richard, attached is v2 of the patch. >> >> 2012-09-27 Uros Bizjak <ubizjak@gmail.com> >> >> PR rtl-optimization/54457 >> * simplify-rtx.c (simplify_subreg): >> Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) >> to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)). > > Is that a good idea even for WORD_REGISTER_OPERATIONS targets? I think so. Admittedly it means I'm going to have to change the mips.md BADDU patterns from: (define_insn "*baddu_si_el" [(set (match_operand:SI 0 "register_operand" "=d") (zero_extend:SI (subreg:QI (plus:SI (match_operand:SI 1 "register_operand" "d") (match_operand:SI 2 "register_operand" "d")) 0)))] "ISA_HAS_BADDU && !BYTES_BIG_ENDIAN" "baddu\\t%0,%1,%2" [(set_attr "alu_type" "add")]) to: (define_insn "*baddu_si_el" [(set (match_operand:SI 0 "register_operand" "=d") (zero_extend:SI (plus:QI (match_operand:QI 1 "register_operand" "d") (match_operand:QI 2 "register_operand" "d"))))] "ISA_HAS_BADDU" "baddu\\t%0,%1,%2" [(set_attr "alu_type" "add")]) But really, that's a better representation even on MIPS, not least because it gets rid of the ugly endianness condition. There will probably be fallout on other targets too. But the upside is that we get rid of more subregs from .mds. I think a few of the i386.md define_peephole2s could go too. E.g. the second two of: (define_peephole2 [(set (match_operand:DI 0 "register_operand") (zero_extend:DI (plus:SI (match_operand:SI 1 "register_operand") (match_operand:SI 2 "nonmemory_operand"))))] "TARGET_64BIT && !TARGET_OPT_AGU && REGNO (operands[0]) == REGNO (operands[1]) && peep2_regno_dead_p (0, FLAGS_REG)" [(parallel [(set (match_dup 0) (zero_extend:DI (plus:SI (match_dup 1) (match_dup 2)))) (clobber (reg:CC FLAGS_REG))])]) (define_peephole2 [(set (match_operand:DI 0 "register_operand") (zero_extend:DI (plus:SI (match_operand:SI 1 "nonmemory_operand") (match_operand:SI 2 "register_operand"))))] "TARGET_64BIT && !TARGET_OPT_AGU && REGNO (operands[0]) == REGNO (operands[2]) && peep2_regno_dead_p (0, FLAGS_REG)" [(parallel [(set (match_dup 0) (zero_extend:DI (plus:SI (match_dup 2) (match_dup 1)))) (clobber (reg:CC FLAGS_REG))])]) (define_peephole2 [(set (match_operand:DI 0 "register_operand") (zero_extend:DI (subreg:SI (plus:DI (match_dup 0) (match_operand:DI 1 "nonmemory_operand")) 0)))] "TARGET_64BIT && !TARGET_OPT_AGU && peep2_regno_dead_p (0, FLAGS_REG)" [(parallel [(set (match_dup 0) (zero_extend:DI (plus:SI (match_dup 2) (match_dup 1)))) (clobber (reg:CC FLAGS_REG))])] { operands[1] = gen_lowpart (SImode, operands[1]); operands[2] = gen_lowpart (SImode, operands[0]); }) (define_peephole2 [(set (match_operand:DI 0 "register_operand") (zero_extend:DI (subreg:SI (plus:DI (match_operand:DI 1 "nonmemory_operand") (match_dup 0)) 0)))] "TARGET_64BIT && !TARGET_OPT_AGU && peep2_regno_dead_p (0, FLAGS_REG)" [(parallel [(set (match_dup 0) (zero_extend:DI (plus:SI (match_dup 2) (match_dup 1)))) (clobber (reg:CC FLAGS_REG))])] { operands[1] = gen_lowpart (SImode, operands[1]); operands[2] = gen_lowpart (SImode, operands[0]); }) where we should now always generate the first two forms. There's no semantic difference between the two rtxes, and I think it would be confusing to have different canonical forms on different targets. If the caller really wants a word-mode operation on WORD_REGISTER_OPERATIONS targets, then I think it's asking for the wrong thing by taking this subreg. Richard ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant 2012-09-27 20:33 ` [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant Jakub Jelinek 2012-09-27 22:37 ` Richard Sandiford @ 2012-09-28 15:39 ` Uros Bizjak 2012-09-30 11:40 ` Richard Sandiford 1 sibling, 1 reply; 23+ messages in thread From: Uros Bizjak @ 2012-09-28 15:39 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc-patches, Richard Sandiford, Eric Botcazou On Thu, Sep 27, 2012 at 8:20 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote: >> After some off-line discussion with Richard, attached is v2 of the patch. >> >> 2012-09-27 Uros Bizjak <ubizjak@gmail.com> >> >> PR rtl-optimization/54457 >> * simplify-rtx.c (simplify_subreg): >> Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) >> to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)). > > Is that a good idea even for WORD_REGISTER_OPERATIONS targets? I have bootstrapped and regtested [1] the patch on alphaev68-pc-linux-gnu, a WORD_REGISTER_OPERATIONS target, and there were no additional failures. [1] http://gcc.gnu.org/ml/gcc-testresults/2012-09/msg02828.html Uros. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant 2012-09-28 15:39 ` Uros Bizjak @ 2012-09-30 11:40 ` Richard Sandiford 2012-10-03 11:07 ` Paolo Bonzini 0 siblings, 1 reply; 23+ messages in thread From: Richard Sandiford @ 2012-09-30 11:40 UTC (permalink / raw) To: Uros Bizjak; +Cc: Jakub Jelinek, gcc-patches, Eric Botcazou Uros Bizjak <ubizjak@gmail.com> writes: > On Thu, Sep 27, 2012 at 8:20 PM, Jakub Jelinek <jakub@redhat.com> wrote: >> On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote: >>> After some off-line discussion with Richard, attached is v2 of the patch. >>> >>> 2012-09-27 Uros Bizjak <ubizjak@gmail.com> >>> >>> PR rtl-optimization/54457 >>> * simplify-rtx.c (simplify_subreg): >>> Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) >>> to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)). >> >> Is that a good idea even for WORD_REGISTER_OPERATIONS targets? > > I have bootstrapped and regtested [1] the patch on > alphaev68-pc-linux-gnu, a WORD_REGISTER_OPERATIONS target, and there > were no additional failures. Thanks. Given Jakub's question/concern, I'd really prefer a third opinion rather than approving it myself, but... OK if no-one objects within 24hrs. Richard ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant 2012-09-30 11:40 ` Richard Sandiford @ 2012-10-03 11:07 ` Paolo Bonzini 0 siblings, 0 replies; 23+ messages in thread From: Paolo Bonzini @ 2012-10-03 11:07 UTC (permalink / raw) To: Uros Bizjak, Jakub Jelinek, gcc-patches, Eric Botcazou, rdsandiford [-- Attachment #1: Type: text/plain, Size: 1226 bytes --] Il 30/09/2012 11:02, Richard Sandiford ha scritto: > Uros Bizjak <ubizjak@gmail.com> writes: >> On Thu, Sep 27, 2012 at 8:20 PM, Jakub Jelinek <jakub@redhat.com> wrote: >>> On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote: >>>> After some off-line discussion with Richard, attached is v2 of the patch. >>>> >>>> 2012-09-27 Uros Bizjak <ubizjak@gmail.com> >>>> >>>> PR rtl-optimization/54457 >>>> * simplify-rtx.c (simplify_subreg): >>>> Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) >>>> to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)). >>> >>> Is that a good idea even for WORD_REGISTER_OPERATIONS targets? >> >> I have bootstrapped and regtested [1] the patch on >> alphaev68-pc-linux-gnu, a WORD_REGISTER_OPERATIONS target, and there >> were no additional failures. > > Thanks. Given Jakub's question/concern, I'd really prefer a third > opinion rather than approving it myself, but... OK if no-one objects > within 24hrs. I used to have a patch doing roughly the same thing (for more operations but not MULT). I never submitted because I didn't have the time to audit all targets after changing the canonicalization. Perhaps you can take the best of both worlds. Paolo [-- Attachment #2: pr39726.patch --] [-- Type: text/x-patch, Size: 5830 bytes --] change canonicalization Index: gcc/Makefile.in =================================================================== --- gcc/Makefile.in (branch pr39726) +++ gcc/Makefile.in (working copy) @@ -2844,7 +2844,7 @@ jump.o : jump.c $(CONFIG_H) $(SYSTEM_H) simplify-rtx.o : simplify-rtx.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ $(RTL_H) $(REGS_H) hard-reg-set.h $(FLAGS_H) $(REAL_H) insn-config.h \ $(RECOG_H) $(EXPR_H) $(TOPLEV_H) output.h $(FUNCTION_H) $(GGC_H) $(TM_P_H) \ - $(TREE_H) $(TARGET_H) + $(TREE_H) $(TARGET_H) $(OPTABS_H) cgraph.o : cgraph.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \ langhooks.h $(TOPLEV_H) $(FLAGS_H) $(GGC_H) $(TARGET_H) $(CGRAPH_H) \ gt-cgraph.h output.h intl.h $(BASIC_BLOCK_H) debug.h $(HASHTAB_H) \ Index: gcc/simplify-rtx.c =================================================================== --- gcc/simplify-rtx.c (branch pr39726) +++ gcc/simplify-rtx.c (working copy) @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. #include "output.h" #include "ggc.h" #include "target.h" +#include "optabs.h" /* Simplification and canonicalization of RTL. */ @@ -5191,6 +5192,8 @@ rtx simplify_subreg (enum machine_mode outermode, rtx op, enum machine_mode innermode, unsigned int byte) { + int is_lowpart; + /* Little bit of sanity checking. */ gcc_assert (innermode != VOIDmode); gcc_assert (outermode != VOIDmode); @@ -5300,11 +5303,13 @@ simplify_subreg (enum machine_mode outer return NULL_RTX; } + is_lowpart = subreg_lowpart_offset (outermode, innermode) == byte; + /* Merge implicit and explicit truncations. */ if (GET_CODE (op) == TRUNCATE && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode) - && subreg_lowpart_offset (outermode, innermode) == byte) + && is_lowpart) return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0), GET_MODE (XEXP (op, 0))); @@ -5343,7 +5348,7 @@ simplify_subreg (enum machine_mode outer The information is used only by alias analysis that can not grog partial register anyway. */ - if (subreg_lowpart_offset (outermode, innermode) == byte) + if (is_lowpart) ORIGINAL_REGNO (x) = ORIGINAL_REGNO (op); return x; } @@ -5393,6 +5398,51 @@ simplify_subreg (enum machine_mode outer return NULL_RTX; } + /* Try to move a subreg inside an arithmetic operation. */ + if (is_lowpart && ARITHMETIC_P (op) + && GET_MODE_CLASS (outermode) == MODE_INT + && GET_MODE_CLASS (innermode) == MODE_INT) + { + enum insn_code ic; + enum machine_mode cnt_mode; + switch (GET_CODE (op)) + { + case ABS: + case NEG: + return simplify_gen_unary (GET_CODE (op), outermode, + rtl_hooks.gen_lowpart_no_emit + (outermode, XEXP (op, 0)), + outermode); + + case ASHIFT: + /* i386 always uses QImode for the shift count. Get the + appropriate mode from the optab. */ + ic = ashl_optab->handlers[outermode].insn_code; + if (ic != CODE_FOR_nothing) + cnt_mode = insn_data[ic].operand[2].mode; + else + cnt_mode = outermode; + return simplify_gen_binary (GET_CODE (op), outermode, + rtl_hooks.gen_lowpart_no_emit + (outermode, XEXP (op, 0)), + rtl_hooks.gen_lowpart_no_emit + (cnt_mode, XEXP (op, 1))); + + case PLUS: + case MINUS: + case AND: + case IOR: + case XOR: + return simplify_gen_binary (GET_CODE (op), outermode, + rtl_hooks.gen_lowpart_no_emit + (outermode, XEXP (op, 0)), + rtl_hooks.gen_lowpart_no_emit + (outermode, XEXP (op, 1))); + default: + break; + } + } + /* Optimize SUBREG truncations of zero and sign extended values. */ if ((GET_CODE (op) == ZERO_EXTEND || GET_CODE (op) == SIGN_EXTEND) Index: gcc/combine.c =================================================================== --- gcc/combine.c (branch pr39726) +++ gcc/combine.c (working copy) @@ -11155,47 +11155,6 @@ simplify_comparison (enum rtx_code code, continue; } - /* If this is (and:M1 (subreg:M2 X 0) (const_int C1)) where C1 - fits in both M1 and M2 and the SUBREG is either paradoxical - or represents the low part, permute the SUBREG and the AND - and try again. */ - if (GET_CODE (XEXP (op0, 0)) == SUBREG) - { - unsigned HOST_WIDE_INT c1; - tmode = GET_MODE (SUBREG_REG (XEXP (op0, 0))); - /* Require an integral mode, to avoid creating something like - (AND:SF ...). */ - if (SCALAR_INT_MODE_P (tmode) - /* It is unsafe to commute the AND into the SUBREG if the - SUBREG is paradoxical and WORD_REGISTER_OPERATIONS is - not defined. As originally written the upper bits - have a defined value due to the AND operation. - However, if we commute the AND inside the SUBREG then - they no longer have defined values and the meaning of - the code has been changed. */ - && (0 -#ifdef WORD_REGISTER_OPERATIONS - || (mode_width > GET_MODE_BITSIZE (tmode) - && mode_width <= BITS_PER_WORD) -#endif - || (mode_width <= GET_MODE_BITSIZE (tmode) - && subreg_lowpart_p (XEXP (op0, 0)))) - && CONST_INT_P (XEXP (op0, 1)) - && mode_width <= HOST_BITS_PER_WIDE_INT - && GET_MODE_BITSIZE (tmode) <= HOST_BITS_PER_WIDE_INT - && ((c1 = INTVAL (XEXP (op0, 1))) & ~mask) == 0 - && (c1 & ~GET_MODE_MASK (tmode)) == 0 - && c1 != mask - && c1 != GET_MODE_MASK (tmode)) - { - op0 = simplify_gen_binary (AND, tmode, - SUBREG_REG (XEXP (op0, 0)), - gen_int_mode (c1, tmode)); - op0 = gen_lowpart (mode, op0); - continue; - } - } - /* Convert (ne (and (not X) 1) 0) to (eq (and X 1) 0). */ if (const_op == 0 && equality_comparison_p && XEXP (op0, 1) == const1_rtx ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2012-11-28 21:45 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-09-25 8:04 [PATCH, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant Uros Bizjak 2012-09-26 18:17 ` Richard Sandiford 2012-09-26 21:38 ` Eric Botcazou 2012-09-27 14:25 ` Uros Bizjak 2012-09-27 16:10 ` Richard Sandiford 2012-09-27 18:20 ` [PATCH v2, " Uros Bizjak 2012-09-27 18:35 ` Paul_Koning 2012-09-27 19:21 ` Uros Bizjak 2012-10-02 2:13 ` Andrew Pinski 2012-10-02 19:32 ` Richard Sandiford 2012-10-06 10:22 ` RFA: Simplifying truncation and integer lowpart subregs Richard Sandiford 2012-10-06 11:13 ` Eric Botcazou 2012-10-06 12:39 ` Richard Sandiford 2012-10-06 13:05 ` Eric Botcazou 2012-10-07 8:56 ` Richard Sandiford 2012-10-07 12:36 ` Eric Botcazou 2012-11-28 2:27 ` Ramana Radhakrishnan 2012-11-28 21:45 ` Richard Sandiford 2012-09-27 20:33 ` [PATCH v2, rtl-optimization]: Fix PR54457, [x32] Fail to combine 64bit index + constant Jakub Jelinek 2012-09-27 22:37 ` Richard Sandiford 2012-09-28 15:39 ` Uros Bizjak 2012-09-30 11:40 ` Richard Sandiford 2012-10-03 11:07 ` Paolo Bonzini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).