* [PATCH, ARM] Keep constants in register when expanding @ 2014-08-05 9:31 Zhenqiang Chen 2014-08-08 15:22 ` Ramana Radhakrishnan 2014-08-12 9:24 ` Richard Earnshaw 0 siblings, 2 replies; 7+ messages in thread From: Zhenqiang Chen @ 2014-08-05 9:31 UTC (permalink / raw) To: gcc-patches [-- Attachment #1: Type: text/plain, Size: 1898 bytes --] Hi, For some large constants, ARM will split them during expanding, which makes impossible to hoist them out the loop or shared by different references (refer the test case in the patch). The patch keeps some constants in registers. If the constant can not be optimized, the cprop and combine passes can optimize them as what we do in current expand pass with define_insn_and_split "*arm_subsi3_insn" define_insn_and_split "*arm_andsi3_insn" define_insn_and_split "*iorsi3_insn" define_insn_and_split "*arm_xorsi3" The patch does not modify addsi3 since the define_insn_and_split "*arm_addsi3" is only valid when (reload_completed || !arm_eliminable_register (operands[1])). The cprop and combine passes can not optimize the large constant if we put it in register, which will lead to regression. For logic operators, the patch skips changes for constants: INTVAL (operands[2]) < 0 && const_ok_for_arm (-INTVAL (operands[2]) since expand pass always uses "sign-extend" to get the value (trunc_int_for_mode called from immed_wide_int_const) for rtl, and logs show most negative values are UNSIGNED when they are TREE node. And combine pass is smart enough to recover the negative value to positive value. Bootstrap and no make check regression on Chrome book. For coremark, dhrystone and eembcv1, no any code size and performance change on Cortex-M4. No any file in CSiBE has code size change for Cortex-A15 and Cortex-M4. No Spec2000 performance regression on Chrome book and dumped assemble codes only show very few difference. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-08-05 Zhenqiang Chen <zhenqiang.chen@linaro.org> * config/arm/arm.md (subsi3, andsi3, iorsi3, xorsi3): Keep some large constants in register other than split them. testsuite/ChangeLog: 2014-08-05 Zhenqiang Chen <zhenqiang.chen@linaro.org> * gcc.target/arm/maskdata.c: New test. [-- Attachment #2: skip-split.patch --] [-- Type: text/x-patch, Size: 3704 bytes --] diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index bd8ea8f..c8b3001 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -1162,9 +1162,16 @@ { if (TARGET_32BIT) { - arm_split_constant (MINUS, SImode, NULL_RTX, - INTVAL (operands[1]), operands[0], - operands[2], optimize && can_create_pseudo_p ()); + if (!const_ok_for_arm (INTVAL (operands[1])) + && can_create_pseudo_p ()) + { + operands[1] = force_reg (SImode, operands[1]); + emit_insn (gen_subsi3 (operands[0], operands[1], operands[2])); + } + else + arm_split_constant (MINUS, SImode, NULL_RTX, + INTVAL (operands[1]), operands[0], operands[2], + optimize && can_create_pseudo_p ()); DONE; } else /* TARGET_THUMB1 */ @@ -2077,6 +2084,17 @@ emit_insn (gen_thumb2_zero_extendqisi2_v6 (operands[0], operands[1])); } + else if (!(const_ok_for_arm (INTVAL (operands[2])) + || const_ok_for_arm (~INTVAL (operands[2])) + /* zero_extendhi instruction is efficient enough. */ + || INTVAL (operands[2]) == 0xffff + || (INTVAL (operands[2]) < 0 + && const_ok_for_arm (-INTVAL (operands[2])))) + && can_create_pseudo_p ()) + { + operands[2] = force_reg (SImode, operands[2]); + emit_insn (gen_andsi3 (operands[0], operands[1], operands[2])); + } else arm_split_constant (AND, SImode, NULL_RTX, INTVAL (operands[2]), operands[0], @@ -2882,9 +2900,20 @@ { if (TARGET_32BIT) { - arm_split_constant (IOR, SImode, NULL_RTX, - INTVAL (operands[2]), operands[0], operands[1], - optimize && can_create_pseudo_p ()); + if (!(const_ok_for_arm (INTVAL (operands[2])) + || (TARGET_THUMB2 + && const_ok_for_arm (~INTVAL (operands[2]))) + || (INTVAL (operands[2]) < 0 + && const_ok_for_arm (-INTVAL (operands[2])))) + && can_create_pseudo_p ()) + { + operands[2] = force_reg (SImode, operands[2]); + emit_insn (gen_iorsi3 (operands[0], operands[1], operands[2])); + } + else + arm_split_constant (IOR, SImode, NULL_RTX, + INTVAL (operands[2]), operands[0], operands[1], + optimize && can_create_pseudo_p ()); DONE; } else /* TARGET_THUMB1 */ @@ -3052,9 +3081,18 @@ { if (TARGET_32BIT) { - arm_split_constant (XOR, SImode, NULL_RTX, - INTVAL (operands[2]), operands[0], operands[1], - optimize && can_create_pseudo_p ()); + if (!(const_ok_for_arm (INTVAL (operands[2])) + || (INTVAL (operands[2]) < 0 + && const_ok_for_arm (-INTVAL (operands[2])))) + && can_create_pseudo_p ()) + { + operands[2] = force_reg (SImode, operands[2]); + emit_insn (gen_xorsi3 (operands[0], operands[1], operands[2])); + } + else + arm_split_constant (XOR, SImode, NULL_RTX, + INTVAL (operands[2]), operands[0], operands[1], + optimize && can_create_pseudo_p ()); DONE; } else /* TARGET_THUMB1 */ diff --git a/gcc/testsuite/gcc.target/arm/maskdata.c b/gcc/testsuite/gcc.target/arm/maskdata.c new file mode 100644 index 0000000..b4231a4 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/maskdata.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options " -O2 -fno-gcse " } */ +/* { dg-require-effective-target arm_thumb2_ok } */ + +#define MASK 0xfe00ff +void maskdata (int * data, int len) +{ + int i = len; + for (; i > 0; i -= 2) + { + data[i] &= MASK; + data[i + 1] &= MASK; + } +} +/* { dg-final { scan-assembler "254" } } */ +/* { dg-final { scan-assembler "255" } } */ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH, ARM] Keep constants in register when expanding 2014-08-05 9:31 [PATCH, ARM] Keep constants in register when expanding Zhenqiang Chen @ 2014-08-08 15:22 ` Ramana Radhakrishnan 2014-08-11 2:35 ` Zhenqiang Chen 2014-08-12 9:24 ` Richard Earnshaw 1 sibling, 1 reply; 7+ messages in thread From: Ramana Radhakrishnan @ 2014-08-08 15:22 UTC (permalink / raw) To: Zhenqiang Chen; +Cc: gcc-patches On Tue, Aug 5, 2014 at 10:31 AM, Zhenqiang Chen <zhenqiang.chen@linaro.org> wrote: > Hi, > > For some large constants, ARM will split them during expanding, which > makes impossible to hoist them out the loop or shared by different > references (refer the test case in the patch). > > The patch keeps some constants in registers. If the constant can not > be optimized, the cprop and combine passes can optimize them as what > we do in current expand pass with > > define_insn_and_split "*arm_subsi3_insn" > define_insn_and_split "*arm_andsi3_insn" > define_insn_and_split "*iorsi3_insn" > define_insn_and_split "*arm_xorsi3" > > The patch does not modify addsi3 since the define_insn_and_split > "*arm_addsi3" is only valid when (reload_completed || > !arm_eliminable_register (operands[1])). The cprop and combine passes > can not optimize the large constant if we put it in register, which > will lead to regression. > > For logic operators, the patch skips changes for constants: > > INTVAL (operands[2]) < 0 && const_ok_for_arm (-INTVAL (operands[2]) > > since expand pass always uses "sign-extend" to get the value > (trunc_int_for_mode called from immed_wide_int_const) for rtl, and > logs show most negative values are UNSIGNED when they are TREE node. > And combine pass is smart enough to recover the negative value to > positive value. I am unable to verify any change in code generation for this testcase with and without the patch when I had a play with the patch. what gives ? Ramana > > Bootstrap and no make check regression on Chrome book. > For coremark, dhrystone and eembcv1, no any code size and performance > change on Cortex-M4. > No any file in CSiBE has code size change for Cortex-A15 and Cortex-M4. > No Spec2000 performance regression on Chrome book and dumped assemble > codes only show very few difference. > > OK for trunk? > > Thanks! > -Zhenqiang > > ChangeLog: > 2014-08-05 Zhenqiang Chen <zhenqiang.chen@linaro.org> > > * config/arm/arm.md (subsi3, andsi3, iorsi3, xorsi3): Keep some > large constants in register other than split them. > > testsuite/ChangeLog: > 2014-08-05 Zhenqiang Chen <zhenqiang.chen@linaro.org> > > * gcc.target/arm/maskdata.c: New test. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH, ARM] Keep constants in register when expanding 2014-08-08 15:22 ` Ramana Radhakrishnan @ 2014-08-11 2:35 ` Zhenqiang Chen 2014-08-11 11:15 ` Ramana Radhakrishnan 0 siblings, 1 reply; 7+ messages in thread From: Zhenqiang Chen @ 2014-08-11 2:35 UTC (permalink / raw) To: Ramana Radhakrishnan; +Cc: gcc-patches [-- Attachment #1: Type: text/plain, Size: 2991 bytes --] On 8 August 2014 23:22, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote: > On Tue, Aug 5, 2014 at 10:31 AM, Zhenqiang Chen > <zhenqiang.chen@linaro.org> wrote: >> Hi, >> >> For some large constants, ARM will split them during expanding, which >> makes impossible to hoist them out the loop or shared by different >> references (refer the test case in the patch). >> >> The patch keeps some constants in registers. If the constant can not >> be optimized, the cprop and combine passes can optimize them as what >> we do in current expand pass with >> >> define_insn_and_split "*arm_subsi3_insn" >> define_insn_and_split "*arm_andsi3_insn" >> define_insn_and_split "*iorsi3_insn" >> define_insn_and_split "*arm_xorsi3" >> >> The patch does not modify addsi3 since the define_insn_and_split >> "*arm_addsi3" is only valid when (reload_completed || >> !arm_eliminable_register (operands[1])). The cprop and combine passes >> can not optimize the large constant if we put it in register, which >> will lead to regression. >> >> For logic operators, the patch skips changes for constants: >> >> INTVAL (operands[2]) < 0 && const_ok_for_arm (-INTVAL (operands[2]) >> >> since expand pass always uses "sign-extend" to get the value >> (trunc_int_for_mode called from immed_wide_int_const) for rtl, and >> logs show most negative values are UNSIGNED when they are TREE node. >> And combine pass is smart enough to recover the negative value to >> positive value. > > I am unable to verify any change in code generation for this testcase > with and without the patch when I had a play with the patch. > > what gives ? Thanks for trying the patch. Do you add option -fno-gcse which is mentioned in dg-options " -O2 -fno-gcse "? Without it, there is no change for the testcase since cprop pass will propagate the constant to AND expr (A patch to enhance cprop pass was discussed at https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01321.html). In addition, if the constant can not be hoisted out the loop, later combine pass can also optimize it. These (cprop and combine) are reasons why the patch itself has little impact on current tests. Test case in the patch is updated since it does not work for armv7-m. Thanks! -Zhenqiang > Ramana > > >> >> Bootstrap and no make check regression on Chrome book. >> For coremark, dhrystone and eembcv1, no any code size and performance >> change on Cortex-M4. >> No any file in CSiBE has code size change for Cortex-A15 and Cortex-M4. >> No Spec2000 performance regression on Chrome book and dumped assemble >> codes only show very few difference. >> >> OK for trunk? >> >> Thanks! >> -Zhenqiang >> >> ChangeLog: >> 2014-08-05 Zhenqiang Chen <zhenqiang.chen@linaro.org> >> >> * config/arm/arm.md (subsi3, andsi3, iorsi3, xorsi3): Keep some >> large constants in register other than split them. >> >> testsuite/ChangeLog: >> 2014-08-05 Zhenqiang Chen <zhenqiang.chen@linaro.org> >> >> * gcc.target/arm/maskdata.c: New test. [-- Attachment #2: skip-split-v2.patch --] [-- Type: text/x-patch, Size: 3665 bytes --] diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index bd8ea8f..c8b3001 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -1162,9 +1162,16 @@ { if (TARGET_32BIT) { - arm_split_constant (MINUS, SImode, NULL_RTX, - INTVAL (operands[1]), operands[0], - operands[2], optimize && can_create_pseudo_p ()); + if (!const_ok_for_arm (INTVAL (operands[1])) + && can_create_pseudo_p ()) + { + operands[1] = force_reg (SImode, operands[1]); + emit_insn (gen_subsi3 (operands[0], operands[1], operands[2])); + } + else + arm_split_constant (MINUS, SImode, NULL_RTX, + INTVAL (operands[1]), operands[0], operands[2], + optimize && can_create_pseudo_p ()); DONE; } else /* TARGET_THUMB1 */ @@ -2077,6 +2084,17 @@ emit_insn (gen_thumb2_zero_extendqisi2_v6 (operands[0], operands[1])); } + else if (!(const_ok_for_arm (INTVAL (operands[2])) + || const_ok_for_arm (~INTVAL (operands[2])) + /* zero_extendhi instruction is efficient enough. */ + || INTVAL (operands[2]) == 0xffff + || (INTVAL (operands[2]) < 0 + && const_ok_for_arm (-INTVAL (operands[2])))) + && can_create_pseudo_p ()) + { + operands[2] = force_reg (SImode, operands[2]); + emit_insn (gen_andsi3 (operands[0], operands[1], operands[2])); + } else arm_split_constant (AND, SImode, NULL_RTX, INTVAL (operands[2]), operands[0], @@ -2882,9 +2900,20 @@ { if (TARGET_32BIT) { - arm_split_constant (IOR, SImode, NULL_RTX, - INTVAL (operands[2]), operands[0], operands[1], - optimize && can_create_pseudo_p ()); + if (!(const_ok_for_arm (INTVAL (operands[2])) + || (TARGET_THUMB2 + && const_ok_for_arm (~INTVAL (operands[2]))) + || (INTVAL (operands[2]) < 0 + && const_ok_for_arm (-INTVAL (operands[2])))) + && can_create_pseudo_p ()) + { + operands[2] = force_reg (SImode, operands[2]); + emit_insn (gen_iorsi3 (operands[0], operands[1], operands[2])); + } + else + arm_split_constant (IOR, SImode, NULL_RTX, + INTVAL (operands[2]), operands[0], operands[1], + optimize && can_create_pseudo_p ()); DONE; } else /* TARGET_THUMB1 */ @@ -3052,9 +3081,18 @@ { if (TARGET_32BIT) { - arm_split_constant (XOR, SImode, NULL_RTX, - INTVAL (operands[2]), operands[0], operands[1], - optimize && can_create_pseudo_p ()); + if (!(const_ok_for_arm (INTVAL (operands[2])) + || (INTVAL (operands[2]) < 0 + && const_ok_for_arm (-INTVAL (operands[2])))) + && can_create_pseudo_p ()) + { + operands[2] = force_reg (SImode, operands[2]); + emit_insn (gen_xorsi3 (operands[0], operands[1], operands[2])); + } + else + arm_split_constant (XOR, SImode, NULL_RTX, + INTVAL (operands[2]), operands[0], operands[1], + optimize && can_create_pseudo_p ()); DONE; } else /* TARGET_THUMB1 */ diff --git a/gcc/testsuite/gcc.target/arm/maskdata.c b/gcc/testsuite/gcc.target/arm/maskdata.c new file mode 100644 index 0000000..6960c75 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/maskdata.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options " -O2 -fno-gcse " } */ +/* { dg-require-effective-target arm_thumb2_ok } */ + +#define MASK 0xfe00ff +void maskdata (int * data, int len) +{ + int i = len; + for (; i > 0; i -= 2) + { + data[i] &= MASK; + data[i + 1] &= MASK; + } +} +/* { dg-final { scan-assembler-not "65536" } } */ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH, ARM] Keep constants in register when expanding 2014-08-11 2:35 ` Zhenqiang Chen @ 2014-08-11 11:15 ` Ramana Radhakrishnan 2014-08-12 5:37 ` Zhenqiang Chen 0 siblings, 1 reply; 7+ messages in thread From: Ramana Radhakrishnan @ 2014-08-11 11:15 UTC (permalink / raw) To: Zhenqiang Chen; +Cc: gcc-patches On Mon, Aug 11, 2014 at 3:35 AM, Zhenqiang Chen <zhenqiang.chen@linaro.org> wrote: > On 8 August 2014 23:22, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote: >> On Tue, Aug 5, 2014 at 10:31 AM, Zhenqiang Chen >> <zhenqiang.chen@linaro.org> wrote: >>> Hi, >>> >>> For some large constants, ARM will split them during expanding, which >>> makes impossible to hoist them out the loop or shared by different >>> references (refer the test case in the patch). >>> >>> The patch keeps some constants in registers. If the constant can not >>> be optimized, the cprop and combine passes can optimize them as what >>> we do in current expand pass with >>> >>> define_insn_and_split "*arm_subsi3_insn" >>> define_insn_and_split "*arm_andsi3_insn" >>> define_insn_and_split "*iorsi3_insn" >>> define_insn_and_split "*arm_xorsi3" >>> >>> The patch does not modify addsi3 since the define_insn_and_split >>> "*arm_addsi3" is only valid when (reload_completed || >>> !arm_eliminable_register (operands[1])). The cprop and combine passes >>> can not optimize the large constant if we put it in register, which >>> will lead to regression. >>> >>> For logic operators, the patch skips changes for constants: >>> >>> INTVAL (operands[2]) < 0 && const_ok_for_arm (-INTVAL (operands[2]) >>> >>> since expand pass always uses "sign-extend" to get the value >>> (trunc_int_for_mode called from immed_wide_int_const) for rtl, and >>> logs show most negative values are UNSIGNED when they are TREE node. >>> And combine pass is smart enough to recover the negative value to >>> positive value. >> >> I am unable to verify any change in code generation for this testcase >> with and without the patch when I had a play with the patch. >> >> what gives ? > > Thanks for trying the patch. > > Do you add option -fno-gcse which is mentioned in dg-options " -O2 > -fno-gcse "? Without it, there is no change for the testcase since > cprop pass will propagate the constant to AND expr (A patch to enhance > cprop pass was discussed at > https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01321.html). Probably not and I can now see the difference in code generated for Thumb state. Why is it that in ARM state with -mcpu=cortex-a15 we see the hoisting of the constant without your patch with -fno-gcse ? So, the patch improves code generation for -mcpu=cortex-a15 -mthumb -fno-gcse for the given testcase ? > > In addition, if the constant can not be hoisted out the loop, later > combine pass can also optimize it. These (cprop and combine) are > reasons why the patch itself has little impact on current tests. Does this mean you need the referred to patch to be useful as a pre-requisite ? I fail to understand why this patch needs to go in if it makes no difference without disabling GCSE. I cannot see -fno-gcse being used by default for performant code. regards Ramana ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH, ARM] Keep constants in register when expanding 2014-08-11 11:15 ` Ramana Radhakrishnan @ 2014-08-12 5:37 ` Zhenqiang Chen 0 siblings, 0 replies; 7+ messages in thread From: Zhenqiang Chen @ 2014-08-12 5:37 UTC (permalink / raw) To: Ramana Radhakrishnan; +Cc: gcc-patches On 11 August 2014 19:14, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote: > On Mon, Aug 11, 2014 at 3:35 AM, Zhenqiang Chen > <zhenqiang.chen@linaro.org> wrote: >> On 8 August 2014 23:22, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote: >>> On Tue, Aug 5, 2014 at 10:31 AM, Zhenqiang Chen >>> <zhenqiang.chen@linaro.org> wrote: >>>> Hi, >>>> >>>> For some large constants, ARM will split them during expanding, which >>>> makes impossible to hoist them out the loop or shared by different >>>> references (refer the test case in the patch). >>>> >>>> The patch keeps some constants in registers. If the constant can not >>>> be optimized, the cprop and combine passes can optimize them as what >>>> we do in current expand pass with >>>> >>>> define_insn_and_split "*arm_subsi3_insn" >>>> define_insn_and_split "*arm_andsi3_insn" >>>> define_insn_and_split "*iorsi3_insn" >>>> define_insn_and_split "*arm_xorsi3" >>>> >>>> The patch does not modify addsi3 since the define_insn_and_split >>>> "*arm_addsi3" is only valid when (reload_completed || >>>> !arm_eliminable_register (operands[1])). The cprop and combine passes >>>> can not optimize the large constant if we put it in register, which >>>> will lead to regression. >>>> >>>> For logic operators, the patch skips changes for constants: >>>> >>>> INTVAL (operands[2]) < 0 && const_ok_for_arm (-INTVAL (operands[2]) >>>> >>>> since expand pass always uses "sign-extend" to get the value >>>> (trunc_int_for_mode called from immed_wide_int_const) for rtl, and >>>> logs show most negative values are UNSIGNED when they are TREE node. >>>> And combine pass is smart enough to recover the negative value to >>>> positive value. >>> >>> I am unable to verify any change in code generation for this testcase >>> with and without the patch when I had a play with the patch. >>> >>> what gives ? >> >> Thanks for trying the patch. >> >> Do you add option -fno-gcse which is mentioned in dg-options " -O2 >> -fno-gcse "? Without it, there is no change for the testcase since >> cprop pass will propagate the constant to AND expr (A patch to enhance >> cprop pass was discussed at >> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01321.html). > > Probably not and I can now see the difference in code generated for > Thumb state. Why is it that in ARM state with -mcpu=cortex-a15 we see > the hoisting of the constant without your patch with -fno-gcse ? The difference between ARM and THUMB2 modes are due to rtx_cost difference. For ARM mode, the constant is force_reg in function avoid_expensive_constant (obtabs.c) before gen_andsi3 when expanding. > So, the patch improves code generation for -mcpu=cortex-a15 -mthumb > -fno-gcse for the given testcase ? Yes. >> >> In addition, if the constant can not be hoisted out the loop, later >> combine pass can also optimize it. These (cprop and combine) are >> reasons why the patch itself has little impact on current tests. > > Does this mean you need the referred to patch to be useful as a > pre-requisite ? I fail to understand why this patch needs to go in if > it makes no difference without disabling GCSE. I cannot see -fno-gcse > being used by default for performant code. For some codes, -fno-gcse might get better performance. Please refer paper: A case study: optimizing GCC on ARM for performance of libevas rasterization library http://ctuning.org/dissemination/grow10-03.pdf The issues mentioned in the paper had been solved since arm_split_constant is smart enough to handle the 0xff00ff. But what for other irregular constant? The patch gives a chance to handle them. Thanks! -Zhenqiang > regards > Ramana ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH, ARM] Keep constants in register when expanding 2014-08-05 9:31 [PATCH, ARM] Keep constants in register when expanding Zhenqiang Chen 2014-08-08 15:22 ` Ramana Radhakrishnan @ 2014-08-12 9:24 ` Richard Earnshaw 2014-08-13 7:46 ` Zhenqiang Chen 1 sibling, 1 reply; 7+ messages in thread From: Richard Earnshaw @ 2014-08-12 9:24 UTC (permalink / raw) To: Zhenqiang Chen; +Cc: gcc-patches, Ramana radhakrishnan On 05/08/14 10:31, Zhenqiang Chen wrote: > Hi, > > For some large constants, ARM will split them during expanding, which > makes impossible to hoist them out the loop or shared by different > references (refer the test case in the patch). > > The patch keeps some constants in registers. If the constant can not > be optimized, the cprop and combine passes can optimize them as what > we do in current expand pass with > > define_insn_and_split "*arm_subsi3_insn" > define_insn_and_split "*arm_andsi3_insn" > define_insn_and_split "*iorsi3_insn" > define_insn_and_split "*arm_xorsi3" > > The patch does not modify addsi3 since the define_insn_and_split > "*arm_addsi3" is only valid when (reload_completed || > !arm_eliminable_register (operands[1])). The cprop and combine passes > can not optimize the large constant if we put it in register, which > will lead to regression. > > For logic operators, the patch skips changes for constants: > > INTVAL (operands[2]) < 0 && const_ok_for_arm (-INTVAL (operands[2]) > > since expand pass always uses "sign-extend" to get the value > (trunc_int_for_mode called from immed_wide_int_const) for rtl, and > logs show most negative values are UNSIGNED when they are TREE node. > And combine pass is smart enough to recover the negative value to > positive value. > > Bootstrap and no make check regression on Chrome book. > For coremark, dhrystone and eembcv1, no any code size and performance > change on Cortex-M4. > No any file in CSiBE has code size change for Cortex-A15 and Cortex-M4. > No Spec2000 performance regression on Chrome book and dumped assemble > codes only show very few difference. > > OK for trunk? > Not yet. I think the principle is sound, but there are some issues that need sorting out. 1) We shouldn't do this when not optimizing; there's no advantage to hoisting out the constants in that case. Indeed, it may not be worth-while at -O1 either. 2) I think you should be using const_ok_for_op rather than working through all the permitted values. If that function isn't doing the right thing, then it probably needs extending to handle those cases as well. 3) Why haven't you handled addsi3? See below for some specific comments. > Thanks! > -Zhenqiang > > ChangeLog: > 2014-08-05 Zhenqiang Chen <zhenqiang.chen@linaro.org> > > * config/arm/arm.md (subsi3, andsi3, iorsi3, xorsi3): Keep some > large constants in register other than split them. > > testsuite/ChangeLog: > 2014-08-05 Zhenqiang Chen <zhenqiang.chen@linaro.org> > > * gcc.target/arm/maskdata.c: New test. > > > skip-split.patch > > > diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md > index bd8ea8f..c8b3001 100644 > --- a/gcc/config/arm/arm.md > +++ b/gcc/config/arm/arm.md > @@ -1162,9 +1162,16 @@ > { > if (TARGET_32BIT) > { > - arm_split_constant (MINUS, SImode, NULL_RTX, > - INTVAL (operands[1]), operands[0], > - operands[2], optimize && can_create_pseudo_p ()); > + if (!const_ok_for_arm (INTVAL (operands[1])) > + && can_create_pseudo_p ()) > + { > + operands[1] = force_reg (SImode, operands[1]); > + emit_insn (gen_subsi3 (operands[0], operands[1], operands[2])); The final emit_insn shouldn't be be needed. Once you've forced operands[1] into a register, you can just fall out the bottom of the pattern and let the default expansion take over. But... > + } > + else > + arm_split_constant (MINUS, SImode, NULL_RTX, > + INTVAL (operands[1]), operands[0], operands[2], > + optimize && can_create_pseudo_p ()); > DONE; ... You'll need to move the DONE inside the else clause. > } > else /* TARGET_THUMB1 */ > @@ -2077,6 +2084,17 @@ > emit_insn (gen_thumb2_zero_extendqisi2_v6 (operands[0], > operands[1])); > } > + else if (!(const_ok_for_arm (INTVAL (operands[2])) > + || const_ok_for_arm (~INTVAL (operands[2])) > + /* zero_extendhi instruction is efficient enough. */ > + || INTVAL (operands[2]) == 0xffff > + || (INTVAL (operands[2]) < 0 > + && const_ok_for_arm (-INTVAL (operands[2])))) > + && can_create_pseudo_p ()) > + { > + operands[2] = force_reg (SImode, operands[2]); > + emit_insn (gen_andsi3 (operands[0], operands[1], operands[2])); Same here. > + } > else > arm_split_constant (AND, SImode, NULL_RTX, > INTVAL (operands[2]), operands[0], > @@ -2882,9 +2900,20 @@ > { > if (TARGET_32BIT) > { > - arm_split_constant (IOR, SImode, NULL_RTX, > - INTVAL (operands[2]), operands[0], operands[1], > - optimize && can_create_pseudo_p ()); > + if (!(const_ok_for_arm (INTVAL (operands[2])) > + || (TARGET_THUMB2 > + && const_ok_for_arm (~INTVAL (operands[2]))) > + || (INTVAL (operands[2]) < 0 > + && const_ok_for_arm (-INTVAL (operands[2])))) > + && can_create_pseudo_p ()) > + { > + operands[2] = force_reg (SImode, operands[2]); > + emit_insn (gen_iorsi3 (operands[0], operands[1], operands[2])); And here. > + } > + else > + arm_split_constant (IOR, SImode, NULL_RTX, > + INTVAL (operands[2]), operands[0], operands[1], > + optimize && can_create_pseudo_p ()); > DONE; > } > else /* TARGET_THUMB1 */ > @@ -3052,9 +3081,18 @@ > { > if (TARGET_32BIT) > { > - arm_split_constant (XOR, SImode, NULL_RTX, > - INTVAL (operands[2]), operands[0], operands[1], > - optimize && can_create_pseudo_p ()); > + if (!(const_ok_for_arm (INTVAL (operands[2])) > + || (INTVAL (operands[2]) < 0 > + && const_ok_for_arm (-INTVAL (operands[2])))) > + && can_create_pseudo_p ()) > + { > + operands[2] = force_reg (SImode, operands[2]); > + emit_insn (gen_xorsi3 (operands[0], operands[1], operands[2])); And here. > + } > + else > + arm_split_constant (XOR, SImode, NULL_RTX, > + INTVAL (operands[2]), operands[0], operands[1], > + optimize && can_create_pseudo_p ()); > DONE; > } > else /* TARGET_THUMB1 */ > > diff --git a/gcc/testsuite/gcc.target/arm/maskdata.c b/gcc/testsuite/gcc.target/arm/maskdata.c > new file mode 100644 > index 0000000..b4231a4 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/maskdata.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options " -O2 -fno-gcse " } */ > +/* { dg-require-effective-target arm_thumb2_ok } */ > + > +#define MASK 0xfe00ff > +void maskdata (int * data, int len) > +{ > + int i = len; > + for (; i > 0; i -= 2) > + { > + data[i] &= MASK; > + data[i + 1] &= MASK; > + } > +} > +/* { dg-final { scan-assembler "254" } } */ > +/* { dg-final { scan-assembler "255" } } */ > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH, ARM] Keep constants in register when expanding 2014-08-12 9:24 ` Richard Earnshaw @ 2014-08-13 7:46 ` Zhenqiang Chen 0 siblings, 0 replies; 7+ messages in thread From: Zhenqiang Chen @ 2014-08-13 7:46 UTC (permalink / raw) To: Richard Earnshaw; +Cc: gcc-patches, Ramana radhakrishnan [-- Attachment #1: Type: text/plain, Size: 8476 bytes --] On 12 August 2014 17:24, Richard Earnshaw <rearnsha@arm.com> wrote: > On 05/08/14 10:31, Zhenqiang Chen wrote: >> Hi, >> >> For some large constants, ARM will split them during expanding, which >> makes impossible to hoist them out the loop or shared by different >> references (refer the test case in the patch). >> >> The patch keeps some constants in registers. If the constant can not >> be optimized, the cprop and combine passes can optimize them as what >> we do in current expand pass with >> >> define_insn_and_split "*arm_subsi3_insn" >> define_insn_and_split "*arm_andsi3_insn" >> define_insn_and_split "*iorsi3_insn" >> define_insn_and_split "*arm_xorsi3" >> >> The patch does not modify addsi3 since the define_insn_and_split >> "*arm_addsi3" is only valid when (reload_completed || >> !arm_eliminable_register (operands[1])). The cprop and combine passes >> can not optimize the large constant if we put it in register, which >> will lead to regression. >> >> For logic operators, the patch skips changes for constants: >> >> INTVAL (operands[2]) < 0 && const_ok_for_arm (-INTVAL (operands[2]) >> >> since expand pass always uses "sign-extend" to get the value >> (trunc_int_for_mode called from immed_wide_int_const) for rtl, and >> logs show most negative values are UNSIGNED when they are TREE node. >> And combine pass is smart enough to recover the negative value to >> positive value. >> >> Bootstrap and no make check regression on Chrome book. >> For coremark, dhrystone and eembcv1, no any code size and performance >> change on Cortex-M4. >> No any file in CSiBE has code size change for Cortex-A15 and Cortex-M4. >> No Spec2000 performance regression on Chrome book and dumped assemble >> codes only show very few difference. >> >> OK for trunk? >> > > Not yet. > > I think the principle is sound, but there are some issues that need > sorting out. > > 1) We shouldn't do this when not optimizing; there's no advantage to > hoisting out the constants in that case. Indeed, it may not be > worth-while at -O1 either. Updated. Only apply it when optimize >= 2 > 2) I think you should be using const_ok_for_op rather than working > through all the permitted values. If that function isn't doing the > right thing, then it probably needs extending to handle those cases as well. There are some difference. I want to filter cases for which arm_split_constant will get better result. Patch is updated to add a function const_ok_for_split, in which (i < 0) && const_ok_for_arm (-i) and 0xffff can not fit into const_ok_for_op. > 3) Why haven't you handled addsi3? As I had mentioned in the original mail, since the define_insn_and_split "*arm_addsi3" is only valid when (reload_completed || !arm_eliminable_register (operands[1])). The cprop and combine passes can not optimize the large constant if we put it in register (and it is not hoisted out of the loop), which will lead to regression. Any reason for *arm_addsi3 with additional condition (reload_completed || !arm_eliminable_register (operands[1]))? Without it, addsi3 can be handled as subsi3. > See below for some specific comments. All are updated. Thanks! -Zhenqiang >> >> ChangeLog: >> 2014-08-05 Zhenqiang Chen <zhenqiang.chen@linaro.org> >> >> * config/arm/arm.md (subsi3, andsi3, iorsi3, xorsi3): Keep some >> large constants in register other than split them. >> >> testsuite/ChangeLog: >> 2014-08-05 Zhenqiang Chen <zhenqiang.chen@linaro.org> >> >> * gcc.target/arm/maskdata.c: New test. >> >> >> skip-split.patch >> >> >> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md >> index bd8ea8f..c8b3001 100644 >> --- a/gcc/config/arm/arm.md >> +++ b/gcc/config/arm/arm.md >> @@ -1162,9 +1162,16 @@ >> { >> if (TARGET_32BIT) >> { >> - arm_split_constant (MINUS, SImode, NULL_RTX, >> - INTVAL (operands[1]), operands[0], >> - operands[2], optimize && can_create_pseudo_p ()); >> + if (!const_ok_for_arm (INTVAL (operands[1])) >> + && can_create_pseudo_p ()) >> + { >> + operands[1] = force_reg (SImode, operands[1]); >> + emit_insn (gen_subsi3 (operands[0], operands[1], operands[2])); > > The final emit_insn shouldn't be be needed. Once you've forced > operands[1] into a register, you can just fall out the bottom of the > pattern and let the default expansion take over. But... >> + } >> + else >> + arm_split_constant (MINUS, SImode, NULL_RTX, >> + INTVAL (operands[1]), operands[0], operands[2], >> + optimize && can_create_pseudo_p ()); >> DONE; > > ... You'll need to move the DONE inside the else clause. > >> } >> else /* TARGET_THUMB1 */ >> @@ -2077,6 +2084,17 @@ >> emit_insn (gen_thumb2_zero_extendqisi2_v6 (operands[0], >> operands[1])); >> } >> + else if (!(const_ok_for_arm (INTVAL (operands[2])) >> + || const_ok_for_arm (~INTVAL (operands[2])) >> + /* zero_extendhi instruction is efficient enough. */ >> + || INTVAL (operands[2]) == 0xffff >> + || (INTVAL (operands[2]) < 0 >> + && const_ok_for_arm (-INTVAL (operands[2])))) >> + && can_create_pseudo_p ()) >> + { >> + operands[2] = force_reg (SImode, operands[2]); >> + emit_insn (gen_andsi3 (operands[0], operands[1], operands[2])); > > Same here. > >> + } >> else >> arm_split_constant (AND, SImode, NULL_RTX, >> INTVAL (operands[2]), operands[0], >> @@ -2882,9 +2900,20 @@ >> { >> if (TARGET_32BIT) >> { >> - arm_split_constant (IOR, SImode, NULL_RTX, >> - INTVAL (operands[2]), operands[0], operands[1], >> - optimize && can_create_pseudo_p ()); >> + if (!(const_ok_for_arm (INTVAL (operands[2])) >> + || (TARGET_THUMB2 >> + && const_ok_for_arm (~INTVAL (operands[2]))) >> + || (INTVAL (operands[2]) < 0 >> + && const_ok_for_arm (-INTVAL (operands[2])))) >> + && can_create_pseudo_p ()) >> + { >> + operands[2] = force_reg (SImode, operands[2]); >> + emit_insn (gen_iorsi3 (operands[0], operands[1], operands[2])); > > And here. > >> + } >> + else >> + arm_split_constant (IOR, SImode, NULL_RTX, >> + INTVAL (operands[2]), operands[0], operands[1], >> + optimize && can_create_pseudo_p ()); >> DONE; >> } >> else /* TARGET_THUMB1 */ >> @@ -3052,9 +3081,18 @@ >> { >> if (TARGET_32BIT) >> { >> - arm_split_constant (XOR, SImode, NULL_RTX, >> - INTVAL (operands[2]), operands[0], operands[1], >> - optimize && can_create_pseudo_p ()); >> + if (!(const_ok_for_arm (INTVAL (operands[2])) >> + || (INTVAL (operands[2]) < 0 >> + && const_ok_for_arm (-INTVAL (operands[2])))) >> + && can_create_pseudo_p ()) >> + { >> + operands[2] = force_reg (SImode, operands[2]); >> + emit_insn (gen_xorsi3 (operands[0], operands[1], operands[2])); > > And here. > >> + } >> + else >> + arm_split_constant (XOR, SImode, NULL_RTX, >> + INTVAL (operands[2]), operands[0], operands[1], >> + optimize && can_create_pseudo_p ()); >> DONE; >> } >> else /* TARGET_THUMB1 */ >> >> diff --git a/gcc/testsuite/gcc.target/arm/maskdata.c b/gcc/testsuite/gcc.target/arm/maskdata.c >> new file mode 100644 >> index 0000000..b4231a4 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/arm/maskdata.c >> @@ -0,0 +1,16 @@ >> +/* { dg-do compile } */ >> +/* { dg-options " -O2 -fno-gcse " } */ >> +/* { dg-require-effective-target arm_thumb2_ok } */ >> + >> +#define MASK 0xfe00ff >> +void maskdata (int * data, int len) >> +{ >> + int i = len; >> + for (; i > 0; i -= 2) >> + { >> + data[i] &= MASK; >> + data[i + 1] &= MASK; >> + } >> +} >> +/* { dg-final { scan-assembler "254" } } */ >> +/* { dg-final { scan-assembler "255" } } */ >> > > [-- Attachment #2: skip-split-v3.patch --] [-- Type: text/x-patch, Size: 5388 bytes --] diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index be5e72a..f913035 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -51,6 +51,7 @@ extern bool arm_modes_tieable_p (enum machine_mode, enum machine_mode); extern int const_ok_for_arm (HOST_WIDE_INT); extern int const_ok_for_op (HOST_WIDE_INT, enum rtx_code); extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code); +extern int const_ok_for_split (HOST_WIDE_INT, enum rtx_code); extern int arm_split_constant (RTX_CODE, enum machine_mode, rtx, HOST_WIDE_INT, rtx, rtx, int); extern int legitimate_pic_operand_p (rtx); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 7f62ca4..78a624d 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3584,6 +3584,38 @@ const_ok_for_dimode_op (HOST_WIDE_INT i, enum rtx_code code) } } +/* Return true if I is a valid constant for split with the operation CODE. + The condition should align with the constrain of the corresponding + define_insn_and_split pattern to make sure later pass can optimize + the constants. */ +int +const_ok_for_split (HOST_WIDE_INT i, enum rtx_code code) +{ + if (optimize < 2 + || !can_create_pseudo_p () + || const_ok_for_arm (i) + /* Since expand pass always uses "sign-extend" to get the value + (trunc_int_for_mode called from immed_wide_int_const) for rtl, + and logs show most negative values are UNSIGNED when they are + TREE node. And combine pass is smart enough to recover the + negative value to positive value. */ + || ((i < 0) && const_ok_for_arm (-i))) + return 1; + + switch (code) + { + case AND: + /* zero_extendhi instruction is efficient. */ + return const_ok_for_arm (~i) || (i == 0xffff); + + case IOR: + return TARGET_THUMB2 && const_ok_for_arm (~i); + + default: + return 1; + } +} + /* Emit a sequence of insns to handle a large constant. CODE is the code of the operation required, it can be any of SET, PLUS, IOR, AND, XOR, MINUS; diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index cd9ab6c..f3ca849 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -1162,10 +1162,16 @@ { if (TARGET_32BIT) { - arm_split_constant (MINUS, SImode, NULL_RTX, - INTVAL (operands[1]), operands[0], - operands[2], optimize && can_create_pseudo_p ()); - DONE; + if (!const_ok_for_split (INTVAL (operands[1]), MINUS)) + operands[1] = force_reg (SImode, operands[1]); + else + { + arm_split_constant (MINUS, SImode, NULL_RTX, + INTVAL (operands[1]), operands[0], + operands[2], + optimize && can_create_pseudo_p ()); + DONE; + } } else /* TARGET_THUMB1 */ operands[1] = force_reg (SImode, operands[1]); @@ -2076,14 +2082,19 @@ operands[1] = convert_to_mode (QImode, operands[1], 1); emit_insn (gen_thumb2_zero_extendqisi2_v6 (operands[0], operands[1])); + DONE; } + else if (!const_ok_for_split (INTVAL (operands[2]), AND)) + operands[2] = force_reg (SImode, operands[2]); else - arm_split_constant (AND, SImode, NULL_RTX, - INTVAL (operands[2]), operands[0], - operands[1], - optimize && can_create_pseudo_p ()); + { + arm_split_constant (AND, SImode, NULL_RTX, + INTVAL (operands[2]), operands[0], + operands[1], + optimize && can_create_pseudo_p ()); - DONE; + DONE; + } } } else /* TARGET_THUMB1 */ @@ -2882,10 +2893,16 @@ { if (TARGET_32BIT) { - arm_split_constant (IOR, SImode, NULL_RTX, - INTVAL (operands[2]), operands[0], operands[1], - optimize && can_create_pseudo_p ()); - DONE; + if (!const_ok_for_split (INTVAL (operands[2]), IOR)) + operands[2] = force_reg (SImode, operands[2]); + else + { + arm_split_constant (IOR, SImode, NULL_RTX, + INTVAL (operands[2]), operands[0], + operands[1], + optimize && can_create_pseudo_p ()); + DONE; + } } else /* TARGET_THUMB1 */ { @@ -3052,10 +3069,16 @@ { if (TARGET_32BIT) { - arm_split_constant (XOR, SImode, NULL_RTX, - INTVAL (operands[2]), operands[0], operands[1], - optimize && can_create_pseudo_p ()); - DONE; + if (!const_ok_for_split (INTVAL (operands[2]), XOR)) + operands[2] = force_reg (SImode, operands[2]); + else + { + arm_split_constant (XOR, SImode, NULL_RTX, + INTVAL (operands[2]), operands[0], + operands[1], + optimize && can_create_pseudo_p ()); + DONE; + } } else /* TARGET_THUMB1 */ { diff --git a/gcc/testsuite/gcc.target/arm/maskdata.c b/gcc/testsuite/gcc.target/arm/maskdata.c new file mode 100644 index 0000000..6960c75 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/maskdata.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options " -O2 -fno-gcse " } */ +/* { dg-require-effective-target arm_thumb2_ok } */ + +#define MASK 0xfe00ff +void maskdata (int * data, int len) +{ + int i = len; + for (; i > 0; i -= 2) + { + data[i] &= MASK; + data[i + 1] &= MASK; + } +} +/* { dg-final { scan-assembler-not "65536" } } */ ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-08-13 7:46 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-08-05 9:31 [PATCH, ARM] Keep constants in register when expanding Zhenqiang Chen 2014-08-08 15:22 ` Ramana Radhakrishnan 2014-08-11 2:35 ` Zhenqiang Chen 2014-08-11 11:15 ` Ramana Radhakrishnan 2014-08-12 5:37 ` Zhenqiang Chen 2014-08-12 9:24 ` Richard Earnshaw 2014-08-13 7:46 ` Zhenqiang Chen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).