Re: RFC: patch to build GCC for arm with LRA

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: RFC: patch to build GCC for arm with LRA
@ 2013-08-30 13:17 Yvan Roux
  2013-08-30 13:23 ` Richard Earnshaw
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Yvan Roux @ 2013-08-30 13:17 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Richard Earnshaw, gcc-patches, rdsandiford, Marcus Shawcroft,
	Ramana Radhakrishnan, Matthew Gretton-Dann

[-- Attachment #1: Type: text/plain, Size: 3706 bytes --]

Hi,

here is a request for comments on the 2 attached patches which enable
the build of GCC on ARM with LRA.  The patches introduce a new
undocumented option -mlra to use LRA instead of reload, as what was
done on previous LRA support, which is here to ease the test and
comparison with reload and not supposed to be kept in 4.9.

On AArch32:

* The patch includes the previous changes of this thread, add the
handling of ASHIFTRT, LSHIFTRT, ROTATE and ROTATERT in get_index_scale
and let LRA handle the constraint in Thumb.
* The status is that the build complete in ARM mode with a couple of
regressions in the testsuite, that I'll investigate:
  - gcc.c-torture/execute/20060102-1.c
  - gcc.c-torture/execute/961026-1.c
  - gcc.target/arm/atomic-comp-swap-release-acquire.c
and some build failures in libstdc++ at the pass manager level (there
is an invalid read in gt_ggc_mx)
* The build of libraries in Thumb mode still doesn't complete, as
Vladimir said the secondary_reload  modification solves LRA cycling
but we still have some issues.

On AArch64 :

* I modified must_be_index_p to avoid the call of set_address_base
with patterns which contains a MULT.
* The build complete, but there is a couple of regression in the
testsuite I'm looking at on
 - gcc.c-torture/execute/ieee/fp-cmp-4l.c
 - c-c++-common/torture/complex-sign-mul-minus-one.c
for instance.

Any comments ?

Thanks
Yvan




On 6 July 2013 01:12, Vladimir Makarov <vmakarov@redhat.com> wrote:
> On 13-07-05 8:43 AM, Yvan Roux wrote:
>>
>> Hi,
>>
>> for AArch64 it is also needed to take into account SIGN_EXTRACT in the
>> set_address_base and set_address_index routines, as we acan encounter
>> that kind of insn for instance :
>>
>> (insn 29 27 5 7 (set (mem:SI (plus:DI (sign_extract:DI (mult:DI
>> (subreg:DI (reg/v:SI 76 [ elt ]) 0)
>> ...
>
> OK.
>
>> with the attached patch and the LRA enabled, compiler now bootstrap
>> but I've few regressions in the testsuite,
>
> It seems ok to me but I am confused of the following change:
>
>  set_address_base (struct address_info *info, rtx *loc, rtx *inner)
>  {
> -  if (GET_CODE (*inner) == LO_SUM)
> +  if (GET_CODE (*inner) == SIGN_EXTRACT)
> +    inner = strip_address_mutations (&XEXP (*inner, 0));
> +
> +  if (GET_CODE (*inner) == LO_SUM || GET_CODE (*inner) == MULT)
>
>      inner = strip_address_mutations (&XEXP (*inner, 0));
>    gcc_checking_assert (REG_P (*inner)
>                 || MEM_P (*inner)
>
> base address should not contain MULT (which you added).  It is controlled by
> the result of must_be_index_p.  So set_address_base should not have code for
> MULT and you need to change must_be_index_p in a way that set_address_base
> is not called for MULT.
>
>
>> gcc.c/torture/execute/fp-cmp-4l.c for instance. I was looking at these
>> issues before submitting  a complete AArch64 LRA enabling patch, but
>> as you are speaking about that...
>>
>> Valdimir, for the ARM target I already had the ASHIFTRT and LSHIFTRT
>> addition on my side, but still had an ICE during bootstrap with LRA
>> when compiling fixed-bit.c (the Max number of generated reload insns
>> we talk about already) is it working for you ?
>>
>>
> I did not check thumb I guess.  If what you are asking about the problem you
> sent me about 2 weeks ago, I guess one solution would be putting
>
>   if (lra_in_progress)
>     return NO_REGS;
>
> at the beginning of arm/targhooks.c::default_secondary_reload.  LRA is smart
> enough to figure out what to do from constraints in most cases of when
> reload needs secondary_reload.  In arm case, default_secondary_reload only
> confuses LRA.
>
> I had no time to test this change, but it solves LRA cycling for the test
> case you sent me.
>

[-- Attachment #2: aarch32-lra.patch --]
[-- Type: application/octet-stream, Size: 3953 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f731bb6..182b009 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -68,6 +68,7 @@ struct four_ints
 };
 
 /* Forward function declarations.  */
+static bool arm_lra_p (void);
 static bool arm_needs_doubleword_align (enum machine_mode, const_tree);
 static int arm_compute_static_chain_stack_bytes (void);
 static arm_stack_offsets *arm_get_frame_offsets (void);
@@ -338,6 +339,9 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_LEGITIMIZE_ADDRESS
 #define TARGET_LEGITIMIZE_ADDRESS arm_legitimize_address
 
+#undef TARGET_LRA_P
+#define TARGET_LRA_P arm_lra_p
+
 #undef  TARGET_ATTRIBUTE_TABLE
 #define TARGET_ATTRIBUTE_TABLE arm_attribute_table
 
@@ -4971,6 +4975,12 @@ arm_init_cumulative_args (CUMULATIVE_ARGS *pcum, tree fntype,
     }
 }
 
+/* Return true if we use LRA instead of reload pass.  */
+static bool
+arm_lra_p (void)
+{
+  return arm_lra_flag;
+}
 
 /* Return true if mode/type need doubleword alignment.  */
 static bool
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 1781b75..05a271e 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1266,11 +1266,12 @@ enum reg_class
 
 /* Must leave BASE_REGS reloads alone */
 #define THUMB_SECONDARY_INPUT_RELOAD_CLASS(CLASS, MODE, X)		\
+  (lra_in_progress ? NO_REGS : 						\
   ((CLASS) != LO_REGS && (CLASS) != BASE_REGS				\
    ? ((true_regnum (X) == -1 ? LO_REGS					\
        : (true_regnum (X) + HARD_REGNO_NREGS (0, MODE) > 8) ? LO_REGS	\
        : NO_REGS)) 							\
-   : NO_REGS)
+   : NO_REGS))
 
 #define THUMB_SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, X)		\
   ((CLASS) != LO_REGS && (CLASS) != BASE_REGS				\
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index b9ae2b0..7c9ea36 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -109,6 +109,10 @@ mfloat-abi=
 Target RejectNegative Joined Enum(float_abi_type) Var(arm_float_abi) Init(TARGET_DEFAULT_FLOAT_ABI)
 Specify if floating point hardware should be used
 
+mlra
+Target Report Var(arm_lra_flag) Init(1) Save
+Use LRA instead of reload
+
 Enum
 Name(float_abi_type) Type(enum float_abi_type)
 Known floating-point ABIs (for use with the -mfloat-abi= option):
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 95a314f..4ef6adb 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -5483,7 +5483,12 @@ must_be_base_p (rtx x)
 static bool
 must_be_index_p (rtx x)
 {
-  return GET_CODE (x) == MULT || GET_CODE (x) == ASHIFT;
+  return GET_CODE (x) == MULT
+       || GET_CODE (x) == ASHIFT
+       || GET_CODE (x) == ASHIFTRT
+       || GET_CODE (x) == LSHIFTRT
+       || GET_CODE (x) == ROTATE
+       || GET_CODE (x) == ROTATERT;
 }
 
 /* Set the segment part of address INFO to LOC, given that INNER is the
@@ -5522,7 +5527,12 @@ set_address_base (struct address_info *info, rtx *loc, rtx *inner)
 static void
 set_address_index (struct address_info *info, rtx *loc, rtx *inner)
 {
-  if ((GET_CODE (*inner) == MULT || GET_CODE (*inner) == ASHIFT)
+  if ((GET_CODE (*inner) == MULT
+       || GET_CODE (*inner) == ASHIFT
+       || GET_CODE (*inner) == ASHIFTRT
+       || GET_CODE (*inner) == LSHIFTRT
+       || GET_CODE (*inner) == ROTATE
+       || GET_CODE (*inner) == ROTATERT)
       && CONSTANT_P (XEXP (*inner, 1)))
     inner = strip_address_mutations (&XEXP (*inner, 0));
   gcc_checking_assert (REG_P (*inner)
@@ -5786,7 +5796,11 @@ get_index_scale (const struct address_info *info)
       && info->index_term == &XEXP (index, 0))
     return INTVAL (XEXP (index, 1));
 
-  if (GET_CODE (index) == ASHIFT
+  if ((GET_CODE (index) == ASHIFT
+       || GET_CODE (index) == ASHIFTRT
+       || GET_CODE (index) == LSHIFTRT
+       || GET_CODE (index) == ROTATE
+       || GET_CODE (index) == ROTATERT)
       && CONST_INT_P (XEXP (index, 1))
       && info->index_term == &XEXP (index, 0))
     return (HOST_WIDE_INT) 1 << INTVAL (XEXP (index, 1));

[-- Attachment #3: aarch64-lra.patch --]
[-- Type: application/octet-stream, Size: 3171 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index aed035a..f42cb75 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -109,6 +109,7 @@ enum aarch64_code_model aarch64_cmodel;
 #define TARGET_HAVE_TLS 1
 #endif
 
+static bool aarch64_lra_p (void);
 static bool aarch64_composite_type_p (const_tree, enum machine_mode);
 static bool aarch64_vfp_is_call_or_return_candidate (enum machine_mode,
 						     const_tree,
@@ -6083,6 +6084,13 @@ aapcs_vfp_sub_candidate (const_tree type, enum machine_mode *modep)
   return -1;
 }
 
+/* Return true if we use LRA instead of reload pass.  */
+static bool
+aarch64_lra_p (void)
+{
+  return aarch64_lra_flag;
+}
+
 /* Return TRUE if the type, as described by TYPE and MODE, is a composite
    type as described in AAPCS64 \S 4.3.  This includes aggregate, union and
    array types.  The C99 floating-point complex types are also considered
@@ -8208,6 +8216,9 @@ aarch64_vectorize_vec_perm_const_ok (enum machine_mode vmode,
 #undef TARGET_LIBGCC_CMP_RETURN_MODE
 #define TARGET_LIBGCC_CMP_RETURN_MODE aarch64_libgcc_cmp_return_mode
 
+#undef TARGET_LRA_P
+#define TARGET_LRA_P aarch64_lra_p
+
 #undef TARGET_MANGLE_TYPE
 #define TARGET_MANGLE_TYPE aarch64_mangle_type
 
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 8ff6ca1..0c31312 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -103,6 +103,10 @@ mabi=
 Target RejectNegative Joined Enum(aarch64_abi) Var(aarch64_abi) Init(AARCH64_ABI_DEFAULT)
 -mabi=ABI	Generate code that conforms to the specified ABI
 
+mlra
+Target Report Var(aarch64_lra_flag) Init(1) Save
+Use LRA instead of reload
+
 Enum
 Name(aarch64_abi) Type(int)
 Known AArch64 ABIs (for use with the -mabi= option):
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 95a314f..f4e9b51 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -5483,7 +5483,11 @@ must_be_base_p (rtx x)
 static bool
 must_be_index_p (rtx x)
 {
-  return GET_CODE (x) == MULT || GET_CODE (x) == ASHIFT;
+  rtx addr = x;
+  if (GET_CODE (addr) == SIGN_EXTRACT)
+    addr = *strip_address_mutations (&XEXP (addr, 0));
+
+  return GET_CODE (addr) == MULT || GET_CODE (addr) == ASHIFT;
 }
 
 /* Set the segment part of address INFO to LOC, given that INNER is the
@@ -5505,6 +5509,9 @@ set_address_segment (struct address_info *info, rtx *loc, rtx *inner)
 static void
 set_address_base (struct address_info *info, rtx *loc, rtx *inner)
 {
+  if (GET_CODE (*inner) == SIGN_EXTRACT)
+    inner = strip_address_mutations (&XEXP (*inner, 0));
+
   if (GET_CODE (*inner) == LO_SUM)
     inner = strip_address_mutations (&XEXP (*inner, 0));
   gcc_checking_assert (REG_P (*inner)
@@ -5522,6 +5529,9 @@ set_address_base (struct address_info *info, rtx *loc, rtx *inner)
 static void
 set_address_index (struct address_info *info, rtx *loc, rtx *inner)
 {
+  if (GET_CODE (*inner) == SIGN_EXTRACT)
+    inner = strip_address_mutations (&XEXP (*inner, 0));
+
   if ((GET_CODE (*inner) == MULT || GET_CODE (*inner) == ASHIFT)
       && CONSTANT_P (XEXP (*inner, 1)))
     inner = strip_address_mutations (&XEXP (*inner, 0));

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-08-30 13:17 RFC: patch to build GCC for arm with LRA Yvan Roux
@ 2013-08-30 13:23 ` Richard Earnshaw
  2013-08-30 13:36   ` Yvan Roux
  2013-09-08 16:12 ` Vladimir Makarov
  2013-09-08 19:41 ` Richard Sandiford
  2 siblings, 1 reply; 21+ messages in thread
From: Richard Earnshaw @ 2013-08-30 13:23 UTC (permalink / raw)
  To: Yvan Roux
  Cc: Vladimir Makarov, gcc-patches, rdsandiford, Marcus Shawcroft,
	Ramana Radhakrishnan, Matthew Gretton-Dann

On 30/08/13 14:09, Yvan Roux wrote:
> Hi,
> 
> here is a request for comments on the 2 attached patches which enable
> the build of GCC on ARM with LRA.  The patches introduce a new
> undocumented option -mlra to use LRA instead of reload, as what was
> done on previous LRA support, which is here to ease the test and
> comparison with reload and not supposed to be kept in 4.9.
> 
> On AArch32:
> 
> * The patch includes the previous changes of this thread, add the
> handling of ASHIFTRT, LSHIFTRT, ROTATE and ROTATERT in get_index_scale
> and let LRA handle the constraint in Thumb.
> * The status is that the build complete in ARM mode with a couple of
> regressions in the testsuite, that I'll investigate:
>   - gcc.c-torture/execute/20060102-1.c
>   - gcc.c-torture/execute/961026-1.c
>   - gcc.target/arm/atomic-comp-swap-release-acquire.c
> and some build failures in libstdc++ at the pass manager level (there
> is an invalid read in gt_ggc_mx)
> * The build of libraries in Thumb mode still doesn't complete, as
> Vladimir said the secondary_reload  modification solves LRA cycling
> but we still have some issues.
> 
> On AArch64 :
> 
> * I modified must_be_index_p to avoid the call of set_address_base
> with patterns which contains a MULT.
> * The build complete, but there is a couple of regression in the
> testsuite I'm looking at on
>  - gcc.c-torture/execute/ieee/fp-cmp-4l.c
>  - c-c++-common/torture/complex-sign-mul-minus-one.c
> for instance.
> 
> Any comments ?
> 

Why are you posting to conflicting sets of patches for rtlanal.c?
Changes to that file will have to work for all architectures; so while
it helps a little bit to see which changes are needed for which target,
ultimately you need one patch for that file that works everywhere.

Also, as a style nit, use

  return (test
	  || test
	  || test);

so that the parenthesis will force correct indentation of continuation
lines.

R.

> Thanks
> Yvan
> 
> 
> 
> 
> On 6 July 2013 01:12, Vladimir Makarov <vmakarov@redhat.com> wrote:
>> On 13-07-05 8:43 AM, Yvan Roux wrote:
>>>
>>> Hi,
>>>
>>> for AArch64 it is also needed to take into account SIGN_EXTRACT in the
>>> set_address_base and set_address_index routines, as we acan encounter
>>> that kind of insn for instance :
>>>
>>> (insn 29 27 5 7 (set (mem:SI (plus:DI (sign_extract:DI (mult:DI
>>> (subreg:DI (reg/v:SI 76 [ elt ]) 0)
>>> ...
>>
>> OK.
>>
>>> with the attached patch and the LRA enabled, compiler now bootstrap
>>> but I've few regressions in the testsuite,
>>
>> It seems ok to me but I am confused of the following change:
>>
>>  set_address_base (struct address_info *info, rtx *loc, rtx *inner)
>>  {
>> -  if (GET_CODE (*inner) == LO_SUM)
>> +  if (GET_CODE (*inner) == SIGN_EXTRACT)
>> +    inner = strip_address_mutations (&XEXP (*inner, 0));
>> +
>> +  if (GET_CODE (*inner) == LO_SUM || GET_CODE (*inner) == MULT)
>>
>>      inner = strip_address_mutations (&XEXP (*inner, 0));
>>    gcc_checking_assert (REG_P (*inner)
>>                 || MEM_P (*inner)
>>
>> base address should not contain MULT (which you added).  It is controlled by
>> the result of must_be_index_p.  So set_address_base should not have code for
>> MULT and you need to change must_be_index_p in a way that set_address_base
>> is not called for MULT.
>>
>>
>>> gcc.c/torture/execute/fp-cmp-4l.c for instance. I was looking at these
>>> issues before submitting  a complete AArch64 LRA enabling patch, but
>>> as you are speaking about that...
>>>
>>> Valdimir, for the ARM target I already had the ASHIFTRT and LSHIFTRT
>>> addition on my side, but still had an ICE during bootstrap with LRA
>>> when compiling fixed-bit.c (the Max number of generated reload insns
>>> we talk about already) is it working for you ?
>>>
>>>
>> I did not check thumb I guess.  If what you are asking about the problem you
>> sent me about 2 weeks ago, I guess one solution would be putting
>>
>>   if (lra_in_progress)
>>     return NO_REGS;
>>
>> at the beginning of arm/targhooks.c::default_secondary_reload.  LRA is smart
>> enough to figure out what to do from constraints in most cases of when
>> reload needs secondary_reload.  In arm case, default_secondary_reload only
>> confuses LRA.
>>
>> I had no time to test this change, but it solves LRA cycling for the test
>> case you sent me.
>> =
>>
>> aarch32-lra.patch
>>
>>
>> N\x18¬n‡r¥ªíÂ)emçhÂyhi×¢w^™©Ý
>>
>> aarch64-lra.patch
>>
>>
>> N\x18¬n‡r¥ªíÂ)emçhÂyhi×¢w^™©Ý


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-08-30 13:23 ` Richard Earnshaw
@ 2013-08-30 13:36   ` Yvan Roux
  0 siblings, 0 replies; 21+ messages in thread
From: Yvan Roux @ 2013-08-30 13:36 UTC (permalink / raw)
  To: Richard Earnshaw
  Cc: Vladimir Makarov, gcc-patches, rdsandiford, Marcus Shawcroft,
	Ramana Radhakrishnan, Matthew Gretton-Dann

Sorry for the previous off-the-list-html-format answer :(

On 30 August 2013 15:18, Richard Earnshaw <rearnsha@arm.com> wrote:
> On 30/08/13 14:09, Yvan Roux wrote:
>> Hi,
>>
>> here is a request for comments on the 2 attached patches which enable
>> the build of GCC on ARM with LRA.  The patches introduce a new
>> undocumented option -mlra to use LRA instead of reload, as what was
>> done on previous LRA support, which is here to ease the test and
>> comparison with reload and not supposed to be kept in 4.9.
>>
>> On AArch32:
>>
>> * The patch includes the previous changes of this thread, add the
>> handling of ASHIFTRT, LSHIFTRT, ROTATE and ROTATERT in get_index_scale
>> and let LRA handle the constraint in Thumb.
>> * The status is that the build complete in ARM mode with a couple of
>> regressions in the testsuite, that I'll investigate:
>>   - gcc.c-torture/execute/20060102-1.c
>>   - gcc.c-torture/execute/961026-1.c
>>   - gcc.target/arm/atomic-comp-swap-release-acquire.c
>> and some build failures in libstdc++ at the pass manager level (there
>> is an invalid read in gt_ggc_mx)
>> * The build of libraries in Thumb mode still doesn't complete, as
>> Vladimir said the secondary_reload  modification solves LRA cycling
>> but we still have some issues.
>>
>> On AArch64 :
>>
>> * I modified must_be_index_p to avoid the call of set_address_base
>> with patterns which contains a MULT.
>> * The build complete, but there is a couple of regression in the
>> testsuite I'm looking at on
>>  - gcc.c-torture/execute/ieee/fp-cmp-4l.c
>>  - c-c++-common/torture/complex-sign-mul-minus-one.c
>> for instance.
>>
>> Any comments ?
>>
>
> Why are you posting to conflicting sets of patches for rtlanal.c?
> Changes to that file will have to work for all architectures; so while
> it helps a little bit to see which changes are needed for which target,
> ultimately you need one patch for that file that works everywhere.

Yes sorry for that, I've 2 dev branches on my side for AArch32 and
AArch64, and I thought that posting one patch for each will help
people who wants to test one target in particular, but I planned to
make either one patch for switching ARM on LRA at the end or 2 patches
without conflict.

> Also, as a style nit, use
>
>   return (test
>           || test
>           || test);
>
> so that the parenthesis will force correct indentation of continuation
> lines.

Ok, will fix this.

Thanks Richard

>
> R.
>
>> Thanks
>> Yvan
>>
>>
>>
>>
>> On 6 July 2013 01:12, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>> On 13-07-05 8:43 AM, Yvan Roux wrote:
>>>>
>>>> Hi,
>>>>
>>>> for AArch64 it is also needed to take into account SIGN_EXTRACT in the
>>>> set_address_base and set_address_index routines, as we acan encounter
>>>> that kind of insn for instance :
>>>>
>>>> (insn 29 27 5 7 (set (mem:SI (plus:DI (sign_extract:DI (mult:DI
>>>> (subreg:DI (reg/v:SI 76 [ elt ]) 0)
>>>> ...
>>>
>>> OK.
>>>
>>>> with the attached patch and the LRA enabled, compiler now bootstrap
>>>> but I've few regressions in the testsuite,
>>>
>>> It seems ok to me but I am confused of the following change:
>>>
>>>  set_address_base (struct address_info *info, rtx *loc, rtx *inner)
>>>  {
>>> -  if (GET_CODE (*inner) == LO_SUM)
>>> +  if (GET_CODE (*inner) == SIGN_EXTRACT)
>>> +    inner = strip_address_mutations (&XEXP (*inner, 0));
>>> +
>>> +  if (GET_CODE (*inner) == LO_SUM || GET_CODE (*inner) == MULT)
>>>
>>>      inner = strip_address_mutations (&XEXP (*inner, 0));
>>>    gcc_checking_assert (REG_P (*inner)
>>>                 || MEM_P (*inner)
>>>
>>> base address should not contain MULT (which you added).  It is controlled by
>>> the result of must_be_index_p.  So set_address_base should not have code for
>>> MULT and you need to change must_be_index_p in a way that set_address_base
>>> is not called for MULT.
>>>
>>>
>>>> gcc.c/torture/execute/fp-cmp-4l.c for instance. I was looking at these
>>>> issues before submitting  a complete AArch64 LRA enabling patch, but
>>>> as you are speaking about that...
>>>>
>>>> Valdimir, for the ARM target I already had the ASHIFTRT and LSHIFTRT
>>>> addition on my side, but still had an ICE during bootstrap with LRA
>>>> when compiling fixed-bit.c (the Max number of generated reload insns
>>>> we talk about already) is it working for you ?
>>>>
>>>>
>>> I did not check thumb I guess.  If what you are asking about the problem you
>>> sent me about 2 weeks ago, I guess one solution would be putting
>>>
>>>   if (lra_in_progress)
>>>     return NO_REGS;
>>>
>>> at the beginning of arm/targhooks.c::default_secondary_reload.  LRA is smart
>>> enough to figure out what to do from constraints in most cases of when
>>> reload needs secondary_reload.  In arm case, default_secondary_reload only
>>> confuses LRA.
>>>
>>> I had no time to test this change, but it solves LRA cycling for the test
>>> case you sent me.
>>> =
>>>
>>> aarch32-lra.patch
>>>
>>>
>>> N ¬n‡r¥ªíÂ)emçhÂyhi× ¢w^™©Ý
>>>
>>> aarch64-lra.patch
>>>
>>>
>>> N ¬n‡r¥ªíÂ)emçhÂyhi× ¢w^™©Ý
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-08-30 13:17 RFC: patch to build GCC for arm with LRA Yvan Roux
  2013-08-30 13:23 ` Richard Earnshaw
@ 2013-09-08 16:12 ` Vladimir Makarov
  2013-09-08 19:41 ` Richard Sandiford
  2 siblings, 0 replies; 21+ messages in thread
From: Vladimir Makarov @ 2013-09-08 16:12 UTC (permalink / raw)
  To: Yvan Roux
  Cc: Richard Earnshaw, gcc-patches, rdsandiford, Marcus Shawcroft,
	Ramana Radhakrishnan, Matthew Gretton-Dann, Richard Henderson

On 13-08-30 9:09 AM, Yvan Roux wrote:
> Hi,
>
> here is a request for comments on the 2 attached patches which enable
> the build of GCC on ARM with LRA.  The patches introduce a new
> undocumented option -mlra to use LRA instead of reload, as what was
> done on previous LRA support, which is here to ease the test and
> comparison with reload and not supposed to be kept in 4.9.
>
> On AArch32:
>
> * The patch includes the previous changes of this thread, add the
> handling of ASHIFTRT, LSHIFTRT, ROTATE and ROTATERT in get_index_scale
> and let LRA handle the constraint in Thumb.
> * The status is that the build complete in ARM mode with a couple of
> regressions in the testsuite, that I'll investigate:
>    - gcc.c-torture/execute/20060102-1.c
>    - gcc.c-torture/execute/961026-1.c
>    - gcc.target/arm/atomic-comp-swap-release-acquire.c
> and some build failures in libstdc++ at the pass manager level (there
> is an invalid read in gt_ggc_mx)
> * The build of libraries in Thumb mode still doesn't complete, as
> Vladimir said the secondary_reload  modification solves LRA cycling
> but we still have some issues.
>
> On AArch64 :
>
> * I modified must_be_index_p to avoid the call of set_address_base
> with patterns which contains a MULT.
> * The build complete, but there is a couple of regression in the
> testsuite I'm looking at on
>   - gcc.c-torture/execute/ieee/fp-cmp-4l.c
>   - c-c++-common/torture/complex-sign-mul-minus-one.c
> for instance.
>
> Any comments ?
>
>
Yvan, sorry for the delay with the response (I was on a long vacation).  
It looks ok to me.  But I have no rights to approve rtlanal.c changes.

So I am CCing this email to Richard Henderson.

Richard, sorry for bothering you, could you give an approval to the 
rtanal.c changes.  Neither the author of the original code (Richard 
Sandiford), nor me has a right to approve rtlanal.c changes.  These 
changes actually were around and discussed for last 2-3 months.

Thanks.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-08-30 13:17 RFC: patch to build GCC for arm with LRA Yvan Roux
  2013-08-30 13:23 ` Richard Earnshaw
  2013-09-08 16:12 ` Vladimir Makarov
@ 2013-09-08 19:41 ` Richard Sandiford
  2013-09-09  0:29   ` Vladimir Makarov
  2 siblings, 1 reply; 21+ messages in thread
From: Richard Sandiford @ 2013-09-08 19:41 UTC (permalink / raw)
  To: Yvan Roux
  Cc: Vladimir Makarov, Richard Earnshaw, gcc-patches,
	Marcus Shawcroft, Ramana Radhakrishnan, Matthew Gretton-Dann,
	rth

Yvan Roux <yvan.roux@linaro.org> writes:
> @@ -5786,7 +5796,11 @@ get_index_scale (const struct address_info *info)
>        && info->index_term == &XEXP (index, 0))
>      return INTVAL (XEXP (index, 1));
>  
> -  if (GET_CODE (index) == ASHIFT
> +  if ((GET_CODE (index) == ASHIFT
> +       || GET_CODE (index) == ASHIFTRT
> +       || GET_CODE (index) == LSHIFTRT
> +       || GET_CODE (index) == ROTATE
> +       || GET_CODE (index) == ROTATERT)
>        && CONST_INT_P (XEXP (index, 1))
>        && info->index_term == &XEXP (index, 0))
>      return (HOST_WIDE_INT) 1 << INTVAL (XEXP (index, 1));

This bit doesn't look right.  The scale is only 1 << N for (ashift ... N).
I think we have to stick to returning zero for the other codes.

The other two (snipped) rtlanal.c hunks like fine to me FWIW.  Maybe now
would be a good time to add an "is this a shift code?" predicate though,
if we don't have one already.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-09-08 19:41 ` Richard Sandiford
@ 2013-09-09  0:29   ` Vladimir Makarov
  2013-09-09  7:42     ` Yvan Roux
  0 siblings, 1 reply; 21+ messages in thread
From: Vladimir Makarov @ 2013-09-09  0:29 UTC (permalink / raw)
  To: Yvan Roux, Richard Earnshaw, gcc-patches, Marcus Shawcroft,
	Ramana Radhakrishnan, Matthew Gretton-Dann, rth, rdsandiford

On 13-09-08 2:04 PM, Richard Sandiford wrote:
> Yvan Roux <yvan.roux@linaro.org> writes:
>> @@ -5786,7 +5796,11 @@ get_index_scale (const struct address_info *info)
>>         && info->index_term == &XEXP (index, 0))
>>       return INTVAL (XEXP (index, 1));
>>   
>> -  if (GET_CODE (index) == ASHIFT
>> +  if ((GET_CODE (index) == ASHIFT
>> +       || GET_CODE (index) == ASHIFTRT
>> +       || GET_CODE (index) == LSHIFTRT
>> +       || GET_CODE (index) == ROTATE
>> +       || GET_CODE (index) == ROTATERT)
>>         && CONST_INT_P (XEXP (index, 1))
>>         && info->index_term == &XEXP (index, 0))
>>       return (HOST_WIDE_INT) 1 << INTVAL (XEXP (index, 1));
> This bit doesn't look right.  The scale is only 1 << N for (ashift ... N).
> I think we have to stick to returning zero for the other codes.
Thanks, Richard.  I missed this.

I am agree it might be a problem although the probability of it is 
negligible as index reg hardly can be equivalent to something + constant 
displacement (except may be case reg+reg when each reg can be base or 
index but in this case we don't use shifts).  Still returning zero is a 
safe solution as it prevents such equivalence substitution completely.

Yvan, could modify this part of code and resend it to Richard Henderson 
for approval.
> The other two (snipped) rtlanal.c hunks like fine to me FWIW.  Maybe now
> would be a good time to add an "is this a shift code?" predicate though,
> if we don't have one already.
>
Yes, I think we could somehow to promote this info to LRA but I don't 
think it is necessary or urgent.  There is already an alternative 
solution -- just doing nothing as prohibiting equivalence substitution 
for complicated index part hardly worsens the generated code.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-09-09  0:29   ` Vladimir Makarov
@ 2013-09-09  7:42     ` Yvan Roux
  2013-09-09  9:00       ` Richard Sandiford
  0 siblings, 1 reply; 21+ messages in thread
From: Yvan Roux @ 2013-09-09  7:42 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: Richard Earnshaw, gcc-patches, Marcus Shawcroft,
	Ramana Radhakrishnan, Matthew Gretton-Dann, rth, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 2146 bytes --]

Hi,

Thanks for the review.  Here is the fixed self-contained patch to
enable LRA on AAarch32 and AArch64 (for those who want to give a try).
 I'm still working on the issues described previously and will send
rtlanal.c patch separately to prepare the move.

Thanks,
Yvan


On 9 September 2013 01:23, Vladimir Makarov <vmakarov@redhat.com> wrote:
> On 13-09-08 2:04 PM, Richard Sandiford wrote:
>>
>> Yvan Roux <yvan.roux@linaro.org> writes:
>>>
>>> @@ -5786,7 +5796,11 @@ get_index_scale (const struct address_info *info)
>>>         && info->index_term == &XEXP (index, 0))
>>>       return INTVAL (XEXP (index, 1));
>>>   -  if (GET_CODE (index) == ASHIFT
>>> +  if ((GET_CODE (index) == ASHIFT
>>> +       || GET_CODE (index) == ASHIFTRT
>>> +       || GET_CODE (index) == LSHIFTRT
>>> +       || GET_CODE (index) == ROTATE
>>> +       || GET_CODE (index) == ROTATERT)
>>>         && CONST_INT_P (XEXP (index, 1))
>>>         && info->index_term == &XEXP (index, 0))
>>>       return (HOST_WIDE_INT) 1 << INTVAL (XEXP (index, 1));
>>
>> This bit doesn't look right.  The scale is only 1 << N for (ashift ... N).
>> I think we have to stick to returning zero for the other codes.
>
> Thanks, Richard.  I missed this.
>
> I am agree it might be a problem although the probability of it is
> negligible as index reg hardly can be equivalent to something + constant
> displacement (except may be case reg+reg when each reg can be base or index
> but in this case we don't use shifts).  Still returning zero is a safe
> solution as it prevents such equivalence substitution completely.
>
> Yvan, could modify this part of code and resend it to Richard Henderson for
> approval.
>
>> The other two (snipped) rtlanal.c hunks like fine to me FWIW.  Maybe now
>> would be a good time to add an "is this a shift code?" predicate though,
>> if we don't have one already.
>>
> Yes, I think we could somehow to promote this info to LRA but I don't think
> it is necessary or urgent.  There is already an alternative solution -- just
> doing nothing as prohibiting equivalence substitution for complicated index
> part hardly worsens the generated code.
>

[-- Attachment #2: arm-lra.patch --]
[-- Type: application/octet-stream, Size: 5866 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index aed035a..f42cb75 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -109,6 +109,7 @@ enum aarch64_code_model aarch64_cmodel;
 #define TARGET_HAVE_TLS 1
 #endif
 
+static bool aarch64_lra_p (void);
 static bool aarch64_composite_type_p (const_tree, enum machine_mode);
 static bool aarch64_vfp_is_call_or_return_candidate (enum machine_mode,
 						     const_tree,
@@ -6083,6 +6084,13 @@ aapcs_vfp_sub_candidate (const_tree type, enum machine_mode *modep)
   return -1;
 }
 
+/* Return true if we use LRA instead of reload pass.  */
+static bool
+aarch64_lra_p (void)
+{
+  return aarch64_lra_flag;
+}
+
 /* Return TRUE if the type, as described by TYPE and MODE, is a composite
    type as described in AAPCS64 \S 4.3.  This includes aggregate, union and
    array types.  The C99 floating-point complex types are also considered
@@ -8208,6 +8216,9 @@ aarch64_vectorize_vec_perm_const_ok (enum machine_mode vmode,
 #undef TARGET_LIBGCC_CMP_RETURN_MODE
 #define TARGET_LIBGCC_CMP_RETURN_MODE aarch64_libgcc_cmp_return_mode
 
+#undef TARGET_LRA_P
+#define TARGET_LRA_P aarch64_lra_p
+
 #undef TARGET_MANGLE_TYPE
 #define TARGET_MANGLE_TYPE aarch64_mangle_type
 
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 8ff6ca1..0c31312 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -103,6 +103,10 @@ mabi=
 Target RejectNegative Joined Enum(aarch64_abi) Var(aarch64_abi) Init(AARCH64_ABI_DEFAULT)
 -mabi=ABI	Generate code that conforms to the specified ABI
 
+mlra
+Target Report Var(aarch64_lra_flag) Init(1) Save
+Use LRA instead of reload
+
 Enum
 Name(aarch64_abi) Type(int)
 Known AArch64 ABIs (for use with the -mabi= option):
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f731bb6..182b009 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -68,6 +68,7 @@ struct four_ints
 };
 
 /* Forward function declarations.  */
+static bool arm_lra_p (void);
 static bool arm_needs_doubleword_align (enum machine_mode, const_tree);
 static int arm_compute_static_chain_stack_bytes (void);
 static arm_stack_offsets *arm_get_frame_offsets (void);
@@ -338,6 +339,9 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_LEGITIMIZE_ADDRESS
 #define TARGET_LEGITIMIZE_ADDRESS arm_legitimize_address
 
+#undef TARGET_LRA_P
+#define TARGET_LRA_P arm_lra_p
+
 #undef  TARGET_ATTRIBUTE_TABLE
 #define TARGET_ATTRIBUTE_TABLE arm_attribute_table
 
@@ -4971,6 +4975,12 @@ arm_init_cumulative_args (CUMULATIVE_ARGS *pcum, tree fntype,
     }
 }
 
+/* Return true if we use LRA instead of reload pass.  */
+static bool
+arm_lra_p (void)
+{
+  return arm_lra_flag;
+}
 
 /* Return true if mode/type need doubleword alignment.  */
 static bool
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 1781b75..05a271e 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1266,11 +1266,12 @@ enum reg_class
 
 /* Must leave BASE_REGS reloads alone */
 #define THUMB_SECONDARY_INPUT_RELOAD_CLASS(CLASS, MODE, X)		\
+  (lra_in_progress ? NO_REGS : 						\
   ((CLASS) != LO_REGS && (CLASS) != BASE_REGS				\
    ? ((true_regnum (X) == -1 ? LO_REGS					\
        : (true_regnum (X) + HARD_REGNO_NREGS (0, MODE) > 8) ? LO_REGS	\
        : NO_REGS)) 							\
-   : NO_REGS)
+   : NO_REGS))
 
 #define THUMB_SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, X)		\
   ((CLASS) != LO_REGS && (CLASS) != BASE_REGS				\
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index b9ae2b0..7c9ea36 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -109,6 +109,10 @@ mfloat-abi=
 Target RejectNegative Joined Enum(float_abi_type) Var(arm_float_abi) Init(TARGET_DEFAULT_FLOAT_ABI)
 Specify if floating point hardware should be used
 
+mlra
+Target Report Var(arm_lra_flag) Init(1) Save
+Use LRA instead of reload
+
 Enum
 Name(float_abi_type) Type(enum float_abi_type)
 Known floating-point ABIs (for use with the -mfloat-abi= option):
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 95a314f..04832cf 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -5483,7 +5483,16 @@ must_be_base_p (rtx x)
 static bool
 must_be_index_p (rtx x)
 {
-  return GET_CODE (x) == MULT || GET_CODE (x) == ASHIFT;
+  rtx addr = x;
+  if (GET_CODE (addr) == SIGN_EXTRACT)
+    addr = *strip_address_mutations (&XEXP (addr, 0));
+
+  return (GET_CODE (x) == MULT
+          || GET_CODE (x) == ASHIFT
+          || GET_CODE (x) == ASHIFTRT
+          || GET_CODE (x) == LSHIFTRT
+          || GET_CODE (x) == ROTATE
+          || GET_CODE (x) == ROTATERT);
 }
 
 /* Set the segment part of address INFO to LOC, given that INNER is the
@@ -5505,6 +5514,9 @@ set_address_segment (struct address_info *info, rtx *loc, rtx *inner)
 static void
 set_address_base (struct address_info *info, rtx *loc, rtx *inner)
 {
+  if (GET_CODE (*inner) == SIGN_EXTRACT)
+    inner = strip_address_mutations (&XEXP (*inner, 0));
+
   if (GET_CODE (*inner) == LO_SUM)
     inner = strip_address_mutations (&XEXP (*inner, 0));
   gcc_checking_assert (REG_P (*inner)
@@ -5522,7 +5534,15 @@ set_address_base (struct address_info *info, rtx *loc, rtx *inner)
 static void
 set_address_index (struct address_info *info, rtx *loc, rtx *inner)
 {
-  if ((GET_CODE (*inner) == MULT || GET_CODE (*inner) == ASHIFT)
+  if (GET_CODE (*inner) == SIGN_EXTRACT)
+    inner = strip_address_mutations (&XEXP (*inner, 0));
+
+  if ((GET_CODE (*inner) == MULT
+       || GET_CODE (*inner) == ASHIFT
+       || GET_CODE (*inner) == ASHIFTRT
+       || GET_CODE (*inner) == LSHIFTRT
+       || GET_CODE (*inner) == ROTATE
+       || GET_CODE (*inner) == ROTATERT)
       && CONSTANT_P (XEXP (*inner, 1)))
     inner = strip_address_mutations (&XEXP (*inner, 0));
   gcc_checking_assert (REG_P (*inner)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-09-09  9:00       ` Richard Sandiford
@ 2013-09-09  9:00         ` Richard Sandiford
  2013-09-09  9:46           ` Yvan Roux
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Sandiford @ 2013-09-09  9:00 UTC (permalink / raw)
  To: Yvan Roux
  Cc: Vladimir Makarov, Richard Earnshaw, gcc-patches,
	Marcus Shawcroft, Ramana Radhakrishnan, Matthew Gretton-Dann,
	rth

Richard Sandiford <rdsandiford@googlemail.com> writes:
> I think SIGN_EXTRACT from the lsb (i.e. when the third operand is 0 for
> !BITS_BIG_ENDIAN or GET_MODE_PRECISION (mode) - 1 for BITS_BIG_ENDIAN)

Gah, "GET_MODE_PRECISION (mode) - size" for BITS_BIG_ENDIAN.

Richard

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-09-09  7:42     ` Yvan Roux
@ 2013-09-09  9:00       ` Richard Sandiford
  2013-09-09  9:00         ` Richard Sandiford
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Sandiford @ 2013-09-09  9:00 UTC (permalink / raw)
  To: Yvan Roux
  Cc: Vladimir Makarov, Richard Earnshaw, gcc-patches,
	Marcus Shawcroft, Ramana Radhakrishnan, Matthew Gretton-Dann,
	rth

Yvan Roux <yvan.roux@linaro.org> writes:
> Thanks for the review.  Here is the fixed self-contained patch to
> enable LRA on AAarch32 and AArch64 (for those who want to give a try).
>  I'm still working on the issues described previously and will send
> rtlanal.c patch separately to prepare the move.

Looks like the rtlanal.c patch contains some extra changes from last time.
Just FWIW:

> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> index 95a314f..04832cf 100644
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -5483,7 +5483,16 @@ must_be_base_p (rtx x)
>  static bool
>  must_be_index_p (rtx x)
>  {
> -  return GET_CODE (x) == MULT || GET_CODE (x) == ASHIFT;
> +  rtx addr = x;
> +  if (GET_CODE (addr) == SIGN_EXTRACT)
> +    addr = *strip_address_mutations (&XEXP (addr, 0));
> +
> +  return (GET_CODE (x) == MULT
> +          || GET_CODE (x) == ASHIFT
> +          || GET_CODE (x) == ASHIFTRT
> +          || GET_CODE (x) == LSHIFTRT
> +          || GET_CODE (x) == ROTATE
> +          || GET_CODE (x) == ROTATERT);

This doesn't look right, since you don't use the stripped address (addr)
anywhere.

I think SIGN_EXTRACT from the lsb (i.e. when the third operand is 0 for
!BITS_BIG_ENDIAN or GET_MODE_PRECISION (mode) - 1 for BITS_BIG_ENDIAN)
should be treated as an address mutation, since it's really a combination
of a truncation and extension.  Maybe the comment could be changed from:

   "Mutations" either convert between modes or apply some kind of
   alignment.  */

to:

   "Mutations" either convert between modes or apply some kind of
   extension, truncation or alignment.  */

Then must_be_index_p should just be:

  return (GET_CODE (x) == MULT
          || GET_CODE (x) == ASHIFT
          || GET_CODE (x) == ASHIFTRT
          || GET_CODE (x) == LSHIFTRT
          || GET_CODE (x) == ROTATE
          || GET_CODE (x) == ROTATERT);

as in your earlier patch, since the function is only passed stripped rtxes.
(Although like I say I'd personally prefer something like:

  return GET_CODE (x) == MULT || shift_code_p (GET_CODE (x));

where shift_code_p is a new predicate.)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-09-09  9:00         ` Richard Sandiford
@ 2013-09-09  9:46           ` Yvan Roux
  2013-09-09 19:32             ` Richard Sandiford
  0 siblings, 1 reply; 21+ messages in thread
From: Yvan Roux @ 2013-09-09  9:46 UTC (permalink / raw)
  To: Yvan Roux, Vladimir Makarov, Richard Earnshaw, gcc-patches,
	Marcus Shawcroft, Ramana Radhakrishnan, Matthew Gretton-Dann,
	Richard Henderson, rdsandiford

Thanks for noticing it Richard, I made a refactoring mistake and addr
was supposed to be used instead of x.  In fact on AArch64 it occurs
that we don't have stripped rtxes at this step and we have some of the
form below, this if why I added the strip.

(insn 29 27 5 7 (set (mem:SI (plus:DI (sign_extract:DI (mult:DI
(subreg:DI (reg/v:SI 76 [ elt ]) 0)

I'll redo the patch...

Thanks
Yvan

On 9 September 2013 10:58, Richard Sandiford <rdsandiford@googlemail.com> wrote:
> Richard Sandiford <rdsandiford@googlemail.com> writes:
>> I think SIGN_EXTRACT from the lsb (i.e. when the third operand is 0 for
>> !BITS_BIG_ENDIAN or GET_MODE_PRECISION (mode) - 1 for BITS_BIG_ENDIAN)
>
> Gah, "GET_MODE_PRECISION (mode) - size" for BITS_BIG_ENDIAN.
>
> Richard

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-09-09  9:46           ` Yvan Roux
@ 2013-09-09 19:32             ` Richard Sandiford
  2013-09-10 17:42               ` Yvan Roux
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Sandiford @ 2013-09-09 19:32 UTC (permalink / raw)
  To: Yvan Roux
  Cc: Vladimir Makarov, Richard Earnshaw, gcc-patches,
	Marcus Shawcroft, Ramana Radhakrishnan, Matthew Gretton-Dann,
	Richard Henderson

Yvan Roux <yvan.roux@linaro.org> writes:
> Thanks for noticing it Richard, I made a refactoring mistake and addr
> was supposed to be used instead of x.  In fact on AArch64 it occurs
> that we don't have stripped rtxes at this step and we have some of the
> form below, this if why I added the strip.
>
> (insn 29 27 5 7 (set (mem:SI (plus:DI (sign_extract:DI (mult:DI
> (subreg:DI (reg/v:SI 76 [ elt ]) 0)

Yeah, but that's because strip_address_mutations doesn't consider
SIGN_EXTRACT to be a "mutation" as things stand.  My point was that
I think it should, at least for the special extract-from-lsb case.
It then shouldn't be necessary to handle SIGN_EXTRACT in the other
address-analysis routines.

(That might be what you meant, sorry, just thought I'd say in case.)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-09-09 19:32             ` Richard Sandiford
@ 2013-09-10 17:42               ` Yvan Roux
  2013-09-10 19:32                 ` Richard Sandiford
  0 siblings, 1 reply; 21+ messages in thread
From: Yvan Roux @ 2013-09-10 17:42 UTC (permalink / raw)
  To: Yvan Roux, Vladimir Makarov, Richard Earnshaw, gcc-patches,
	Marcus Shawcroft, Ramana Radhakrishnan, Matthew Gretton-Dann,
	Richard Henderson, rdsandiford

> Yeah, but that's because strip_address_mutations doesn't consider
> SIGN_EXTRACT to be a "mutation" as things stand.  My point was that
> I think it should, at least for the special extract-from-lsb case.
> It then shouldn't be necessary to handle SIGN_EXTRACT in the other
> address-analysis routines.
>
> (That might be what you meant, sorry, just thought I'd say in case.)

You did well.  I wanted to handle it in strip_address_mutation, but
misread the code and thought that it wasn't called all the time, but
in any case I didn't thought to the endianness issue. I've added
ZERO_EXTRACT too in this treatment, but wonder if for the big endian
case the third operand has to be taken into account, like this:

GET_MODE_PRECISION(mode) - size - pos

Thanks,
Yvan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-09-10 17:42               ` Yvan Roux
@ 2013-09-10 19:32                 ` Richard Sandiford
  2013-09-11  7:22                   ` Yvan Roux
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Sandiford @ 2013-09-10 19:32 UTC (permalink / raw)
  To: Yvan Roux
  Cc: Vladimir Makarov, Richard Earnshaw, gcc-patches,
	Marcus Shawcroft, Ramana Radhakrishnan, Matthew Gretton-Dann,
	Richard Henderson

Yvan Roux <yvan.roux@linaro.org> writes:
>> Yeah, but that's because strip_address_mutations doesn't consider
>> SIGN_EXTRACT to be a "mutation" as things stand.  My point was that
>> I think it should, at least for the special extract-from-lsb case.
>> It then shouldn't be necessary to handle SIGN_EXTRACT in the other
>> address-analysis routines.
>>
>> (That might be what you meant, sorry, just thought I'd say in case.)
>
> You did well.  I wanted to handle it in strip_address_mutation, but
> misread the code and thought that it wasn't called all the time, but
> in any case I didn't thought to the endianness issue. I've added
> ZERO_EXTRACT too in this treatment, but wonder if for the big endian
> case the third operand has to be taken into account, like this:
>
> GET_MODE_PRECISION(mode) - size - pos

Endianness in the BYTES_BIG_ENDIAN sense shouldn't be a problem AFAIK.
We just need to worry about BITS_BIG_ENDIAN.  For:

  ({sign,zero}_extract:m X len pos)

"pos" counts from the lsb if !BITS_BIG_ENDIAN and from the msb if
BITS_BIG_ENDIAN.  So I think the condition should be something like:

  pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (m) - len : 0)

Agreed that it makes sense to handle ZERO_EXTRACT in the same way.

Thanks,
counts 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-09-10 19:32                 ` Richard Sandiford
@ 2013-09-11  7:22                   ` Yvan Roux
  2013-09-11  7:34                     ` Richard Sandiford
  0 siblings, 1 reply; 21+ messages in thread
From: Yvan Roux @ 2013-09-11  7:22 UTC (permalink / raw)
  To: Yvan Roux, Vladimir Makarov, Richard Earnshaw, gcc-patches,
	Marcus Shawcroft, Ramana Radhakrishnan, Matthew Gretton-Dann,
	Richard Henderson, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 590 bytes --]

> Endianness in the BYTES_BIG_ENDIAN sense shouldn't be a problem AFAIK.
> We just need to worry about BITS_BIG_ENDIAN.  For:
>
>   ({sign,zero}_extract:m X len pos)
>
> "pos" counts from the lsb if !BITS_BIG_ENDIAN and from the msb if
> BITS_BIG_ENDIAN.  So I think the condition should be something like:
>
>   pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (m) - len : 0)
>
> Agreed that it makes sense to handle ZERO_EXTRACT in the same way.

Agreed, here is the new patch which includes the new shifting code
predicate.  I'll post the rtl analysis part in another request.

Thanks,
Yvan

[-- Attachment #2: arm-lra.patch --]
[-- Type: application/octet-stream, Size: 6411 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index aed035a..f42cb75 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -109,6 +109,7 @@ enum aarch64_code_model aarch64_cmodel;
 #define TARGET_HAVE_TLS 1
 #endif
 
+static bool aarch64_lra_p (void);
 static bool aarch64_composite_type_p (const_tree, enum machine_mode);
 static bool aarch64_vfp_is_call_or_return_candidate (enum machine_mode,
 						     const_tree,
@@ -6083,6 +6084,13 @@ aapcs_vfp_sub_candidate (const_tree type, enum machine_mode *modep)
   return -1;
 }
 
+/* Return true if we use LRA instead of reload pass.  */
+static bool
+aarch64_lra_p (void)
+{
+  return aarch64_lra_flag;
+}
+
 /* Return TRUE if the type, as described by TYPE and MODE, is a composite
    type as described in AAPCS64 \S 4.3.  This includes aggregate, union and
    array types.  The C99 floating-point complex types are also considered
@@ -8208,6 +8216,9 @@ aarch64_vectorize_vec_perm_const_ok (enum machine_mode vmode,
 #undef TARGET_LIBGCC_CMP_RETURN_MODE
 #define TARGET_LIBGCC_CMP_RETURN_MODE aarch64_libgcc_cmp_return_mode
 
+#undef TARGET_LRA_P
+#define TARGET_LRA_P aarch64_lra_p
+
 #undef TARGET_MANGLE_TYPE
 #define TARGET_MANGLE_TYPE aarch64_mangle_type
 
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 8ff6ca1..0c31312 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -103,6 +103,10 @@ mabi=
 Target RejectNegative Joined Enum(aarch64_abi) Var(aarch64_abi) Init(AARCH64_ABI_DEFAULT)
 -mabi=ABI	Generate code that conforms to the specified ABI
 
+mlra
+Target Report Var(aarch64_lra_flag) Init(1) Save
+Use LRA instead of reload
+
 Enum
 Name(aarch64_abi) Type(int)
 Known AArch64 ABIs (for use with the -mabi= option):
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f731bb6..182b009 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -68,6 +68,7 @@ struct four_ints
 };
 
 /* Forward function declarations.  */
+static bool arm_lra_p (void);
 static bool arm_needs_doubleword_align (enum machine_mode, const_tree);
 static int arm_compute_static_chain_stack_bytes (void);
 static arm_stack_offsets *arm_get_frame_offsets (void);
@@ -338,6 +339,9 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_LEGITIMIZE_ADDRESS
 #define TARGET_LEGITIMIZE_ADDRESS arm_legitimize_address
 
+#undef TARGET_LRA_P
+#define TARGET_LRA_P arm_lra_p
+
 #undef  TARGET_ATTRIBUTE_TABLE
 #define TARGET_ATTRIBUTE_TABLE arm_attribute_table
 
@@ -4971,6 +4975,12 @@ arm_init_cumulative_args (CUMULATIVE_ARGS *pcum, tree fntype,
     }
 }
 
+/* Return true if we use LRA instead of reload pass.  */
+static bool
+arm_lra_p (void)
+{
+  return arm_lra_flag;
+}
 
 /* Return true if mode/type need doubleword alignment.  */
 static bool
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 1781b75..05a271e 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1266,11 +1266,12 @@ enum reg_class
 
 /* Must leave BASE_REGS reloads alone */
 #define THUMB_SECONDARY_INPUT_RELOAD_CLASS(CLASS, MODE, X)		\
+  (lra_in_progress ? NO_REGS : 						\
   ((CLASS) != LO_REGS && (CLASS) != BASE_REGS				\
    ? ((true_regnum (X) == -1 ? LO_REGS					\
        : (true_regnum (X) + HARD_REGNO_NREGS (0, MODE) > 8) ? LO_REGS	\
        : NO_REGS)) 							\
-   : NO_REGS)
+   : NO_REGS))
 
 #define THUMB_SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, X)		\
   ((CLASS) != LO_REGS && (CLASS) != BASE_REGS				\
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index b9ae2b0..7c9ea36 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -109,6 +109,10 @@ mfloat-abi=
 Target RejectNegative Joined Enum(float_abi_type) Var(arm_float_abi) Init(TARGET_DEFAULT_FLOAT_ABI)
 Specify if floating point hardware should be used
 
+mlra
+Target Report Var(arm_lra_flag) Init(1) Save
+Use LRA instead of reload
+
 Enum
 Name(float_abi_type) Type(enum float_abi_type)
 Known floating-point ABIs (for use with the -mfloat-abi= option):
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 95a314f..0032cb1 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -5442,7 +5442,7 @@ split_double (rtx value, rtx *first, rtx *second)
    stripped expression there.
 
    "Mutations" either convert between modes or apply some kind of
-   alignment.  */
+   extension, truncation or alignment.  */
 
 rtx *
 strip_address_mutations (rtx *loc, enum rtx_code *outer_code)
@@ -5454,6 +5454,16 @@ strip_address_mutations (rtx *loc, enum rtx_code *outer_code)
 	/* Things like SIGN_EXTEND, ZERO_EXTEND and TRUNCATE can be
 	   used to convert between pointer sizes.  */
 	loc = &XEXP (*loc, 0);
+      else if (GET_RTX_CLASS (code) == RTX_BITFIELD_OPS)
+	{
+	  /* Bitfield operations [SIGN|ZERO]_EXTRACT can be used too.  */
+	  enum machine_mode mode = GET_MODE(*loc);
+	  unsigned HOST_WIDE_INT len = INTVAL (XEXP (*loc, 1));
+	  HOST_WIDE_INT pos = INTVAL (XEXP (*loc, 2));
+
+	  if (pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (mode) - len : 0))
+	    loc = &XEXP (*loc, 0);
+	}
       else if (code == AND && CONST_INT_P (XEXP (*loc, 1)))
 	/* (and ... (const_int -X)) is used to align to X bytes.  */
 	loc = &XEXP (*loc, 0);
@@ -5470,6 +5480,18 @@ strip_address_mutations (rtx *loc, enum rtx_code *outer_code)
     }
 }
 
+/* Return true if X is a shifting operation.  */
+
+static bool
+shift_code_p (rtx x)
+{
+  return (GET_CODE (x) == ASHIFT
+          || GET_CODE (x) == ASHIFTRT
+          || GET_CODE (x) == LSHIFTRT
+          || GET_CODE (x) == ROTATE
+          || GET_CODE (x) == ROTATERT);
+}
+
 /* Return true if X must be a base rather than an index.  */
 
 static bool
@@ -5483,7 +5505,7 @@ must_be_base_p (rtx x)
 static bool
 must_be_index_p (rtx x)
 {
-  return GET_CODE (x) == MULT || GET_CODE (x) == ASHIFT;
+  return GET_CODE (x) == MULT || shift_code_p (x);
 }
 
 /* Set the segment part of address INFO to LOC, given that INNER is the
@@ -5522,7 +5544,7 @@ set_address_base (struct address_info *info, rtx *loc, rtx *inner)
 static void
 set_address_index (struct address_info *info, rtx *loc, rtx *inner)
 {
-  if ((GET_CODE (*inner) == MULT || GET_CODE (*inner) == ASHIFT)
+  if ((GET_CODE (*inner) == MULT || shift_code_p (*inner))
       && CONSTANT_P (XEXP (*inner, 1)))
     inner = strip_address_mutations (&XEXP (*inner, 0));
   gcc_checking_assert (REG_P (*inner)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-09-11  7:22                   ` Yvan Roux
@ 2013-09-11  7:34                     ` Richard Sandiford
  2013-09-11 12:02                       ` Yvan Roux
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Sandiford @ 2013-09-11  7:34 UTC (permalink / raw)
  To: Yvan Roux
  Cc: Vladimir Makarov, Richard Earnshaw, gcc-patches,
	Marcus Shawcroft, Ramana Radhakrishnan, Matthew Gretton-Dann,
	Richard Henderson

Yvan Roux <yvan.roux@linaro.org> writes:
> @@ -5454,6 +5454,16 @@ strip_address_mutations (rtx *loc, enum rtx_code *outer_code)
>  	/* Things like SIGN_EXTEND, ZERO_EXTEND and TRUNCATE can be
>  	   used to convert between pointer sizes.  */
>  	loc = &XEXP (*loc, 0);
> +      else if (GET_RTX_CLASS (code) == RTX_BITFIELD_OPS)
> +	{
> +	  /* Bitfield operations [SIGN|ZERO]_EXTRACT can be used too.  */
> +	  enum machine_mode mode = GET_MODE(*loc);
> +	  unsigned HOST_WIDE_INT len = INTVAL (XEXP (*loc, 1));
> +	  HOST_WIDE_INT pos = INTVAL (XEXP (*loc, 2));
> +
> +	  if (pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (mode) - len : 0))
> +	    loc = &XEXP (*loc, 0);
> +	}

This means that the other values of "pos" bypass the:

      else
	return loc;

so you'll get an infinite loop.  I think it would be neater to split
this out into:

/* Return true if X is a sign_extract or zero_extract from the least
   significant bit.  */

static bool
lsb_bitfield_op_p (rtx X)
{
  ...;
}

    else if (lsb_bitfield_op_p (*loc))
      loc = &XEXP (*loc, 0);

Looks good to me otherwise FWIW.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-09-11  7:34                     ` Richard Sandiford
@ 2013-09-11 12:02                       ` Yvan Roux
  2013-09-11 18:20                         ` Richard Sandiford
  0 siblings, 1 reply; 21+ messages in thread
From: Yvan Roux @ 2013-09-11 12:02 UTC (permalink / raw)
  To: Yvan Roux, Vladimir Makarov, Richard Earnshaw, gcc-patches,
	Marcus Shawcroft, Ramana Radhakrishnan, Matthew Gretton-Dann,
	Richard Henderson, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 2023 bytes --]

Yes indeed ! here is a fixed patch.

In strip_address_mutations we now have 3 if/else if statements with
the same body which could be factorized in:

      if ((GET_RTX_CLASS (code) == RTX_UNARY)
          /* Things like SIGN_EXTEND, ZERO_EXTEND and TRUNCATE can be
             used to convert between pointer sizes.  */
          || (lsb_bitfield_op_p (*loc))
          /* Bitfield operations [SIGN|ZERO]_EXTRACT from the least significant
             bit can be used too.  */
          || (code == AND && CONST_INT_P (XEXP (*loc, 1))))
          /* (and ... (const_int -X)) is used to align to X bytes.  */
        loc = &XEXP (*loc, 0);

if you think that it doesn't affect too much the readability.

Many Thanks,
Yvan

On 11 September 2013 09:32, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
> Yvan Roux <yvan.roux@linaro.org> writes:
>> @@ -5454,6 +5454,16 @@ strip_address_mutations (rtx *loc, enum rtx_code *outer_code)
>>       /* Things like SIGN_EXTEND, ZERO_EXTEND and TRUNCATE can be
>>          used to convert between pointer sizes.  */
>>       loc = &XEXP (*loc, 0);
>> +      else if (GET_RTX_CLASS (code) == RTX_BITFIELD_OPS)
>> +     {
>> +       /* Bitfield operations [SIGN|ZERO]_EXTRACT can be used too.  */
>> +       enum machine_mode mode = GET_MODE(*loc);
>> +       unsigned HOST_WIDE_INT len = INTVAL (XEXP (*loc, 1));
>> +       HOST_WIDE_INT pos = INTVAL (XEXP (*loc, 2));
>> +
>> +       if (pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (mode) - len : 0))
>> +         loc = &XEXP (*loc, 0);
>> +     }
>
> This means that the other values of "pos" bypass the:
>
>       else
>         return loc;
>
> so you'll get an infinite loop.  I think it would be neater to split
> this out into:
>
> /* Return true if X is a sign_extract or zero_extract from the least
>    significant bit.  */
>
> static bool
> lsb_bitfield_op_p (rtx X)
> {
>   ...;
> }
>
>     else if (lsb_bitfield_op_p (*loc))
>       loc = &XEXP (*loc, 0);
>
> Looks good to me otherwise FWIW.
>
> Thanks,
> Richard

[-- Attachment #2: arm-lra.patch --]
[-- Type: application/octet-stream, Size: 6817 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index aed035a..f42cb75 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -109,6 +109,7 @@ enum aarch64_code_model aarch64_cmodel;
 #define TARGET_HAVE_TLS 1
 #endif
 
+static bool aarch64_lra_p (void);
 static bool aarch64_composite_type_p (const_tree, enum machine_mode);
 static bool aarch64_vfp_is_call_or_return_candidate (enum machine_mode,
 						     const_tree,
@@ -6083,6 +6084,13 @@ aapcs_vfp_sub_candidate (const_tree type, enum machine_mode *modep)
   return -1;
 }
 
+/* Return true if we use LRA instead of reload pass.  */
+static bool
+aarch64_lra_p (void)
+{
+  return aarch64_lra_flag;
+}
+
 /* Return TRUE if the type, as described by TYPE and MODE, is a composite
    type as described in AAPCS64 \S 4.3.  This includes aggregate, union and
    array types.  The C99 floating-point complex types are also considered
@@ -8208,6 +8216,9 @@ aarch64_vectorize_vec_perm_const_ok (enum machine_mode vmode,
 #undef TARGET_LIBGCC_CMP_RETURN_MODE
 #define TARGET_LIBGCC_CMP_RETURN_MODE aarch64_libgcc_cmp_return_mode
 
+#undef TARGET_LRA_P
+#define TARGET_LRA_P aarch64_lra_p
+
 #undef TARGET_MANGLE_TYPE
 #define TARGET_MANGLE_TYPE aarch64_mangle_type
 
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 8ff6ca1..0c31312 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -103,6 +103,10 @@ mabi=
 Target RejectNegative Joined Enum(aarch64_abi) Var(aarch64_abi) Init(AARCH64_ABI_DEFAULT)
 -mabi=ABI	Generate code that conforms to the specified ABI
 
+mlra
+Target Report Var(aarch64_lra_flag) Init(1) Save
+Use LRA instead of reload
+
 Enum
 Name(aarch64_abi) Type(int)
 Known AArch64 ABIs (for use with the -mabi= option):
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f731bb6..182b009 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -68,6 +68,7 @@ struct four_ints
 };
 
 /* Forward function declarations.  */
+static bool arm_lra_p (void);
 static bool arm_needs_doubleword_align (enum machine_mode, const_tree);
 static int arm_compute_static_chain_stack_bytes (void);
 static arm_stack_offsets *arm_get_frame_offsets (void);
@@ -338,6 +339,9 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_LEGITIMIZE_ADDRESS
 #define TARGET_LEGITIMIZE_ADDRESS arm_legitimize_address
 
+#undef TARGET_LRA_P
+#define TARGET_LRA_P arm_lra_p
+
 #undef  TARGET_ATTRIBUTE_TABLE
 #define TARGET_ATTRIBUTE_TABLE arm_attribute_table
 
@@ -4971,6 +4975,12 @@ arm_init_cumulative_args (CUMULATIVE_ARGS *pcum, tree fntype,
     }
 }
 
+/* Return true if we use LRA instead of reload pass.  */
+static bool
+arm_lra_p (void)
+{
+  return arm_lra_flag;
+}
 
 /* Return true if mode/type need doubleword alignment.  */
 static bool
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 1781b75..05a271e 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1266,11 +1266,12 @@ enum reg_class
 
 /* Must leave BASE_REGS reloads alone */
 #define THUMB_SECONDARY_INPUT_RELOAD_CLASS(CLASS, MODE, X)		\
+  (lra_in_progress ? NO_REGS : 						\
   ((CLASS) != LO_REGS && (CLASS) != BASE_REGS				\
    ? ((true_regnum (X) == -1 ? LO_REGS					\
        : (true_regnum (X) + HARD_REGNO_NREGS (0, MODE) > 8) ? LO_REGS	\
        : NO_REGS)) 							\
-   : NO_REGS)
+   : NO_REGS))
 
 #define THUMB_SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, X)		\
   ((CLASS) != LO_REGS && (CLASS) != BASE_REGS				\
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index b9ae2b0..7c9ea36 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -109,6 +109,10 @@ mfloat-abi=
 Target RejectNegative Joined Enum(float_abi_type) Var(arm_float_abi) Init(TARGET_DEFAULT_FLOAT_ABI)
 Specify if floating point hardware should be used
 
+mlra
+Target Report Var(arm_lra_flag) Init(1) Save
+Use LRA instead of reload
+
 Enum
 Name(float_abi_type) Type(enum float_abi_type)
 Known floating-point ABIs (for use with the -mfloat-abi= option):
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 95a314f..0959ff5 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -5437,12 +5437,29 @@ split_double (rtx value, rtx *first, rtx *second)
     }
 }
 
+/* Return true if X is a sign_extract or zero_extract from the least
+   significant bit.  */
+
+static bool
+lsb_bitfield_op_p (rtx x)
+{
+  if (GET_RTX_CLASS (GET_CODE (x)) == RTX_BITFIELD_OPS)
+    {
+      enum machine_mode mode = GET_MODE(x);
+      unsigned HOST_WIDE_INT len = INTVAL (XEXP (x, 1));
+      HOST_WIDE_INT pos = INTVAL (XEXP (x, 2));
+
+      return (pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (mode) - len : 0));
+    }
+  return 0;
+}
+
 /* Strip outer address "mutations" from LOC and return a pointer to the
    inner value.  If OUTER_CODE is nonnull, store the code of the innermost
    stripped expression there.
 
    "Mutations" either convert between modes or apply some kind of
-   alignment.  */
+   extension, truncation or alignment.  */
 
 rtx *
 strip_address_mutations (rtx *loc, enum rtx_code *outer_code)
@@ -5454,6 +5471,10 @@ strip_address_mutations (rtx *loc, enum rtx_code *outer_code)
 	/* Things like SIGN_EXTEND, ZERO_EXTEND and TRUNCATE can be
 	   used to convert between pointer sizes.  */
 	loc = &XEXP (*loc, 0);
+      else if (lsb_bitfield_op_p (*loc))
+	/* Bitfield operations [SIGN|ZERO]_EXTRACT from the least significant
+	   bit can be used too.  */
+	loc = &XEXP (*loc, 0);
       else if (code == AND && CONST_INT_P (XEXP (*loc, 1)))
 	/* (and ... (const_int -X)) is used to align to X bytes.  */
 	loc = &XEXP (*loc, 0);
@@ -5470,6 +5491,18 @@ strip_address_mutations (rtx *loc, enum rtx_code *outer_code)
     }
 }
 
+/* Return true if X is a shifting operation.  */
+
+static bool
+shift_code_p (rtx x)
+{
+  return (GET_CODE (x) == ASHIFT
+          || GET_CODE (x) == ASHIFTRT
+          || GET_CODE (x) == LSHIFTRT
+          || GET_CODE (x) == ROTATE
+          || GET_CODE (x) == ROTATERT);
+}
+
 /* Return true if X must be a base rather than an index.  */
 
 static bool
@@ -5483,7 +5516,7 @@ must_be_base_p (rtx x)
 static bool
 must_be_index_p (rtx x)
 {
-  return GET_CODE (x) == MULT || GET_CODE (x) == ASHIFT;
+  return GET_CODE (x) == MULT || shift_code_p (x);
 }
 
 /* Set the segment part of address INFO to LOC, given that INNER is the
@@ -5522,7 +5555,7 @@ set_address_base (struct address_info *info, rtx *loc, rtx *inner)
 static void
 set_address_index (struct address_info *info, rtx *loc, rtx *inner)
 {
-  if ((GET_CODE (*inner) == MULT || GET_CODE (*inner) == ASHIFT)
+  if ((GET_CODE (*inner) == MULT || shift_code_p (*inner))
       && CONSTANT_P (XEXP (*inner, 1)))
     inner = strip_address_mutations (&XEXP (*inner, 0));
   gcc_checking_assert (REG_P (*inner)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-09-11 12:02                       ` Yvan Roux
@ 2013-09-11 18:20                         ` Richard Sandiford
  2013-09-11 19:08                           ` Yvan Roux
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Sandiford @ 2013-09-11 18:20 UTC (permalink / raw)
  To: Yvan Roux
  Cc: Vladimir Makarov, Richard Earnshaw, gcc-patches,
	Marcus Shawcroft, Ramana Radhakrishnan, Matthew Gretton-Dann,
	Richard Henderson

Yvan Roux <yvan.roux@linaro.org> writes:
> Yes indeed ! here is a fixed patch.
>
> In strip_address_mutations we now have 3 if/else if statements with
> the same body which could be factorized in:
>
>       if ((GET_RTX_CLASS (code) == RTX_UNARY)
>           /* Things like SIGN_EXTEND, ZERO_EXTEND and TRUNCATE can be
>              used to convert between pointer sizes.  */
>           || (lsb_bitfield_op_p (*loc))
>           /* Bitfield operations [SIGN|ZERO]_EXTRACT from the least significant
>              bit can be used too.  */
>           || (code == AND && CONST_INT_P (XEXP (*loc, 1))))
>           /* (and ... (const_int -X)) is used to align to X bytes.  */
>         loc = &XEXP (*loc, 0);
>
> if you think that it doesn't affect too much the readability.

Yeah, good point.  TBH I prefer it with separate ifs though, because the
three cases are dealing with three different types of rtl (unary, binary
and ternary).  But I don't mind much either way.

The new patch looks good to me, thanks.  Just one minor style nit:
"return false" rather than "return 0" for the bool.  Maybe also change:

	/* Bitfield operations [SIGN|ZERO]_EXTRACT from the least significant
	   bit can be used too.  */

to something like:

	/* A [SIGN|ZERO]_EXTRACT from the least significant bit effectively
	   acts as a combined truncation and extension.  */

I really will try to make that my last comment and leave things open
for an official review :-)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-09-11 18:20                         ` Richard Sandiford
@ 2013-09-11 19:08                           ` Yvan Roux
  2013-09-23 11:19                             ` Yvan Roux
  0 siblings, 1 reply; 21+ messages in thread
From: Yvan Roux @ 2013-09-11 19:08 UTC (permalink / raw)
  To: Yvan Roux, Vladimir Makarov, Richard Earnshaw, gcc-patches,
	Marcus Shawcroft, Ramana Radhakrishnan, Matthew Gretton-Dann,
	Richard Henderson, rdsandiford

> Yeah, good point.  TBH I prefer it with separate ifs though, because the
> three cases are dealing with three different types of rtl (unary, binary
> and ternary).  But I don't mind much either way.

Ok, it's fine for me too.

> The new patch looks good to me, thanks.  Just one minor style nit:
> "return false" rather than "return 0" for the bool.  Maybe also change:
>
>         /* Bitfield operations [SIGN|ZERO]_EXTRACT from the least significant
>            bit can be used too.  */
>
> to something like:
>
>         /* A [SIGN|ZERO]_EXTRACT from the least significant bit effectively
>            acts as a combined truncation and extension.  */

Yeah, its clearer.  I'll post the new patch in the other thread.

> I really will try to make that my last comment and leave things open
> for an official review :-)

:-) once again many thanks for your help Richard.

Cheers,
Yvan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-09-11 19:08                           ` Yvan Roux
@ 2013-09-23 11:19                             ` Yvan Roux
  2013-09-23 15:20                               ` Ramana Radhakrishnan
  0 siblings, 1 reply; 21+ messages in thread
From: Yvan Roux @ 2013-09-23 11:19 UTC (permalink / raw)
  To: Yvan Roux, Vladimir Makarov, Richard Earnshaw, gcc-patches,
	Marcus Shawcroft, Ramana Radhakrishnan, Matthew Gretton-Dann,
	Richard Henderson, rdsandiford

Hi,

here is a status of LRA on ARM, after Vladimir's patch and a rebase of
my branch:

AArch64:

No more issues in the testsuite, the libstdc++ ones weren't LRA
specific, they happened at the pass manager level and due to PCH
inclusion, but were fixed with the trunk update. Thus, I'll post a
patch to enable LRA on AArch64, but it is still needed to have the RTL
analyser patch accepted.

AArch32:

No more issues in libstdc++ as well (same as reasons as AArch64), and
only 3 failures in the testsuite:
- The first one is invalid as the test sans the assembler for
"ldaex\tr\[0-9\]+..." and it fails because with LRA the chosen
register is r12 and thus the instruction is "ldaex  ip,..."
- The two others are the same bug where LRA keeps some REG_NOTES (DEAD
and UNUSED) on a comparison pattern where reload removes then, and the
results is that the comparison is removed.  I'm currently working on
this issue.
- Thumb still doesn't bootstrap.

Thanks,
Yvan


On 11 September 2013 20:57, Yvan Roux <yvan.roux@linaro.org> wrote:
>> Yeah, good point.  TBH I prefer it with separate ifs though, because the
>> three cases are dealing with three different types of rtl (unary, binary
>> and ternary).  But I don't mind much either way.
>
> Ok, it's fine for me too.
>
>> The new patch looks good to me, thanks.  Just one minor style nit:
>> "return false" rather than "return 0" for the bool.  Maybe also change:
>>
>>         /* Bitfield operations [SIGN|ZERO]_EXTRACT from the least significant
>>            bit can be used too.  */
>>
>> to something like:
>>
>>         /* A [SIGN|ZERO]_EXTRACT from the least significant bit effectively
>>            acts as a combined truncation and extension.  */
>
> Yeah, its clearer.  I'll post the new patch in the other thread.
>
>> I really will try to make that my last comment and leave things open
>> for an official review :-)
>
> :-) once again many thanks for your help Richard.
>
> Cheers,
> Yvan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-09-23 11:19                             ` Yvan Roux
@ 2013-09-23 15:20                               ` Ramana Radhakrishnan
  2013-09-24  9:50                                 ` Yvan Roux
  0 siblings, 1 reply; 21+ messages in thread
From: Ramana Radhakrishnan @ 2013-09-23 15:20 UTC (permalink / raw)
  To: Yvan Roux
  Cc: Vladimir Makarov, Richard Earnshaw, gcc-patches,
	Marcus Shawcroft, Matthew Gretton-Dann, Richard Henderson,
	rdsandiford


>
> AArch32:
>
> No more issues in libstdc++ as well (same as reasons as AArch64), and
> only 3 failures in the testsuite:
> - The first one is invalid as the test sans the assembler for
> "ldaex\tr\[0-9\]+..." and it fails because with LRA the chosen
> register is r12 and thus the instruction is "ldaex  ip,..."

Fair enough - we should just fix the test and move on.

> - The two others are the same bug where LRA keeps some REG_NOTES (DEAD
> and UNUSED) on a comparison pattern where reload removes then, and the
> results is that the comparison is removed.  I'm currently working on
> this issue.

> - Thumb still doesn't bootstrap.

I would suggest in addition a transitional command-line option to switch 
between LRA and reload as a temporary measure so that folks can do some 
more experimenting for AArch32.

I don't see why such a command-line option is wrong for A64 in principle 
but that's something for Marcus / Richard to comment.


regards
Ramana


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RFC: patch to build GCC for arm with LRA
  2013-09-23 15:20                               ` Ramana Radhakrishnan
@ 2013-09-24  9:50                                 ` Yvan Roux
  0 siblings, 0 replies; 21+ messages in thread
From: Yvan Roux @ 2013-09-24  9:50 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: Vladimir Makarov, Richard Earnshaw, gcc-patches,
	Marcus Shawcroft, Matthew Gretton-Dann, Richard Henderson,
	rdsandiford

Hi,

> Fair enough - we should just fix the test and move on.

Done.

> I would suggest in addition a transitional command-line option to switch
> between LRA and reload as a temporary measure so that folks can do some more
> experimenting for AArch32.

I've a patch which fixes the REG_NOTE issues, it's still under
validation, but seems to work.  I'll add the -mlra option to switch
between LRA and reload, but my question is, will LRA be enabled by
default or not, as we still have the Thumb issue ? Maybe switching LRA
on only in ARM mode is a good trade off, but in that case what is the
best way to do it ? (I'll pass a full validation when we'll be agree
on that point)

Thanks,
Yvan

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-09-24  9:21 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-30 13:17 RFC: patch to build GCC for arm with LRA Yvan Roux
2013-08-30 13:23 ` Richard Earnshaw
2013-08-30 13:36   ` Yvan Roux
2013-09-08 16:12 ` Vladimir Makarov
2013-09-08 19:41 ` Richard Sandiford
2013-09-09  0:29   ` Vladimir Makarov
2013-09-09  7:42     ` Yvan Roux
2013-09-09  9:00       ` Richard Sandiford
2013-09-09  9:00         ` Richard Sandiford
2013-09-09  9:46           ` Yvan Roux
2013-09-09 19:32             ` Richard Sandiford
2013-09-10 17:42               ` Yvan Roux
2013-09-10 19:32                 ` Richard Sandiford
2013-09-11  7:22                   ` Yvan Roux
2013-09-11  7:34                     ` Richard Sandiford
2013-09-11 12:02                       ` Yvan Roux
2013-09-11 18:20                         ` Richard Sandiford
2013-09-11 19:08                           ` Yvan Roux
2013-09-23 11:19                             ` Yvan Roux
2013-09-23 15:20                               ` Ramana Radhakrishnan
2013-09-24  9:50                                 ` Yvan Roux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).