RFA: another patch to fix PR61360

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* RFA: another patch to fix PR61360
@ 2014-09-23  1:26 Vladimir Makarov
  2014-09-23  6:07 ` Uros Bizjak
  0 siblings, 1 reply; 9+ messages in thread
From: Vladimir Makarov @ 2014-09-23  1:26 UTC (permalink / raw)
  To: GCC Patches, Uros Bizjak; +Cc: Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 1396 bytes --]

   The previous patch to solve PR61360 fixed the problem in IRA (it was 
easier for me to do as I know the code well)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360

   Although imo it was an ok fix, Richard expressed concerns with the 
patch and the practice to have different enable attribute values 
depending on the current pass.

   I don't understand why "x,m" alternative is better to "x,r" and "x,r" 
should be disabled.  Even if the path from general regs to sse regs is 
slow (usually such slow path is implemented internally by 
micro-architecture through cache).  "x,r" alternative results in only 
smaller insns (including number of insns) with probably the same time 
for the movement.  So "x,r" should be at least no slower, insn cache 
should have more locality, and less overhead for decoding/translating insns.

   Here I propose another solution avoiding to have different enable 
attribute values.

   The patch was successfully bootstrapped on x86/x86-64 and tested with 
and without -march=amdfam10 (actually the patch results in 2 less 
failures when -march=amdfam10 were used).

   Uros, is i386.md change ok for the trunk?

2014-09-22  Vladimir Makarov  <vmakarov@redhat.com>

         PR target/61360
         * lra.c (lra): Remove call of recog_init.
         * config/i386/i386.md (*float<SWI48:mode><MODEF:mode>2_sse):
         Always enable first alternative.


[-- Attachment #2: pr61360-2.patch --]
[-- Type: text/plain, Size: 1467 bytes --]

Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md	(revision 215337)
+++ config/i386/i386.md	(working copy)
@@ -4795,14 +4795,6 @@
               (symbol_ref "TARGET_MIX_SSE_I387
                            && X87_ENABLE_FLOAT (<MODEF:MODE>mode,
                                                 <SWI48:MODE>mode)")
-            (eq_attr "alternative" "1")
-              /* ??? For sched1 we need constrain_operands to be able to
-                 select an alternative.  Leave this enabled before RA.  */
-              (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS
-                           || optimize_function_for_size_p (cfun)
-                           || !(reload_completed
-                                || reload_in_progress
-                                || lra_in_progress)")
            ]
            (symbol_ref "true")))
    ])
Index: lra.c
===================================================================
--- lra.c	(revision 215358)
+++ lra.c	(working copy)
@@ -2135,11 +2135,6 @@ lra (FILE *f)
 
   lra_in_progress = 1;
 
-  /* The enable attributes can change their values as LRA starts
-     although it is a bad practice.  To prevent reuse of the outdated
-     values, clear them.  */
-  recog_init ();
-
   lra_live_range_iter = lra_coalesce_iter = 0;
   lra_constraint_iter = lra_constraint_iter_after_spill = 0;
   lra_inheritance_iter = lra_undo_inheritance_iter = 0;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFA: another patch to fix PR61360
  2014-09-23  1:26 RFA: another patch to fix PR61360 Vladimir Makarov
@ 2014-09-23  6:07 ` Uros Bizjak
  2014-09-23 14:52   ` Vladimir Makarov
  0 siblings, 1 reply; 9+ messages in thread
From: Uros Bizjak @ 2014-09-23  6:07 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: GCC Patches, Richard Sandiford

On Tue, Sep 23, 2014 at 3:26 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>   The previous patch to solve PR61360 fixed the problem in IRA (it was
> easier for me to do as I know the code well)
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360
>
>   Although imo it was an ok fix, Richard expressed concerns with the patch
> and the practice to have different enable attribute values depending on the
> current pass.
>
>   I don't understand why "x,m" alternative is better to "x,r" and "x,r"
> should be disabled.  Even if the path from general regs to sse regs is slow
> (usually such slow path is implemented internally by micro-architecture
> through cache).  "x,r" alternative results in only smaller insns (including
> number of insns) with probably the same time for the movement.  So "x,r"
> should be at least no slower, insn cache should have more locality, and less
> overhead for decoding/translating insns.
>
>   Here I propose another solution avoiding to have different enable
> attribute values.
>
>   The patch was successfully bootstrapped on x86/x86-64 and tested with and
> without -march=amdfam10 (actually the patch results in 2 less failures when
> -march=amdfam10 were used).
>
>   Uros, is i386.md change ok for the trunk?

I don't think so. This would be a regression, since 4.8 (and later
versions until Richard's patch) were able to handle this functionality
just fine. Please also note that there are a couple of other patterns
with the same problem, that is using ("nonimmediate_operand" "m")
constraint.

Please see PR 60704, comment 7 [1]. If LRA is able to fixup
("nonimmediate_operand" "m") to a memory from a register, then other
RTL infrastructure should also be updated for this functionality. IMO,
recog should be fixed/enhanced, so pseudo registers will also satisfy
("nonimmediate_operand" "m") constraint before LRA.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60704#c7

Uros.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFA: another patch to fix PR61360
  2014-09-23  6:07 ` Uros Bizjak
@ 2014-09-23 14:52   ` Vladimir Makarov
  2014-09-23 15:03     ` Uros Bizjak
  0 siblings, 1 reply; 9+ messages in thread
From: Vladimir Makarov @ 2014-09-23 14:52 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches, Richard Sandiford

On 09/23/2014 02:07 AM, Uros Bizjak wrote:
> On Tue, Sep 23, 2014 at 3:26 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>   The previous patch to solve PR61360 fixed the problem in IRA (it was
>> easier for me to do as I know the code well)
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360
>>
>>   Although imo it was an ok fix, Richard expressed concerns with the patch
>> and the practice to have different enable attribute values depending on the
>> current pass.
>>
>>   I don't understand why "x,m" alternative is better to "x,r" and "x,r"
>> should be disabled.  Even if the path from general regs to sse regs is slow
>> (usually such slow path is implemented internally by micro-architecture
>> through cache).  "x,r" alternative results in only smaller insns (including
>> number of insns) with probably the same time for the movement.  So "x,r"
>> should be at least no slower, insn cache should have more locality, and less
>> overhead for decoding/translating insns.
>>
>>   Here I propose another solution avoiding to have different enable
>> attribute values.
>>
>>   The patch was successfully bootstrapped on x86/x86-64 and tested with and
>> without -march=amdfam10 (actually the patch results in 2 less failures when
>> -march=amdfam10 were used).
>>
>>   Uros, is i386.md change ok for the trunk?
> I don't think so. This would be a regression, since 4.8 (and later
> versions until Richard's patch) were able to handle this functionality
> just fine. Please also note that there are a couple of other patterns
> with the same problem, that is using ("nonimmediate_operand" "m")
> constraint.
>
> Please see PR 60704, comment 7 [1]. If LRA is able to fixup
> ("nonimmediate_operand" "m") to a memory from a register, then other
> RTL infrastructure should also be updated for this functionality. IMO,
> recog should be fixed/enhanced, so pseudo registers will also satisfy
> ("nonimmediate_operand" "m") constraint before LRA.
>
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60704#c7
>
>
Uros, my patch does not result in PR60704 (I tested it before submitting
the patch).

Also I don't understand why you are mentioning only "m" alternative as I
enabled another alternative "x,r" in original pattern (alternative "1"
in other words alternative in the middle).

(define_insn "*float<SWI48:mode><MODEF:mode>2_sse"
  [(set (match_operand:MODEF 0 "register_operand" "=f,x,x")
	(float:MODEF
	  (match_operand:SWI48 1 "nonimmediate_operand" "m,r,m")))]
  "SSE_FLOAT_MODE_P (<MODEF:MODE>mode) && TARGET_SSE_MATH"
  "@
   fild%Z1\t%1
   %vcvtsi2<MODEF:ssemodesuffix><SWI48:rex64suffix>\t{%1, %d0|%d0, %1}
   %vcvtsi2<MODEF:ssemodesuffix><SWI48:rex64suffix>\t{%1, %d0|%d0, %1}"
  [(set_attr "type" "fmov,sseicvt,sseicvt")
   (set_attr "prefix" "orig,maybe_vex,maybe_vex")
   (set_attr "mode" "<MODEF:MODE>")
   (set (attr "prefix_rex")
     (if_then_else
       (and (eq_attr "prefix" "maybe_vex")
	    (match_test "<SWI48:MODE>mode == DImode"))
       (const_string "1")
       (const_string "*")))
   (set_attr "unit" "i387,*,*")
   (set_attr "athlon_decode" "*,double,direct")
   (set_attr "amdfam10_decode" "*,vector,double")
   (set_attr "bdver1_decode" "*,double,direct")
   (set_attr "fp_int_src" "true")
   (set (attr "enabled")
     (cond [(eq_attr "alternative" "0")
              (symbol_ref "TARGET_MIX_SSE_I387
                           && X87_ENABLE_FLOAT (<MODEF:MODE>mode,
                                                <SWI48:MODE>mode)")
            (eq_attr "alternative" "1")
              /* ??? For sched1 we need constrain_operands to be able to
                 select an alternative.  Leave this enabled before RA.  */
              (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS
                           || optimize_function_for_size_p (cfun)
                           || !(reload_completed
                                || reload_in_progress
                                || lra_in_progress)")
           ]
           (symbol_ref "true")))
   ])

What you are saying would be true if I enabled "x,m" and it was single alternative in insn because constrain_operands rejects such instruction until pseudo is changed by memory.

You are right constrain_operands is not upto LRA possibilities and we should make the following change:

Index: recog.c
===================================================================
--- recog.c     (revision 215337)
+++ recog.c     (working copy)
@@ -2639,7 +2639,10 @@ constrain_operands (int strict)
                               || (strict < 0 && CONSTANT_P (op))
                               /* During reload, accept a pseudo  */
                               || (reload_in_progress && REG_P (op)
-                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)))
+                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)
+                              /* LRA can put reg value into memory if
+                                 it is necessary.  */
+                              || (strict <= 0 && targetm.lra_p () && REG_P (op)))
                    win = 1;
                  else if (insn_extra_address_constraint (cn)
                           /* Every address operand can be reloaded to fit.  */

But that is a different story (for insns with single alternative containing only "m").

I guess I should submit such change for recog.c as a separate patch.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFA: another patch to fix PR61360
  2014-09-23 14:52   ` Vladimir Makarov
@ 2014-09-23 15:03     ` Uros Bizjak
  2014-09-23 15:23       ` Vladimir Makarov
  2014-09-23 15:23       ` Uros Bizjak
  0 siblings, 2 replies; 9+ messages in thread
From: Uros Bizjak @ 2014-09-23 15:03 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: GCC Patches, Richard Sandiford

On Tue, Sep 23, 2014 at 4:52 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
> On 09/23/2014 02:07 AM, Uros Bizjak wrote:
>> On Tue, Sep 23, 2014 at 3:26 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>>   The previous patch to solve PR61360 fixed the problem in IRA (it was
>>> easier for me to do as I know the code well)
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360
>>>
>>>   Although imo it was an ok fix, Richard expressed concerns with the patch
>>> and the practice to have different enable attribute values depending on the
>>> current pass.
>>>
>>>   I don't understand why "x,m" alternative is better to "x,r" and "x,r"
>>> should be disabled.  Even if the path from general regs to sse regs is slow
>>> (usually such slow path is implemented internally by micro-architecture
>>> through cache).  "x,r" alternative results in only smaller insns (including
>>> number of insns) with probably the same time for the movement.  So "x,r"
>>> should be at least no slower, insn cache should have more locality, and less
>>> overhead for decoding/translating insns.
>>>
>>>   Here I propose another solution avoiding to have different enable
>>> attribute values.
>>>
>>>   The patch was successfully bootstrapped on x86/x86-64 and tested with and
>>> without -march=amdfam10 (actually the patch results in 2 less failures when
>>> -march=amdfam10 were used).
>>>
>>>   Uros, is i386.md change ok for the trunk?
>> I don't think so. This would be a regression, since 4.8 (and later
>> versions until Richard's patch) were able to handle this functionality
>> just fine. Please also note that there are a couple of other patterns
>> with the same problem, that is using ("nonimmediate_operand" "m")
>> constraint.
>>
>> Please see PR 60704, comment 7 [1]. If LRA is able to fixup
>> ("nonimmediate_operand" "m") to a memory from a register, then other
>> RTL infrastructure should also be updated for this functionality. IMO,
>> recog should be fixed/enhanced, so pseudo registers will also satisfy
>> ("nonimmediate_operand" "m") constraint before LRA.
>>
>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60704#c7
>>
>>
> Uros, my patch does not result in PR60704 (I tested it before submitting
> the patch).

No, we didn't understand each other. The fix for PR60704 (enablement
of register alternative before LRA) caused PR61360. As I argued in the
referred comment of PR61360, the original fix was wrong, and should be
reverted. So, the condition should simply read:

>    (set (attr "enabled")
>      (cond [(eq_attr "alternative" "0")
>               (symbol_ref "TARGET_MIX_SSE_I387
>                            && X87_ENABLE_FLOAT (<MODEF:MODE>mode,
>                                                 <SWI48:MODE>mode)")
>             (eq_attr "alternative" "1")
>               (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS
>                            || optimize_function_for_size_p (cfun)")
>            ]
>            (symbol_ref "true")))
>
> Also I don't understand why you are mentioning only "m" alternative as I
> enabled another alternative "x,r" in original pattern (alternative "1"
> in other words alternative in the middle).

This part of the comment refers to:

(define_insn "*float<SWI48x:mode><MODEF:mode>2_i387"

(as mention in the #c7 comment of PR60704).

> You are right constrain_operands is not upto LRA possibilities and we should make the following change:
>
> Index: recog.c
> ===================================================================
> --- recog.c     (revision 215337)
> +++ recog.c     (working copy)
> @@ -2639,7 +2639,10 @@ constrain_operands (int strict)
>                                || (strict < 0 && CONSTANT_P (op))
>                                /* During reload, accept a pseudo  */
>                                || (reload_in_progress && REG_P (op)
> -                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)))
> +                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)
> +                              /* LRA can put reg value into memory if
> +                                 it is necessary.  */
> +                              || (strict <= 0 && targetm.lra_p () && REG_P (op)))
>                     win = 1;
>                   else if (insn_extra_address_constraint (cn)
>                            /* Every address operand can be reloaded to fit.  */
>
> But that is a different story (for insns with single alternative containing only "m").
>
> I guess I should submit such change for recog.c as a separate patch.

I think that the above is the right approach to fix PR60704, so the
current PR60704 fix [1] should be reverted.

[1] https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.md?r1=208989&r2=208988&pathrev=208989

Uros.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFA: another patch to fix PR61360
  2014-09-23 15:03     ` Uros Bizjak
@ 2014-09-23 15:23       ` Vladimir Makarov
  2014-09-23 15:33         ` Uros Bizjak
  2014-09-23 15:23       ` Uros Bizjak
  1 sibling, 1 reply; 9+ messages in thread
From: Vladimir Makarov @ 2014-09-23 15:23 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: GCC Patches, Richard Sandiford

On 09/23/2014 11:02 AM, Uros Bizjak wrote:
> On Tue, Sep 23, 2014 at 4:52 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>> On 09/23/2014 02:07 AM, Uros Bizjak wrote:
>>>
>> Uros, my patch does not result in PR60704 (I tested it before submitting
>> the patch).
> No, we didn't understand each other. The fix for PR60704 (enablement
> of register alternative before LRA) caused PR61360. As I argued in the
> referred comment of PR61360, the original fix was wrong, and should be
> reverted. So, the condition should simply read:
>
>>    (set (attr "enabled")
>>      (cond [(eq_attr "alternative" "0")
>>               (symbol_ref "TARGET_MIX_SSE_I387
>>                            && X87_ENABLE_FLOAT (<MODEF:MODE>mode,
>>                                                 <SWI48:MODE>mode)")
>>             (eq_attr "alternative" "1")
>>               (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS
>>                            || optimize_function_for_size_p (cfun)")
>>            ]
>>            (symbol_ref "true")))
>>
>> Also I don't understand why you are mentioning only "m" alternative as I
>> enabled another alternative "x,r" in original pattern (alternative "1"
>> in other words alternative in the middle).
> This part of the comment refers to:
>
> (define_insn "*float<SWI48x:mode><MODEF:mode>2_i387"
>
> (as mention in the #c7 comment of PR60704).

Ok, I see.
>> You are right constrain_operands is not upto LRA possibilities and we should make the following change:
>>
>> Index: recog.c
>> ===================================================================
>> --- recog.c     (revision 215337)
>> +++ recog.c     (working copy)
>> @@ -2639,7 +2639,10 @@ constrain_operands (int strict)
>>                                || (strict < 0 && CONSTANT_P (op))
>>                                /* During reload, accept a pseudo  */
>>                                || (reload_in_progress && REG_P (op)
>> -                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)))
>> +                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)
>> +                              /* LRA can put reg value into memory if
>> +                                 it is necessary.  */
>> +                              || (strict <= 0 && targetm.lra_p () && REG_P (op)))
>>                     win = 1;
>>                   else if (insn_extra_address_constraint (cn)
>>                            /* Every address operand can be reloaded to fit.  */
>>
>> But that is a different story (for insns with single alternative containing only "m").
>>
>> I guess I should submit such change for recog.c as a separate patch.
> I think that the above is the right approach to fix PR60704, so the
> current PR60704 fix [1] should be reverted.
>
> [1] https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.md?r1=208989&r2=208988&pathrev=208989
>
>
Ok. I can submit patch reverting it + the change in recog.c.

I have still a question: do we really need

(eq_attr "alternative" "1")
              (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS
                           || optimize_function_for_size_p (cfun)")

As I wrote I'd always enable the alternative.  I don't expect performance improvement in disabling this alternative when path r->x is slow (as I heard it is implemented internally by moving through cache anyway).  Even it is slow I believe it is still not faster than r->m->x.  What do you think?



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFA: another patch to fix PR61360
  2014-09-23 15:23       ` Vladimir Makarov
@ 2014-09-23 15:33         ` Uros Bizjak
  2014-09-23 16:40           ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Uros Bizjak @ 2014-09-23 15:33 UTC (permalink / raw)
  To: Vladimir Makarov
  Cc: GCC Patches, Richard Sandiford, Gopalasubramanian, Ganesh

On Tue, Sep 23, 2014 at 5:22 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:

>>> You are right constrain_operands is not upto LRA possibilities and we should make the following change:
>>>
>>> Index: recog.c
>>> ===================================================================
>>> --- recog.c     (revision 215337)
>>> +++ recog.c     (working copy)
>>> @@ -2639,7 +2639,10 @@ constrain_operands (int strict)
>>>                                || (strict < 0 && CONSTANT_P (op))
>>>                                /* During reload, accept a pseudo  */
>>>                                || (reload_in_progress && REG_P (op)
>>> -                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)))
>>> +                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)
>>> +                              /* LRA can put reg value into memory if
>>> +                                 it is necessary.  */
>>> +                              || (strict <= 0 && targetm.lra_p () && REG_P (op)))
>>>                     win = 1;
>>>                   else if (insn_extra_address_constraint (cn)
>>>                            /* Every address operand can be reloaded to fit.  */
>>>
>>> But that is a different story (for insns with single alternative containing only "m").
>>>
>>> I guess I should submit such change for recog.c as a separate patch.
>> I think that the above is the right approach to fix PR60704, so the
>> current PR60704 fix [1] should be reverted.
>>
>> [1] https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.md?r1=208989&r2=208988&pathrev=208989
>>
>>
> Ok. I can submit patch reverting it + the change in recog.c.
>
> I have still a question: do we really need
>
> (eq_attr "alternative" "1")
>               (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS
>                            || optimize_function_for_size_p (cfun)")
>
> As I wrote I'd always enable the alternative.  I don't expect performance improvement in disabling this alternative when path r->x is slow (as I heard it is implemented internally by moving through cache anyway).  Even it is slow I believe it is still not faster than r->m->x.  What do you think?

The "r->x" alternative results in "vector" decoding on amdfam10. This
is AMD-speak for microcoded instructions, and AMD optimization manual
strongly recommends avoiding them. I have CC'd Ganesh, maybe he can
provide more relevant data on the performance impact.

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFA: another patch to fix PR61360
  2014-09-23 15:33         ` Uros Bizjak
@ 2014-09-23 16:40           ` Richard Biener
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Biener @ 2014-09-23 16:40 UTC (permalink / raw)
  To: Uros Bizjak, Vladimir Makarov
  Cc: GCC Patches, Richard Sandiford, Gopalasubramanian, Ganesh

On September 23, 2014 5:33:35 PM CEST, Uros Bizjak <ubizjak@gmail.com> wrote:
>On Tue, Sep 23, 2014 at 5:22 PM, Vladimir Makarov <vmakarov@redhat.com>
>wrote:
>
>>>> You are right constrain_operands is not upto LRA possibilities and
>we should make the following change:
>>>>
>>>> Index: recog.c
>>>> ===================================================================
>>>> --- recog.c     (revision 215337)
>>>> +++ recog.c     (working copy)
>>>> @@ -2639,7 +2639,10 @@ constrain_operands (int strict)
>>>>                                || (strict < 0 && CONSTANT_P (op))
>>>>                                /* During reload, accept a pseudo 
>*/
>>>>                                || (reload_in_progress && REG_P (op)
>>>> -                                  && REGNO (op) >=
>FIRST_PSEUDO_REGISTER)))
>>>> +                                  && REGNO (op) >=
>FIRST_PSEUDO_REGISTER)
>>>> +                              /* LRA can put reg value into memory
>if
>>>> +                                 it is necessary.  */
>>>> +                              || (strict <= 0 && targetm.lra_p ()
>&& REG_P (op)))
>>>>                     win = 1;
>>>>                   else if (insn_extra_address_constraint (cn)
>>>>                            /* Every address operand can be reloaded
>to fit.  */
>>>>
>>>> But that is a different story (for insns with single alternative
>containing only "m").
>>>>
>>>> I guess I should submit such change for recog.c as a separate
>patch.
>>> I think that the above is the right approach to fix PR60704, so the
>>> current PR60704 fix [1] should be reverted.
>>>
>>> [1]
>https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.md?r1=208989&r2=208988&pathrev=208989
>>>
>>>
>> Ok. I can submit patch reverting it + the change in recog.c.
>>
>> I have still a question: do we really need
>>
>> (eq_attr "alternative" "1")
>>               (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS
>>                            || optimize_function_for_size_p (cfun)")
>>
>> As I wrote I'd always enable the alternative.  I don't expect
>performance improvement in disabling this alternative when path r->x is
>slow (as I heard it is implemented internally by moving through cache
>anyway).  Even it is slow I believe it is still not faster than
>r->m->x.  What do you think?
>
>The "r->x" alternative results in "vector" decoding on amdfam10. This
>is AMD-speak for microcoded instructions, and AMD optimization manual
>strongly recommends avoiding them. I have CC'd Ganesh, maybe he can
>provide more relevant data on the performance impact.

IIRC a vector decoded instruction merely limits the frontend which can at most decode and dispatch one such insn at a time. So the performance impact depends on the context.

Richard.

>Thanks,
>Uros.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFA: another patch to fix PR61360
  2014-09-23 15:03     ` Uros Bizjak
  2014-09-23 15:23       ` Vladimir Makarov
@ 2014-09-23 15:23       ` Uros Bizjak
  1 sibling, 0 replies; 9+ messages in thread
From: Uros Bizjak @ 2014-09-23 15:23 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: GCC Patches, Richard Sandiford

On Tue, Sep 23, 2014 at 5:02 PM, Uros Bizjak <ubizjak@gmail.com> wrote:

>>> On Tue, Sep 23, 2014 at 3:26 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>>>   The previous patch to solve PR61360 fixed the problem in IRA (it was
>>>> easier for me to do as I know the code well)
>>>>
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360
>>>>
>>>>   Although imo it was an ok fix, Richard expressed concerns with the patch
>>>> and the practice to have different enable attribute values depending on the
>>>> current pass.
>>>>
>>>>   I don't understand why "x,m" alternative is better to "x,r" and "x,r"
>>>> should be disabled.  Even if the path from general regs to sse regs is slow
>>>> (usually such slow path is implemented internally by micro-architecture
>>>> through cache).  "x,r" alternative results in only smaller insns (including
>>>> number of insns) with probably the same time for the movement.  So "x,r"
>>>> should be at least no slower, insn cache should have more locality, and less
>>>> overhead for decoding/translating insns.
>>>>
>>>>   Here I propose another solution avoiding to have different enable
>>>> attribute values.
>>>>
>>>>   The patch was successfully bootstrapped on x86/x86-64 and tested with and
>>>> without -march=amdfam10 (actually the patch results in 2 less failures when
>>>> -march=amdfam10 were used).
>>>>
>>>>   Uros, is i386.md change ok for the trunk?
>>> I don't think so. This would be a regression, since 4.8 (and later
>>> versions until Richard's patch) were able to handle this functionality
>>> just fine. Please also note that there are a couple of other patterns
>>> with the same problem, that is using ("nonimmediate_operand" "m")
>>> constraint.
>>>
>>> Please see PR 60704, comment 7 [1]. If LRA is able to fixup
>>> ("nonimmediate_operand" "m") to a memory from a register, then other
>>> RTL infrastructure should also be updated for this functionality. IMO,
>>> recog should be fixed/enhanced, so pseudo registers will also satisfy
>>> ("nonimmediate_operand" "m") constraint before LRA.
>>>
>>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60704#c7
>>>
>>>
>> Uros, my patch does not result in PR60704 (I tested it before submitting
>> the patch).
>
> No, we didn't understand each other. The fix for PR60704 (enablement
> of register alternative before LRA) caused PR61360. As I argued in the
> referred comment of PR61360, the original fix was wrong, and should be
> reverted. So, the condition should simply read:
>
>>    (set (attr "enabled")
>>      (cond [(eq_attr "alternative" "0")
>>               (symbol_ref "TARGET_MIX_SSE_I387
>>                            && X87_ENABLE_FLOAT (<MODEF:MODE>mode,
>>                                                 <SWI48:MODE>mode)")
>>             (eq_attr "alternative" "1")
>>               (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS
>>                            || optimize_function_for_size_p (cfun)")
>>            ]
>>            (symbol_ref "true")))
>>
>> Also I don't understand why you are mentioning only "m" alternative as I
>> enabled another alternative "x,r" in original pattern (alternative "1"
>> in other words alternative in the middle).
>
> This part of the comment refers to:
>
> (define_insn "*float<SWI48x:mode><MODEF:mode>2_i387"
>
> (as mention in the #c7 comment of PR60704).
>
>> You are right constrain_operands is not upto LRA possibilities and we should make the following change:
>>
>> Index: recog.c
>> ===================================================================
>> --- recog.c     (revision 215337)
>> +++ recog.c     (working copy)
>> @@ -2639,7 +2639,10 @@ constrain_operands (int strict)
>>                                || (strict < 0 && CONSTANT_P (op))
>>                                /* During reload, accept a pseudo  */
>>                                || (reload_in_progress && REG_P (op)
>> -                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)))
>> +                                  && REGNO (op) >= FIRST_PSEUDO_REGISTER)
>> +                              /* LRA can put reg value into memory if
>> +                                 it is necessary.  */
>> +                              || (strict <= 0 && targetm.lra_p () && REG_P (op)))
>>                     win = 1;
>>                   else if (insn_extra_address_constraint (cn)
>>                            /* Every address operand can be reloaded to fit.  */
>>
>> But that is a different story (for insns with single alternative containing only "m").

This is also the situation when the fix for PR60704 is reverted. The
"r" alternative in "*float<SWI48:mode><MODEF:
mode>2_sse" is unconditionally disabled and all remainting alternatives are "m".

Uros.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: RFA: another patch to fix PR61360
@ 2014-09-24 11:36 Gopalasubramanian, Ganesh
  0 siblings, 0 replies; 9+ messages in thread
From: Gopalasubramanian, Ganesh @ 2014-09-24 11:36 UTC (permalink / raw)
  To: Uros Bizjak, Vladimir Makarov; +Cc: GCC Patches, Richard Sandiford

>The "r->x" alternative results in "vector" decoding on amdfam10. This is AMD-speak for microcoded instructions, and AMD optimization manual strongly recommends avoiding them. I have CC'd Ganesh, maybe he >can provide more relevant data on the performance impact.

Thanks Uros!

Yes, the AMD SWOG recommends precisely what Uros mentions.
<snip from SWOG for BD>
When moving data from a GPR to an XMM register, use separate store and load instructions to move
the data first from the source register to a temporary location in memory and then from memory into
the destination register
</snip>

This is listed as an optimization too. This holds good for all amdfam10 and BD  family processors. 
I have to dig through the performance numbers will try to get them.

Regards
Ganesh

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-09-24 11:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-23  1:26 RFA: another patch to fix PR61360 Vladimir Makarov
2014-09-23  6:07 ` Uros Bizjak
2014-09-23 14:52   ` Vladimir Makarov
2014-09-23 15:03     ` Uros Bizjak
2014-09-23 15:23       ` Vladimir Makarov
2014-09-23 15:33         ` Uros Bizjak
2014-09-23 16:40           ` Richard Biener
2014-09-23 15:23       ` Uros Bizjak
2014-09-24 11:36 Gopalasubramanian, Ganesh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).