public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* RFA: another patch to fix PR61360
@ 2014-09-23  1:26 Vladimir Makarov
  2014-09-23  6:07 ` Uros Bizjak
  0 siblings, 1 reply; 9+ messages in thread
From: Vladimir Makarov @ 2014-09-23  1:26 UTC (permalink / raw)
  To: GCC Patches, Uros Bizjak; +Cc: Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 1396 bytes --]

   The previous patch to solve PR61360 fixed the problem in IRA (it was 
easier for me to do as I know the code well)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61360

   Although imo it was an ok fix, Richard expressed concerns with the 
patch and the practice to have different enable attribute values 
depending on the current pass.

   I don't understand why "x,m" alternative is better to "x,r" and "x,r" 
should be disabled.  Even if the path from general regs to sse regs is 
slow (usually such slow path is implemented internally by 
micro-architecture through cache).  "x,r" alternative results in only 
smaller insns (including number of insns) with probably the same time 
for the movement.  So "x,r" should be at least no slower, insn cache 
should have more locality, and less overhead for decoding/translating insns.

   Here I propose another solution avoiding to have different enable 
attribute values.

   The patch was successfully bootstrapped on x86/x86-64 and tested with 
and without -march=amdfam10 (actually the patch results in 2 less 
failures when -march=amdfam10 were used).

   Uros, is i386.md change ok for the trunk?

2014-09-22  Vladimir Makarov  <vmakarov@redhat.com>

         PR target/61360
         * lra.c (lra): Remove call of recog_init.
         * config/i386/i386.md (*float<SWI48:mode><MODEF:mode>2_sse):
         Always enable first alternative.


[-- Attachment #2: pr61360-2.patch --]
[-- Type: text/plain, Size: 1467 bytes --]

Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md	(revision 215337)
+++ config/i386/i386.md	(working copy)
@@ -4795,14 +4795,6 @@
               (symbol_ref "TARGET_MIX_SSE_I387
                            && X87_ENABLE_FLOAT (<MODEF:MODE>mode,
                                                 <SWI48:MODE>mode)")
-            (eq_attr "alternative" "1")
-              /* ??? For sched1 we need constrain_operands to be able to
-                 select an alternative.  Leave this enabled before RA.  */
-              (symbol_ref "TARGET_INTER_UNIT_CONVERSIONS
-                           || optimize_function_for_size_p (cfun)
-                           || !(reload_completed
-                                || reload_in_progress
-                                || lra_in_progress)")
            ]
            (symbol_ref "true")))
    ])
Index: lra.c
===================================================================
--- lra.c	(revision 215358)
+++ lra.c	(working copy)
@@ -2135,11 +2135,6 @@ lra (FILE *f)
 
   lra_in_progress = 1;
 
-  /* The enable attributes can change their values as LRA starts
-     although it is a bad practice.  To prevent reuse of the outdated
-     values, clear them.  */
-  recog_init ();
-
   lra_live_range_iter = lra_coalesce_iter = 0;
   lra_constraint_iter = lra_constraint_iter_after_spill = 0;
   lra_inheritance_iter = lra_undo_inheritance_iter = 0;

^ permalink raw reply	[flat|nested] 9+ messages in thread
* RE: RFA: another patch to fix PR61360
@ 2014-09-24 11:36 Gopalasubramanian, Ganesh
  0 siblings, 0 replies; 9+ messages in thread
From: Gopalasubramanian, Ganesh @ 2014-09-24 11:36 UTC (permalink / raw)
  To: Uros Bizjak, Vladimir Makarov; +Cc: GCC Patches, Richard Sandiford

>The "r->x" alternative results in "vector" decoding on amdfam10. This is AMD-speak for microcoded instructions, and AMD optimization manual strongly recommends avoiding them. I have CC'd Ganesh, maybe he >can provide more relevant data on the performance impact.

Thanks Uros!

Yes, the AMD SWOG recommends precisely what Uros mentions.
<snip from SWOG for BD>
When moving data from a GPR to an XMM register, use separate store and load instructions to move
the data first from the source register to a temporary location in memory and then from memory into
the destination register
</snip>

This is listed as an optimization too. This holds good for all amdfam10 and BD  family processors. 
I have to dig through the performance numbers will try to get them.

Regards
Ganesh

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-09-24 11:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-23  1:26 RFA: another patch to fix PR61360 Vladimir Makarov
2014-09-23  6:07 ` Uros Bizjak
2014-09-23 14:52   ` Vladimir Makarov
2014-09-23 15:03     ` Uros Bizjak
2014-09-23 15:23       ` Vladimir Makarov
2014-09-23 15:33         ` Uros Bizjak
2014-09-23 16:40           ` Richard Biener
2014-09-23 15:23       ` Uros Bizjak
2014-09-24 11:36 Gopalasubramanian, Ganesh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).