public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][RFC] Add FRE in pass_vectorize
@ 2015-06-10 14:11 Richard Biener
  2015-06-23 20:22 ` Jeff Law
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2015-06-10 14:11 UTC (permalink / raw)
  To: gcc-patches


The following patch adds FRE after vectorization which is needed
for IVOPTs to remove redundant PHI nodes (well, I'm testing a
patch for FRE that will do it already there).

The patch also makes FRE preserve loop-closed SSA form and thus
make it suitable for use in the loop pipeline.

With the placement in the vectorizer sub-pass FRE will effectively
be enabled by -O3 only (well, or if one requests loop vectorization).
I've considered placing it after complete_unroll instead but that
would enable it at -O1 already.  I have no strong opinion on the
exact placement, but it should help all passes between vectorizing
and ivopts for vectorized loops.

Yeah, it adds yet another pass and thus I don't like it very much.
But it for example improves code generated for
gfortran.dg/vect/fast-math-pr37021.f90
from

.L14:
        movupd  (%r11), %xmm3
        addl    $1, %ecx
        addq    %rax, %r11
        movupd  (%r8), %xmm0
        addq    %rax, %r8
        unpckhpd        %xmm3, %xmm3
        movupd  (%rdi), %xmm2
        unpcklpd        %xmm0, %xmm0
        addq    %rsi, %rdi
        movupd  (%rbx), %xmm1
        mulpd   %xmm3, %xmm2
        addq    %rsi, %rbx
        cmpl    %ecx, %ebp
        palignr $8, %xmm1, %xmm1
        mulpd   %xmm1, %xmm0
        movapd  %xmm2, %xmm1
        addpd   %xmm0, %xmm1
        subpd   %xmm2, %xmm0
        shufpd  $2, %xmm0, %xmm1
        addpd   %xmm1, %xmm4
        jne     .L14

to

.L14:
        movupd  (%r8), %xmm0
        addl    $1, %ecx
        addq    %rax, %r8
        movapd  %xmm0, %xmm2
        movupd  (%rdi), %xmm1
        addq    %rsi, %rdi
        cmpl    %ecx, %r11d
        unpckhpd        %xmm0, %xmm2
        unpcklpd        %xmm0, %xmm0
        mulpd   %xmm1, %xmm2
        palignr $8, %xmm1, %xmm1
        mulpd   %xmm1, %xmm0
        movapd  %xmm2, %xmm1
        addpd   %xmm0, %xmm1
        subpd   %xmm2, %xmm0
        shufpd  $2, %xmm0, %xmm1
        addpd   %xmm1, %xmm3
        jne     .L14

(yeah, the vectorizer happily generates redundant loads and one IV
for each such load)

Any other suggestions on pass placement?  I can of course key
that FRE run on -O3 explicitely.  Not sure if we at this point
want to start playing fancy games like setting a property
when a pass (likely) generated redundancies that are worth
fixing up and then key FRE on that one (it gets harder and
less predictable what transforms are run on code).

Bootstrap / regtest running on x86_64-unknown-linux-gnu.  With
other placements I'd expect quite some testsuite fallout
eventually.

Thoughts?

Thanks,
Richard.

2015-06-10  Richard Biener  <rguenther@suse.de>

	* passes.def (pass_vectorize): Add pass_fre.
	* tree-ssa-pre.c (eliminate_dom_walker::before_dom_children):
	Preserve loop-closed SSA form.

Index: gcc/passes.def
===================================================================
*** gcc/passes.def	(revision 224324)
--- gcc/passes.def	(working copy)
*************** along with GCC; see the file COPYING3.
*** 252,257 ****
--- 252,258 ----
  	     Please do not add any other passes in between.  */
  	  NEXT_PASS (pass_vectorize);
            PUSH_INSERT_PASSES_WITHIN (pass_vectorize)
+ 	      NEXT_PASS (pass_fre);
  	      NEXT_PASS (pass_dce);
            POP_INSERT_PASSES ()
            NEXT_PASS (pass_predcom);
Index: gcc/tree-ssa-pre.c
===================================================================
*** gcc/tree-ssa-pre.c	(revision 224324)
--- gcc/tree-ssa-pre.c	(working copy)
*************** eliminate_dom_walker::before_dom_childre
*** 4013,4018 ****
--- 4013,4028 ----
       tailmerging.  Eventually we can reduce its reliance on SCCVN now
       that we fully copy/constant-propagate (most) things.  */
  
+   /* Compute whether this block has loop-closed PHI nodes we need
+      to preserve.  */
+   bool lc_phi = false;
+   edge e;
+   if (loops_state_satisfies_p (LOOP_CLOSED_SSA)
+       && single_pred_p (b)
+       && (e = single_pred_edge (b))
+       && loop_exit_edge_p (e->src->loop_father, e))
+     lc_phi = true;
+ 
    for (gphi_iterator gsi = gsi_start_phis (b); !gsi_end_p (gsi);)
      {
        gphi *phi = gsi.phi ();
*************** eliminate_dom_walker::before_dom_childre
*** 4026,4032 ****
  
        tree sprime = eliminate_avail (res);
        if (sprime
! 	  && sprime != res)
  	{
  	  if (dump_file && (dump_flags & TDF_DETAILS))
  	    {
--- 4036,4043 ----
  
        tree sprime = eliminate_avail (res);
        if (sprime
! 	  && sprime != res
! 	  && !lc_phi)
  	{
  	  if (dump_file && (dump_flags & TDF_DETAILS))
  	    {
*************** eliminate_dom_walker::before_dom_childre
*** 4466,4472 ****
  
    /* Replace destination PHI arguments.  */
    edge_iterator ei;
-   edge e;
    FOR_EACH_EDGE (e, ei, b->succs)
      {
        for (gphi_iterator gsi = gsi_start_phis (e->dest);
--- 4477,4482 ----

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-07-02 17:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-10 14:11 [PATCH][RFC] Add FRE in pass_vectorize Richard Biener
2015-06-23 20:22 ` Jeff Law
2015-06-24  8:16   ` Richard Biener
2015-06-25  3:40     ` Jeff Law
2015-07-02 11:40       ` Alan Lawrence
2015-07-02 12:11         ` Richard Biener
2015-07-02 17:52         ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).