public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] DSE: Use the constant source if possible
@ 2022-05-21  3:01 H.J. Lu
  2022-05-23 10:38 ` Richard Biener
  0 siblings, 1 reply; 21+ messages in thread
From: H.J. Lu @ 2022-05-21  3:01 UTC (permalink / raw)
  To: gcc-patches

When recording store for RTL dead store elimination, check if the source
register is set only once to a constant.  If yes, record the constant
as the store source.  It eliminates unrolled zero stores after memset 0
in a loop where a vector register is used as the zero store source.

gcc/

	PR rtl-optimization/105638
	* dse.cc (record_store): Use the constant source if the source
	register is set only once.

gcc/testsuite/

	PR rtl-optimization/105638
	* g++.target/i386/pr105638.C: New test.
---
 gcc/dse.cc                               | 19 ++++++++++
 gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
 2 files changed, 63 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C

diff --git a/gcc/dse.cc b/gcc/dse.cc
index 30c11cee034..0433dd3d846 100644
--- a/gcc/dse.cc
+++ b/gcc/dse.cc
@@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
 
 	  if (tem && CONSTANT_P (tem))
 	    const_rhs = tem;
+	  else
+	    {
+	      /* If RHS is set only once to a constant, set CONST_RHS
+		 to the constant.  */
+	      df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
+	      if (def != nullptr
+		  && !DF_REF_IS_ARTIFICIAL (def)
+		  && !DF_REF_NEXT_REG (def))
+		{
+		  rtx_insn *def_insn = DF_REF_INSN (def);
+		  rtx def_body = PATTERN (def_insn);
+		  if (GET_CODE (def_body) == SET)
+		    {
+		      rtx def_src = SET_SRC (def_body);
+		      if (CONSTANT_P (def_src))
+			const_rhs = def_src;
+		    }
+		}
+	    }
 	}
     }
 
diff --git a/gcc/testsuite/g++.target/i386/pr105638.C b/gcc/testsuite/g++.target/i386/pr105638.C
new file mode 100644
index 00000000000..ff40a459de1
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr105638.C
@@ -0,0 +1,44 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-std=gnu++20 -O2 -march=skylake" } */
+/* { dg-final { scan-assembler-not "vpxor" } } */
+
+#include <stdint.h>
+#include <vector>
+#include <tr1/array>
+
+class FastBoard {
+public:
+    typedef std::pair<int, int> movescore_t;
+    typedef std::tr1::array<movescore_t, 24> scoredlist_t;
+    
+protected:
+    std::vector<int> m_critical;
+
+    int m_boardsize;    
+};
+
+class FastState {
+public:        
+    FastBoard board;
+    
+    int movenum;              
+protected:
+    FastBoard::scoredlist_t scoredmoves;
+};
+
+class KoState : public FastState {
+private:         
+    std::vector<uint64_t> ko_hash_history;   
+    std::vector<uint64_t> hash_history;     
+};
+
+class GameState : public KoState {
+public:                    
+    void foo ();      
+private:
+    std::vector<KoState> game_history;                          
+};
+
+void GameState::foo() {
+    game_history.resize(movenum);
+}
-- 
2.36.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] DSE: Use the constant source if possible
  2022-05-21  3:01 [PATCH] DSE: Use the constant source if possible H.J. Lu
@ 2022-05-23 10:38 ` Richard Biener
  2022-05-23 18:34   ` [PATCH v2] DSE: Use the constant store " H.J. Lu
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Biener @ 2022-05-23 10:38 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches

On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> When recording store for RTL dead store elimination, check if the source
> register is set only once to a constant.  If yes, record the constant
> as the store source.  It eliminates unrolled zero stores after memset 0
> in a loop where a vector register is used as the zero store source.
>
> gcc/
>
>         PR rtl-optimization/105638
>         * dse.cc (record_store): Use the constant source if the source
>         register is set only once.
>
> gcc/testsuite/
>
>         PR rtl-optimization/105638
>         * g++.target/i386/pr105638.C: New test.
> ---
>  gcc/dse.cc                               | 19 ++++++++++
>  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
>  2 files changed, 63 insertions(+)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>
> diff --git a/gcc/dse.cc b/gcc/dse.cc
> index 30c11cee034..0433dd3d846 100644
> --- a/gcc/dse.cc
> +++ b/gcc/dse.cc
> @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
>
>           if (tem && CONSTANT_P (tem))
>             const_rhs = tem;
> +         else
> +           {
> +             /* If RHS is set only once to a constant, set CONST_RHS
> +                to the constant.  */
> +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> +             if (def != nullptr
> +                 && !DF_REF_IS_ARTIFICIAL (def)
> +                 && !DF_REF_NEXT_REG (def))
> +               {
> +                 rtx_insn *def_insn = DF_REF_INSN (def);
> +                 rtx def_body = PATTERN (def_insn);
> +                 if (GET_CODE (def_body) == SET)
> +                   {
> +                     rtx def_src = SET_SRC (def_body);
> +                     if (CONSTANT_P (def_src))
> +                       const_rhs = def_src;

doesn't DSE have its own tracking of stored values?  Shouldn't we
improve _that_ if it is not enough?  I also wonder if you need to
verify the SET isn't partial?

Richard.

> +                   }
> +               }
> +           }
>         }
>      }
>
> diff --git a/gcc/testsuite/g++.target/i386/pr105638.C b/gcc/testsuite/g++.target/i386/pr105638.C
> new file mode 100644
> index 00000000000..ff40a459de1
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr105638.C
> @@ -0,0 +1,44 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-std=gnu++20 -O2 -march=skylake" } */
> +/* { dg-final { scan-assembler-not "vpxor" } } */
> +
> +#include <stdint.h>
> +#include <vector>
> +#include <tr1/array>
> +
> +class FastBoard {
> +public:
> +    typedef std::pair<int, int> movescore_t;
> +    typedef std::tr1::array<movescore_t, 24> scoredlist_t;
> +
> +protected:
> +    std::vector<int> m_critical;
> +
> +    int m_boardsize;
> +};
> +
> +class FastState {
> +public:
> +    FastBoard board;
> +
> +    int movenum;
> +protected:
> +    FastBoard::scoredlist_t scoredmoves;
> +};
> +
> +class KoState : public FastState {
> +private:
> +    std::vector<uint64_t> ko_hash_history;
> +    std::vector<uint64_t> hash_history;
> +};
> +
> +class GameState : public KoState {
> +public:
> +    void foo ();
> +private:
> +    std::vector<KoState> game_history;
> +};
> +
> +void GameState::foo() {
> +    game_history.resize(movenum);
> +}
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2] DSE: Use the constant store source if possible
  2022-05-23 10:38 ` Richard Biener
@ 2022-05-23 18:34   ` H.J. Lu
  2022-05-24  6:42     ` Richard Biener
  2022-05-25  7:30     ` Richard Sandiford
  0 siblings, 2 replies; 21+ messages in thread
From: H.J. Lu @ 2022-05-23 18:34 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > When recording store for RTL dead store elimination, check if the source
> > register is set only once to a constant.  If yes, record the constant
> > as the store source.  It eliminates unrolled zero stores after memset 0
> > in a loop where a vector register is used as the zero store source.
> >
> > gcc/
> >
> >         PR rtl-optimization/105638
> >         * dse.cc (record_store): Use the constant source if the source
> >         register is set only once.
> >
> > gcc/testsuite/
> >
> >         PR rtl-optimization/105638
> >         * g++.target/i386/pr105638.C: New test.
> > ---
> >  gcc/dse.cc                               | 19 ++++++++++
> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
> >  2 files changed, 63 insertions(+)
> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> >
> > diff --git a/gcc/dse.cc b/gcc/dse.cc
> > index 30c11cee034..0433dd3d846 100644
> > --- a/gcc/dse.cc
> > +++ b/gcc/dse.cc
> > @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
> >
> >           if (tem && CONSTANT_P (tem))
> >             const_rhs = tem;
> > +         else
> > +           {
> > +             /* If RHS is set only once to a constant, set CONST_RHS
> > +                to the constant.  */
> > +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> > +             if (def != nullptr
> > +                 && !DF_REF_IS_ARTIFICIAL (def)
> > +                 && !DF_REF_NEXT_REG (def))
> > +               {
> > +                 rtx_insn *def_insn = DF_REF_INSN (def);
> > +                 rtx def_body = PATTERN (def_insn);
> > +                 if (GET_CODE (def_body) == SET)
> > +                   {
> > +                     rtx def_src = SET_SRC (def_body);
> > +                     if (CONSTANT_P (def_src))
> > +                       const_rhs = def_src;
> 
> doesn't DSE have its own tracking of stored values?  Shouldn't we

It tracks stored values only within the basic block.  When RTL loop
invariant motion hoists a constant initialization out of the loop into
a separate basic block, the constant store value becomes unknown
within the original basic block.

> improve _that_ if it is not enough?  I also wonder if you need to

My patch extends DSE stored value tracking to include the constant which
is set only once in another basic block.

> verify the SET isn't partial?
> 

Here is the v2 patch to check that the constant is set by a non-partial
unconditional load.

OK for master?

Thanks.

H.J.
---
RTL DSE tracks redundant constant stores within a basic block.  When RTL
loop invariant motion hoists a constant initialization out of the loop
into a separate basic block, the constant store value becomes unknown
within the original basic block.  When recording store for RTL DSE, check
if the source register is set only once to a constant by a non-partial
unconditional load.  If yes, record the constant as the constant store
source.  It eliminates unrolled zero stores after memset 0 in a loop
where a vector register is used as the zero store source.

gcc/

	PR rtl-optimization/105638
	* dse.cc (record_store): Use the constant source if the source
	register is set only once.

gcc/testsuite/

	PR rtl-optimization/105638
	* g++.target/i386/pr105638.C: New test.
---
 gcc/dse.cc                               | 22 ++++++++++++
 gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
 2 files changed, 66 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C

diff --git a/gcc/dse.cc b/gcc/dse.cc
index 30c11cee034..af8e88dac32 100644
--- a/gcc/dse.cc
+++ b/gcc/dse.cc
@@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
 
 	  if (tem && CONSTANT_P (tem))
 	    const_rhs = tem;
+	  else
+	    {
+	      /* If RHS is set only once to a constant, set CONST_RHS
+		 to the constant.  */
+	      df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
+	      if (def != nullptr
+		  && !DF_REF_IS_ARTIFICIAL (def)
+		  && !(DF_REF_FLAGS (def)
+		       & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
+		  && !DF_REF_NEXT_REG (def))
+		{
+		  rtx_insn *def_insn = DF_REF_INSN (def);
+		  rtx def_body = PATTERN (def_insn);
+		  if (GET_CODE (def_body) == SET)
+		    {
+		      rtx def_src = SET_SRC (def_body);
+		      if (CONSTANT_P (def_src)
+			  && GET_MODE (def_src) == GET_MODE (rhs))
+			const_rhs = def_src;
+		    }
+		}
+	    }
 	}
     }
 
diff --git a/gcc/testsuite/g++.target/i386/pr105638.C b/gcc/testsuite/g++.target/i386/pr105638.C
new file mode 100644
index 00000000000..ff40a459de1
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr105638.C
@@ -0,0 +1,44 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-std=gnu++20 -O2 -march=skylake" } */
+/* { dg-final { scan-assembler-not "vpxor" } } */
+
+#include <stdint.h>
+#include <vector>
+#include <tr1/array>
+
+class FastBoard {
+public:
+    typedef std::pair<int, int> movescore_t;
+    typedef std::tr1::array<movescore_t, 24> scoredlist_t;
+    
+protected:
+    std::vector<int> m_critical;
+
+    int m_boardsize;    
+};
+
+class FastState {
+public:        
+    FastBoard board;
+    
+    int movenum;              
+protected:
+    FastBoard::scoredlist_t scoredmoves;
+};
+
+class KoState : public FastState {
+private:         
+    std::vector<uint64_t> ko_hash_history;   
+    std::vector<uint64_t> hash_history;     
+};
+
+class GameState : public KoState {
+public:                    
+    void foo ();      
+private:
+    std::vector<KoState> game_history;                          
+};
+
+void GameState::foo() {
+    game_history.resize(movenum);
+}
-- 
2.36.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] DSE: Use the constant store source if possible
  2022-05-23 18:34   ` [PATCH v2] DSE: Use the constant store " H.J. Lu
@ 2022-05-24  6:42     ` Richard Biener
  2022-05-24 20:10       ` H.J. Lu
  2022-05-25  7:30     ` Richard Sandiford
  1 sibling, 1 reply; 21+ messages in thread
From: Richard Biener @ 2022-05-24  6:42 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches

On Mon, May 23, 2022 at 8:34 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
> > On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > When recording store for RTL dead store elimination, check if the source
> > > register is set only once to a constant.  If yes, record the constant
> > > as the store source.  It eliminates unrolled zero stores after memset 0
> > > in a loop where a vector register is used as the zero store source.
> > >
> > > gcc/
> > >
> > >         PR rtl-optimization/105638
> > >         * dse.cc (record_store): Use the constant source if the source
> > >         register is set only once.
> > >
> > > gcc/testsuite/
> > >
> > >         PR rtl-optimization/105638
> > >         * g++.target/i386/pr105638.C: New test.
> > > ---
> > >  gcc/dse.cc                               | 19 ++++++++++
> > >  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
> > >  2 files changed, 63 insertions(+)
> > >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> > >
> > > diff --git a/gcc/dse.cc b/gcc/dse.cc
> > > index 30c11cee034..0433dd3d846 100644
> > > --- a/gcc/dse.cc
> > > +++ b/gcc/dse.cc
> > > @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
> > >
> > >           if (tem && CONSTANT_P (tem))
> > >             const_rhs = tem;
> > > +         else
> > > +           {
> > > +             /* If RHS is set only once to a constant, set CONST_RHS
> > > +                to the constant.  */
> > > +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> > > +             if (def != nullptr
> > > +                 && !DF_REF_IS_ARTIFICIAL (def)
> > > +                 && !DF_REF_NEXT_REG (def))
> > > +               {
> > > +                 rtx_insn *def_insn = DF_REF_INSN (def);
> > > +                 rtx def_body = PATTERN (def_insn);
> > > +                 if (GET_CODE (def_body) == SET)
> > > +                   {
> > > +                     rtx def_src = SET_SRC (def_body);
> > > +                     if (CONSTANT_P (def_src))
> > > +                       const_rhs = def_src;
> >
> > doesn't DSE have its own tracking of stored values?  Shouldn't we
>
> It tracks stored values only within the basic block.  When RTL loop
> invariant motion hoists a constant initialization out of the loop into
> a separate basic block, the constant store value becomes unknown
> within the original basic block.
>
> > improve _that_ if it is not enough?  I also wonder if you need to
>
> My patch extends DSE stored value tracking to include the constant which
> is set only once in another basic block.
>
> > verify the SET isn't partial?
> >
>
> Here is the v2 patch to check that the constant is set by a non-partial
> unconditional load.
>
> OK for master?
>
> Thanks.
>
> H.J.
> ---
> RTL DSE tracks redundant constant stores within a basic block.  When RTL
> loop invariant motion hoists a constant initialization out of the loop
> into a separate basic block, the constant store value becomes unknown
> within the original basic block.  When recording store for RTL DSE, check
> if the source register is set only once to a constant by a non-partial
> unconditional load.  If yes, record the constant as the constant store
> source.  It eliminates unrolled zero stores after memset 0 in a loop
> where a vector register is used as the zero store source.
>
> gcc/
>
>         PR rtl-optimization/105638
>         * dse.cc (record_store): Use the constant source if the source
>         register is set only once.
>
> gcc/testsuite/
>
>         PR rtl-optimization/105638
>         * g++.target/i386/pr105638.C: New test.
> ---
>  gcc/dse.cc                               | 22 ++++++++++++
>  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
>  2 files changed, 66 insertions(+)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>
> diff --git a/gcc/dse.cc b/gcc/dse.cc
> index 30c11cee034..af8e88dac32 100644
> --- a/gcc/dse.cc
> +++ b/gcc/dse.cc
> @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
>
>           if (tem && CONSTANT_P (tem))
>             const_rhs = tem;
> +         else
> +           {
> +             /* If RHS is set only once to a constant, set CONST_RHS
> +                to the constant.  */
> +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> +             if (def != nullptr
> +                 && !DF_REF_IS_ARTIFICIAL (def)
> +                 && !(DF_REF_FLAGS (def)
> +                      & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
> +                 && !DF_REF_NEXT_REG (def))

Can we really use df-chain here and rely that a single definition is
the only one?  If rhs is a hardreg does df-chain include implicit
sets of function argument registers for example?  Don't we need RD
here or at least verify the single df-chain definition dominates the
use here (if we can rely on the reg otherwise be uninitialized and thus
the use invoking undefined behavior we could use the constant even
in non-dominating context, but WRT the function args & hardregs I'm not
sure we can tell).

> +               {
> +                 rtx_insn *def_insn = DF_REF_INSN (def);
> +                 rtx def_body = PATTERN (def_insn);
> +                 if (GET_CODE (def_body) == SET)

I think you should be able to use

           rtx def_body = single_set (def_insn);
           if (def_body)

here to get at some more cases.

> +                   {
> +                     rtx def_src = SET_SRC (def_body);
> +                     if (CONSTANT_P (def_src)
> +                         && GET_MODE (def_src) == GET_MODE (rhs))

possibly verify SET_DEST (def_body) == rhs to rule out subregs and mode punning
instead?  Then def_src should be automatically OK.  I wonder whether it's worth
tracking (or if DSE even can) loads from the constant pool with constant
REG_EQUAL/EQUIV notes?

> +                       const_rhs = def_src;
> +                   }
> +               }
> +           }
>         }
>      }
>
> diff --git a/gcc/testsuite/g++.target/i386/pr105638.C b/gcc/testsuite/g++.target/i386/pr105638.C
> new file mode 100644
> index 00000000000..ff40a459de1
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr105638.C
> @@ -0,0 +1,44 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-std=gnu++20 -O2 -march=skylake" } */
> +/* { dg-final { scan-assembler-not "vpxor" } } */
> +
> +#include <stdint.h>
> +#include <vector>
> +#include <tr1/array>
> +
> +class FastBoard {
> +public:
> +    typedef std::pair<int, int> movescore_t;
> +    typedef std::tr1::array<movescore_t, 24> scoredlist_t;
> +
> +protected:
> +    std::vector<int> m_critical;
> +
> +    int m_boardsize;
> +};
> +
> +class FastState {
> +public:
> +    FastBoard board;
> +
> +    int movenum;
> +protected:
> +    FastBoard::scoredlist_t scoredmoves;
> +};
> +
> +class KoState : public FastState {
> +private:
> +    std::vector<uint64_t> ko_hash_history;
> +    std::vector<uint64_t> hash_history;
> +};
> +
> +class GameState : public KoState {
> +public:
> +    void foo ();
> +private:
> +    std::vector<KoState> game_history;
> +};
> +
> +void GameState::foo() {
> +    game_history.resize(movenum);
> +}
> --
> 2.36.1
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] DSE: Use the constant store source if possible
  2022-05-24  6:42     ` Richard Biener
@ 2022-05-24 20:10       ` H.J. Lu
  2022-05-25  9:22         ` Richard Biener
  0 siblings, 1 reply; 21+ messages in thread
From: H.J. Lu @ 2022-05-24 20:10 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On Mon, May 23, 2022 at 11:42 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, May 23, 2022 at 8:34 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
> > > On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > When recording store for RTL dead store elimination, check if the source
> > > > register is set only once to a constant.  If yes, record the constant
> > > > as the store source.  It eliminates unrolled zero stores after memset 0
> > > > in a loop where a vector register is used as the zero store source.
> > > >
> > > > gcc/
> > > >
> > > >         PR rtl-optimization/105638
> > > >         * dse.cc (record_store): Use the constant source if the source
> > > >         register is set only once.
> > > >
> > > > gcc/testsuite/
> > > >
> > > >         PR rtl-optimization/105638
> > > >         * g++.target/i386/pr105638.C: New test.
> > > > ---
> > > >  gcc/dse.cc                               | 19 ++++++++++
> > > >  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
> > > >  2 files changed, 63 insertions(+)
> > > >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> > > >
> > > > diff --git a/gcc/dse.cc b/gcc/dse.cc
> > > > index 30c11cee034..0433dd3d846 100644
> > > > --- a/gcc/dse.cc
> > > > +++ b/gcc/dse.cc
> > > > @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
> > > >
> > > >           if (tem && CONSTANT_P (tem))
> > > >             const_rhs = tem;
> > > > +         else
> > > > +           {
> > > > +             /* If RHS is set only once to a constant, set CONST_RHS
> > > > +                to the constant.  */
> > > > +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> > > > +             if (def != nullptr
> > > > +                 && !DF_REF_IS_ARTIFICIAL (def)
> > > > +                 && !DF_REF_NEXT_REG (def))
> > > > +               {
> > > > +                 rtx_insn *def_insn = DF_REF_INSN (def);
> > > > +                 rtx def_body = PATTERN (def_insn);
> > > > +                 if (GET_CODE (def_body) == SET)
> > > > +                   {
> > > > +                     rtx def_src = SET_SRC (def_body);
> > > > +                     if (CONSTANT_P (def_src))
> > > > +                       const_rhs = def_src;
> > >
> > > doesn't DSE have its own tracking of stored values?  Shouldn't we
> >
> > It tracks stored values only within the basic block.  When RTL loop
> > invariant motion hoists a constant initialization out of the loop into
> > a separate basic block, the constant store value becomes unknown
> > within the original basic block.
> >
> > > improve _that_ if it is not enough?  I also wonder if you need to
> >
> > My patch extends DSE stored value tracking to include the constant which
> > is set only once in another basic block.
> >
> > > verify the SET isn't partial?
> > >
> >
> > Here is the v2 patch to check that the constant is set by a non-partial
> > unconditional load.
> >
> > OK for master?
> >
> > Thanks.
> >
> > H.J.
> > ---
> > RTL DSE tracks redundant constant stores within a basic block.  When RTL
> > loop invariant motion hoists a constant initialization out of the loop
> > into a separate basic block, the constant store value becomes unknown
> > within the original basic block.  When recording store for RTL DSE, check
> > if the source register is set only once to a constant by a non-partial
> > unconditional load.  If yes, record the constant as the constant store
> > source.  It eliminates unrolled zero stores after memset 0 in a loop
> > where a vector register is used as the zero store source.
> >
> > gcc/
> >
> >         PR rtl-optimization/105638
> >         * dse.cc (record_store): Use the constant source if the source
> >         register is set only once.
> >
> > gcc/testsuite/
> >
> >         PR rtl-optimization/105638
> >         * g++.target/i386/pr105638.C: New test.
> > ---
> >  gcc/dse.cc                               | 22 ++++++++++++
> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
> >  2 files changed, 66 insertions(+)
> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> >
> > diff --git a/gcc/dse.cc b/gcc/dse.cc
> > index 30c11cee034..af8e88dac32 100644
> > --- a/gcc/dse.cc
> > +++ b/gcc/dse.cc
> > @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
> >
> >           if (tem && CONSTANT_P (tem))
> >             const_rhs = tem;
> > +         else
> > +           {
> > +             /* If RHS is set only once to a constant, set CONST_RHS
> > +                to the constant.  */
> > +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> > +             if (def != nullptr
> > +                 && !DF_REF_IS_ARTIFICIAL (def)
> > +                 && !(DF_REF_FLAGS (def)
> > +                      & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
> > +                 && !DF_REF_NEXT_REG (def))
>
> Can we really use df-chain here and rely that a single definition is
> the only one?  If rhs is a hardreg does df-chain include implicit
> sets of function argument registers for example?  Don't we need RD
> here or at least verify the single df-chain definition dominates the
> use here (if we can rely on the reg otherwise be uninitialized and thus
> the use invoking undefined behavior we could use the constant even
> in non-dominating context, but WRT the function args & hardregs I'm not
> sure we can tell).

Does the hard register have a regular DEF?  Won't the function args &
hardregs have DF_REF_ARTIFICIAL?

>
> > +               {
> > +                 rtx_insn *def_insn = DF_REF_INSN (def);
> > +                 rtx def_body = PATTERN (def_insn);
> > +                 if (GET_CODE (def_body) == SET)
>
> I think you should be able to use
>
>            rtx def_body = single_set (def_insn);
>            if (def_body)
>
> here to get at some more cases.

Will change it.

> > +                   {
> > +                     rtx def_src = SET_SRC (def_body);
> > +                     if (CONSTANT_P (def_src)
> > +                         && GET_MODE (def_src) == GET_MODE (rhs))
>
> possibly verify SET_DEST (def_body) == rhs to rule out subregs and mode punning
> instead?  Then def_src should be automatically OK.  I wonder whether it's worth

Will change it.

> tracking (or if DSE even can) loads from the constant pool with constant.> REG_EQUAL/EQUIV notes?

CONSTANT_P returns true for a constant pool.   I am not
sure if DSE can handle a constant pool.

Thanks.

>
> > +                       const_rhs = def_src;
> > +                   }
> > +               }
> > +           }
> >         }
> >      }
> >
> > diff --git a/gcc/testsuite/g++.target/i386/pr105638.C b/gcc/testsuite/g++.target/i386/pr105638.C
> > new file mode 100644
> > index 00000000000..ff40a459de1
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.target/i386/pr105638.C
> > @@ -0,0 +1,44 @@
> > +/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-options "-std=gnu++20 -O2 -march=skylake" } */
> > +/* { dg-final { scan-assembler-not "vpxor" } } */
> > +
> > +#include <stdint.h>
> > +#include <vector>
> > +#include <tr1/array>
> > +
> > +class FastBoard {
> > +public:
> > +    typedef std::pair<int, int> movescore_t;
> > +    typedef std::tr1::array<movescore_t, 24> scoredlist_t;
> > +
> > +protected:
> > +    std::vector<int> m_critical;
> > +
> > +    int m_boardsize;
> > +};
> > +
> > +class FastState {
> > +public:
> > +    FastBoard board;
> > +
> > +    int movenum;
> > +protected:
> > +    FastBoard::scoredlist_t scoredmoves;
> > +};
> > +
> > +class KoState : public FastState {
> > +private:
> > +    std::vector<uint64_t> ko_hash_history;
> > +    std::vector<uint64_t> hash_history;
> > +};
> > +
> > +class GameState : public KoState {
> > +public:
> > +    void foo ();
> > +private:
> > +    std::vector<KoState> game_history;
> > +};
> > +
> > +void GameState::foo() {
> > +    game_history.resize(movenum);
> > +}
> > --
> > 2.36.1
> >



-- 
H.J.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] DSE: Use the constant store source if possible
  2022-05-23 18:34   ` [PATCH v2] DSE: Use the constant store " H.J. Lu
  2022-05-24  6:42     ` Richard Biener
@ 2022-05-25  7:30     ` Richard Sandiford
  2022-05-25 18:56       ` H.J. Lu
  1 sibling, 1 reply; 21+ messages in thread
From: Richard Sandiford @ 2022-05-25  7:30 UTC (permalink / raw)
  To: H.J. Lu via Gcc-patches

"H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
> On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
>> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
>> <gcc-patches@gcc.gnu.org> wrote:
>> >
>> > When recording store for RTL dead store elimination, check if the source
>> > register is set only once to a constant.  If yes, record the constant
>> > as the store source.  It eliminates unrolled zero stores after memset 0
>> > in a loop where a vector register is used as the zero store source.
>> >
>> > gcc/
>> >
>> >         PR rtl-optimization/105638
>> >         * dse.cc (record_store): Use the constant source if the source
>> >         register is set only once.
>> >
>> > gcc/testsuite/
>> >
>> >         PR rtl-optimization/105638
>> >         * g++.target/i386/pr105638.C: New test.
>> > ---
>> >  gcc/dse.cc                               | 19 ++++++++++
>> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
>> >  2 files changed, 63 insertions(+)
>> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>> >
>> > diff --git a/gcc/dse.cc b/gcc/dse.cc
>> > index 30c11cee034..0433dd3d846 100644
>> > --- a/gcc/dse.cc
>> > +++ b/gcc/dse.cc
>> > @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
>> >
>> >           if (tem && CONSTANT_P (tem))
>> >             const_rhs = tem;
>> > +         else
>> > +           {
>> > +             /* If RHS is set only once to a constant, set CONST_RHS
>> > +                to the constant.  */
>> > +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>> > +             if (def != nullptr
>> > +                 && !DF_REF_IS_ARTIFICIAL (def)
>> > +                 && !DF_REF_NEXT_REG (def))
>> > +               {
>> > +                 rtx_insn *def_insn = DF_REF_INSN (def);
>> > +                 rtx def_body = PATTERN (def_insn);
>> > +                 if (GET_CODE (def_body) == SET)
>> > +                   {
>> > +                     rtx def_src = SET_SRC (def_body);
>> > +                     if (CONSTANT_P (def_src))
>> > +                       const_rhs = def_src;
>> 
>> doesn't DSE have its own tracking of stored values?  Shouldn't we
>
> It tracks stored values only within the basic block.  When RTL loop
> invariant motion hoists a constant initialization out of the loop into
> a separate basic block, the constant store value becomes unknown
> within the original basic block.
>
>> improve _that_ if it is not enough?  I also wonder if you need to
>
> My patch extends DSE stored value tracking to include the constant which
> is set only once in another basic block.
>
>> verify the SET isn't partial?
>> 
>
> Here is the v2 patch to check that the constant is set by a non-partial
> unconditional load.
>
> OK for master?
>
> Thanks.
>
> H.J.
> ---
> RTL DSE tracks redundant constant stores within a basic block.  When RTL
> loop invariant motion hoists a constant initialization out of the loop
> into a separate basic block, the constant store value becomes unknown
> within the original basic block.  When recording store for RTL DSE, check
> if the source register is set only once to a constant by a non-partial
> unconditional load.  If yes, record the constant as the constant store
> source.  It eliminates unrolled zero stores after memset 0 in a loop
> where a vector register is used as the zero store source.
>
> gcc/
>
> 	PR rtl-optimization/105638
> 	* dse.cc (record_store): Use the constant source if the source
> 	register is set only once.
>
> gcc/testsuite/
>
> 	PR rtl-optimization/105638
> 	* g++.target/i386/pr105638.C: New test.
> ---
>  gcc/dse.cc                               | 22 ++++++++++++
>  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
>  2 files changed, 66 insertions(+)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>
> diff --git a/gcc/dse.cc b/gcc/dse.cc
> index 30c11cee034..af8e88dac32 100644
> --- a/gcc/dse.cc
> +++ b/gcc/dse.cc
> @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
>  
>  	  if (tem && CONSTANT_P (tem))
>  	    const_rhs = tem;
> +	  else
> +	    {
> +	      /* If RHS is set only once to a constant, set CONST_RHS
> +		 to the constant.  */
> +	      df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> +	      if (def != nullptr
> +		  && !DF_REF_IS_ARTIFICIAL (def)
> +		  && !(DF_REF_FLAGS (def)
> +		       & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
> +		  && !DF_REF_NEXT_REG (def))

Can we introduce a helper for this?  There are already similar tests
in ira and loop-iv, and it seems a bit too complex to have to open-code
each time.

Thanks,
Richard

> +		{
> +		  rtx_insn *def_insn = DF_REF_INSN (def);
> +		  rtx def_body = PATTERN (def_insn);
> +		  if (GET_CODE (def_body) == SET)
> +		    {
> +		      rtx def_src = SET_SRC (def_body);
> +		      if (CONSTANT_P (def_src)
> +			  && GET_MODE (def_src) == GET_MODE (rhs))
> +			const_rhs = def_src;
> +		    }
> +		}
> +	    }
>  	}
>      }
>  
> diff --git a/gcc/testsuite/g++.target/i386/pr105638.C b/gcc/testsuite/g++.target/i386/pr105638.C
> new file mode 100644
> index 00000000000..ff40a459de1
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr105638.C
> @@ -0,0 +1,44 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-std=gnu++20 -O2 -march=skylake" } */
> +/* { dg-final { scan-assembler-not "vpxor" } } */
> +
> +#include <stdint.h>
> +#include <vector>
> +#include <tr1/array>
> +
> +class FastBoard {
> +public:
> +    typedef std::pair<int, int> movescore_t;
> +    typedef std::tr1::array<movescore_t, 24> scoredlist_t;
> +    
> +protected:
> +    std::vector<int> m_critical;
> +
> +    int m_boardsize;    
> +};
> +
> +class FastState {
> +public:        
> +    FastBoard board;
> +    
> +    int movenum;              
> +protected:
> +    FastBoard::scoredlist_t scoredmoves;
> +};
> +
> +class KoState : public FastState {
> +private:         
> +    std::vector<uint64_t> ko_hash_history;   
> +    std::vector<uint64_t> hash_history;     
> +};
> +
> +class GameState : public KoState {
> +public:                    
> +    void foo ();      
> +private:
> +    std::vector<KoState> game_history;                          
> +};
> +
> +void GameState::foo() {
> +    game_history.resize(movenum);
> +}

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] DSE: Use the constant store source if possible
  2022-05-24 20:10       ` H.J. Lu
@ 2022-05-25  9:22         ` Richard Biener
  2022-05-25  9:30           ` Richard Sandiford
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Biener @ 2022-05-25  9:22 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches

On Tue, May 24, 2022 at 10:11 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, May 23, 2022 at 11:42 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Mon, May 23, 2022 at 8:34 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
> > > > On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >
> > > > > When recording store for RTL dead store elimination, check if the source
> > > > > register is set only once to a constant.  If yes, record the constant
> > > > > as the store source.  It eliminates unrolled zero stores after memset 0
> > > > > in a loop where a vector register is used as the zero store source.
> > > > >
> > > > > gcc/
> > > > >
> > > > >         PR rtl-optimization/105638
> > > > >         * dse.cc (record_store): Use the constant source if the source
> > > > >         register is set only once.
> > > > >
> > > > > gcc/testsuite/
> > > > >
> > > > >         PR rtl-optimization/105638
> > > > >         * g++.target/i386/pr105638.C: New test.
> > > > > ---
> > > > >  gcc/dse.cc                               | 19 ++++++++++
> > > > >  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
> > > > >  2 files changed, 63 insertions(+)
> > > > >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> > > > >
> > > > > diff --git a/gcc/dse.cc b/gcc/dse.cc
> > > > > index 30c11cee034..0433dd3d846 100644
> > > > > --- a/gcc/dse.cc
> > > > > +++ b/gcc/dse.cc
> > > > > @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
> > > > >
> > > > >           if (tem && CONSTANT_P (tem))
> > > > >             const_rhs = tem;
> > > > > +         else
> > > > > +           {
> > > > > +             /* If RHS is set only once to a constant, set CONST_RHS
> > > > > +                to the constant.  */
> > > > > +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> > > > > +             if (def != nullptr
> > > > > +                 && !DF_REF_IS_ARTIFICIAL (def)
> > > > > +                 && !DF_REF_NEXT_REG (def))
> > > > > +               {
> > > > > +                 rtx_insn *def_insn = DF_REF_INSN (def);
> > > > > +                 rtx def_body = PATTERN (def_insn);
> > > > > +                 if (GET_CODE (def_body) == SET)
> > > > > +                   {
> > > > > +                     rtx def_src = SET_SRC (def_body);
> > > > > +                     if (CONSTANT_P (def_src))
> > > > > +                       const_rhs = def_src;
> > > >
> > > > doesn't DSE have its own tracking of stored values?  Shouldn't we
> > >
> > > It tracks stored values only within the basic block.  When RTL loop
> > > invariant motion hoists a constant initialization out of the loop into
> > > a separate basic block, the constant store value becomes unknown
> > > within the original basic block.
> > >
> > > > improve _that_ if it is not enough?  I also wonder if you need to
> > >
> > > My patch extends DSE stored value tracking to include the constant which
> > > is set only once in another basic block.
> > >
> > > > verify the SET isn't partial?
> > > >
> > >
> > > Here is the v2 patch to check that the constant is set by a non-partial
> > > unconditional load.
> > >
> > > OK for master?
> > >
> > > Thanks.
> > >
> > > H.J.
> > > ---
> > > RTL DSE tracks redundant constant stores within a basic block.  When RTL
> > > loop invariant motion hoists a constant initialization out of the loop
> > > into a separate basic block, the constant store value becomes unknown
> > > within the original basic block.  When recording store for RTL DSE, check
> > > if the source register is set only once to a constant by a non-partial
> > > unconditional load.  If yes, record the constant as the constant store
> > > source.  It eliminates unrolled zero stores after memset 0 in a loop
> > > where a vector register is used as the zero store source.
> > >
> > > gcc/
> > >
> > >         PR rtl-optimization/105638
> > >         * dse.cc (record_store): Use the constant source if the source
> > >         register is set only once.
> > >
> > > gcc/testsuite/
> > >
> > >         PR rtl-optimization/105638
> > >         * g++.target/i386/pr105638.C: New test.
> > > ---
> > >  gcc/dse.cc                               | 22 ++++++++++++
> > >  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
> > >  2 files changed, 66 insertions(+)
> > >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> > >
> > > diff --git a/gcc/dse.cc b/gcc/dse.cc
> > > index 30c11cee034..af8e88dac32 100644
> > > --- a/gcc/dse.cc
> > > +++ b/gcc/dse.cc
> > > @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
> > >
> > >           if (tem && CONSTANT_P (tem))
> > >             const_rhs = tem;
> > > +         else
> > > +           {
> > > +             /* If RHS is set only once to a constant, set CONST_RHS
> > > +                to the constant.  */
> > > +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> > > +             if (def != nullptr
> > > +                 && !DF_REF_IS_ARTIFICIAL (def)
> > > +                 && !(DF_REF_FLAGS (def)
> > > +                      & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
> > > +                 && !DF_REF_NEXT_REG (def))
> >
> > Can we really use df-chain here and rely that a single definition is
> > the only one?  If rhs is a hardreg does df-chain include implicit
> > sets of function argument registers for example?  Don't we need RD
> > here or at least verify the single df-chain definition dominates the
> > use here (if we can rely on the reg otherwise be uninitialized and thus
> > the use invoking undefined behavior we could use the constant even
> > in non-dominating context, but WRT the function args & hardregs I'm not
> > sure we can tell).
>
> Does the hard register have a regular DEF?  Won't the function args &
> hardregs have DF_REF_ARTIFICIAL?

I don't know - do they?

> >
> > > +               {
> > > +                 rtx_insn *def_insn = DF_REF_INSN (def);
> > > +                 rtx def_body = PATTERN (def_insn);
> > > +                 if (GET_CODE (def_body) == SET)
> >
> > I think you should be able to use
> >
> >            rtx def_body = single_set (def_insn);
> >            if (def_body)
> >
> > here to get at some more cases.
>
> Will change it.
>
> > > +                   {
> > > +                     rtx def_src = SET_SRC (def_body);
> > > +                     if (CONSTANT_P (def_src)
> > > +                         && GET_MODE (def_src) == GET_MODE (rhs))
> >
> > possibly verify SET_DEST (def_body) == rhs to rule out subregs and mode punning
> > instead?  Then def_src should be automatically OK.  I wonder whether it's worth
>
> Will change it.
>
> > tracking (or if DSE even can) loads from the constant pool with constant.> REG_EQUAL/EQUIV notes?
>
> CONSTANT_P returns true for a constant pool.   I am not
> sure if DSE can handle a constant pool.
>
> Thanks.
>
> >
> > > +                       const_rhs = def_src;
> > > +                   }
> > > +               }
> > > +           }
> > >         }
> > >      }
> > >
> > > diff --git a/gcc/testsuite/g++.target/i386/pr105638.C b/gcc/testsuite/g++.target/i386/pr105638.C
> > > new file mode 100644
> > > index 00000000000..ff40a459de1
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.target/i386/pr105638.C
> > > @@ -0,0 +1,44 @@
> > > +/* { dg-do compile { target { ! ia32 } } } */
> > > +/* { dg-options "-std=gnu++20 -O2 -march=skylake" } */
> > > +/* { dg-final { scan-assembler-not "vpxor" } } */
> > > +
> > > +#include <stdint.h>
> > > +#include <vector>
> > > +#include <tr1/array>
> > > +
> > > +class FastBoard {
> > > +public:
> > > +    typedef std::pair<int, int> movescore_t;
> > > +    typedef std::tr1::array<movescore_t, 24> scoredlist_t;
> > > +
> > > +protected:
> > > +    std::vector<int> m_critical;
> > > +
> > > +    int m_boardsize;
> > > +};
> > > +
> > > +class FastState {
> > > +public:
> > > +    FastBoard board;
> > > +
> > > +    int movenum;
> > > +protected:
> > > +    FastBoard::scoredlist_t scoredmoves;
> > > +};
> > > +
> > > +class KoState : public FastState {
> > > +private:
> > > +    std::vector<uint64_t> ko_hash_history;
> > > +    std::vector<uint64_t> hash_history;
> > > +};
> > > +
> > > +class GameState : public KoState {
> > > +public:
> > > +    void foo ();
> > > +private:
> > > +    std::vector<KoState> game_history;
> > > +};
> > > +
> > > +void GameState::foo() {
> > > +    game_history.resize(movenum);
> > > +}
> > > --
> > > 2.36.1
> > >
>
>
>
> --
> H.J.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] DSE: Use the constant store source if possible
  2022-05-25  9:22         ` Richard Biener
@ 2022-05-25  9:30           ` Richard Sandiford
  2022-05-25 19:01             ` H.J. Lu
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Sandiford @ 2022-05-25  9:30 UTC (permalink / raw)
  To: Richard Biener via Gcc-patches

Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Tue, May 24, 2022 at 10:11 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Mon, May 23, 2022 at 11:42 PM Richard Biener
>> <richard.guenther@gmail.com> wrote:
>> >
>> > On Mon, May 23, 2022 at 8:34 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>> > >
>> > > On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
>> > > > On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
>> > > > <gcc-patches@gcc.gnu.org> wrote:
>> > > > >
>> > > > > When recording store for RTL dead store elimination, check if the source
>> > > > > register is set only once to a constant.  If yes, record the constant
>> > > > > as the store source.  It eliminates unrolled zero stores after memset 0
>> > > > > in a loop where a vector register is used as the zero store source.
>> > > > >
>> > > > > gcc/
>> > > > >
>> > > > >         PR rtl-optimization/105638
>> > > > >         * dse.cc (record_store): Use the constant source if the source
>> > > > >         register is set only once.
>> > > > >
>> > > > > gcc/testsuite/
>> > > > >
>> > > > >         PR rtl-optimization/105638
>> > > > >         * g++.target/i386/pr105638.C: New test.
>> > > > > ---
>> > > > >  gcc/dse.cc                               | 19 ++++++++++
>> > > > >  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
>> > > > >  2 files changed, 63 insertions(+)
>> > > > >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>> > > > >
>> > > > > diff --git a/gcc/dse.cc b/gcc/dse.cc
>> > > > > index 30c11cee034..0433dd3d846 100644
>> > > > > --- a/gcc/dse.cc
>> > > > > +++ b/gcc/dse.cc
>> > > > > @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
>> > > > >
>> > > > >           if (tem && CONSTANT_P (tem))
>> > > > >             const_rhs = tem;
>> > > > > +         else
>> > > > > +           {
>> > > > > +             /* If RHS is set only once to a constant, set CONST_RHS
>> > > > > +                to the constant.  */
>> > > > > +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>> > > > > +             if (def != nullptr
>> > > > > +                 && !DF_REF_IS_ARTIFICIAL (def)
>> > > > > +                 && !DF_REF_NEXT_REG (def))
>> > > > > +               {
>> > > > > +                 rtx_insn *def_insn = DF_REF_INSN (def);
>> > > > > +                 rtx def_body = PATTERN (def_insn);
>> > > > > +                 if (GET_CODE (def_body) == SET)
>> > > > > +                   {
>> > > > > +                     rtx def_src = SET_SRC (def_body);
>> > > > > +                     if (CONSTANT_P (def_src))
>> > > > > +                       const_rhs = def_src;
>> > > >
>> > > > doesn't DSE have its own tracking of stored values?  Shouldn't we
>> > >
>> > > It tracks stored values only within the basic block.  When RTL loop
>> > > invariant motion hoists a constant initialization out of the loop into
>> > > a separate basic block, the constant store value becomes unknown
>> > > within the original basic block.
>> > >
>> > > > improve _that_ if it is not enough?  I also wonder if you need to
>> > >
>> > > My patch extends DSE stored value tracking to include the constant which
>> > > is set only once in another basic block.
>> > >
>> > > > verify the SET isn't partial?
>> > > >
>> > >
>> > > Here is the v2 patch to check that the constant is set by a non-partial
>> > > unconditional load.
>> > >
>> > > OK for master?
>> > >
>> > > Thanks.
>> > >
>> > > H.J.
>> > > ---
>> > > RTL DSE tracks redundant constant stores within a basic block.  When RTL
>> > > loop invariant motion hoists a constant initialization out of the loop
>> > > into a separate basic block, the constant store value becomes unknown
>> > > within the original basic block.  When recording store for RTL DSE, check
>> > > if the source register is set only once to a constant by a non-partial
>> > > unconditional load.  If yes, record the constant as the constant store
>> > > source.  It eliminates unrolled zero stores after memset 0 in a loop
>> > > where a vector register is used as the zero store source.
>> > >
>> > > gcc/
>> > >
>> > >         PR rtl-optimization/105638
>> > >         * dse.cc (record_store): Use the constant source if the source
>> > >         register is set only once.
>> > >
>> > > gcc/testsuite/
>> > >
>> > >         PR rtl-optimization/105638
>> > >         * g++.target/i386/pr105638.C: New test.
>> > > ---
>> > >  gcc/dse.cc                               | 22 ++++++++++++
>> > >  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
>> > >  2 files changed, 66 insertions(+)
>> > >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>> > >
>> > > diff --git a/gcc/dse.cc b/gcc/dse.cc
>> > > index 30c11cee034..af8e88dac32 100644
>> > > --- a/gcc/dse.cc
>> > > +++ b/gcc/dse.cc
>> > > @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
>> > >
>> > >           if (tem && CONSTANT_P (tem))
>> > >             const_rhs = tem;
>> > > +         else
>> > > +           {
>> > > +             /* If RHS is set only once to a constant, set CONST_RHS
>> > > +                to the constant.  */
>> > > +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>> > > +             if (def != nullptr
>> > > +                 && !DF_REF_IS_ARTIFICIAL (def)
>> > > +                 && !(DF_REF_FLAGS (def)
>> > > +                      & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
>> > > +                 && !DF_REF_NEXT_REG (def))
>> >
>> > Can we really use df-chain here and rely that a single definition is
>> > the only one?  If rhs is a hardreg does df-chain include implicit
>> > sets of function argument registers for example?  Don't we need RD
>> > here or at least verify the single df-chain definition dominates the
>> > use here (if we can rely on the reg otherwise be uninitialized and thus
>> > the use invoking undefined behavior we could use the constant even
>> > in non-dominating context, but WRT the function args & hardregs I'm not
>> > sure we can tell).
>>
>> Does the hard register have a regular DEF?  Won't the function args &
>> hardregs have DF_REF_ARTIFICIAL?
>
> I don't know - do they?

They won't be DF_REF_ARTIFICIAL.  I think the code should work ok for
hard registers though.  All sets of argument registers, and all clobbers
of registers by function calls, are recorded as df defs.  (This is one
of the things that bloats the representation compared to rtx insns,
where most call clobbers are implicit.  But it's also one of the things
that makes df easier to use & more reliable than ad hoc liveness tracking.)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] DSE: Use the constant store source if possible
  2022-05-25  7:30     ` Richard Sandiford
@ 2022-05-25 18:56       ` H.J. Lu
  2022-05-26 15:14         ` Richard Sandiford
  0 siblings, 1 reply; 21+ messages in thread
From: H.J. Lu @ 2022-05-25 18:56 UTC (permalink / raw)
  To: H.J. Lu via Gcc-patches, Richard Biener, H.J. Lu, Richard Sandiford

On Wed, May 25, 2022 at 12:30 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
> > On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
> >> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
> >> <gcc-patches@gcc.gnu.org> wrote:
> >> >
> >> > When recording store for RTL dead store elimination, check if the source
> >> > register is set only once to a constant.  If yes, record the constant
> >> > as the store source.  It eliminates unrolled zero stores after memset 0
> >> > in a loop where a vector register is used as the zero store source.
> >> >
> >> > gcc/
> >> >
> >> >         PR rtl-optimization/105638
> >> >         * dse.cc (record_store): Use the constant source if the source
> >> >         register is set only once.
> >> >
> >> > gcc/testsuite/
> >> >
> >> >         PR rtl-optimization/105638
> >> >         * g++.target/i386/pr105638.C: New test.
> >> > ---
> >> >  gcc/dse.cc                               | 19 ++++++++++
> >> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
> >> >  2 files changed, 63 insertions(+)
> >> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> >> >
> >> > diff --git a/gcc/dse.cc b/gcc/dse.cc
> >> > index 30c11cee034..0433dd3d846 100644
> >> > --- a/gcc/dse.cc
> >> > +++ b/gcc/dse.cc
> >> > @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
> >> >
> >> >           if (tem && CONSTANT_P (tem))
> >> >             const_rhs = tem;
> >> > +         else
> >> > +           {
> >> > +             /* If RHS is set only once to a constant, set CONST_RHS
> >> > +                to the constant.  */
> >> > +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> >> > +             if (def != nullptr
> >> > +                 && !DF_REF_IS_ARTIFICIAL (def)
> >> > +                 && !DF_REF_NEXT_REG (def))
> >> > +               {
> >> > +                 rtx_insn *def_insn = DF_REF_INSN (def);
> >> > +                 rtx def_body = PATTERN (def_insn);
> >> > +                 if (GET_CODE (def_body) == SET)
> >> > +                   {
> >> > +                     rtx def_src = SET_SRC (def_body);
> >> > +                     if (CONSTANT_P (def_src))
> >> > +                       const_rhs = def_src;
> >>
> >> doesn't DSE have its own tracking of stored values?  Shouldn't we
> >
> > It tracks stored values only within the basic block.  When RTL loop
> > invariant motion hoists a constant initialization out of the loop into
> > a separate basic block, the constant store value becomes unknown
> > within the original basic block.
> >
> >> improve _that_ if it is not enough?  I also wonder if you need to
> >
> > My patch extends DSE stored value tracking to include the constant which
> > is set only once in another basic block.
> >
> >> verify the SET isn't partial?
> >>
> >
> > Here is the v2 patch to check that the constant is set by a non-partial
> > unconditional load.
> >
> > OK for master?
> >
> > Thanks.
> >
> > H.J.
> > ---
> > RTL DSE tracks redundant constant stores within a basic block.  When RTL
> > loop invariant motion hoists a constant initialization out of the loop
> > into a separate basic block, the constant store value becomes unknown
> > within the original basic block.  When recording store for RTL DSE, check
> > if the source register is set only once to a constant by a non-partial
> > unconditional load.  If yes, record the constant as the constant store
> > source.  It eliminates unrolled zero stores after memset 0 in a loop
> > where a vector register is used as the zero store source.
> >
> > gcc/
> >
> >       PR rtl-optimization/105638
> >       * dse.cc (record_store): Use the constant source if the source
> >       register is set only once.
> >
> > gcc/testsuite/
> >
> >       PR rtl-optimization/105638
> >       * g++.target/i386/pr105638.C: New test.
> > ---
> >  gcc/dse.cc                               | 22 ++++++++++++
> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
> >  2 files changed, 66 insertions(+)
> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> >
> > diff --git a/gcc/dse.cc b/gcc/dse.cc
> > index 30c11cee034..af8e88dac32 100644
> > --- a/gcc/dse.cc
> > +++ b/gcc/dse.cc
> > @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
> >
> >         if (tem && CONSTANT_P (tem))
> >           const_rhs = tem;
> > +       else
> > +         {
> > +           /* If RHS is set only once to a constant, set CONST_RHS
> > +              to the constant.  */
> > +           df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> > +           if (def != nullptr
> > +               && !DF_REF_IS_ARTIFICIAL (def)
> > +               && !(DF_REF_FLAGS (def)
> > +                    & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
> > +               && !DF_REF_NEXT_REG (def))
>
> Can we introduce a helper for this?  There are already similar tests
> in ira and loop-iv, and it seems a bit too complex to have to open-code
> each time.

I can use find_single_def_src in loop-iv.cc:

/* If REGNO has a single definition, return its known value, otherwise return
   null.  */

rtx
find_single_def_src (unsigned int regno)

and do

             /* If RHS is set only once to a constant, set CONST_RHS
                 to the constant.  */
              rtx def_src = find_single_def_src (REGNO (rhs));
              if (def_src != nullptr && CONSTANT_P (def_src))
                {
                  df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
                  if (!(DF_REF_FLAGS (def)
                        & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
                  {
                      rtx_insn *def_insn = DF_REF_INSN (def);
                      rtx def_body = single_set (def_insn);
                      if (rhs == SET_DEST (def_body))
                        const_rhs = def_src;
                    }
                }
            }

Will it work?

Thanks.

> Thanks,
> Richard
>
> > +             {
> > +               rtx_insn *def_insn = DF_REF_INSN (def);
> > +               rtx def_body = PATTERN (def_insn);
> > +               if (GET_CODE (def_body) == SET)
> > +                 {
> > +                   rtx def_src = SET_SRC (def_body);
> > +                   if (CONSTANT_P (def_src)
> > +                       && GET_MODE (def_src) == GET_MODE (rhs))
> > +                     const_rhs = def_src;
> > +                 }
> > +             }
> > +         }
> >       }
> >      }
> >
> > diff --git a/gcc/testsuite/g++.target/i386/pr105638.C b/gcc/testsuite/g++.target/i386/pr105638.C
> > new file mode 100644
> > index 00000000000..ff40a459de1
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.target/i386/pr105638.C
> > @@ -0,0 +1,44 @@
> > +/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-options "-std=gnu++20 -O2 -march=skylake" } */
> > +/* { dg-final { scan-assembler-not "vpxor" } } */
> > +
> > +#include <stdint.h>
> > +#include <vector>
> > +#include <tr1/array>
> > +
> > +class FastBoard {
> > +public:
> > +    typedef std::pair<int, int> movescore_t;
> > +    typedef std::tr1::array<movescore_t, 24> scoredlist_t;
> > +
> > +protected:
> > +    std::vector<int> m_critical;
> > +
> > +    int m_boardsize;
> > +};
> > +
> > +class FastState {
> > +public:
> > +    FastBoard board;
> > +
> > +    int movenum;
> > +protected:
> > +    FastBoard::scoredlist_t scoredmoves;
> > +};
> > +
> > +class KoState : public FastState {
> > +private:
> > +    std::vector<uint64_t> ko_hash_history;
> > +    std::vector<uint64_t> hash_history;
> > +};
> > +
> > +class GameState : public KoState {
> > +public:
> > +    void foo ();
> > +private:
> > +    std::vector<KoState> game_history;
> > +};
> > +
> > +void GameState::foo() {
> > +    game_history.resize(movenum);
> > +}



-- 
H.J.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] DSE: Use the constant store source if possible
  2022-05-25  9:30           ` Richard Sandiford
@ 2022-05-25 19:01             ` H.J. Lu
  0 siblings, 0 replies; 21+ messages in thread
From: H.J. Lu @ 2022-05-25 19:01 UTC (permalink / raw)
  To: Richard Biener via Gcc-patches, H.J. Lu, Richard Biener,
	Richard Sandiford

On Wed, May 25, 2022 at 2:30 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > On Tue, May 24, 2022 at 10:11 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >>
> >> On Mon, May 23, 2022 at 11:42 PM Richard Biener
> >> <richard.guenther@gmail.com> wrote:
> >> >
> >> > On Mon, May 23, 2022 at 8:34 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >> > >
> >> > > On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
> >> > > > On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
> >> > > > <gcc-patches@gcc.gnu.org> wrote:
> >> > > > >
> >> > > > > When recording store for RTL dead store elimination, check if the source
> >> > > > > register is set only once to a constant.  If yes, record the constant
> >> > > > > as the store source.  It eliminates unrolled zero stores after memset 0
> >> > > > > in a loop where a vector register is used as the zero store source.
> >> > > > >
> >> > > > > gcc/
> >> > > > >
> >> > > > >         PR rtl-optimization/105638
> >> > > > >         * dse.cc (record_store): Use the constant source if the source
> >> > > > >         register is set only once.
> >> > > > >
> >> > > > > gcc/testsuite/
> >> > > > >
> >> > > > >         PR rtl-optimization/105638
> >> > > > >         * g++.target/i386/pr105638.C: New test.
> >> > > > > ---
> >> > > > >  gcc/dse.cc                               | 19 ++++++++++
> >> > > > >  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
> >> > > > >  2 files changed, 63 insertions(+)
> >> > > > >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> >> > > > >
> >> > > > > diff --git a/gcc/dse.cc b/gcc/dse.cc
> >> > > > > index 30c11cee034..0433dd3d846 100644
> >> > > > > --- a/gcc/dse.cc
> >> > > > > +++ b/gcc/dse.cc
> >> > > > > @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
> >> > > > >
> >> > > > >           if (tem && CONSTANT_P (tem))
> >> > > > >             const_rhs = tem;
> >> > > > > +         else
> >> > > > > +           {
> >> > > > > +             /* If RHS is set only once to a constant, set CONST_RHS
> >> > > > > +                to the constant.  */
> >> > > > > +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> >> > > > > +             if (def != nullptr
> >> > > > > +                 && !DF_REF_IS_ARTIFICIAL (def)
> >> > > > > +                 && !DF_REF_NEXT_REG (def))
> >> > > > > +               {
> >> > > > > +                 rtx_insn *def_insn = DF_REF_INSN (def);
> >> > > > > +                 rtx def_body = PATTERN (def_insn);
> >> > > > > +                 if (GET_CODE (def_body) == SET)
> >> > > > > +                   {
> >> > > > > +                     rtx def_src = SET_SRC (def_body);
> >> > > > > +                     if (CONSTANT_P (def_src))
> >> > > > > +                       const_rhs = def_src;
> >> > > >
> >> > > > doesn't DSE have its own tracking of stored values?  Shouldn't we
> >> > >
> >> > > It tracks stored values only within the basic block.  When RTL loop
> >> > > invariant motion hoists a constant initialization out of the loop into
> >> > > a separate basic block, the constant store value becomes unknown
> >> > > within the original basic block.
> >> > >
> >> > > > improve _that_ if it is not enough?  I also wonder if you need to
> >> > >
> >> > > My patch extends DSE stored value tracking to include the constant which
> >> > > is set only once in another basic block.
> >> > >
> >> > > > verify the SET isn't partial?
> >> > > >
> >> > >
> >> > > Here is the v2 patch to check that the constant is set by a non-partial
> >> > > unconditional load.
> >> > >
> >> > > OK for master?
> >> > >
> >> > > Thanks.
> >> > >
> >> > > H.J.
> >> > > ---
> >> > > RTL DSE tracks redundant constant stores within a basic block.  When RTL
> >> > > loop invariant motion hoists a constant initialization out of the loop
> >> > > into a separate basic block, the constant store value becomes unknown
> >> > > within the original basic block.  When recording store for RTL DSE, check
> >> > > if the source register is set only once to a constant by a non-partial
> >> > > unconditional load.  If yes, record the constant as the constant store
> >> > > source.  It eliminates unrolled zero stores after memset 0 in a loop
> >> > > where a vector register is used as the zero store source.
> >> > >
> >> > > gcc/
> >> > >
> >> > >         PR rtl-optimization/105638
> >> > >         * dse.cc (record_store): Use the constant source if the source
> >> > >         register is set only once.
> >> > >
> >> > > gcc/testsuite/
> >> > >
> >> > >         PR rtl-optimization/105638
> >> > >         * g++.target/i386/pr105638.C: New test.
> >> > > ---
> >> > >  gcc/dse.cc                               | 22 ++++++++++++
> >> > >  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
> >> > >  2 files changed, 66 insertions(+)
> >> > >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> >> > >
> >> > > diff --git a/gcc/dse.cc b/gcc/dse.cc
> >> > > index 30c11cee034..af8e88dac32 100644
> >> > > --- a/gcc/dse.cc
> >> > > +++ b/gcc/dse.cc
> >> > > @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
> >> > >
> >> > >           if (tem && CONSTANT_P (tem))
> >> > >             const_rhs = tem;
> >> > > +         else
> >> > > +           {
> >> > > +             /* If RHS is set only once to a constant, set CONST_RHS
> >> > > +                to the constant.  */
> >> > > +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> >> > > +             if (def != nullptr
> >> > > +                 && !DF_REF_IS_ARTIFICIAL (def)
> >> > > +                 && !(DF_REF_FLAGS (def)
> >> > > +                      & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
> >> > > +                 && !DF_REF_NEXT_REG (def))
> >> >
> >> > Can we really use df-chain here and rely that a single definition is
> >> > the only one?  If rhs is a hardreg does df-chain include implicit
> >> > sets of function argument registers for example?  Don't we need RD
> >> > here or at least verify the single df-chain definition dominates the
> >> > use here (if we can rely on the reg otherwise be uninitialized and thus
> >> > the use invoking undefined behavior we could use the constant even
> >> > in non-dominating context, but WRT the function args & hardregs I'm not
> >> > sure we can tell).
> >>
> >> Does the hard register have a regular DEF?  Won't the function args &
> >> hardregs have DF_REF_ARTIFICIAL?
> >
> > I don't know - do they?
>
> They won't be DF_REF_ARTIFICIAL.  I think the code should work ok for
> hard registers though.  All sets of argument registers, and all clobbers
> of registers by function calls, are recorded as df defs.  (This is one

In these cases, will SET_SRC be constant?

> of the things that bloats the representation compared to rtx insns,
> where most call clobbers are implicit.  But it's also one of the things
> that makes df easier to use & more reliable than ad hoc liveness tracking.)
>
> Thanks,
> Richard



-- 
H.J.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] DSE: Use the constant store source if possible
  2022-05-25 18:56       ` H.J. Lu
@ 2022-05-26 15:14         ` Richard Sandiford
  2022-05-26 20:43           ` [PATCH v3] " H.J. Lu
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Sandiford @ 2022-05-26 15:14 UTC (permalink / raw)
  To: H.J. Lu; +Cc: H.J. Lu via Gcc-patches, Richard Biener

"H.J. Lu" <hjl.tools@gmail.com> writes:
> On Wed, May 25, 2022 at 12:30 AM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
>> > On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
>> >> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
>> >> <gcc-patches@gcc.gnu.org> wrote:
>> >> >
>> >> > When recording store for RTL dead store elimination, check if the source
>> >> > register is set only once to a constant.  If yes, record the constant
>> >> > as the store source.  It eliminates unrolled zero stores after memset 0
>> >> > in a loop where a vector register is used as the zero store source.
>> >> >
>> >> > gcc/
>> >> >
>> >> >         PR rtl-optimization/105638
>> >> >         * dse.cc (record_store): Use the constant source if the source
>> >> >         register is set only once.
>> >> >
>> >> > gcc/testsuite/
>> >> >
>> >> >         PR rtl-optimization/105638
>> >> >         * g++.target/i386/pr105638.C: New test.
>> >> > ---
>> >> >  gcc/dse.cc                               | 19 ++++++++++
>> >> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
>> >> >  2 files changed, 63 insertions(+)
>> >> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>> >> >
>> >> > diff --git a/gcc/dse.cc b/gcc/dse.cc
>> >> > index 30c11cee034..0433dd3d846 100644
>> >> > --- a/gcc/dse.cc
>> >> > +++ b/gcc/dse.cc
>> >> > @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
>> >> >
>> >> >           if (tem && CONSTANT_P (tem))
>> >> >             const_rhs = tem;
>> >> > +         else
>> >> > +           {
>> >> > +             /* If RHS is set only once to a constant, set CONST_RHS
>> >> > +                to the constant.  */
>> >> > +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>> >> > +             if (def != nullptr
>> >> > +                 && !DF_REF_IS_ARTIFICIAL (def)
>> >> > +                 && !DF_REF_NEXT_REG (def))
>> >> > +               {
>> >> > +                 rtx_insn *def_insn = DF_REF_INSN (def);
>> >> > +                 rtx def_body = PATTERN (def_insn);
>> >> > +                 if (GET_CODE (def_body) == SET)
>> >> > +                   {
>> >> > +                     rtx def_src = SET_SRC (def_body);
>> >> > +                     if (CONSTANT_P (def_src))
>> >> > +                       const_rhs = def_src;
>> >>
>> >> doesn't DSE have its own tracking of stored values?  Shouldn't we
>> >
>> > It tracks stored values only within the basic block.  When RTL loop
>> > invariant motion hoists a constant initialization out of the loop into
>> > a separate basic block, the constant store value becomes unknown
>> > within the original basic block.
>> >
>> >> improve _that_ if it is not enough?  I also wonder if you need to
>> >
>> > My patch extends DSE stored value tracking to include the constant which
>> > is set only once in another basic block.
>> >
>> >> verify the SET isn't partial?
>> >>
>> >
>> > Here is the v2 patch to check that the constant is set by a non-partial
>> > unconditional load.
>> >
>> > OK for master?
>> >
>> > Thanks.
>> >
>> > H.J.
>> > ---
>> > RTL DSE tracks redundant constant stores within a basic block.  When RTL
>> > loop invariant motion hoists a constant initialization out of the loop
>> > into a separate basic block, the constant store value becomes unknown
>> > within the original basic block.  When recording store for RTL DSE, check
>> > if the source register is set only once to a constant by a non-partial
>> > unconditional load.  If yes, record the constant as the constant store
>> > source.  It eliminates unrolled zero stores after memset 0 in a loop
>> > where a vector register is used as the zero store source.
>> >
>> > gcc/
>> >
>> >       PR rtl-optimization/105638
>> >       * dse.cc (record_store): Use the constant source if the source
>> >       register is set only once.
>> >
>> > gcc/testsuite/
>> >
>> >       PR rtl-optimization/105638
>> >       * g++.target/i386/pr105638.C: New test.
>> > ---
>> >  gcc/dse.cc                               | 22 ++++++++++++
>> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
>> >  2 files changed, 66 insertions(+)
>> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>> >
>> > diff --git a/gcc/dse.cc b/gcc/dse.cc
>> > index 30c11cee034..af8e88dac32 100644
>> > --- a/gcc/dse.cc
>> > +++ b/gcc/dse.cc
>> > @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
>> >
>> >         if (tem && CONSTANT_P (tem))
>> >           const_rhs = tem;
>> > +       else
>> > +         {
>> > +           /* If RHS is set only once to a constant, set CONST_RHS
>> > +              to the constant.  */
>> > +           df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>> > +           if (def != nullptr
>> > +               && !DF_REF_IS_ARTIFICIAL (def)
>> > +               && !(DF_REF_FLAGS (def)
>> > +                    & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
>> > +               && !DF_REF_NEXT_REG (def))
>>
>> Can we introduce a helper for this?  There are already similar tests
>> in ira and loop-iv, and it seems a bit too complex to have to open-code
>> each time.
>
> I can use find_single_def_src in loop-iv.cc:
>
> /* If REGNO has a single definition, return its known value, otherwise return
>    null.  */
>
> rtx
> find_single_def_src (unsigned int regno)

Yeah, reusing that sounds good.  Perhaps we should move it into df-core.cc,
alongside the df_reg_used group of functions.

I think the mode check in your original patch should be kept though,
so how about we change the parameter to an rtx reg and use rtx_equal in:

      rtx set = single_set (DF_REF_INSN (adef));
      if (set == NULL || !REG_P (SET_DEST (set))
	  || REGNO (SET_DEST (set)) != regno)
	return NULL_RTX;

rather than the existing !REG_P and REGNO checks.  We should also add:

>
> and do
>
>              /* If RHS is set only once to a constant, set CONST_RHS
>                  to the constant.  */
>               rtx def_src = find_single_def_src (REGNO (rhs));
>               if (def_src != nullptr && CONSTANT_P (def_src))
>                 {
>                   df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>                   if (!(DF_REF_FLAGS (def)
>                         & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))

…this check to the function, since it's needed for correctness even
in the loop-iv.cc use.

Thanks,
Richard

>                   {
>                       rtx_insn *def_insn = DF_REF_INSN (def);
>                       rtx def_body = single_set (def_insn);
>                       if (rhs == SET_DEST (def_body))
>                         const_rhs = def_src;
>                     }
>                 }
>             }
>
> Will it work?
>
> Thanks.
>
>> Thanks,
>> Richard
>>
>> > +             {
>> > +               rtx_insn *def_insn = DF_REF_INSN (def);
>> > +               rtx def_body = PATTERN (def_insn);
>> > +               if (GET_CODE (def_body) == SET)
>> > +                 {
>> > +                   rtx def_src = SET_SRC (def_body);
>> > +                   if (CONSTANT_P (def_src)
>> > +                       && GET_MODE (def_src) == GET_MODE (rhs))
>> > +                     const_rhs = def_src;
>> > +                 }
>> > +             }
>> > +         }
>> >       }
>> >      }
>> >
>> > diff --git a/gcc/testsuite/g++.target/i386/pr105638.C b/gcc/testsuite/g++.target/i386/pr105638.C
>> > new file mode 100644
>> > index 00000000000..ff40a459de1
>> > --- /dev/null
>> > +++ b/gcc/testsuite/g++.target/i386/pr105638.C
>> > @@ -0,0 +1,44 @@
>> > +/* { dg-do compile { target { ! ia32 } } } */
>> > +/* { dg-options "-std=gnu++20 -O2 -march=skylake" } */
>> > +/* { dg-final { scan-assembler-not "vpxor" } } */
>> > +
>> > +#include <stdint.h>
>> > +#include <vector>
>> > +#include <tr1/array>
>> > +
>> > +class FastBoard {
>> > +public:
>> > +    typedef std::pair<int, int> movescore_t;
>> > +    typedef std::tr1::array<movescore_t, 24> scoredlist_t;
>> > +
>> > +protected:
>> > +    std::vector<int> m_critical;
>> > +
>> > +    int m_boardsize;
>> > +};
>> > +
>> > +class FastState {
>> > +public:
>> > +    FastBoard board;
>> > +
>> > +    int movenum;
>> > +protected:
>> > +    FastBoard::scoredlist_t scoredmoves;
>> > +};
>> > +
>> > +class KoState : public FastState {
>> > +private:
>> > +    std::vector<uint64_t> ko_hash_history;
>> > +    std::vector<uint64_t> hash_history;
>> > +};
>> > +
>> > +class GameState : public KoState {
>> > +public:
>> > +    void foo ();
>> > +private:
>> > +    std::vector<KoState> game_history;
>> > +};
>> > +
>> > +void GameState::foo() {
>> > +    game_history.resize(movenum);
>> > +}

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3] DSE: Use the constant store source if possible
  2022-05-26 15:14         ` Richard Sandiford
@ 2022-05-26 20:43           ` H.J. Lu
  2022-05-28 18:37             ` Jeff Law
  2022-05-30  8:35             ` Richard Sandiford
  0 siblings, 2 replies; 21+ messages in thread
From: H.J. Lu @ 2022-05-26 20:43 UTC (permalink / raw)
  To: Gcc-patches, Richard Biener, richard.sandiford

On Thu, May 26, 2022 at 04:14:17PM +0100, Richard Sandiford wrote:
> "H.J. Lu" <hjl.tools@gmail.com> writes:
> > On Wed, May 25, 2022 at 12:30 AM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
> >> > On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
> >> >> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
> >> >> <gcc-patches@gcc.gnu.org> wrote:
> >> >> >
> >> >> > When recording store for RTL dead store elimination, check if the source
> >> >> > register is set only once to a constant.  If yes, record the constant
> >> >> > as the store source.  It eliminates unrolled zero stores after memset 0
> >> >> > in a loop where a vector register is used as the zero store source.
> >> >> >
> >> >> > gcc/
> >> >> >
> >> >> >         PR rtl-optimization/105638
> >> >> >         * dse.cc (record_store): Use the constant source if the source
> >> >> >         register is set only once.
> >> >> >
> >> >> > gcc/testsuite/
> >> >> >
> >> >> >         PR rtl-optimization/105638
> >> >> >         * g++.target/i386/pr105638.C: New test.
> >> >> > ---
> >> >> >  gcc/dse.cc                               | 19 ++++++++++
> >> >> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
> >> >> >  2 files changed, 63 insertions(+)
> >> >> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> >> >> >
> >> >> > diff --git a/gcc/dse.cc b/gcc/dse.cc
> >> >> > index 30c11cee034..0433dd3d846 100644
> >> >> > --- a/gcc/dse.cc
> >> >> > +++ b/gcc/dse.cc
> >> >> > @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
> >> >> >
> >> >> >           if (tem && CONSTANT_P (tem))
> >> >> >             const_rhs = tem;
> >> >> > +         else
> >> >> > +           {
> >> >> > +             /* If RHS is set only once to a constant, set CONST_RHS
> >> >> > +                to the constant.  */
> >> >> > +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> >> >> > +             if (def != nullptr
> >> >> > +                 && !DF_REF_IS_ARTIFICIAL (def)
> >> >> > +                 && !DF_REF_NEXT_REG (def))
> >> >> > +               {
> >> >> > +                 rtx_insn *def_insn = DF_REF_INSN (def);
> >> >> > +                 rtx def_body = PATTERN (def_insn);
> >> >> > +                 if (GET_CODE (def_body) == SET)
> >> >> > +                   {
> >> >> > +                     rtx def_src = SET_SRC (def_body);
> >> >> > +                     if (CONSTANT_P (def_src))
> >> >> > +                       const_rhs = def_src;
> >> >>
> >> >> doesn't DSE have its own tracking of stored values?  Shouldn't we
> >> >
> >> > It tracks stored values only within the basic block.  When RTL loop
> >> > invariant motion hoists a constant initialization out of the loop into
> >> > a separate basic block, the constant store value becomes unknown
> >> > within the original basic block.
> >> >
> >> >> improve _that_ if it is not enough?  I also wonder if you need to
> >> >
> >> > My patch extends DSE stored value tracking to include the constant which
> >> > is set only once in another basic block.
> >> >
> >> >> verify the SET isn't partial?
> >> >>
> >> >
> >> > Here is the v2 patch to check that the constant is set by a non-partial
> >> > unconditional load.
> >> >
> >> > OK for master?
> >> >
> >> > Thanks.
> >> >
> >> > H.J.
> >> > ---
> >> > RTL DSE tracks redundant constant stores within a basic block.  When RTL
> >> > loop invariant motion hoists a constant initialization out of the loop
> >> > into a separate basic block, the constant store value becomes unknown
> >> > within the original basic block.  When recording store for RTL DSE, check
> >> > if the source register is set only once to a constant by a non-partial
> >> > unconditional load.  If yes, record the constant as the constant store
> >> > source.  It eliminates unrolled zero stores after memset 0 in a loop
> >> > where a vector register is used as the zero store source.
> >> >
> >> > gcc/
> >> >
> >> >       PR rtl-optimization/105638
> >> >       * dse.cc (record_store): Use the constant source if the source
> >> >       register is set only once.
> >> >
> >> > gcc/testsuite/
> >> >
> >> >       PR rtl-optimization/105638
> >> >       * g++.target/i386/pr105638.C: New test.
> >> > ---
> >> >  gcc/dse.cc                               | 22 ++++++++++++
> >> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
> >> >  2 files changed, 66 insertions(+)
> >> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> >> >
> >> > diff --git a/gcc/dse.cc b/gcc/dse.cc
> >> > index 30c11cee034..af8e88dac32 100644
> >> > --- a/gcc/dse.cc
> >> > +++ b/gcc/dse.cc
> >> > @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
> >> >
> >> >         if (tem && CONSTANT_P (tem))
> >> >           const_rhs = tem;
> >> > +       else
> >> > +         {
> >> > +           /* If RHS is set only once to a constant, set CONST_RHS
> >> > +              to the constant.  */
> >> > +           df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> >> > +           if (def != nullptr
> >> > +               && !DF_REF_IS_ARTIFICIAL (def)
> >> > +               && !(DF_REF_FLAGS (def)
> >> > +                    & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
> >> > +               && !DF_REF_NEXT_REG (def))
> >>
> >> Can we introduce a helper for this?  There are already similar tests
> >> in ira and loop-iv, and it seems a bit too complex to have to open-code
> >> each time.
> >
> > I can use find_single_def_src in loop-iv.cc:
> >
> > /* If REGNO has a single definition, return its known value, otherwise return
> >    null.  */
> >
> > rtx
> > find_single_def_src (unsigned int regno)
> 
> Yeah, reusing that sounds good.  Perhaps we should move it into df-core.cc,
> alongside the df_reg_used group of functions.
> 
> I think the mode check in your original patch should be kept though,
> so how about we change the parameter to an rtx reg and use rtx_equal in:
> 
>       rtx set = single_set (DF_REF_INSN (adef));
>       if (set == NULL || !REG_P (SET_DEST (set))
> 	  || REGNO (SET_DEST (set)) != regno)
> 	return NULL_RTX;

Fixed.

> rather than the existing !REG_P and REGNO checks.  We should also add:
> 
> >
> > and do
> >
> >              /* If RHS is set only once to a constant, set CONST_RHS
> >                  to the constant.  */
> >               rtx def_src = find_single_def_src (REGNO (rhs));
> >               if (def_src != nullptr && CONSTANT_P (def_src))
> >                 {
> >                   df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> >                   if (!(DF_REF_FLAGS (def)
> >                         & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
> 
> …this check to the function, since it's needed for correctness even
> in the loop-iv.cc use.

Fixed.

> 
> Thanks,
> Richard

Here is the v3 patch.  OK for master?

Thanks.

H.J.
---
RTL DSE tracks redundant constant stores within a basic block.  When RTL
loop invariant motion hoists a constant initialization out of the loop
into a separate basic block, the constant store value becomes unknown
within the original basic block.  When recording store for RTL DSE, check
if the source register is set only once to a constant by a non-partial
unconditional load.  If yes, record the constant as the constant store
source.  It eliminates unrolled zero stores after memset 0 in a loop
where a vector register is used as the zero store source.

Extract find_single_def_src from loop-iv.cc and move it to df-core.cc:

1. Rename to df_find_single_def_src.
2. Change the argument to rtx and use rtx_equal_p.
3. Return null for partial or conditional defs.

gcc/

	PR rtl-optimization/105638
	* df-core.cc (df_find_single_def_sr): Moved and renamed from
	find_single_def_src in loop-iv.cc.  Change the argument to rtx
	and use rtx_equal_p.  Return null for partial or conditional
	defs.
	* df.h (df_find_single_def_src): New prototype.
	* dse.cc (record_store): Use the constant source if the source
	register is set only once.
	* loop-iv.cc (find_single_def_src): Moved to df-core.cc.
	(replace_single_def_regs): Replace find_single_def_src with
	df_find_single_def_src.

gcc/testsuite/

	PR rtl-optimization/105638
	* g++.target/i386/pr105638.C: New test.
---
 gcc/df-core.cc                           | 44 +++++++++++++++++++++++
 gcc/df.h                                 |  1 +
 gcc/dse.cc                               | 14 ++++++++
 gcc/loop-iv.cc                           | 45 +-----------------------
 gcc/testsuite/g++.target/i386/pr105638.C | 44 +++++++++++++++++++++++
 5 files changed, 104 insertions(+), 44 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C

diff --git a/gcc/df-core.cc b/gcc/df-core.cc
index a901b84878f..f9b4de8eb7a 100644
--- a/gcc/df-core.cc
+++ b/gcc/df-core.cc
@@ -2009,6 +2009,50 @@ df_reg_used (rtx_insn *insn, rtx reg)
   return df_find_use (insn, reg) != NULL;
 }
 
+/* If REG has a single definition, return its known value, otherwise return
+   null.  */
+
+rtx
+df_find_single_def_src (rtx reg)
+{
+  rtx src = NULL_RTX;
+
+  /* Don't look through unbounded number of single definition REG copies,
+     there might be loops for sources with uninitialized variables.  */
+  for (int cnt = 0; cnt < 128; cnt++)
+    {
+      df_ref adef = DF_REG_DEF_CHAIN (REGNO (reg));
+      if (adef == NULL || DF_REF_NEXT_REG (adef) != NULL
+	  || DF_REF_IS_ARTIFICIAL (adef)
+	  || (DF_REF_FLAGS (adef)
+	      & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
+	return NULL_RTX;
+
+      rtx set = single_set (DF_REF_INSN (adef));
+      if (set == NULL || !rtx_equal_p (SET_DEST (set), reg))
+	return NULL_RTX;
+
+      rtx note = find_reg_equal_equiv_note (DF_REF_INSN (adef));
+      if (note && function_invariant_p (XEXP (note, 0)))
+	{
+	  src = XEXP (note, 0);
+	  break;
+	}
+      src = SET_SRC (set);
+
+      if (REG_P (src))
+	{
+	  reg = src;
+	  continue;
+	}
+      break;
+    }
+  if (!function_invariant_p (src))
+    return NULL_RTX;
+
+  return src;
+}
+
 \f
 /*----------------------------------------------------------------------------
    Debugging and printing functions.
diff --git a/gcc/df.h b/gcc/df.h
index bd329205d08..71e249ad20a 100644
--- a/gcc/df.h
+++ b/gcc/df.h
@@ -991,6 +991,7 @@ extern df_ref df_find_def (rtx_insn *, rtx);
 extern bool df_reg_defined (rtx_insn *, rtx);
 extern df_ref df_find_use (rtx_insn *, rtx);
 extern bool df_reg_used (rtx_insn *, rtx);
+extern rtx df_find_single_def_src (rtx);
 extern void df_worklist_dataflow (struct dataflow *,bitmap, int *, int);
 extern void df_print_regset (FILE *file, const_bitmap r);
 extern void df_print_word_regset (FILE *file, const_bitmap r);
diff --git a/gcc/dse.cc b/gcc/dse.cc
index 30c11cee034..c915266f025 100644
--- a/gcc/dse.cc
+++ b/gcc/dse.cc
@@ -1508,6 +1508,20 @@ record_store (rtx body, bb_info_t bb_info)
 
 	  if (tem && CONSTANT_P (tem))
 	    const_rhs = tem;
+	  else
+	    {
+	      /* If RHS is set only once to a constant, set CONST_RHS
+		 to the constant.  */
+	      rtx def_src = df_find_single_def_src (rhs);
+	      if (def_src != nullptr && CONSTANT_P (def_src))
+		{
+		  df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
+		  rtx_insn *def_insn = DF_REF_INSN (def);
+		  rtx def_body = single_set (def_insn);
+		  if (rhs == SET_DEST (def_body))
+		    const_rhs = def_src;
+		}
+	    }
 	}
     }
 
diff --git a/gcc/loop-iv.cc b/gcc/loop-iv.cc
index 0eafe7d2362..d639336445a 100644
--- a/gcc/loop-iv.cc
+++ b/gcc/loop-iv.cc
@@ -1378,49 +1378,6 @@ simple_rhs_p (rtx rhs)
     }
 }
 
-/* If REGNO has a single definition, return its known value, otherwise return
-   null.  */
-
-static rtx
-find_single_def_src (unsigned int regno)
-{
-  rtx src = NULL_RTX;
-
-  /* Don't look through unbounded number of single definition REG copies,
-     there might be loops for sources with uninitialized variables.  */
-  for (int cnt = 0; cnt < 128; cnt++)
-    {
-      df_ref adef = DF_REG_DEF_CHAIN (regno);
-      if (adef == NULL || DF_REF_NEXT_REG (adef) != NULL
-	  || DF_REF_IS_ARTIFICIAL (adef))
-	return NULL_RTX;
-
-      rtx set = single_set (DF_REF_INSN (adef));
-      if (set == NULL || !REG_P (SET_DEST (set))
-	  || REGNO (SET_DEST (set)) != regno)
-	return NULL_RTX;
-
-      rtx note = find_reg_equal_equiv_note (DF_REF_INSN (adef));
-      if (note && function_invariant_p (XEXP (note, 0)))
-	{
-	  src = XEXP (note, 0);
-	  break;
-	}
-      src = SET_SRC (set);
-
-      if (REG_P (src))
-	{
-	  regno = REGNO (src);
-	  continue;
-	}
-      break;
-    }
-  if (!function_invariant_p (src))
-    return NULL_RTX;
-
-  return src;
-}
-
 /* If any registers in *EXPR that have a single definition, try to replace
    them with the known-equivalent values.  */
 
@@ -1433,7 +1390,7 @@ replace_single_def_regs (rtx *expr)
     {
       rtx x = *iter;
       if (REG_P (x))
-	if (rtx new_x = find_single_def_src (REGNO (x)))
+	if (rtx new_x = df_find_single_def_src (x))
 	  {
 	    *expr = simplify_replace_rtx (*expr, x, new_x);
 	    goto repeat;
diff --git a/gcc/testsuite/g++.target/i386/pr105638.C b/gcc/testsuite/g++.target/i386/pr105638.C
new file mode 100644
index 00000000000..ff40a459de1
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr105638.C
@@ -0,0 +1,44 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-std=gnu++20 -O2 -march=skylake" } */
+/* { dg-final { scan-assembler-not "vpxor" } } */
+
+#include <stdint.h>
+#include <vector>
+#include <tr1/array>
+
+class FastBoard {
+public:
+    typedef std::pair<int, int> movescore_t;
+    typedef std::tr1::array<movescore_t, 24> scoredlist_t;
+    
+protected:
+    std::vector<int> m_critical;
+
+    int m_boardsize;    
+};
+
+class FastState {
+public:        
+    FastBoard board;
+    
+    int movenum;              
+protected:
+    FastBoard::scoredlist_t scoredmoves;
+};
+
+class KoState : public FastState {
+private:         
+    std::vector<uint64_t> ko_hash_history;   
+    std::vector<uint64_t> hash_history;     
+};
+
+class GameState : public KoState {
+public:                    
+    void foo ();      
+private:
+    std::vector<KoState> game_history;                          
+};
+
+void GameState::foo() {
+    game_history.resize(movenum);
+}
-- 
2.36.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] DSE: Use the constant store source if possible
  2022-05-26 20:43           ` [PATCH v3] " H.J. Lu
@ 2022-05-28 18:37             ` Jeff Law
  2022-05-29 21:43               ` H.J. Lu
  2022-05-30  8:35             ` Richard Sandiford
  1 sibling, 1 reply; 21+ messages in thread
From: Jeff Law @ 2022-05-28 18:37 UTC (permalink / raw)
  To: gcc-patches



On 5/26/2022 2:43 PM, H.J. Lu via Gcc-patches wrote:
> On Thu, May 26, 2022 at 04:14:17PM +0100, Richard Sandiford wrote:
>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>>> On Wed, May 25, 2022 at 12:30 AM Richard Sandiford
>>> <richard.sandiford@arm.com> wrote:
>>>> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
>>>>> On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
>>>>>> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>>> When recording store for RTL dead store elimination, check if the source
>>>>>>> register is set only once to a constant.  If yes, record the constant
>>>>>>> as the store source.  It eliminates unrolled zero stores after memset 0
>>>>>>> in a loop where a vector register is used as the zero store source.
>>>>>>>
>>>>>>> gcc/
>>>>>>>
>>>>>>>          PR rtl-optimization/105638
>>>>>>>          * dse.cc (record_store): Use the constant source if the source
>>>>>>>          register is set only once.
>>>>>>>
>>>>>>> gcc/testsuite/
>>>>>>>
>>>>>>>          PR rtl-optimization/105638
>>>>>>>          * g++.target/i386/pr105638.C: New test.
>>>>>>> ---
>>>>>>>   gcc/dse.cc                               | 19 ++++++++++
>>>>>>>   gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
>>>>>>>   2 files changed, 63 insertions(+)
>>>>>>>   create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>>>>>>>
>>>>>>> diff --git a/gcc/dse.cc b/gcc/dse.cc
>>>>>>> index 30c11cee034..0433dd3d846 100644
>>>>>>> --- a/gcc/dse.cc
>>>>>>> +++ b/gcc/dse.cc
>>>>>>> @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
>>>>>>>
>>>>>>>            if (tem && CONSTANT_P (tem))
>>>>>>>              const_rhs = tem;
>>>>>>> +         else
>>>>>>> +           {
>>>>>>> +             /* If RHS is set only once to a constant, set CONST_RHS
>>>>>>> +                to the constant.  */
>>>>>>> +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>>>>>>> +             if (def != nullptr
>>>>>>> +                 && !DF_REF_IS_ARTIFICIAL (def)
>>>>>>> +                 && !DF_REF_NEXT_REG (def))
>>>>>>> +               {
>>>>>>> +                 rtx_insn *def_insn = DF_REF_INSN (def);
>>>>>>> +                 rtx def_body = PATTERN (def_insn);
>>>>>>> +                 if (GET_CODE (def_body) == SET)
>>>>>>> +                   {
>>>>>>> +                     rtx def_src = SET_SRC (def_body);
>>>>>>> +                     if (CONSTANT_P (def_src))
>>>>>>> +                       const_rhs = def_src;
>>>>>> doesn't DSE have its own tracking of stored values?  Shouldn't we
>>>>> It tracks stored values only within the basic block.  When RTL loop
>>>>> invariant motion hoists a constant initialization out of the loop into
>>>>> a separate basic block, the constant store value becomes unknown
>>>>> within the original basic block.
>>>>>
>>>>>> improve _that_ if it is not enough?  I also wonder if you need to
>>>>> My patch extends DSE stored value tracking to include the constant which
>>>>> is set only once in another basic block.
>>>>>
>>>>>> verify the SET isn't partial?
>>>>>>
>>>>> Here is the v2 patch to check that the constant is set by a non-partial
>>>>> unconditional load.
>>>>>
>>>>> OK for master?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> H.J.
>>>>> ---
>>>>> RTL DSE tracks redundant constant stores within a basic block.  When RTL
>>>>> loop invariant motion hoists a constant initialization out of the loop
>>>>> into a separate basic block, the constant store value becomes unknown
>>>>> within the original basic block.  When recording store for RTL DSE, check
>>>>> if the source register is set only once to a constant by a non-partial
>>>>> unconditional load.  If yes, record the constant as the constant store
>>>>> source.  It eliminates unrolled zero stores after memset 0 in a loop
>>>>> where a vector register is used as the zero store source.
>>>>>
>>>>> gcc/
>>>>>
>>>>>        PR rtl-optimization/105638
>>>>>        * dse.cc (record_store): Use the constant source if the source
>>>>>        register is set only once.
>>>>>
>>>>> gcc/testsuite/
>>>>>
>>>>>        PR rtl-optimization/105638
>>>>>        * g++.target/i386/pr105638.C: New test.
>>>>> ---
>>>>>   gcc/dse.cc                               | 22 ++++++++++++
>>>>>   gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
>>>>>   2 files changed, 66 insertions(+)
>>>>>   create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>>>>>
>>>>> diff --git a/gcc/dse.cc b/gcc/dse.cc
>>>>> index 30c11cee034..af8e88dac32 100644
>>>>> --- a/gcc/dse.cc
>>>>> +++ b/gcc/dse.cc
>>>>> @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
>>>>>
>>>>>          if (tem && CONSTANT_P (tem))
>>>>>            const_rhs = tem;
>>>>> +       else
>>>>> +         {
>>>>> +           /* If RHS is set only once to a constant, set CONST_RHS
>>>>> +              to the constant.  */
>>>>> +           df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>>>>> +           if (def != nullptr
>>>>> +               && !DF_REF_IS_ARTIFICIAL (def)
>>>>> +               && !(DF_REF_FLAGS (def)
>>>>> +                    & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
>>>>> +               && !DF_REF_NEXT_REG (def))
>>>> Can we introduce a helper for this?  There are already similar tests
>>>> in ira and loop-iv, and it seems a bit too complex to have to open-code
>>>> each time.
>>> I can use find_single_def_src in loop-iv.cc:
>>>
>>> /* If REGNO has a single definition, return its known value, otherwise return
>>>     null.  */
>>>
>>> rtx
>>> find_single_def_src (unsigned int regno)
>> Yeah, reusing that sounds good.  Perhaps we should move it into df-core.cc,
>> alongside the df_reg_used group of functions.
>>
>> I think the mode check in your original patch should be kept though,
>> so how about we change the parameter to an rtx reg and use rtx_equal in:
>>
>>        rtx set = single_set (DF_REF_INSN (adef));
>>        if (set == NULL || !REG_P (SET_DEST (set))
>> 	  || REGNO (SET_DEST (set)) != regno)
>> 	return NULL_RTX;
> Fixed.
>
>> rather than the existing !REG_P and REGNO checks.  We should also add:
>>
>>> and do
>>>
>>>               /* If RHS is set only once to a constant, set CONST_RHS
>>>                   to the constant.  */
>>>                rtx def_src = find_single_def_src (REGNO (rhs));
>>>                if (def_src != nullptr && CONSTANT_P (def_src))
>>>                  {
>>>                    df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>>>                    if (!(DF_REF_FLAGS (def)
>>>                          & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
>> …this check to the function, since it's needed for correctness even
>> in the loop-iv.cc use.
> Fixed.
>
>> Thanks,
>> Richard
> Here is the v3 patch.  OK for master?
>
> Thanks.
>
> H.J.
> ---
> RTL DSE tracks redundant constant stores within a basic block.  When RTL
> loop invariant motion hoists a constant initialization out of the loop
> into a separate basic block, the constant store value becomes unknown
> within the original basic block.  When recording store for RTL DSE, check
> if the source register is set only once to a constant by a non-partial
> unconditional load.  If yes, record the constant as the constant store
> source.  It eliminates unrolled zero stores after memset 0 in a loop
> where a vector register is used as the zero store source.
>
> Extract find_single_def_src from loop-iv.cc and move it to df-core.cc:
>
> 1. Rename to df_find_single_def_src.
> 2. Change the argument to rtx and use rtx_equal_p.
> 3. Return null for partial or conditional defs.
>
> gcc/
>
> 	PR rtl-optimization/105638
> 	* df-core.cc (df_find_single_def_sr): Moved and renamed from
> 	find_single_def_src in loop-iv.cc.  Change the argument to rtx
> 	and use rtx_equal_p.  Return null for partial or conditional
> 	defs.
> 	* df.h (df_find_single_def_src): New prototype.
> 	* dse.cc (record_store): Use the constant source if the source
> 	register is set only once.
> 	* loop-iv.cc (find_single_def_src): Moved to df-core.cc.
> 	(replace_single_def_regs): Replace find_single_def_src with
> 	df_find_single_def_src.
>
> gcc/testsuite/
>
> 	PR rtl-optimization/105638
> 	* g++.target/i386/pr105638.C: New test.
Avoiding REG_EQUAL and only handling REG_EQUIV notes would be better 
here.  REG_EQUIV indicates the destination could be replaced with the 
source of the note at any point and semantically the code would still be 
valid.  REG_EQUAL doesn't give us that guarantee.

To allow REG_EQUAL you have to check that the block with the note 
dominates the use.

Jeff


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] DSE: Use the constant store source if possible
  2022-05-28 18:37             ` Jeff Law
@ 2022-05-29 21:43               ` H.J. Lu
  2022-05-29 22:55                 ` Jeff Law
  0 siblings, 1 reply; 21+ messages in thread
From: H.J. Lu @ 2022-05-29 21:43 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

On Sat, May 28, 2022 at 11:37 AM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 5/26/2022 2:43 PM, H.J. Lu via Gcc-patches wrote:
> > On Thu, May 26, 2022 at 04:14:17PM +0100, Richard Sandiford wrote:
> >> "H.J. Lu" <hjl.tools@gmail.com> writes:
> >>> On Wed, May 25, 2022 at 12:30 AM Richard Sandiford
> >>> <richard.sandiford@arm.com> wrote:
> >>>> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
> >>>>> On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
> >>>>>> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
> >>>>>> <gcc-patches@gcc.gnu.org> wrote:
> >>>>>>> When recording store for RTL dead store elimination, check if the source
> >>>>>>> register is set only once to a constant.  If yes, record the constant
> >>>>>>> as the store source.  It eliminates unrolled zero stores after memset 0
> >>>>>>> in a loop where a vector register is used as the zero store source.
> >>>>>>>
> >>>>>>> gcc/
> >>>>>>>
> >>>>>>>          PR rtl-optimization/105638
> >>>>>>>          * dse.cc (record_store): Use the constant source if the source
> >>>>>>>          register is set only once.
> >>>>>>>
> >>>>>>> gcc/testsuite/
> >>>>>>>
> >>>>>>>          PR rtl-optimization/105638
> >>>>>>>          * g++.target/i386/pr105638.C: New test.
> >>>>>>> ---
> >>>>>>>   gcc/dse.cc                               | 19 ++++++++++
> >>>>>>>   gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
> >>>>>>>   2 files changed, 63 insertions(+)
> >>>>>>>   create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> >>>>>>>
> >>>>>>> diff --git a/gcc/dse.cc b/gcc/dse.cc
> >>>>>>> index 30c11cee034..0433dd3d846 100644
> >>>>>>> --- a/gcc/dse.cc
> >>>>>>> +++ b/gcc/dse.cc
> >>>>>>> @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
> >>>>>>>
> >>>>>>>            if (tem && CONSTANT_P (tem))
> >>>>>>>              const_rhs = tem;
> >>>>>>> +         else
> >>>>>>> +           {
> >>>>>>> +             /* If RHS is set only once to a constant, set CONST_RHS
> >>>>>>> +                to the constant.  */
> >>>>>>> +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> >>>>>>> +             if (def != nullptr
> >>>>>>> +                 && !DF_REF_IS_ARTIFICIAL (def)
> >>>>>>> +                 && !DF_REF_NEXT_REG (def))
> >>>>>>> +               {
> >>>>>>> +                 rtx_insn *def_insn = DF_REF_INSN (def);
> >>>>>>> +                 rtx def_body = PATTERN (def_insn);
> >>>>>>> +                 if (GET_CODE (def_body) == SET)
> >>>>>>> +                   {
> >>>>>>> +                     rtx def_src = SET_SRC (def_body);
> >>>>>>> +                     if (CONSTANT_P (def_src))
> >>>>>>> +                       const_rhs = def_src;
> >>>>>> doesn't DSE have its own tracking of stored values?  Shouldn't we
> >>>>> It tracks stored values only within the basic block.  When RTL loop
> >>>>> invariant motion hoists a constant initialization out of the loop into
> >>>>> a separate basic block, the constant store value becomes unknown
> >>>>> within the original basic block.
> >>>>>
> >>>>>> improve _that_ if it is not enough?  I also wonder if you need to
> >>>>> My patch extends DSE stored value tracking to include the constant which
> >>>>> is set only once in another basic block.
> >>>>>
> >>>>>> verify the SET isn't partial?
> >>>>>>
> >>>>> Here is the v2 patch to check that the constant is set by a non-partial
> >>>>> unconditional load.
> >>>>>
> >>>>> OK for master?
> >>>>>
> >>>>> Thanks.
> >>>>>
> >>>>> H.J.
> >>>>> ---
> >>>>> RTL DSE tracks redundant constant stores within a basic block.  When RTL
> >>>>> loop invariant motion hoists a constant initialization out of the loop
> >>>>> into a separate basic block, the constant store value becomes unknown
> >>>>> within the original basic block.  When recording store for RTL DSE, check
> >>>>> if the source register is set only once to a constant by a non-partial
> >>>>> unconditional load.  If yes, record the constant as the constant store
> >>>>> source.  It eliminates unrolled zero stores after memset 0 in a loop
> >>>>> where a vector register is used as the zero store source.
> >>>>>
> >>>>> gcc/
> >>>>>
> >>>>>        PR rtl-optimization/105638
> >>>>>        * dse.cc (record_store): Use the constant source if the source
> >>>>>        register is set only once.
> >>>>>
> >>>>> gcc/testsuite/
> >>>>>
> >>>>>        PR rtl-optimization/105638
> >>>>>        * g++.target/i386/pr105638.C: New test.
> >>>>> ---
> >>>>>   gcc/dse.cc                               | 22 ++++++++++++
> >>>>>   gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
> >>>>>   2 files changed, 66 insertions(+)
> >>>>>   create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> >>>>>
> >>>>> diff --git a/gcc/dse.cc b/gcc/dse.cc
> >>>>> index 30c11cee034..af8e88dac32 100644
> >>>>> --- a/gcc/dse.cc
> >>>>> +++ b/gcc/dse.cc
> >>>>> @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
> >>>>>
> >>>>>          if (tem && CONSTANT_P (tem))
> >>>>>            const_rhs = tem;
> >>>>> +       else
> >>>>> +         {
> >>>>> +           /* If RHS is set only once to a constant, set CONST_RHS
> >>>>> +              to the constant.  */
> >>>>> +           df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> >>>>> +           if (def != nullptr
> >>>>> +               && !DF_REF_IS_ARTIFICIAL (def)
> >>>>> +               && !(DF_REF_FLAGS (def)
> >>>>> +                    & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
> >>>>> +               && !DF_REF_NEXT_REG (def))
> >>>> Can we introduce a helper for this?  There are already similar tests
> >>>> in ira and loop-iv, and it seems a bit too complex to have to open-code
> >>>> each time.
> >>> I can use find_single_def_src in loop-iv.cc:
> >>>
> >>> /* If REGNO has a single definition, return its known value, otherwise return
> >>>     null.  */
> >>>
> >>> rtx
> >>> find_single_def_src (unsigned int regno)
> >> Yeah, reusing that sounds good.  Perhaps we should move it into df-core.cc,
> >> alongside the df_reg_used group of functions.
> >>
> >> I think the mode check in your original patch should be kept though,
> >> so how about we change the parameter to an rtx reg and use rtx_equal in:
> >>
> >>        rtx set = single_set (DF_REF_INSN (adef));
> >>        if (set == NULL || !REG_P (SET_DEST (set))
> >>        || REGNO (SET_DEST (set)) != regno)
> >>      return NULL_RTX;
> > Fixed.
> >
> >> rather than the existing !REG_P and REGNO checks.  We should also add:
> >>
> >>> and do
> >>>
> >>>               /* If RHS is set only once to a constant, set CONST_RHS
> >>>                   to the constant.  */
> >>>                rtx def_src = find_single_def_src (REGNO (rhs));
> >>>                if (def_src != nullptr && CONSTANT_P (def_src))
> >>>                  {
> >>>                    df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> >>>                    if (!(DF_REF_FLAGS (def)
> >>>                          & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
> >> …this check to the function, since it's needed for correctness even
> >> in the loop-iv.cc use.
> > Fixed.
> >
> >> Thanks,
> >> Richard
> > Here is the v3 patch.  OK for master?
> >
> > Thanks.
> >
> > H.J.
> > ---
> > RTL DSE tracks redundant constant stores within a basic block.  When RTL
> > loop invariant motion hoists a constant initialization out of the loop
> > into a separate basic block, the constant store value becomes unknown
> > within the original basic block.  When recording store for RTL DSE, check
> > if the source register is set only once to a constant by a non-partial
> > unconditional load.  If yes, record the constant as the constant store
> > source.  It eliminates unrolled zero stores after memset 0 in a loop
> > where a vector register is used as the zero store source.
> >
> > Extract find_single_def_src from loop-iv.cc and move it to df-core.cc:
> >
> > 1. Rename to df_find_single_def_src.
> > 2. Change the argument to rtx and use rtx_equal_p.
> > 3. Return null for partial or conditional defs.
> >
> > gcc/
> >
> >       PR rtl-optimization/105638
> >       * df-core.cc (df_find_single_def_sr): Moved and renamed from
> >       find_single_def_src in loop-iv.cc.  Change the argument to rtx
> >       and use rtx_equal_p.  Return null for partial or conditional
> >       defs.
> >       * df.h (df_find_single_def_src): New prototype.
> >       * dse.cc (record_store): Use the constant source if the source
> >       register is set only once.
> >       * loop-iv.cc (find_single_def_src): Moved to df-core.cc.
> >       (replace_single_def_regs): Replace find_single_def_src with
> >       df_find_single_def_src.
> >
> > gcc/testsuite/
> >
> >       PR rtl-optimization/105638
> >       * g++.target/i386/pr105638.C: New test.
> Avoiding REG_EQUAL and only handling REG_EQUIV notes would be better
> here.  REG_EQUIV indicates the destination could be replaced with the
> source of the note at any point and semantically the code would still be
> valid.  REG_EQUAL doesn't give us that guarantee.
>
> To allow REG_EQUAL you have to check that the block with the note
> dominates the use.

When a use only has one non-conditional def which doesn't dominate
the use, isn't its behavior undefined?

-- 
H.J.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] DSE: Use the constant store source if possible
  2022-05-29 21:43               ` H.J. Lu
@ 2022-05-29 22:55                 ` Jeff Law
  2022-05-30  8:28                   ` Richard Sandiford
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff Law @ 2022-05-29 22:55 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches



On 5/29/2022 3:43 PM, H.J. Lu wrote:
> On Sat, May 28, 2022 at 11:37 AM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>>
>> On 5/26/2022 2:43 PM, H.J. Lu via Gcc-patches wrote:
>>> On Thu, May 26, 2022 at 04:14:17PM +0100, Richard Sandiford wrote:
>>>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>>>>> On Wed, May 25, 2022 at 12:30 AM Richard Sandiford
>>>>> <richard.sandiford@arm.com> wrote:
>>>>>> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
>>>>>>> On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
>>>>>>>> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
>>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>>>>> When recording store for RTL dead store elimination, check if the source
>>>>>>>>> register is set only once to a constant.  If yes, record the constant
>>>>>>>>> as the store source.  It eliminates unrolled zero stores after memset 0
>>>>>>>>> in a loop where a vector register is used as the zero store source.
>>>>>>>>>
>>>>>>>>> gcc/
>>>>>>>>>
>>>>>>>>>           PR rtl-optimization/105638
>>>>>>>>>           * dse.cc (record_store): Use the constant source if the source
>>>>>>>>>           register is set only once.
>>>>>>>>>
>>>>>>>>> gcc/testsuite/
>>>>>>>>>
>>>>>>>>>           PR rtl-optimization/105638
>>>>>>>>>           * g++.target/i386/pr105638.C: New test.
>>>>>>>>> ---
>>>>>>>>>    gcc/dse.cc                               | 19 ++++++++++
>>>>>>>>>    gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
>>>>>>>>>    2 files changed, 63 insertions(+)
>>>>>>>>>    create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>>>>>>>>>
>>>>>>>>> diff --git a/gcc/dse.cc b/gcc/dse.cc
>>>>>>>>> index 30c11cee034..0433dd3d846 100644
>>>>>>>>> --- a/gcc/dse.cc
>>>>>>>>> +++ b/gcc/dse.cc
>>>>>>>>> @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
>>>>>>>>>
>>>>>>>>>             if (tem && CONSTANT_P (tem))
>>>>>>>>>               const_rhs = tem;
>>>>>>>>> +         else
>>>>>>>>> +           {
>>>>>>>>> +             /* If RHS is set only once to a constant, set CONST_RHS
>>>>>>>>> +                to the constant.  */
>>>>>>>>> +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>>>>>>>>> +             if (def != nullptr
>>>>>>>>> +                 && !DF_REF_IS_ARTIFICIAL (def)
>>>>>>>>> +                 && !DF_REF_NEXT_REG (def))
>>>>>>>>> +               {
>>>>>>>>> +                 rtx_insn *def_insn = DF_REF_INSN (def);
>>>>>>>>> +                 rtx def_body = PATTERN (def_insn);
>>>>>>>>> +                 if (GET_CODE (def_body) == SET)
>>>>>>>>> +                   {
>>>>>>>>> +                     rtx def_src = SET_SRC (def_body);
>>>>>>>>> +                     if (CONSTANT_P (def_src))
>>>>>>>>> +                       const_rhs = def_src;
>>>>>>>> doesn't DSE have its own tracking of stored values?  Shouldn't we
>>>>>>> It tracks stored values only within the basic block.  When RTL loop
>>>>>>> invariant motion hoists a constant initialization out of the loop into
>>>>>>> a separate basic block, the constant store value becomes unknown
>>>>>>> within the original basic block.
>>>>>>>
>>>>>>>> improve _that_ if it is not enough?  I also wonder if you need to
>>>>>>> My patch extends DSE stored value tracking to include the constant which
>>>>>>> is set only once in another basic block.
>>>>>>>
>>>>>>>> verify the SET isn't partial?
>>>>>>>>
>>>>>>> Here is the v2 patch to check that the constant is set by a non-partial
>>>>>>> unconditional load.
>>>>>>>
>>>>>>> OK for master?
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> H.J.
>>>>>>> ---
>>>>>>> RTL DSE tracks redundant constant stores within a basic block.  When RTL
>>>>>>> loop invariant motion hoists a constant initialization out of the loop
>>>>>>> into a separate basic block, the constant store value becomes unknown
>>>>>>> within the original basic block.  When recording store for RTL DSE, check
>>>>>>> if the source register is set only once to a constant by a non-partial
>>>>>>> unconditional load.  If yes, record the constant as the constant store
>>>>>>> source.  It eliminates unrolled zero stores after memset 0 in a loop
>>>>>>> where a vector register is used as the zero store source.
>>>>>>>
>>>>>>> gcc/
>>>>>>>
>>>>>>>         PR rtl-optimization/105638
>>>>>>>         * dse.cc (record_store): Use the constant source if the source
>>>>>>>         register is set only once.
>>>>>>>
>>>>>>> gcc/testsuite/
>>>>>>>
>>>>>>>         PR rtl-optimization/105638
>>>>>>>         * g++.target/i386/pr105638.C: New test.
>>>>>>> ---
>>>>>>>    gcc/dse.cc                               | 22 ++++++++++++
>>>>>>>    gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
>>>>>>>    2 files changed, 66 insertions(+)
>>>>>>>    create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>>>>>>>
>>>>>>> diff --git a/gcc/dse.cc b/gcc/dse.cc
>>>>>>> index 30c11cee034..af8e88dac32 100644
>>>>>>> --- a/gcc/dse.cc
>>>>>>> +++ b/gcc/dse.cc
>>>>>>> @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
>>>>>>>
>>>>>>>           if (tem && CONSTANT_P (tem))
>>>>>>>             const_rhs = tem;
>>>>>>> +       else
>>>>>>> +         {
>>>>>>> +           /* If RHS is set only once to a constant, set CONST_RHS
>>>>>>> +              to the constant.  */
>>>>>>> +           df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>>>>>>> +           if (def != nullptr
>>>>>>> +               && !DF_REF_IS_ARTIFICIAL (def)
>>>>>>> +               && !(DF_REF_FLAGS (def)
>>>>>>> +                    & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
>>>>>>> +               && !DF_REF_NEXT_REG (def))
>>>>>> Can we introduce a helper for this?  There are already similar tests
>>>>>> in ira and loop-iv, and it seems a bit too complex to have to open-code
>>>>>> each time.
>>>>> I can use find_single_def_src in loop-iv.cc:
>>>>>
>>>>> /* If REGNO has a single definition, return its known value, otherwise return
>>>>>      null.  */
>>>>>
>>>>> rtx
>>>>> find_single_def_src (unsigned int regno)
>>>> Yeah, reusing that sounds good.  Perhaps we should move it into df-core.cc,
>>>> alongside the df_reg_used group of functions.
>>>>
>>>> I think the mode check in your original patch should be kept though,
>>>> so how about we change the parameter to an rtx reg and use rtx_equal in:
>>>>
>>>>         rtx set = single_set (DF_REF_INSN (adef));
>>>>         if (set == NULL || !REG_P (SET_DEST (set))
>>>>         || REGNO (SET_DEST (set)) != regno)
>>>>       return NULL_RTX;
>>> Fixed.
>>>
>>>> rather than the existing !REG_P and REGNO checks.  We should also add:
>>>>
>>>>> and do
>>>>>
>>>>>                /* If RHS is set only once to a constant, set CONST_RHS
>>>>>                    to the constant.  */
>>>>>                 rtx def_src = find_single_def_src (REGNO (rhs));
>>>>>                 if (def_src != nullptr && CONSTANT_P (def_src))
>>>>>                   {
>>>>>                     df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>>>>>                     if (!(DF_REF_FLAGS (def)
>>>>>                           & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
>>>> …this check to the function, since it's needed for correctness even
>>>> in the loop-iv.cc use.
>>> Fixed.
>>>
>>>> Thanks,
>>>> Richard
>>> Here is the v3 patch.  OK for master?
>>>
>>> Thanks.
>>>
>>> H.J.
>>> ---
>>> RTL DSE tracks redundant constant stores within a basic block.  When RTL
>>> loop invariant motion hoists a constant initialization out of the loop
>>> into a separate basic block, the constant store value becomes unknown
>>> within the original basic block.  When recording store for RTL DSE, check
>>> if the source register is set only once to a constant by a non-partial
>>> unconditional load.  If yes, record the constant as the constant store
>>> source.  It eliminates unrolled zero stores after memset 0 in a loop
>>> where a vector register is used as the zero store source.
>>>
>>> Extract find_single_def_src from loop-iv.cc and move it to df-core.cc:
>>>
>>> 1. Rename to df_find_single_def_src.
>>> 2. Change the argument to rtx and use rtx_equal_p.
>>> 3. Return null for partial or conditional defs.
>>>
>>> gcc/
>>>
>>>        PR rtl-optimization/105638
>>>        * df-core.cc (df_find_single_def_sr): Moved and renamed from
>>>        find_single_def_src in loop-iv.cc.  Change the argument to rtx
>>>        and use rtx_equal_p.  Return null for partial or conditional
>>>        defs.
>>>        * df.h (df_find_single_def_src): New prototype.
>>>        * dse.cc (record_store): Use the constant source if the source
>>>        register is set only once.
>>>        * loop-iv.cc (find_single_def_src): Moved to df-core.cc.
>>>        (replace_single_def_regs): Replace find_single_def_src with
>>>        df_find_single_def_src.
>>>
>>> gcc/testsuite/
>>>
>>>        PR rtl-optimization/105638
>>>        * g++.target/i386/pr105638.C: New test.
>> Avoiding REG_EQUAL and only handling REG_EQUIV notes would be better
>> here.  REG_EQUIV indicates the destination could be replaced with the
>> source of the note at any point and semantically the code would still be
>> valid.  REG_EQUAL doesn't give us that guarantee.
>>
>> To allow REG_EQUAL you have to check that the block with the note
>> dominates the use.
> When a use only has one non-conditional def which doesn't dominate
> the use, isn't its behavior undefined?
I think so, even for an irreducible loop.  Even so, it's the safest 
thing to do, particularly if someone tries to extend this code in the 
future.

jeff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] DSE: Use the constant store source if possible
  2022-05-29 22:55                 ` Jeff Law
@ 2022-05-30  8:28                   ` Richard Sandiford
  2022-05-30 22:58                     ` Jeff Law
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Sandiford @ 2022-05-30  8:28 UTC (permalink / raw)
  To: Jeff Law via Gcc-patches

Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On 5/29/2022 3:43 PM, H.J. Lu wrote:
>> On Sat, May 28, 2022 at 11:37 AM Jeff Law via Gcc-patches
>> <gcc-patches@gcc.gnu.org> wrote:
>>>
>>>
>>> On 5/26/2022 2:43 PM, H.J. Lu via Gcc-patches wrote:
>>>> On Thu, May 26, 2022 at 04:14:17PM +0100, Richard Sandiford wrote:
>>>>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>>>>>> On Wed, May 25, 2022 at 12:30 AM Richard Sandiford
>>>>>> <richard.sandiford@arm.com> wrote:
>>>>>>> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
>>>>>>>> On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
>>>>>>>>> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
>>>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>>>>>> When recording store for RTL dead store elimination, check if the source
>>>>>>>>>> register is set only once to a constant.  If yes, record the constant
>>>>>>>>>> as the store source.  It eliminates unrolled zero stores after memset 0
>>>>>>>>>> in a loop where a vector register is used as the zero store source.
>>>>>>>>>>
>>>>>>>>>> gcc/
>>>>>>>>>>
>>>>>>>>>>           PR rtl-optimization/105638
>>>>>>>>>>           * dse.cc (record_store): Use the constant source if the source
>>>>>>>>>>           register is set only once.
>>>>>>>>>>
>>>>>>>>>> gcc/testsuite/
>>>>>>>>>>
>>>>>>>>>>           PR rtl-optimization/105638
>>>>>>>>>>           * g++.target/i386/pr105638.C: New test.
>>>>>>>>>> ---
>>>>>>>>>>    gcc/dse.cc                               | 19 ++++++++++
>>>>>>>>>>    gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
>>>>>>>>>>    2 files changed, 63 insertions(+)
>>>>>>>>>>    create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>>>>>>>>>>
>>>>>>>>>> diff --git a/gcc/dse.cc b/gcc/dse.cc
>>>>>>>>>> index 30c11cee034..0433dd3d846 100644
>>>>>>>>>> --- a/gcc/dse.cc
>>>>>>>>>> +++ b/gcc/dse.cc
>>>>>>>>>> @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
>>>>>>>>>>
>>>>>>>>>>             if (tem && CONSTANT_P (tem))
>>>>>>>>>>               const_rhs = tem;
>>>>>>>>>> +         else
>>>>>>>>>> +           {
>>>>>>>>>> +             /* If RHS is set only once to a constant, set CONST_RHS
>>>>>>>>>> +                to the constant.  */
>>>>>>>>>> +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>>>>>>>>>> +             if (def != nullptr
>>>>>>>>>> +                 && !DF_REF_IS_ARTIFICIAL (def)
>>>>>>>>>> +                 && !DF_REF_NEXT_REG (def))
>>>>>>>>>> +               {
>>>>>>>>>> +                 rtx_insn *def_insn = DF_REF_INSN (def);
>>>>>>>>>> +                 rtx def_body = PATTERN (def_insn);
>>>>>>>>>> +                 if (GET_CODE (def_body) == SET)
>>>>>>>>>> +                   {
>>>>>>>>>> +                     rtx def_src = SET_SRC (def_body);
>>>>>>>>>> +                     if (CONSTANT_P (def_src))
>>>>>>>>>> +                       const_rhs = def_src;
>>>>>>>>> doesn't DSE have its own tracking of stored values?  Shouldn't we
>>>>>>>> It tracks stored values only within the basic block.  When RTL loop
>>>>>>>> invariant motion hoists a constant initialization out of the loop into
>>>>>>>> a separate basic block, the constant store value becomes unknown
>>>>>>>> within the original basic block.
>>>>>>>>
>>>>>>>>> improve _that_ if it is not enough?  I also wonder if you need to
>>>>>>>> My patch extends DSE stored value tracking to include the constant which
>>>>>>>> is set only once in another basic block.
>>>>>>>>
>>>>>>>>> verify the SET isn't partial?
>>>>>>>>>
>>>>>>>> Here is the v2 patch to check that the constant is set by a non-partial
>>>>>>>> unconditional load.
>>>>>>>>
>>>>>>>> OK for master?
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> H.J.
>>>>>>>> ---
>>>>>>>> RTL DSE tracks redundant constant stores within a basic block.  When RTL
>>>>>>>> loop invariant motion hoists a constant initialization out of the loop
>>>>>>>> into a separate basic block, the constant store value becomes unknown
>>>>>>>> within the original basic block.  When recording store for RTL DSE, check
>>>>>>>> if the source register is set only once to a constant by a non-partial
>>>>>>>> unconditional load.  If yes, record the constant as the constant store
>>>>>>>> source.  It eliminates unrolled zero stores after memset 0 in a loop
>>>>>>>> where a vector register is used as the zero store source.
>>>>>>>>
>>>>>>>> gcc/
>>>>>>>>
>>>>>>>>         PR rtl-optimization/105638
>>>>>>>>         * dse.cc (record_store): Use the constant source if the source
>>>>>>>>         register is set only once.
>>>>>>>>
>>>>>>>> gcc/testsuite/
>>>>>>>>
>>>>>>>>         PR rtl-optimization/105638
>>>>>>>>         * g++.target/i386/pr105638.C: New test.
>>>>>>>> ---
>>>>>>>>    gcc/dse.cc                               | 22 ++++++++++++
>>>>>>>>    gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
>>>>>>>>    2 files changed, 66 insertions(+)
>>>>>>>>    create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>>>>>>>>
>>>>>>>> diff --git a/gcc/dse.cc b/gcc/dse.cc
>>>>>>>> index 30c11cee034..af8e88dac32 100644
>>>>>>>> --- a/gcc/dse.cc
>>>>>>>> +++ b/gcc/dse.cc
>>>>>>>> @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
>>>>>>>>
>>>>>>>>           if (tem && CONSTANT_P (tem))
>>>>>>>>             const_rhs = tem;
>>>>>>>> +       else
>>>>>>>> +         {
>>>>>>>> +           /* If RHS is set only once to a constant, set CONST_RHS
>>>>>>>> +              to the constant.  */
>>>>>>>> +           df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>>>>>>>> +           if (def != nullptr
>>>>>>>> +               && !DF_REF_IS_ARTIFICIAL (def)
>>>>>>>> +               && !(DF_REF_FLAGS (def)
>>>>>>>> +                    & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
>>>>>>>> +               && !DF_REF_NEXT_REG (def))
>>>>>>> Can we introduce a helper for this?  There are already similar tests
>>>>>>> in ira and loop-iv, and it seems a bit too complex to have to open-code
>>>>>>> each time.
>>>>>> I can use find_single_def_src in loop-iv.cc:
>>>>>>
>>>>>> /* If REGNO has a single definition, return its known value, otherwise return
>>>>>>      null.  */
>>>>>>
>>>>>> rtx
>>>>>> find_single_def_src (unsigned int regno)
>>>>> Yeah, reusing that sounds good.  Perhaps we should move it into df-core.cc,
>>>>> alongside the df_reg_used group of functions.
>>>>>
>>>>> I think the mode check in your original patch should be kept though,
>>>>> so how about we change the parameter to an rtx reg and use rtx_equal in:
>>>>>
>>>>>         rtx set = single_set (DF_REF_INSN (adef));
>>>>>         if (set == NULL || !REG_P (SET_DEST (set))
>>>>>         || REGNO (SET_DEST (set)) != regno)
>>>>>       return NULL_RTX;
>>>> Fixed.
>>>>
>>>>> rather than the existing !REG_P and REGNO checks.  We should also add:
>>>>>
>>>>>> and do
>>>>>>
>>>>>>                /* If RHS is set only once to a constant, set CONST_RHS
>>>>>>                    to the constant.  */
>>>>>>                 rtx def_src = find_single_def_src (REGNO (rhs));
>>>>>>                 if (def_src != nullptr && CONSTANT_P (def_src))
>>>>>>                   {
>>>>>>                     df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>>>>>>                     if (!(DF_REF_FLAGS (def)
>>>>>>                           & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
>>>>> …this check to the function, since it's needed for correctness even
>>>>> in the loop-iv.cc use.
>>>> Fixed.
>>>>
>>>>> Thanks,
>>>>> Richard
>>>> Here is the v3 patch.  OK for master?
>>>>
>>>> Thanks.
>>>>
>>>> H.J.
>>>> ---
>>>> RTL DSE tracks redundant constant stores within a basic block.  When RTL
>>>> loop invariant motion hoists a constant initialization out of the loop
>>>> into a separate basic block, the constant store value becomes unknown
>>>> within the original basic block.  When recording store for RTL DSE, check
>>>> if the source register is set only once to a constant by a non-partial
>>>> unconditional load.  If yes, record the constant as the constant store
>>>> source.  It eliminates unrolled zero stores after memset 0 in a loop
>>>> where a vector register is used as the zero store source.
>>>>
>>>> Extract find_single_def_src from loop-iv.cc and move it to df-core.cc:
>>>>
>>>> 1. Rename to df_find_single_def_src.
>>>> 2. Change the argument to rtx and use rtx_equal_p.
>>>> 3. Return null for partial or conditional defs.
>>>>
>>>> gcc/
>>>>
>>>>        PR rtl-optimization/105638
>>>>        * df-core.cc (df_find_single_def_sr): Moved and renamed from
>>>>        find_single_def_src in loop-iv.cc.  Change the argument to rtx
>>>>        and use rtx_equal_p.  Return null for partial or conditional
>>>>        defs.
>>>>        * df.h (df_find_single_def_src): New prototype.
>>>>        * dse.cc (record_store): Use the constant source if the source
>>>>        register is set only once.
>>>>        * loop-iv.cc (find_single_def_src): Moved to df-core.cc.
>>>>        (replace_single_def_regs): Replace find_single_def_src with
>>>>        df_find_single_def_src.
>>>>
>>>> gcc/testsuite/
>>>>
>>>>        PR rtl-optimization/105638
>>>>        * g++.target/i386/pr105638.C: New test.
>>> Avoiding REG_EQUAL and only handling REG_EQUIV notes would be better
>>> here.  REG_EQUIV indicates the destination could be replaced with the
>>> source of the note at any point and semantically the code would still be
>>> valid.  REG_EQUAL doesn't give us that guarantee.
>>>
>>> To allow REG_EQUAL you have to check that the block with the note
>>> dominates the use.
>> When a use only has one non-conditional def which doesn't dominate
>> the use, isn't its behavior undefined?
> I think so, even for an irreducible loop.  Even so, it's the safest 
> thing to do, particularly if someone tries to extend this code in the 
> future.

If any use of R was upwards exposed, R would have an artificial
definition in the entry block in addition to any “real” definition.
Since we're checking for exactly one definition in total, it should
follow that the definition dominates all uses.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] DSE: Use the constant store source if possible
  2022-05-26 20:43           ` [PATCH v3] " H.J. Lu
  2022-05-28 18:37             ` Jeff Law
@ 2022-05-30  8:35             ` Richard Sandiford
  2022-05-31 17:12               ` [PATCH v4] " H.J. Lu
  1 sibling, 1 reply; 21+ messages in thread
From: Richard Sandiford @ 2022-05-30  8:35 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Gcc-patches, Richard Biener

"H.J. Lu" <hjl.tools@gmail.com> writes:
> On Thu, May 26, 2022 at 04:14:17PM +0100, Richard Sandiford wrote:
>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>> > On Wed, May 25, 2022 at 12:30 AM Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> >>
>> >> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
>> >> > On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
>> >> >> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
>> >> >> <gcc-patches@gcc.gnu.org> wrote:
>> >> >> >
>> >> >> > When recording store for RTL dead store elimination, check if the source
>> >> >> > register is set only once to a constant.  If yes, record the constant
>> >> >> > as the store source.  It eliminates unrolled zero stores after memset 0
>> >> >> > in a loop where a vector register is used as the zero store source.
>> >> >> >
>> >> >> > gcc/
>> >> >> >
>> >> >> >         PR rtl-optimization/105638
>> >> >> >         * dse.cc (record_store): Use the constant source if the source
>> >> >> >         register is set only once.
>> >> >> >
>> >> >> > gcc/testsuite/
>> >> >> >
>> >> >> >         PR rtl-optimization/105638
>> >> >> >         * g++.target/i386/pr105638.C: New test.
>> >> >> > ---
>> >> >> >  gcc/dse.cc                               | 19 ++++++++++
>> >> >> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
>> >> >> >  2 files changed, 63 insertions(+)
>> >> >> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>> >> >> >
>> >> >> > diff --git a/gcc/dse.cc b/gcc/dse.cc
>> >> >> > index 30c11cee034..0433dd3d846 100644
>> >> >> > --- a/gcc/dse.cc
>> >> >> > +++ b/gcc/dse.cc
>> >> >> > @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
>> >> >> >
>> >> >> >           if (tem && CONSTANT_P (tem))
>> >> >> >             const_rhs = tem;
>> >> >> > +         else
>> >> >> > +           {
>> >> >> > +             /* If RHS is set only once to a constant, set CONST_RHS
>> >> >> > +                to the constant.  */
>> >> >> > +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>> >> >> > +             if (def != nullptr
>> >> >> > +                 && !DF_REF_IS_ARTIFICIAL (def)
>> >> >> > +                 && !DF_REF_NEXT_REG (def))
>> >> >> > +               {
>> >> >> > +                 rtx_insn *def_insn = DF_REF_INSN (def);
>> >> >> > +                 rtx def_body = PATTERN (def_insn);
>> >> >> > +                 if (GET_CODE (def_body) == SET)
>> >> >> > +                   {
>> >> >> > +                     rtx def_src = SET_SRC (def_body);
>> >> >> > +                     if (CONSTANT_P (def_src))
>> >> >> > +                       const_rhs = def_src;
>> >> >>
>> >> >> doesn't DSE have its own tracking of stored values?  Shouldn't we
>> >> >
>> >> > It tracks stored values only within the basic block.  When RTL loop
>> >> > invariant motion hoists a constant initialization out of the loop into
>> >> > a separate basic block, the constant store value becomes unknown
>> >> > within the original basic block.
>> >> >
>> >> >> improve _that_ if it is not enough?  I also wonder if you need to
>> >> >
>> >> > My patch extends DSE stored value tracking to include the constant which
>> >> > is set only once in another basic block.
>> >> >
>> >> >> verify the SET isn't partial?
>> >> >>
>> >> >
>> >> > Here is the v2 patch to check that the constant is set by a non-partial
>> >> > unconditional load.
>> >> >
>> >> > OK for master?
>> >> >
>> >> > Thanks.
>> >> >
>> >> > H.J.
>> >> > ---
>> >> > RTL DSE tracks redundant constant stores within a basic block.  When RTL
>> >> > loop invariant motion hoists a constant initialization out of the loop
>> >> > into a separate basic block, the constant store value becomes unknown
>> >> > within the original basic block.  When recording store for RTL DSE, check
>> >> > if the source register is set only once to a constant by a non-partial
>> >> > unconditional load.  If yes, record the constant as the constant store
>> >> > source.  It eliminates unrolled zero stores after memset 0 in a loop
>> >> > where a vector register is used as the zero store source.
>> >> >
>> >> > gcc/
>> >> >
>> >> >       PR rtl-optimization/105638
>> >> >       * dse.cc (record_store): Use the constant source if the source
>> >> >       register is set only once.
>> >> >
>> >> > gcc/testsuite/
>> >> >
>> >> >       PR rtl-optimization/105638
>> >> >       * g++.target/i386/pr105638.C: New test.
>> >> > ---
>> >> >  gcc/dse.cc                               | 22 ++++++++++++
>> >> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
>> >> >  2 files changed, 66 insertions(+)
>> >> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>> >> >
>> >> > diff --git a/gcc/dse.cc b/gcc/dse.cc
>> >> > index 30c11cee034..af8e88dac32 100644
>> >> > --- a/gcc/dse.cc
>> >> > +++ b/gcc/dse.cc
>> >> > @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
>> >> >
>> >> >         if (tem && CONSTANT_P (tem))
>> >> >           const_rhs = tem;
>> >> > +       else
>> >> > +         {
>> >> > +           /* If RHS is set only once to a constant, set CONST_RHS
>> >> > +              to the constant.  */
>> >> > +           df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>> >> > +           if (def != nullptr
>> >> > +               && !DF_REF_IS_ARTIFICIAL (def)
>> >> > +               && !(DF_REF_FLAGS (def)
>> >> > +                    & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
>> >> > +               && !DF_REF_NEXT_REG (def))
>> >>
>> >> Can we introduce a helper for this?  There are already similar tests
>> >> in ira and loop-iv, and it seems a bit too complex to have to open-code
>> >> each time.
>> >
>> > I can use find_single_def_src in loop-iv.cc:
>> >
>> > /* If REGNO has a single definition, return its known value, otherwise return
>> >    null.  */
>> >
>> > rtx
>> > find_single_def_src (unsigned int regno)
>> 
>> Yeah, reusing that sounds good.  Perhaps we should move it into df-core.cc,
>> alongside the df_reg_used group of functions.
>> 
>> I think the mode check in your original patch should be kept though,
>> so how about we change the parameter to an rtx reg and use rtx_equal in:
>> 
>>       rtx set = single_set (DF_REF_INSN (adef));
>>       if (set == NULL || !REG_P (SET_DEST (set))
>> 	  || REGNO (SET_DEST (set)) != regno)
>> 	return NULL_RTX;
>
> Fixed.
>
>> rather than the existing !REG_P and REGNO checks.  We should also add:
>> 
>> >
>> > and do
>> >
>> >              /* If RHS is set only once to a constant, set CONST_RHS
>> >                  to the constant.  */
>> >               rtx def_src = find_single_def_src (REGNO (rhs));
>> >               if (def_src != nullptr && CONSTANT_P (def_src))
>> >                 {
>> >                   df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>> >                   if (!(DF_REF_FLAGS (def)
>> >                         & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
>> 
>> …this check to the function, since it's needed for correctness even
>> in the loop-iv.cc use.
>
> Fixed.
>
>> 
>> Thanks,
>> Richard
>
> Here is the v3 patch.  OK for master?
>
> Thanks.
>
> H.J.
> ---
> RTL DSE tracks redundant constant stores within a basic block.  When RTL
> loop invariant motion hoists a constant initialization out of the loop
> into a separate basic block, the constant store value becomes unknown
> within the original basic block.  When recording store for RTL DSE, check
> if the source register is set only once to a constant by a non-partial
> unconditional load.  If yes, record the constant as the constant store
> source.  It eliminates unrolled zero stores after memset 0 in a loop
> where a vector register is used as the zero store source.
>
> Extract find_single_def_src from loop-iv.cc and move it to df-core.cc:
>
> 1. Rename to df_find_single_def_src.
> 2. Change the argument to rtx and use rtx_equal_p.
> 3. Return null for partial or conditional defs.
>
> gcc/
>
> 	PR rtl-optimization/105638
> 	* df-core.cc (df_find_single_def_sr): Moved and renamed from
> 	find_single_def_src in loop-iv.cc.  Change the argument to rtx
> 	and use rtx_equal_p.  Return null for partial or conditional
> 	defs.
> 	* df.h (df_find_single_def_src): New prototype.
> 	* dse.cc (record_store): Use the constant source if the source
> 	register is set only once.
> 	* loop-iv.cc (find_single_def_src): Moved to df-core.cc.
> 	(replace_single_def_regs): Replace find_single_def_src with
> 	df_find_single_def_src.
>
> gcc/testsuite/
>
> 	PR rtl-optimization/105638
> 	* g++.target/i386/pr105638.C: New test.
> ---
>  gcc/df-core.cc                           | 44 +++++++++++++++++++++++
>  gcc/df.h                                 |  1 +
>  gcc/dse.cc                               | 14 ++++++++
>  gcc/loop-iv.cc                           | 45 +-----------------------
>  gcc/testsuite/g++.target/i386/pr105638.C | 44 +++++++++++++++++++++++
>  5 files changed, 104 insertions(+), 44 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>
> diff --git a/gcc/df-core.cc b/gcc/df-core.cc
> index a901b84878f..f9b4de8eb7a 100644
> --- a/gcc/df-core.cc
> +++ b/gcc/df-core.cc
> @@ -2009,6 +2009,50 @@ df_reg_used (rtx_insn *insn, rtx reg)
>    return df_find_use (insn, reg) != NULL;
>  }
>  
> +/* If REG has a single definition, return its known value, otherwise return
> +   null.  */
> +
> +rtx
> +df_find_single_def_src (rtx reg)
> +{
> +  rtx src = NULL_RTX;
> +
> +  /* Don't look through unbounded number of single definition REG copies,
> +     there might be loops for sources with uninitialized variables.  */
> +  for (int cnt = 0; cnt < 128; cnt++)
> +    {
> +      df_ref adef = DF_REG_DEF_CHAIN (REGNO (reg));
> +      if (adef == NULL || DF_REF_NEXT_REG (adef) != NULL
> +	  || DF_REF_IS_ARTIFICIAL (adef)
> +	  || (DF_REF_FLAGS (adef)
> +	      & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
> +	return NULL_RTX;
> +
> +      rtx set = single_set (DF_REF_INSN (adef));
> +      if (set == NULL || !rtx_equal_p (SET_DEST (set), reg))
> +	return NULL_RTX;
> +
> +      rtx note = find_reg_equal_equiv_note (DF_REF_INSN (adef));
> +      if (note && function_invariant_p (XEXP (note, 0)))
> +	{
> +	  src = XEXP (note, 0);
> +	  break;
> +	}

Seems simpler to return this directly, rather than break and then
check function_invariant_p again.

> +      src = SET_SRC (set);
> +
> +      if (REG_P (src))
> +	{
> +	  reg = src;
> +	  continue;
> +	}
> +      break;
> +    }
> +  if (!function_invariant_p (src))
> +    return NULL_RTX;
> +
> +  return src;
> +}
> +
>  \f
>  /*----------------------------------------------------------------------------
>     Debugging and printing functions.
> diff --git a/gcc/df.h b/gcc/df.h
> index bd329205d08..71e249ad20a 100644
> --- a/gcc/df.h
> +++ b/gcc/df.h
> @@ -991,6 +991,7 @@ extern df_ref df_find_def (rtx_insn *, rtx);
>  extern bool df_reg_defined (rtx_insn *, rtx);
>  extern df_ref df_find_use (rtx_insn *, rtx);
>  extern bool df_reg_used (rtx_insn *, rtx);
> +extern rtx df_find_single_def_src (rtx);
>  extern void df_worklist_dataflow (struct dataflow *,bitmap, int *, int);
>  extern void df_print_regset (FILE *file, const_bitmap r);
>  extern void df_print_word_regset (FILE *file, const_bitmap r);
> diff --git a/gcc/dse.cc b/gcc/dse.cc
> index 30c11cee034..c915266f025 100644
> --- a/gcc/dse.cc
> +++ b/gcc/dse.cc
> @@ -1508,6 +1508,20 @@ record_store (rtx body, bb_info_t bb_info)
>  
>  	  if (tem && CONSTANT_P (tem))
>  	    const_rhs = tem;
> +	  else
> +	    {
> +	      /* If RHS is set only once to a constant, set CONST_RHS
> +		 to the constant.  */
> +	      rtx def_src = df_find_single_def_src (rhs);
> +	      if (def_src != nullptr && CONSTANT_P (def_src))
> +		{
> +		  df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> +		  rtx_insn *def_insn = DF_REF_INSN (def);
> +		  rtx def_body = single_set (def_insn);
> +		  if (rhs == SET_DEST (def_body))
> +		    const_rhs = def_src;
> +		}
> +	    }

This shouldn't be necessary now: we can rely on def_src being correct.

I.e. I think this can just be:

	      rtx def_src = df_find_single_def_src (rhs);
	      if (def_src != nullptr && CONSTANT_P (def_src))
   		const_rhs = def_src;

Thanks,
Richard

>  	}
>      }
>  
> diff --git a/gcc/loop-iv.cc b/gcc/loop-iv.cc
> index 0eafe7d2362..d639336445a 100644
> --- a/gcc/loop-iv.cc
> +++ b/gcc/loop-iv.cc
> @@ -1378,49 +1378,6 @@ simple_rhs_p (rtx rhs)
>      }
>  }
>  
> -/* If REGNO has a single definition, return its known value, otherwise return
> -   null.  */
> -
> -static rtx
> -find_single_def_src (unsigned int regno)
> -{
> -  rtx src = NULL_RTX;
> -
> -  /* Don't look through unbounded number of single definition REG copies,
> -     there might be loops for sources with uninitialized variables.  */
> -  for (int cnt = 0; cnt < 128; cnt++)
> -    {
> -      df_ref adef = DF_REG_DEF_CHAIN (regno);
> -      if (adef == NULL || DF_REF_NEXT_REG (adef) != NULL
> -	  || DF_REF_IS_ARTIFICIAL (adef))
> -	return NULL_RTX;
> -
> -      rtx set = single_set (DF_REF_INSN (adef));
> -      if (set == NULL || !REG_P (SET_DEST (set))
> -	  || REGNO (SET_DEST (set)) != regno)
> -	return NULL_RTX;
> -
> -      rtx note = find_reg_equal_equiv_note (DF_REF_INSN (adef));
> -      if (note && function_invariant_p (XEXP (note, 0)))
> -	{
> -	  src = XEXP (note, 0);
> -	  break;
> -	}
> -      src = SET_SRC (set);
> -
> -      if (REG_P (src))
> -	{
> -	  regno = REGNO (src);
> -	  continue;
> -	}
> -      break;
> -    }
> -  if (!function_invariant_p (src))
> -    return NULL_RTX;
> -
> -  return src;
> -}
> -
>  /* If any registers in *EXPR that have a single definition, try to replace
>     them with the known-equivalent values.  */
>  
> @@ -1433,7 +1390,7 @@ replace_single_def_regs (rtx *expr)
>      {
>        rtx x = *iter;
>        if (REG_P (x))
> -	if (rtx new_x = find_single_def_src (REGNO (x)))
> +	if (rtx new_x = df_find_single_def_src (x))
>  	  {
>  	    *expr = simplify_replace_rtx (*expr, x, new_x);
>  	    goto repeat;
> diff --git a/gcc/testsuite/g++.target/i386/pr105638.C b/gcc/testsuite/g++.target/i386/pr105638.C
> new file mode 100644
> index 00000000000..ff40a459de1
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr105638.C
> @@ -0,0 +1,44 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-std=gnu++20 -O2 -march=skylake" } */
> +/* { dg-final { scan-assembler-not "vpxor" } } */
> +
> +#include <stdint.h>
> +#include <vector>
> +#include <tr1/array>
> +
> +class FastBoard {
> +public:
> +    typedef std::pair<int, int> movescore_t;
> +    typedef std::tr1::array<movescore_t, 24> scoredlist_t;
> +    
> +protected:
> +    std::vector<int> m_critical;
> +
> +    int m_boardsize;    
> +};
> +
> +class FastState {
> +public:        
> +    FastBoard board;
> +    
> +    int movenum;              
> +protected:
> +    FastBoard::scoredlist_t scoredmoves;
> +};
> +
> +class KoState : public FastState {
> +private:         
> +    std::vector<uint64_t> ko_hash_history;   
> +    std::vector<uint64_t> hash_history;     
> +};
> +
> +class GameState : public KoState {
> +public:                    
> +    void foo ();      
> +private:
> +    std::vector<KoState> game_history;                          
> +};
> +
> +void GameState::foo() {
> +    game_history.resize(movenum);
> +}

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] DSE: Use the constant store source if possible
  2022-05-30  8:28                   ` Richard Sandiford
@ 2022-05-30 22:58                     ` Jeff Law
  0 siblings, 0 replies; 21+ messages in thread
From: Jeff Law @ 2022-05-30 22:58 UTC (permalink / raw)
  To: Jeff Law via Gcc-patches, H.J. Lu, richard.sandiford



On 5/30/2022 2:28 AM, Richard Sandiford wrote:
> Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> On 5/29/2022 3:43 PM, H.J. Lu wrote:
>>> On Sat, May 28, 2022 at 11:37 AM Jeff Law via Gcc-patches
>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>
>>>> On 5/26/2022 2:43 PM, H.J. Lu via Gcc-patches wrote:
>>>>> On Thu, May 26, 2022 at 04:14:17PM +0100, Richard Sandiford wrote:
>>>>>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>>>>>>> On Wed, May 25, 2022 at 12:30 AM Richard Sandiford
>>>>>>> <richard.sandiford@arm.com> wrote:
>>>>>>>> "H.J. Lu via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
>>>>>>>>> On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
>>>>>>>>>> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
>>>>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>>>>>>> When recording store for RTL dead store elimination, check if the source
>>>>>>>>>>> register is set only once to a constant.  If yes, record the constant
>>>>>>>>>>> as the store source.  It eliminates unrolled zero stores after memset 0
>>>>>>>>>>> in a loop where a vector register is used as the zero store source.
>>>>>>>>>>>
>>>>>>>>>>> gcc/
>>>>>>>>>>>
>>>>>>>>>>>            PR rtl-optimization/105638
>>>>>>>>>>>            * dse.cc (record_store): Use the constant source if the source
>>>>>>>>>>>            register is set only once.
>>>>>>>>>>>
>>>>>>>>>>> gcc/testsuite/
>>>>>>>>>>>
>>>>>>>>>>>            PR rtl-optimization/105638
>>>>>>>>>>>            * g++.target/i386/pr105638.C: New test.
>>>>>>>>>>> ---
>>>>>>>>>>>     gcc/dse.cc                               | 19 ++++++++++
>>>>>>>>>>>     gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
>>>>>>>>>>>     2 files changed, 63 insertions(+)
>>>>>>>>>>>     create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/gcc/dse.cc b/gcc/dse.cc
>>>>>>>>>>> index 30c11cee034..0433dd3d846 100644
>>>>>>>>>>> --- a/gcc/dse.cc
>>>>>>>>>>> +++ b/gcc/dse.cc
>>>>>>>>>>> @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
>>>>>>>>>>>
>>>>>>>>>>>              if (tem && CONSTANT_P (tem))
>>>>>>>>>>>                const_rhs = tem;
>>>>>>>>>>> +         else
>>>>>>>>>>> +           {
>>>>>>>>>>> +             /* If RHS is set only once to a constant, set CONST_RHS
>>>>>>>>>>> +                to the constant.  */
>>>>>>>>>>> +             df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>>>>>>>>>>> +             if (def != nullptr
>>>>>>>>>>> +                 && !DF_REF_IS_ARTIFICIAL (def)
>>>>>>>>>>> +                 && !DF_REF_NEXT_REG (def))
>>>>>>>>>>> +               {
>>>>>>>>>>> +                 rtx_insn *def_insn = DF_REF_INSN (def);
>>>>>>>>>>> +                 rtx def_body = PATTERN (def_insn);
>>>>>>>>>>> +                 if (GET_CODE (def_body) == SET)
>>>>>>>>>>> +                   {
>>>>>>>>>>> +                     rtx def_src = SET_SRC (def_body);
>>>>>>>>>>> +                     if (CONSTANT_P (def_src))
>>>>>>>>>>> +                       const_rhs = def_src;
>>>>>>>>>> doesn't DSE have its own tracking of stored values?  Shouldn't we
>>>>>>>>> It tracks stored values only within the basic block.  When RTL loop
>>>>>>>>> invariant motion hoists a constant initialization out of the loop into
>>>>>>>>> a separate basic block, the constant store value becomes unknown
>>>>>>>>> within the original basic block.
>>>>>>>>>
>>>>>>>>>> improve _that_ if it is not enough?  I also wonder if you need to
>>>>>>>>> My patch extends DSE stored value tracking to include the constant which
>>>>>>>>> is set only once in another basic block.
>>>>>>>>>
>>>>>>>>>> verify the SET isn't partial?
>>>>>>>>>>
>>>>>>>>> Here is the v2 patch to check that the constant is set by a non-partial
>>>>>>>>> unconditional load.
>>>>>>>>>
>>>>>>>>> OK for master?
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> H.J.
>>>>>>>>> ---
>>>>>>>>> RTL DSE tracks redundant constant stores within a basic block.  When RTL
>>>>>>>>> loop invariant motion hoists a constant initialization out of the loop
>>>>>>>>> into a separate basic block, the constant store value becomes unknown
>>>>>>>>> within the original basic block.  When recording store for RTL DSE, check
>>>>>>>>> if the source register is set only once to a constant by a non-partial
>>>>>>>>> unconditional load.  If yes, record the constant as the constant store
>>>>>>>>> source.  It eliminates unrolled zero stores after memset 0 in a loop
>>>>>>>>> where a vector register is used as the zero store source.
>>>>>>>>>
>>>>>>>>> gcc/
>>>>>>>>>
>>>>>>>>>          PR rtl-optimization/105638
>>>>>>>>>          * dse.cc (record_store): Use the constant source if the source
>>>>>>>>>          register is set only once.
>>>>>>>>>
>>>>>>>>> gcc/testsuite/
>>>>>>>>>
>>>>>>>>>          PR rtl-optimization/105638
>>>>>>>>>          * g++.target/i386/pr105638.C: New test.
>>>>>>>>> ---
>>>>>>>>>     gcc/dse.cc                               | 22 ++++++++++++
>>>>>>>>>     gcc/testsuite/g++.target/i386/pr105638.C | 44 ++++++++++++++++++++++++
>>>>>>>>>     2 files changed, 66 insertions(+)
>>>>>>>>>     create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>>>>>>>>>
>>>>>>>>> diff --git a/gcc/dse.cc b/gcc/dse.cc
>>>>>>>>> index 30c11cee034..af8e88dac32 100644
>>>>>>>>> --- a/gcc/dse.cc
>>>>>>>>> +++ b/gcc/dse.cc
>>>>>>>>> @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
>>>>>>>>>
>>>>>>>>>            if (tem && CONSTANT_P (tem))
>>>>>>>>>              const_rhs = tem;
>>>>>>>>> +       else
>>>>>>>>> +         {
>>>>>>>>> +           /* If RHS is set only once to a constant, set CONST_RHS
>>>>>>>>> +              to the constant.  */
>>>>>>>>> +           df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>>>>>>>>> +           if (def != nullptr
>>>>>>>>> +               && !DF_REF_IS_ARTIFICIAL (def)
>>>>>>>>> +               && !(DF_REF_FLAGS (def)
>>>>>>>>> +                    & (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
>>>>>>>>> +               && !DF_REF_NEXT_REG (def))
>>>>>>>> Can we introduce a helper for this?  There are already similar tests
>>>>>>>> in ira and loop-iv, and it seems a bit too complex to have to open-code
>>>>>>>> each time.
>>>>>>> I can use find_single_def_src in loop-iv.cc:
>>>>>>>
>>>>>>> /* If REGNO has a single definition, return its known value, otherwise return
>>>>>>>       null.  */
>>>>>>>
>>>>>>> rtx
>>>>>>> find_single_def_src (unsigned int regno)
>>>>>> Yeah, reusing that sounds good.  Perhaps we should move it into df-core.cc,
>>>>>> alongside the df_reg_used group of functions.
>>>>>>
>>>>>> I think the mode check in your original patch should be kept though,
>>>>>> so how about we change the parameter to an rtx reg and use rtx_equal in:
>>>>>>
>>>>>>          rtx set = single_set (DF_REF_INSN (adef));
>>>>>>          if (set == NULL || !REG_P (SET_DEST (set))
>>>>>>          || REGNO (SET_DEST (set)) != regno)
>>>>>>        return NULL_RTX;
>>>>> Fixed.
>>>>>
>>>>>> rather than the existing !REG_P and REGNO checks.  We should also add:
>>>>>>
>>>>>>> and do
>>>>>>>
>>>>>>>                 /* If RHS is set only once to a constant, set CONST_RHS
>>>>>>>                     to the constant.  */
>>>>>>>                  rtx def_src = find_single_def_src (REGNO (rhs));
>>>>>>>                  if (def_src != nullptr && CONSTANT_P (def_src))
>>>>>>>                    {
>>>>>>>                      df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>>>>>>>                      if (!(DF_REF_FLAGS (def)
>>>>>>>                            & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
>>>>>> …this check to the function, since it's needed for correctness even
>>>>>> in the loop-iv.cc use.
>>>>> Fixed.
>>>>>
>>>>>> Thanks,
>>>>>> Richard
>>>>> Here is the v3 patch.  OK for master?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> H.J.
>>>>> ---
>>>>> RTL DSE tracks redundant constant stores within a basic block.  When RTL
>>>>> loop invariant motion hoists a constant initialization out of the loop
>>>>> into a separate basic block, the constant store value becomes unknown
>>>>> within the original basic block.  When recording store for RTL DSE, check
>>>>> if the source register is set only once to a constant by a non-partial
>>>>> unconditional load.  If yes, record the constant as the constant store
>>>>> source.  It eliminates unrolled zero stores after memset 0 in a loop
>>>>> where a vector register is used as the zero store source.
>>>>>
>>>>> Extract find_single_def_src from loop-iv.cc and move it to df-core.cc:
>>>>>
>>>>> 1. Rename to df_find_single_def_src.
>>>>> 2. Change the argument to rtx and use rtx_equal_p.
>>>>> 3. Return null for partial or conditional defs.
>>>>>
>>>>> gcc/
>>>>>
>>>>>         PR rtl-optimization/105638
>>>>>         * df-core.cc (df_find_single_def_sr): Moved and renamed from
>>>>>         find_single_def_src in loop-iv.cc.  Change the argument to rtx
>>>>>         and use rtx_equal_p.  Return null for partial or conditional
>>>>>         defs.
>>>>>         * df.h (df_find_single_def_src): New prototype.
>>>>>         * dse.cc (record_store): Use the constant source if the source
>>>>>         register is set only once.
>>>>>         * loop-iv.cc (find_single_def_src): Moved to df-core.cc.
>>>>>         (replace_single_def_regs): Replace find_single_def_src with
>>>>>         df_find_single_def_src.
>>>>>
>>>>> gcc/testsuite/
>>>>>
>>>>>         PR rtl-optimization/105638
>>>>>         * g++.target/i386/pr105638.C: New test.
>>>> Avoiding REG_EQUAL and only handling REG_EQUIV notes would be better
>>>> here.  REG_EQUIV indicates the destination could be replaced with the
>>>> source of the note at any point and semantically the code would still be
>>>> valid.  REG_EQUAL doesn't give us that guarantee.
>>>>
>>>> To allow REG_EQUAL you have to check that the block with the note
>>>> dominates the use.
>>> When a use only has one non-conditional def which doesn't dominate
>>> the use, isn't its behavior undefined?
>> I think so, even for an irreducible loop.  Even so, it's the safest
>> thing to do, particularly if someone tries to extend this code in the
>> future.
> If any use of R was upwards exposed, R would have an artificial
> definition in the entry block in addition to any “real” definition.
> Since we're checking for exactly one definition in total, it should
> follow that the definition dominates all uses.
Oh, duh.  My mind was stuck in the old reg_n_{refs,defs} way of 
thinking, not the DF world.  My bad.

If you're OK with the patch, so am I.

jeff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4] DSE: Use the constant store source if possible
  2022-05-30  8:35             ` Richard Sandiford
@ 2022-05-31 17:12               ` H.J. Lu
  2022-06-01  7:20                 ` Richard Sandiford
  0 siblings, 1 reply; 21+ messages in thread
From: H.J. Lu @ 2022-05-31 17:12 UTC (permalink / raw)
  To: Gcc-patches, Richard Biener, richard.sandiford

On Mon, May 30, 2022 at 09:35:43AM +0100, Richard Sandiford wrote:
> "H.J. Lu" <hjl.tools@gmail.com> writes:
> > ---
> > RTL DSE tracks redundant constant stores within a basic block.  When RTL
> > loop invariant motion hoists a constant initialization out of the loop
> > into a separate basic block, the constant store value becomes unknown
> > within the original basic block.  When recording store for RTL DSE, check
> > if the source register is set only once to a constant by a non-partial
> > unconditional load.  If yes, record the constant as the constant store
> > source.  It eliminates unrolled zero stores after memset 0 in a loop
> > where a vector register is used as the zero store source.
> >
> > Extract find_single_def_src from loop-iv.cc and move it to df-core.cc:
> >
> > 1. Rename to df_find_single_def_src.
> > 2. Change the argument to rtx and use rtx_equal_p.
> > 3. Return null for partial or conditional defs.
> >
> > gcc/
> >
> > 	PR rtl-optimization/105638
> > 	* df-core.cc (df_find_single_def_sr): Moved and renamed from
> > 	find_single_def_src in loop-iv.cc.  Change the argument to rtx
> > 	and use rtx_equal_p.  Return null for partial or conditional
> > 	defs.
> > 	* df.h (df_find_single_def_src): New prototype.
> > 	* dse.cc (record_store): Use the constant source if the source
> > 	register is set only once.
> > 	* loop-iv.cc (find_single_def_src): Moved to df-core.cc.
> > 	(replace_single_def_regs): Replace find_single_def_src with
> > 	df_find_single_def_src.
> >
> > gcc/testsuite/
> >
> > 	PR rtl-optimization/105638
> > 	* g++.target/i386/pr105638.C: New test.
> > ---
> >  gcc/df-core.cc                           | 44 +++++++++++++++++++++++
> >  gcc/df.h                                 |  1 +
> >  gcc/dse.cc                               | 14 ++++++++
> >  gcc/loop-iv.cc                           | 45 +-----------------------
> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 +++++++++++++++++++++++
> >  5 files changed, 104 insertions(+), 44 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> >
> > diff --git a/gcc/df-core.cc b/gcc/df-core.cc
> > index a901b84878f..f9b4de8eb7a 100644
> > --- a/gcc/df-core.cc
> > +++ b/gcc/df-core.cc
> > @@ -2009,6 +2009,50 @@ df_reg_used (rtx_insn *insn, rtx reg)
> >    return df_find_use (insn, reg) != NULL;
> >  }
> >  
> > +/* If REG has a single definition, return its known value, otherwise return
> > +   null.  */
> > +
> > +rtx
> > +df_find_single_def_src (rtx reg)
> > +{
> > +  rtx src = NULL_RTX;
> > +
> > +  /* Don't look through unbounded number of single definition REG copies,
> > +     there might be loops for sources with uninitialized variables.  */
> > +  for (int cnt = 0; cnt < 128; cnt++)
> > +    {
> > +      df_ref adef = DF_REG_DEF_CHAIN (REGNO (reg));
> > +      if (adef == NULL || DF_REF_NEXT_REG (adef) != NULL
> > +	  || DF_REF_IS_ARTIFICIAL (adef)
> > +	  || (DF_REF_FLAGS (adef)
> > +	      & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
> > +	return NULL_RTX;
> > +
> > +      rtx set = single_set (DF_REF_INSN (adef));
> > +      if (set == NULL || !rtx_equal_p (SET_DEST (set), reg))
> > +	return NULL_RTX;
> > +
> > +      rtx note = find_reg_equal_equiv_note (DF_REF_INSN (adef));
> > +      if (note && function_invariant_p (XEXP (note, 0)))
> > +	{
> > +	  src = XEXP (note, 0);
> > +	  break;
> > +	}
> 
> Seems simpler to return this directly, rather than break and then
> check function_invariant_p again.

Fixed.

> 
> > +      src = SET_SRC (set);
> > +
> > +      if (REG_P (src))
> > +	{
> > +	  reg = src;
> > +	  continue;
> > +	}
> > +      break;
> > +    }
> > +  if (!function_invariant_p (src))
> > +    return NULL_RTX;
> > +
> > +  return src;
> > +}
> > +
> >
> >  /*----------------------------------------------------------------------------
> >     Debugging and printing functions.
> > diff --git a/gcc/df.h b/gcc/df.h
> > index bd329205d08..71e249ad20a 100644
> > --- a/gcc/df.h
> > +++ b/gcc/df.h
> > @@ -991,6 +991,7 @@ extern df_ref df_find_def (rtx_insn *, rtx);
> >  extern bool df_reg_defined (rtx_insn *, rtx);
> >  extern df_ref df_find_use (rtx_insn *, rtx);
> >  extern bool df_reg_used (rtx_insn *, rtx);
> > +extern rtx df_find_single_def_src (rtx);
> >  extern void df_worklist_dataflow (struct dataflow *,bitmap, int *, int);
> >  extern void df_print_regset (FILE *file, const_bitmap r);
> >  extern void df_print_word_regset (FILE *file, const_bitmap r);
> > diff --git a/gcc/dse.cc b/gcc/dse.cc
> > index 30c11cee034..c915266f025 100644
> > --- a/gcc/dse.cc
> > +++ b/gcc/dse.cc
> > @@ -1508,6 +1508,20 @@ record_store (rtx body, bb_info_t bb_info)
> >  
> >  	  if (tem && CONSTANT_P (tem))
> >  	    const_rhs = tem;
> > +	  else
> > +	    {
> > +	      /* If RHS is set only once to a constant, set CONST_RHS
> > +		 to the constant.  */
> > +	      rtx def_src = df_find_single_def_src (rhs);
> > +	      if (def_src != nullptr && CONSTANT_P (def_src))
> > +		{
> > +		  df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> > +		  rtx_insn *def_insn = DF_REF_INSN (def);
> > +		  rtx def_body = single_set (def_insn);
> > +		  if (rhs == SET_DEST (def_body))
> > +		    const_rhs = def_src;
> > +		}
> > +	    }
> 
> This shouldn't be necessary now: we can rely on def_src being correct.
> 
> I.e. I think this can just be:
> 
> 	      rtx def_src = df_find_single_def_src (rhs);
> 	      if (def_src != nullptr && CONSTANT_P (def_src))
>    		const_rhs = def_src;

Fixed.

> 
> Thanks,
> Richard
> 

Here is the v4 patch.  OK for master?

Thanks.


H.J.
---
RTL DSE tracks redundant constant stores within a basic block.  When RTL
loop invariant motion hoists a constant initialization out of the loop
into a separate basic block, the constant store value becomes unknown
within the original basic block.  When recording store for RTL DSE, check
if the source register is set only once to a constant by a non-partial
unconditional load.  If yes, record the constant as the constant store
source.  It eliminates unrolled zero stores after memset 0 in a loop
where a vector register is used as the zero store source.

gcc/

	PR rtl-optimization/105638
	* df-core.cc (df_find_single_def_src): Moved and renamed from
	find_single_def_src in loop-iv.cc.  Change the argument to rtx
	and use rtx_equal_p.  Return null for partial or conditional
	defs.
	* df.h (df_find_single_def_src): New prototype.
	* dse.cc (record_store): Use the constant source if the source
	register is set only once.
	* loop-iv.cc (find_single_def_src): Moved to df-core.cc.
	(replace_single_def_regs): Replace find_single_def_src with
	df_find_single_def_src.

gcc/testsuite/

	PR rtl-optimization/105638
	* g++.target/i386/pr105638.C: New test.
---
 gcc/df-core.cc                           | 44 +++++++++++++++++++++++
 gcc/df.h                                 |  1 +
 gcc/dse.cc                               |  8 +++++
 gcc/loop-iv.cc                           | 45 +-----------------------
 gcc/testsuite/g++.target/i386/pr105638.C | 44 +++++++++++++++++++++++
 5 files changed, 98 insertions(+), 44 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C

diff --git a/gcc/df-core.cc b/gcc/df-core.cc
index a901b84878f..e56d9bcf760 100644
--- a/gcc/df-core.cc
+++ b/gcc/df-core.cc
@@ -2009,6 +2009,50 @@ df_reg_used (rtx_insn *insn, rtx reg)
   return df_find_use (insn, reg) != NULL;
 }
 
+/* If REG has a single definition, return its known value, otherwise return
+   null.  */
+
+rtx
+df_find_single_def_src (rtx reg)
+{
+  rtx src = NULL_RTX;
+
+  /* Don't look through unbounded number of single definition REG copies,
+     there might be loops for sources with uninitialized variables.  */
+  for (int cnt = 0; cnt < 128; cnt++)
+    {
+      df_ref adef = DF_REG_DEF_CHAIN (REGNO (reg));
+      if (adef == NULL || DF_REF_NEXT_REG (adef) != NULL
+	  || DF_REF_IS_ARTIFICIAL (adef)
+	  || (DF_REF_FLAGS (adef)
+	      & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
+	return NULL_RTX;
+
+      rtx set = single_set (DF_REF_INSN (adef));
+      if (set == NULL || !rtx_equal_p (SET_DEST (set), reg))
+	return NULL_RTX;
+
+      rtx note = find_reg_equal_equiv_note (DF_REF_INSN (adef));
+      if (note && function_invariant_p (XEXP (note, 0)))
+	{
+	  src = XEXP (note, 0);
+	  return src;
+	}
+      src = SET_SRC (set);
+
+      if (REG_P (src))
+	{
+	  reg = src;
+	  continue;
+	}
+      break;
+    }
+  if (!function_invariant_p (src))
+    return NULL_RTX;
+
+  return src;
+}
+
 \f
 /*----------------------------------------------------------------------------
    Debugging and printing functions.
diff --git a/gcc/df.h b/gcc/df.h
index bd329205d08..71e249ad20a 100644
--- a/gcc/df.h
+++ b/gcc/df.h
@@ -991,6 +991,7 @@ extern df_ref df_find_def (rtx_insn *, rtx);
 extern bool df_reg_defined (rtx_insn *, rtx);
 extern df_ref df_find_use (rtx_insn *, rtx);
 extern bool df_reg_used (rtx_insn *, rtx);
+extern rtx df_find_single_def_src (rtx);
 extern void df_worklist_dataflow (struct dataflow *,bitmap, int *, int);
 extern void df_print_regset (FILE *file, const_bitmap r);
 extern void df_print_word_regset (FILE *file, const_bitmap r);
diff --git a/gcc/dse.cc b/gcc/dse.cc
index 30c11cee034..994c60dc189 100644
--- a/gcc/dse.cc
+++ b/gcc/dse.cc
@@ -1508,6 +1508,14 @@ record_store (rtx body, bb_info_t bb_info)
 
 	  if (tem && CONSTANT_P (tem))
 	    const_rhs = tem;
+	  else
+	    {
+	      /* If RHS is set only once to a constant, set CONST_RHS
+		 to the constant.  */
+	      rtx def_src = df_find_single_def_src (rhs);
+	      if (def_src != nullptr && CONSTANT_P (def_src))
+		const_rhs = def_src;
+	    }
 	}
     }
 
diff --git a/gcc/loop-iv.cc b/gcc/loop-iv.cc
index 0eafe7d2362..d639336445a 100644
--- a/gcc/loop-iv.cc
+++ b/gcc/loop-iv.cc
@@ -1378,49 +1378,6 @@ simple_rhs_p (rtx rhs)
     }
 }
 
-/* If REGNO has a single definition, return its known value, otherwise return
-   null.  */
-
-static rtx
-find_single_def_src (unsigned int regno)
-{
-  rtx src = NULL_RTX;
-
-  /* Don't look through unbounded number of single definition REG copies,
-     there might be loops for sources with uninitialized variables.  */
-  for (int cnt = 0; cnt < 128; cnt++)
-    {
-      df_ref adef = DF_REG_DEF_CHAIN (regno);
-      if (adef == NULL || DF_REF_NEXT_REG (adef) != NULL
-	  || DF_REF_IS_ARTIFICIAL (adef))
-	return NULL_RTX;
-
-      rtx set = single_set (DF_REF_INSN (adef));
-      if (set == NULL || !REG_P (SET_DEST (set))
-	  || REGNO (SET_DEST (set)) != regno)
-	return NULL_RTX;
-
-      rtx note = find_reg_equal_equiv_note (DF_REF_INSN (adef));
-      if (note && function_invariant_p (XEXP (note, 0)))
-	{
-	  src = XEXP (note, 0);
-	  break;
-	}
-      src = SET_SRC (set);
-
-      if (REG_P (src))
-	{
-	  regno = REGNO (src);
-	  continue;
-	}
-      break;
-    }
-  if (!function_invariant_p (src))
-    return NULL_RTX;
-
-  return src;
-}
-
 /* If any registers in *EXPR that have a single definition, try to replace
    them with the known-equivalent values.  */
 
@@ -1433,7 +1390,7 @@ replace_single_def_regs (rtx *expr)
     {
       rtx x = *iter;
       if (REG_P (x))
-	if (rtx new_x = find_single_def_src (REGNO (x)))
+	if (rtx new_x = df_find_single_def_src (x))
 	  {
 	    *expr = simplify_replace_rtx (*expr, x, new_x);
 	    goto repeat;
diff --git a/gcc/testsuite/g++.target/i386/pr105638.C b/gcc/testsuite/g++.target/i386/pr105638.C
new file mode 100644
index 00000000000..ff40a459de1
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr105638.C
@@ -0,0 +1,44 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-std=gnu++20 -O2 -march=skylake" } */
+/* { dg-final { scan-assembler-not "vpxor" } } */
+
+#include <stdint.h>
+#include <vector>
+#include <tr1/array>
+
+class FastBoard {
+public:
+    typedef std::pair<int, int> movescore_t;
+    typedef std::tr1::array<movescore_t, 24> scoredlist_t;
+    
+protected:
+    std::vector<int> m_critical;
+
+    int m_boardsize;    
+};
+
+class FastState {
+public:        
+    FastBoard board;
+    
+    int movenum;              
+protected:
+    FastBoard::scoredlist_t scoredmoves;
+};
+
+class KoState : public FastState {
+private:         
+    std::vector<uint64_t> ko_hash_history;   
+    std::vector<uint64_t> hash_history;     
+};
+
+class GameState : public KoState {
+public:                    
+    void foo ();      
+private:
+    std::vector<KoState> game_history;                          
+};
+
+void GameState::foo() {
+    game_history.resize(movenum);
+}
-- 
2.36.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4] DSE: Use the constant store source if possible
  2022-05-31 17:12               ` [PATCH v4] " H.J. Lu
@ 2022-06-01  7:20                 ` Richard Sandiford
  2022-06-01 21:07                   ` H.J. Lu
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Sandiford @ 2022-06-01  7:20 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Gcc-patches, Richard Biener, jeffreyalaw

"H.J. Lu" <hjl.tools@gmail.com> writes:
> On Mon, May 30, 2022 at 09:35:43AM +0100, Richard Sandiford wrote:
>> "H.J. Lu" <hjl.tools@gmail.com> writes:
>> > ---
>> > RTL DSE tracks redundant constant stores within a basic block.  When RTL
>> > loop invariant motion hoists a constant initialization out of the loop
>> > into a separate basic block, the constant store value becomes unknown
>> > within the original basic block.  When recording store for RTL DSE, check
>> > if the source register is set only once to a constant by a non-partial
>> > unconditional load.  If yes, record the constant as the constant store
>> > source.  It eliminates unrolled zero stores after memset 0 in a loop
>> > where a vector register is used as the zero store source.
>> >
>> > Extract find_single_def_src from loop-iv.cc and move it to df-core.cc:
>> >
>> > 1. Rename to df_find_single_def_src.
>> > 2. Change the argument to rtx and use rtx_equal_p.
>> > 3. Return null for partial or conditional defs.
>> >
>> > gcc/
>> >
>> > 	PR rtl-optimization/105638
>> > 	* df-core.cc (df_find_single_def_sr): Moved and renamed from
>> > 	find_single_def_src in loop-iv.cc.  Change the argument to rtx
>> > 	and use rtx_equal_p.  Return null for partial or conditional
>> > 	defs.
>> > 	* df.h (df_find_single_def_src): New prototype.
>> > 	* dse.cc (record_store): Use the constant source if the source
>> > 	register is set only once.
>> > 	* loop-iv.cc (find_single_def_src): Moved to df-core.cc.
>> > 	(replace_single_def_regs): Replace find_single_def_src with
>> > 	df_find_single_def_src.
>> >
>> > gcc/testsuite/
>> >
>> > 	PR rtl-optimization/105638
>> > 	* g++.target/i386/pr105638.C: New test.
>> > ---
>> >  gcc/df-core.cc                           | 44 +++++++++++++++++++++++
>> >  gcc/df.h                                 |  1 +
>> >  gcc/dse.cc                               | 14 ++++++++
>> >  gcc/loop-iv.cc                           | 45 +-----------------------
>> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 +++++++++++++++++++++++
>> >  5 files changed, 104 insertions(+), 44 deletions(-)
>> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>> >
>> > diff --git a/gcc/df-core.cc b/gcc/df-core.cc
>> > index a901b84878f..f9b4de8eb7a 100644
>> > --- a/gcc/df-core.cc
>> > +++ b/gcc/df-core.cc
>> > @@ -2009,6 +2009,50 @@ df_reg_used (rtx_insn *insn, rtx reg)
>> >    return df_find_use (insn, reg) != NULL;
>> >  }
>> >  
>> > +/* If REG has a single definition, return its known value, otherwise return
>> > +   null.  */
>> > +
>> > +rtx
>> > +df_find_single_def_src (rtx reg)
>> > +{
>> > +  rtx src = NULL_RTX;
>> > +
>> > +  /* Don't look through unbounded number of single definition REG copies,
>> > +     there might be loops for sources with uninitialized variables.  */
>> > +  for (int cnt = 0; cnt < 128; cnt++)
>> > +    {
>> > +      df_ref adef = DF_REG_DEF_CHAIN (REGNO (reg));
>> > +      if (adef == NULL || DF_REF_NEXT_REG (adef) != NULL
>> > +	  || DF_REF_IS_ARTIFICIAL (adef)
>> > +	  || (DF_REF_FLAGS (adef)
>> > +	      & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
>> > +	return NULL_RTX;
>> > +
>> > +      rtx set = single_set (DF_REF_INSN (adef));
>> > +      if (set == NULL || !rtx_equal_p (SET_DEST (set), reg))
>> > +	return NULL_RTX;
>> > +
>> > +      rtx note = find_reg_equal_equiv_note (DF_REF_INSN (adef));
>> > +      if (note && function_invariant_p (XEXP (note, 0)))
>> > +	{
>> > +	  src = XEXP (note, 0);
>> > +	  break;
>> > +	}
>> 
>> Seems simpler to return this directly, rather than break and then
>> check function_invariant_p again.
>
> Fixed.
>
>> 
>> > +      src = SET_SRC (set);
>> > +
>> > +      if (REG_P (src))
>> > +	{
>> > +	  reg = src;
>> > +	  continue;
>> > +	}
>> > +      break;
>> > +    }
>> > +  if (!function_invariant_p (src))
>> > +    return NULL_RTX;
>> > +
>> > +  return src;
>> > +}
>> > +
>> >
>> >  /*----------------------------------------------------------------------------
>> >     Debugging and printing functions.
>> > diff --git a/gcc/df.h b/gcc/df.h
>> > index bd329205d08..71e249ad20a 100644
>> > --- a/gcc/df.h
>> > +++ b/gcc/df.h
>> > @@ -991,6 +991,7 @@ extern df_ref df_find_def (rtx_insn *, rtx);
>> >  extern bool df_reg_defined (rtx_insn *, rtx);
>> >  extern df_ref df_find_use (rtx_insn *, rtx);
>> >  extern bool df_reg_used (rtx_insn *, rtx);
>> > +extern rtx df_find_single_def_src (rtx);
>> >  extern void df_worklist_dataflow (struct dataflow *,bitmap, int *, int);
>> >  extern void df_print_regset (FILE *file, const_bitmap r);
>> >  extern void df_print_word_regset (FILE *file, const_bitmap r);
>> > diff --git a/gcc/dse.cc b/gcc/dse.cc
>> > index 30c11cee034..c915266f025 100644
>> > --- a/gcc/dse.cc
>> > +++ b/gcc/dse.cc
>> > @@ -1508,6 +1508,20 @@ record_store (rtx body, bb_info_t bb_info)
>> >  
>> >  	  if (tem && CONSTANT_P (tem))
>> >  	    const_rhs = tem;
>> > +	  else
>> > +	    {
>> > +	      /* If RHS is set only once to a constant, set CONST_RHS
>> > +		 to the constant.  */
>> > +	      rtx def_src = df_find_single_def_src (rhs);
>> > +	      if (def_src != nullptr && CONSTANT_P (def_src))
>> > +		{
>> > +		  df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>> > +		  rtx_insn *def_insn = DF_REF_INSN (def);
>> > +		  rtx def_body = single_set (def_insn);
>> > +		  if (rhs == SET_DEST (def_body))
>> > +		    const_rhs = def_src;
>> > +		}
>> > +	    }
>> 
>> This shouldn't be necessary now: we can rely on def_src being correct.
>> 
>> I.e. I think this can just be:
>> 
>> 	      rtx def_src = df_find_single_def_src (rhs);
>> 	      if (def_src != nullptr && CONSTANT_P (def_src))
>>    		const_rhs = def_src;
>
> Fixed.
>
>> 
>> Thanks,
>> Richard
>> 
>
> Here is the v4 patch.  OK for master?
>
> Thanks.
>
>
> H.J.
> ---
> RTL DSE tracks redundant constant stores within a basic block.  When RTL
> loop invariant motion hoists a constant initialization out of the loop
> into a separate basic block, the constant store value becomes unknown
> within the original basic block.  When recording store for RTL DSE, check
> if the source register is set only once to a constant by a non-partial
> unconditional load.  If yes, record the constant as the constant store
> source.  It eliminates unrolled zero stores after memset 0 in a loop
> where a vector register is used as the zero store source.
>
> gcc/
>
> 	PR rtl-optimization/105638
> 	* df-core.cc (df_find_single_def_src): Moved and renamed from
> 	find_single_def_src in loop-iv.cc.  Change the argument to rtx
> 	and use rtx_equal_p.  Return null for partial or conditional
> 	defs.
> 	* df.h (df_find_single_def_src): New prototype.
> 	* dse.cc (record_store): Use the constant source if the source
> 	register is set only once.
> 	* loop-iv.cc (find_single_def_src): Moved to df-core.cc.
> 	(replace_single_def_regs): Replace find_single_def_src with
> 	df_find_single_def_src.
>
> gcc/testsuite/
>
> 	PR rtl-optimization/105638
> 	* g++.target/i386/pr105638.C: New test.
> ---
>  gcc/df-core.cc                           | 44 +++++++++++++++++++++++
>  gcc/df.h                                 |  1 +
>  gcc/dse.cc                               |  8 +++++
>  gcc/loop-iv.cc                           | 45 +-----------------------
>  gcc/testsuite/g++.target/i386/pr105638.C | 44 +++++++++++++++++++++++
>  5 files changed, 98 insertions(+), 44 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>
> diff --git a/gcc/df-core.cc b/gcc/df-core.cc
> index a901b84878f..e56d9bcf760 100644
> --- a/gcc/df-core.cc
> +++ b/gcc/df-core.cc
> @@ -2009,6 +2009,50 @@ df_reg_used (rtx_insn *insn, rtx reg)
>    return df_find_use (insn, reg) != NULL;
>  }
>  
> +/* If REG has a single definition, return its known value, otherwise return
> +   null.  */
> +
> +rtx
> +df_find_single_def_src (rtx reg)
> +{
> +  rtx src = NULL_RTX;
> +
> +  /* Don't look through unbounded number of single definition REG copies,
> +     there might be loops for sources with uninitialized variables.  */
> +  for (int cnt = 0; cnt < 128; cnt++)
> +    {
> +      df_ref adef = DF_REG_DEF_CHAIN (REGNO (reg));
> +      if (adef == NULL || DF_REF_NEXT_REG (adef) != NULL
> +	  || DF_REF_IS_ARTIFICIAL (adef)
> +	  || (DF_REF_FLAGS (adef)
> +	      & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
> +	return NULL_RTX;
> +
> +      rtx set = single_set (DF_REF_INSN (adef));
> +      if (set == NULL || !rtx_equal_p (SET_DEST (set), reg))
> +	return NULL_RTX;
> +
> +      rtx note = find_reg_equal_equiv_note (DF_REF_INSN (adef));
> +      if (note && function_invariant_p (XEXP (note, 0)))
> +	{
> +	  src = XEXP (note, 0);
> +	  return src;

Seems simpler without the assignment:

    return XEXP (note, 0);

OK with that change, thanks.

Richard

> +	}
> +      src = SET_SRC (set);
> +
> +      if (REG_P (src))
> +	{
> +	  reg = src;
> +	  continue;
> +	}
> +      break;
> +    }
> +  if (!function_invariant_p (src))
> +    return NULL_RTX;
> +
> +  return src;
> +}
> +
>  \f
>  /*----------------------------------------------------------------------------
>     Debugging and printing functions.
> diff --git a/gcc/df.h b/gcc/df.h
> index bd329205d08..71e249ad20a 100644
> --- a/gcc/df.h
> +++ b/gcc/df.h
> @@ -991,6 +991,7 @@ extern df_ref df_find_def (rtx_insn *, rtx);
>  extern bool df_reg_defined (rtx_insn *, rtx);
>  extern df_ref df_find_use (rtx_insn *, rtx);
>  extern bool df_reg_used (rtx_insn *, rtx);
> +extern rtx df_find_single_def_src (rtx);
>  extern void df_worklist_dataflow (struct dataflow *,bitmap, int *, int);
>  extern void df_print_regset (FILE *file, const_bitmap r);
>  extern void df_print_word_regset (FILE *file, const_bitmap r);
> diff --git a/gcc/dse.cc b/gcc/dse.cc
> index 30c11cee034..994c60dc189 100644
> --- a/gcc/dse.cc
> +++ b/gcc/dse.cc
> @@ -1508,6 +1508,14 @@ record_store (rtx body, bb_info_t bb_info)
>  
>  	  if (tem && CONSTANT_P (tem))
>  	    const_rhs = tem;
> +	  else
> +	    {
> +	      /* If RHS is set only once to a constant, set CONST_RHS
> +		 to the constant.  */
> +	      rtx def_src = df_find_single_def_src (rhs);
> +	      if (def_src != nullptr && CONSTANT_P (def_src))
> +		const_rhs = def_src;
> +	    }
>  	}
>      }
>  
> diff --git a/gcc/loop-iv.cc b/gcc/loop-iv.cc
> index 0eafe7d2362..d639336445a 100644
> --- a/gcc/loop-iv.cc
> +++ b/gcc/loop-iv.cc
> @@ -1378,49 +1378,6 @@ simple_rhs_p (rtx rhs)
>      }
>  }
>  
> -/* If REGNO has a single definition, return its known value, otherwise return
> -   null.  */
> -
> -static rtx
> -find_single_def_src (unsigned int regno)
> -{
> -  rtx src = NULL_RTX;
> -
> -  /* Don't look through unbounded number of single definition REG copies,
> -     there might be loops for sources with uninitialized variables.  */
> -  for (int cnt = 0; cnt < 128; cnt++)
> -    {
> -      df_ref adef = DF_REG_DEF_CHAIN (regno);
> -      if (adef == NULL || DF_REF_NEXT_REG (adef) != NULL
> -	  || DF_REF_IS_ARTIFICIAL (adef))
> -	return NULL_RTX;
> -
> -      rtx set = single_set (DF_REF_INSN (adef));
> -      if (set == NULL || !REG_P (SET_DEST (set))
> -	  || REGNO (SET_DEST (set)) != regno)
> -	return NULL_RTX;
> -
> -      rtx note = find_reg_equal_equiv_note (DF_REF_INSN (adef));
> -      if (note && function_invariant_p (XEXP (note, 0)))
> -	{
> -	  src = XEXP (note, 0);
> -	  break;
> -	}
> -      src = SET_SRC (set);
> -
> -      if (REG_P (src))
> -	{
> -	  regno = REGNO (src);
> -	  continue;
> -	}
> -      break;
> -    }
> -  if (!function_invariant_p (src))
> -    return NULL_RTX;
> -
> -  return src;
> -}
> -
>  /* If any registers in *EXPR that have a single definition, try to replace
>     them with the known-equivalent values.  */
>  
> @@ -1433,7 +1390,7 @@ replace_single_def_regs (rtx *expr)
>      {
>        rtx x = *iter;
>        if (REG_P (x))
> -	if (rtx new_x = find_single_def_src (REGNO (x)))
> +	if (rtx new_x = df_find_single_def_src (x))
>  	  {
>  	    *expr = simplify_replace_rtx (*expr, x, new_x);
>  	    goto repeat;
> diff --git a/gcc/testsuite/g++.target/i386/pr105638.C b/gcc/testsuite/g++.target/i386/pr105638.C
> new file mode 100644
> index 00000000000..ff40a459de1
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr105638.C
> @@ -0,0 +1,44 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-std=gnu++20 -O2 -march=skylake" } */
> +/* { dg-final { scan-assembler-not "vpxor" } } */
> +
> +#include <stdint.h>
> +#include <vector>
> +#include <tr1/array>
> +
> +class FastBoard {
> +public:
> +    typedef std::pair<int, int> movescore_t;
> +    typedef std::tr1::array<movescore_t, 24> scoredlist_t;
> +    
> +protected:
> +    std::vector<int> m_critical;
> +
> +    int m_boardsize;    
> +};
> +
> +class FastState {
> +public:        
> +    FastBoard board;
> +    
> +    int movenum;              
> +protected:
> +    FastBoard::scoredlist_t scoredmoves;
> +};
> +
> +class KoState : public FastState {
> +private:         
> +    std::vector<uint64_t> ko_hash_history;   
> +    std::vector<uint64_t> hash_history;     
> +};
> +
> +class GameState : public KoState {
> +public:                    
> +    void foo ();      
> +private:
> +    std::vector<KoState> game_history;                          
> +};
> +
> +void GameState::foo() {
> +    game_history.resize(movenum);
> +}

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4] DSE: Use the constant store source if possible
  2022-06-01  7:20                 ` Richard Sandiford
@ 2022-06-01 21:07                   ` H.J. Lu
  0 siblings, 0 replies; 21+ messages in thread
From: H.J. Lu @ 2022-06-01 21:07 UTC (permalink / raw)
  To: Gcc-patches, Richard Biener, Jeff Law, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 9596 bytes --]

On Wed, Jun 1, 2022 at 12:20 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> "H.J. Lu" <hjl.tools@gmail.com> writes:
> > On Mon, May 30, 2022 at 09:35:43AM +0100, Richard Sandiford wrote:
> >> "H.J. Lu" <hjl.tools@gmail.com> writes:
> >> > ---
> >> > RTL DSE tracks redundant constant stores within a basic block.  When RTL
> >> > loop invariant motion hoists a constant initialization out of the loop
> >> > into a separate basic block, the constant store value becomes unknown
> >> > within the original basic block.  When recording store for RTL DSE, check
> >> > if the source register is set only once to a constant by a non-partial
> >> > unconditional load.  If yes, record the constant as the constant store
> >> > source.  It eliminates unrolled zero stores after memset 0 in a loop
> >> > where a vector register is used as the zero store source.
> >> >
> >> > Extract find_single_def_src from loop-iv.cc and move it to df-core.cc:
> >> >
> >> > 1. Rename to df_find_single_def_src.
> >> > 2. Change the argument to rtx and use rtx_equal_p.
> >> > 3. Return null for partial or conditional defs.
> >> >
> >> > gcc/
> >> >
> >> >    PR rtl-optimization/105638
> >> >    * df-core.cc (df_find_single_def_sr): Moved and renamed from
> >> >    find_single_def_src in loop-iv.cc.  Change the argument to rtx
> >> >    and use rtx_equal_p.  Return null for partial or conditional
> >> >    defs.
> >> >    * df.h (df_find_single_def_src): New prototype.
> >> >    * dse.cc (record_store): Use the constant source if the source
> >> >    register is set only once.
> >> >    * loop-iv.cc (find_single_def_src): Moved to df-core.cc.
> >> >    (replace_single_def_regs): Replace find_single_def_src with
> >> >    df_find_single_def_src.
> >> >
> >> > gcc/testsuite/
> >> >
> >> >    PR rtl-optimization/105638
> >> >    * g++.target/i386/pr105638.C: New test.
> >> > ---
> >> >  gcc/df-core.cc                           | 44 +++++++++++++++++++++++
> >> >  gcc/df.h                                 |  1 +
> >> >  gcc/dse.cc                               | 14 ++++++++
> >> >  gcc/loop-iv.cc                           | 45 +-----------------------
> >> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 +++++++++++++++++++++++
> >> >  5 files changed, 104 insertions(+), 44 deletions(-)
> >> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> >> >
> >> > diff --git a/gcc/df-core.cc b/gcc/df-core.cc
> >> > index a901b84878f..f9b4de8eb7a 100644
> >> > --- a/gcc/df-core.cc
> >> > +++ b/gcc/df-core.cc
> >> > @@ -2009,6 +2009,50 @@ df_reg_used (rtx_insn *insn, rtx reg)
> >> >    return df_find_use (insn, reg) != NULL;
> >> >  }
> >> >
> >> > +/* If REG has a single definition, return its known value, otherwise return
> >> > +   null.  */
> >> > +
> >> > +rtx
> >> > +df_find_single_def_src (rtx reg)
> >> > +{
> >> > +  rtx src = NULL_RTX;
> >> > +
> >> > +  /* Don't look through unbounded number of single definition REG copies,
> >> > +     there might be loops for sources with uninitialized variables.  */
> >> > +  for (int cnt = 0; cnt < 128; cnt++)
> >> > +    {
> >> > +      df_ref adef = DF_REG_DEF_CHAIN (REGNO (reg));
> >> > +      if (adef == NULL || DF_REF_NEXT_REG (adef) != NULL
> >> > +    || DF_REF_IS_ARTIFICIAL (adef)
> >> > +    || (DF_REF_FLAGS (adef)
> >> > +        & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
> >> > +  return NULL_RTX;
> >> > +
> >> > +      rtx set = single_set (DF_REF_INSN (adef));
> >> > +      if (set == NULL || !rtx_equal_p (SET_DEST (set), reg))
> >> > +  return NULL_RTX;
> >> > +
> >> > +      rtx note = find_reg_equal_equiv_note (DF_REF_INSN (adef));
> >> > +      if (note && function_invariant_p (XEXP (note, 0)))
> >> > +  {
> >> > +    src = XEXP (note, 0);
> >> > +    break;
> >> > +  }
> >>
> >> Seems simpler to return this directly, rather than break and then
> >> check function_invariant_p again.
> >
> > Fixed.
> >
> >>
> >> > +      src = SET_SRC (set);
> >> > +
> >> > +      if (REG_P (src))
> >> > +  {
> >> > +    reg = src;
> >> > +    continue;
> >> > +  }
> >> > +      break;
> >> > +    }
> >> > +  if (!function_invariant_p (src))
> >> > +    return NULL_RTX;
> >> > +
> >> > +  return src;
> >> > +}
> >> > +
> >> >
> >> >  /*----------------------------------------------------------------------------
> >> >     Debugging and printing functions.
> >> > diff --git a/gcc/df.h b/gcc/df.h
> >> > index bd329205d08..71e249ad20a 100644
> >> > --- a/gcc/df.h
> >> > +++ b/gcc/df.h
> >> > @@ -991,6 +991,7 @@ extern df_ref df_find_def (rtx_insn *, rtx);
> >> >  extern bool df_reg_defined (rtx_insn *, rtx);
> >> >  extern df_ref df_find_use (rtx_insn *, rtx);
> >> >  extern bool df_reg_used (rtx_insn *, rtx);
> >> > +extern rtx df_find_single_def_src (rtx);
> >> >  extern void df_worklist_dataflow (struct dataflow *,bitmap, int *, int);
> >> >  extern void df_print_regset (FILE *file, const_bitmap r);
> >> >  extern void df_print_word_regset (FILE *file, const_bitmap r);
> >> > diff --git a/gcc/dse.cc b/gcc/dse.cc
> >> > index 30c11cee034..c915266f025 100644
> >> > --- a/gcc/dse.cc
> >> > +++ b/gcc/dse.cc
> >> > @@ -1508,6 +1508,20 @@ record_store (rtx body, bb_info_t bb_info)
> >> >
> >> >      if (tem && CONSTANT_P (tem))
> >> >        const_rhs = tem;
> >> > +    else
> >> > +      {
> >> > +        /* If RHS is set only once to a constant, set CONST_RHS
> >> > +           to the constant.  */
> >> > +        rtx def_src = df_find_single_def_src (rhs);
> >> > +        if (def_src != nullptr && CONSTANT_P (def_src))
> >> > +          {
> >> > +            df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> >> > +            rtx_insn *def_insn = DF_REF_INSN (def);
> >> > +            rtx def_body = single_set (def_insn);
> >> > +            if (rhs == SET_DEST (def_body))
> >> > +              const_rhs = def_src;
> >> > +          }
> >> > +      }
> >>
> >> This shouldn't be necessary now: we can rely on def_src being correct.
> >>
> >> I.e. I think this can just be:
> >>
> >>            rtx def_src = df_find_single_def_src (rhs);
> >>            if (def_src != nullptr && CONSTANT_P (def_src))
> >>              const_rhs = def_src;
> >
> > Fixed.
> >
> >>
> >> Thanks,
> >> Richard
> >>
> >
> > Here is the v4 patch.  OK for master?
> >
> > Thanks.
> >
> >
> > H.J.
> > ---
> > RTL DSE tracks redundant constant stores within a basic block.  When RTL
> > loop invariant motion hoists a constant initialization out of the loop
> > into a separate basic block, the constant store value becomes unknown
> > within the original basic block.  When recording store for RTL DSE, check
> > if the source register is set only once to a constant by a non-partial
> > unconditional load.  If yes, record the constant as the constant store
> > source.  It eliminates unrolled zero stores after memset 0 in a loop
> > where a vector register is used as the zero store source.
> >
> > gcc/
> >
> >       PR rtl-optimization/105638
> >       * df-core.cc (df_find_single_def_src): Moved and renamed from
> >       find_single_def_src in loop-iv.cc.  Change the argument to rtx
> >       and use rtx_equal_p.  Return null for partial or conditional
> >       defs.
> >       * df.h (df_find_single_def_src): New prototype.
> >       * dse.cc (record_store): Use the constant source if the source
> >       register is set only once.
> >       * loop-iv.cc (find_single_def_src): Moved to df-core.cc.
> >       (replace_single_def_regs): Replace find_single_def_src with
> >       df_find_single_def_src.
> >
> > gcc/testsuite/
> >
> >       PR rtl-optimization/105638
> >       * g++.target/i386/pr105638.C: New test.
> > ---
> >  gcc/df-core.cc                           | 44 +++++++++++++++++++++++
> >  gcc/df.h                                 |  1 +
> >  gcc/dse.cc                               |  8 +++++
> >  gcc/loop-iv.cc                           | 45 +-----------------------
> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 +++++++++++++++++++++++
> >  5 files changed, 98 insertions(+), 44 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> >
> > diff --git a/gcc/df-core.cc b/gcc/df-core.cc
> > index a901b84878f..e56d9bcf760 100644
> > --- a/gcc/df-core.cc
> > +++ b/gcc/df-core.cc
> > @@ -2009,6 +2009,50 @@ df_reg_used (rtx_insn *insn, rtx reg)
> >    return df_find_use (insn, reg) != NULL;
> >  }
> >
> > +/* If REG has a single definition, return its known value, otherwise return
> > +   null.  */
> > +
> > +rtx
> > +df_find_single_def_src (rtx reg)
> > +{
> > +  rtx src = NULL_RTX;
> > +
> > +  /* Don't look through unbounded number of single definition REG copies,
> > +     there might be loops for sources with uninitialized variables.  */
> > +  for (int cnt = 0; cnt < 128; cnt++)
> > +    {
> > +      df_ref adef = DF_REG_DEF_CHAIN (REGNO (reg));
> > +      if (adef == NULL || DF_REF_NEXT_REG (adef) != NULL
> > +       || DF_REF_IS_ARTIFICIAL (adef)
> > +       || (DF_REF_FLAGS (adef)
> > +           & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
> > +     return NULL_RTX;
> > +
> > +      rtx set = single_set (DF_REF_INSN (adef));
> > +      if (set == NULL || !rtx_equal_p (SET_DEST (set), reg))
> > +     return NULL_RTX;
> > +
> > +      rtx note = find_reg_equal_equiv_note (DF_REF_INSN (adef));
> > +      if (note && function_invariant_p (XEXP (note, 0)))
> > +     {
> > +       src = XEXP (note, 0);
> > +       return src;
>
> Seems simpler without the assignment:
>
>     return XEXP (note, 0);
>
> OK with that change, thanks.

Fixed.

> Richard
>
>

This is the patch I am checking in.

Thanks.

-- 
H.J.

[-- Attachment #2: 0001-DSE-Use-the-constant-store-source-if-possible.patch --]
[-- Type: text/x-patch, Size: 7146 bytes --]

From 99d1fe3790275e0eaa588637f90f0ff1e5b8d117 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Wed, 18 May 2022 13:00:47 -0700
Subject: [PATCH] DSE: Use the constant store source if possible

RTL DSE tracks redundant constant stores within a basic block.  When RTL
loop invariant motion hoists a constant initialization out of the loop
into a separate basic block, the constant store value becomes unknown
within the original basic block.  When recording store for RTL DSE, check
if the source register is set only once to a constant by a non-partial
unconditional load.  If yes, record the constant as the constant store
source.  It eliminates unrolled zero stores after memset 0 in a loop
where a vector register is used as the zero store source.

gcc/

	PR rtl-optimization/105638
	* df-core.cc (df_find_single_def_src): Moved and renamed from
	find_single_def_src in loop-iv.cc.  Change the argument to rtx
	and use rtx_equal_p.  Return null for partial or conditional
	defs.
	* df.h (df_find_single_def_src): New prototype.
	* dse.cc (record_store): Use the constant source if the source
	register is set only once.
	* loop-iv.cc (find_single_def_src): Moved to df-core.cc.
	(replace_single_def_regs): Replace find_single_def_src with
	df_find_single_def_src.

gcc/testsuite/

	PR rtl-optimization/105638
	* g++.target/i386/pr105638.C: New test.
---
 gcc/df-core.cc                           | 41 +++++++++++++++++++++
 gcc/df.h                                 |  1 +
 gcc/dse.cc                               |  8 +++++
 gcc/loop-iv.cc                           | 45 +-----------------------
 gcc/testsuite/g++.target/i386/pr105638.C | 44 +++++++++++++++++++++++
 5 files changed, 95 insertions(+), 44 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C

diff --git a/gcc/df-core.cc b/gcc/df-core.cc
index a901b84878f..e3a56bf6a9f 100644
--- a/gcc/df-core.cc
+++ b/gcc/df-core.cc
@@ -2009,6 +2009,47 @@ df_reg_used (rtx_insn *insn, rtx reg)
   return df_find_use (insn, reg) != NULL;
 }
 
+/* If REG has a single definition, return its known value, otherwise return
+   null.  */
+
+rtx
+df_find_single_def_src (rtx reg)
+{
+  rtx src = NULL_RTX;
+
+  /* Don't look through unbounded number of single definition REG copies,
+     there might be loops for sources with uninitialized variables.  */
+  for (int cnt = 0; cnt < 128; cnt++)
+    {
+      df_ref adef = DF_REG_DEF_CHAIN (REGNO (reg));
+      if (adef == NULL || DF_REF_NEXT_REG (adef) != NULL
+	  || DF_REF_IS_ARTIFICIAL (adef)
+	  || (DF_REF_FLAGS (adef)
+	      & (DF_REF_PARTIAL | DF_REF_CONDITIONAL)))
+	return NULL_RTX;
+
+      rtx set = single_set (DF_REF_INSN (adef));
+      if (set == NULL || !rtx_equal_p (SET_DEST (set), reg))
+	return NULL_RTX;
+
+      rtx note = find_reg_equal_equiv_note (DF_REF_INSN (adef));
+      if (note && function_invariant_p (XEXP (note, 0)))
+	return XEXP (note, 0);
+      src = SET_SRC (set);
+
+      if (REG_P (src))
+	{
+	  reg = src;
+	  continue;
+	}
+      break;
+    }
+  if (!function_invariant_p (src))
+    return NULL_RTX;
+
+  return src;
+}
+
 \f
 /*----------------------------------------------------------------------------
    Debugging and printing functions.
diff --git a/gcc/df.h b/gcc/df.h
index bd329205d08..71e249ad20a 100644
--- a/gcc/df.h
+++ b/gcc/df.h
@@ -991,6 +991,7 @@ extern df_ref df_find_def (rtx_insn *, rtx);
 extern bool df_reg_defined (rtx_insn *, rtx);
 extern df_ref df_find_use (rtx_insn *, rtx);
 extern bool df_reg_used (rtx_insn *, rtx);
+extern rtx df_find_single_def_src (rtx);
 extern void df_worklist_dataflow (struct dataflow *,bitmap, int *, int);
 extern void df_print_regset (FILE *file, const_bitmap r);
 extern void df_print_word_regset (FILE *file, const_bitmap r);
diff --git a/gcc/dse.cc b/gcc/dse.cc
index 30c11cee034..994c60dc189 100644
--- a/gcc/dse.cc
+++ b/gcc/dse.cc
@@ -1508,6 +1508,14 @@ record_store (rtx body, bb_info_t bb_info)
 
 	  if (tem && CONSTANT_P (tem))
 	    const_rhs = tem;
+	  else
+	    {
+	      /* If RHS is set only once to a constant, set CONST_RHS
+		 to the constant.  */
+	      rtx def_src = df_find_single_def_src (rhs);
+	      if (def_src != nullptr && CONSTANT_P (def_src))
+		const_rhs = def_src;
+	    }
 	}
     }
 
diff --git a/gcc/loop-iv.cc b/gcc/loop-iv.cc
index 0eafe7d2362..d639336445a 100644
--- a/gcc/loop-iv.cc
+++ b/gcc/loop-iv.cc
@@ -1378,49 +1378,6 @@ simple_rhs_p (rtx rhs)
     }
 }
 
-/* If REGNO has a single definition, return its known value, otherwise return
-   null.  */
-
-static rtx
-find_single_def_src (unsigned int regno)
-{
-  rtx src = NULL_RTX;
-
-  /* Don't look through unbounded number of single definition REG copies,
-     there might be loops for sources with uninitialized variables.  */
-  for (int cnt = 0; cnt < 128; cnt++)
-    {
-      df_ref adef = DF_REG_DEF_CHAIN (regno);
-      if (adef == NULL || DF_REF_NEXT_REG (adef) != NULL
-	  || DF_REF_IS_ARTIFICIAL (adef))
-	return NULL_RTX;
-
-      rtx set = single_set (DF_REF_INSN (adef));
-      if (set == NULL || !REG_P (SET_DEST (set))
-	  || REGNO (SET_DEST (set)) != regno)
-	return NULL_RTX;
-
-      rtx note = find_reg_equal_equiv_note (DF_REF_INSN (adef));
-      if (note && function_invariant_p (XEXP (note, 0)))
-	{
-	  src = XEXP (note, 0);
-	  break;
-	}
-      src = SET_SRC (set);
-
-      if (REG_P (src))
-	{
-	  regno = REGNO (src);
-	  continue;
-	}
-      break;
-    }
-  if (!function_invariant_p (src))
-    return NULL_RTX;
-
-  return src;
-}
-
 /* If any registers in *EXPR that have a single definition, try to replace
    them with the known-equivalent values.  */
 
@@ -1433,7 +1390,7 @@ replace_single_def_regs (rtx *expr)
     {
       rtx x = *iter;
       if (REG_P (x))
-	if (rtx new_x = find_single_def_src (REGNO (x)))
+	if (rtx new_x = df_find_single_def_src (x))
 	  {
 	    *expr = simplify_replace_rtx (*expr, x, new_x);
 	    goto repeat;
diff --git a/gcc/testsuite/g++.target/i386/pr105638.C b/gcc/testsuite/g++.target/i386/pr105638.C
new file mode 100644
index 00000000000..ff40a459de1
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr105638.C
@@ -0,0 +1,44 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-std=gnu++20 -O2 -march=skylake" } */
+/* { dg-final { scan-assembler-not "vpxor" } } */
+
+#include <stdint.h>
+#include <vector>
+#include <tr1/array>
+
+class FastBoard {
+public:
+    typedef std::pair<int, int> movescore_t;
+    typedef std::tr1::array<movescore_t, 24> scoredlist_t;
+    
+protected:
+    std::vector<int> m_critical;
+
+    int m_boardsize;    
+};
+
+class FastState {
+public:        
+    FastBoard board;
+    
+    int movenum;              
+protected:
+    FastBoard::scoredlist_t scoredmoves;
+};
+
+class KoState : public FastState {
+private:         
+    std::vector<uint64_t> ko_hash_history;   
+    std::vector<uint64_t> hash_history;     
+};
+
+class GameState : public KoState {
+public:                    
+    void foo ();      
+private:
+    std::vector<KoState> game_history;                          
+};
+
+void GameState::foo() {
+    game_history.resize(movenum);
+}
-- 
2.36.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-06-01 21:08 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-21  3:01 [PATCH] DSE: Use the constant source if possible H.J. Lu
2022-05-23 10:38 ` Richard Biener
2022-05-23 18:34   ` [PATCH v2] DSE: Use the constant store " H.J. Lu
2022-05-24  6:42     ` Richard Biener
2022-05-24 20:10       ` H.J. Lu
2022-05-25  9:22         ` Richard Biener
2022-05-25  9:30           ` Richard Sandiford
2022-05-25 19:01             ` H.J. Lu
2022-05-25  7:30     ` Richard Sandiford
2022-05-25 18:56       ` H.J. Lu
2022-05-26 15:14         ` Richard Sandiford
2022-05-26 20:43           ` [PATCH v3] " H.J. Lu
2022-05-28 18:37             ` Jeff Law
2022-05-29 21:43               ` H.J. Lu
2022-05-29 22:55                 ` Jeff Law
2022-05-30  8:28                   ` Richard Sandiford
2022-05-30 22:58                     ` Jeff Law
2022-05-30  8:35             ` Richard Sandiford
2022-05-31 17:12               ` [PATCH v4] " H.J. Lu
2022-06-01  7:20                 ` Richard Sandiford
2022-06-01 21:07                   ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).