public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] xtensa: Add workaround for pSRAM cache issue in ESP32
@ 2022-10-12 19:23 Alexey Lapshin
  0 siblings, 0 replies; only message in thread
From: Alexey Lapshin @ 2022-10-12 19:23 UTC (permalink / raw)
  To: gcc-patches; +Cc: Alexey Gerenkov, Anton Maklakov, Ivan Grokhotkov

From a2b425031f5b06dd51cd3ca34fe4f3620b93a944 Mon Sep 17 00:00:00 2001
From: Jeroen Domburg <jeroen@espressif.com>
Date: Sat, 12 Aug 2017 23:10:12 +0800
Subject: [PATCH] xtensa: Add workaround for pSRAM cache issue in ESP32

Xtensa does a load/store inversion when a load and a store to the same
address is found in the 5 affected stages of the pipeline: with a load
done
_after_ the store in code, the Xtensa will move it _before_ the store
in
execution.
Unfortunately, the ESP32 pSRAM cache messes up handling these
when an interrupt happens during these. This reorg step inserts NOPs
between loads and stores so this never occurs.

Workarounds:

  ESP32_PSRAM_FIX_NOPS:
   The handling issue also shows up when doing a store to an 8 or 16-
bit
   memory location followed by a larger (16 or 32-bit) sized load from
that
   location within the time it takes to grab a cache line from external
RAM
   (which is at least 80 cycles). The cache will confuse the load and
store,
   resulting in the bytes not set by the store to be read as garbage.
To fix
   this, we insert a memory barrier with NOP instructions after each
8/16-bit
   store that isn't followed by another store.

  ESP32_PSRAM_FIX_MEMW (default):
   Explicitly insert a memory barrier instead of nops.
   Slower than nops, but faster than just adding memws everywhere.

  ESP32_PSRAM_FIX_DUPLDST:
    Explicitly insert a load after every store:
    - Instruction is s32i:
        Insert l32i from that address to the source register
immediately after,
        plus a duplicated s32i after that.
    - Instruction is s8i/s16i:
        Note and insert a memw before a load.
        (The same as ESP32_PSRAM_FIX_MEMW)
    - If any of the args are volatile, no touch:
        The memw resulting from that will fix everything.
---
 gcc/config.gcc                      |   5 +
 gcc/config/xtensa/t-esp32-psram-fix |  22 ++
 gcc/config/xtensa/xtensa-opts.h     |  34 +++
 gcc/config/xtensa/xtensa.cc         | 444 ++++++++++++++++++++++++++++
 gcc/config/xtensa/xtensa.h          |   1 +
 gcc/config/xtensa/xtensa.md         |  46 ++-
 gcc/config/xtensa/xtensa.opt        |  31 ++
 7 files changed, 580 insertions(+), 3 deletions(-)
 create mode 100644 gcc/config/xtensa/t-esp32-psram-fix
 create mode 100644 gcc/config/xtensa/xtensa-opts.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index e73cb848c2d..a407e8407f0 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3457,6 +3457,11 @@ xstormy16-*-elf)
 	extra_options=stormy16/stormy16.opt
 	tmake_file="stormy16/t-stormy16"
 	;;
+xtensa*-esp32-elf*)
+	tm_file="${tm_file} elfos.h newlib-stdint.h xtensa/elf.h"
+	tmake_file="${tmake_file} xtensa/t-esp32-psram-fix"
+	extra_options="${extra_options} xtensa/elf.opt"
+	;;
 xtensa*-*-elf*)
 	tm_file="${tm_file} elfos.h newlib-stdint.h xtensa/elf.h"
 	extra_options="${extra_options} xtensa/elf.opt"
diff --git a/gcc/config/xtensa/t-esp32-psram-fix b/gcc/config/xtensa/t-
esp32-psram-fix
new file mode 100644
index 00000000000..78fe54d4852
--- /dev/null
+++ b/gcc/config/xtensa/t-esp32-psram-fix
@@ -0,0 +1,22 @@
+# Copyright (C) 2022 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+$(out_object_file): gt-xtensa.h
+
+MULTILIB_OPTIONS = mfix-esp32-psram-cache-issue
+MULTILIB_DIRNAMES = esp32-psram
diff --git a/gcc/config/xtensa/xtensa-opts.h
b/gcc/config/xtensa/xtensa-opts.h
new file mode 100644
index 00000000000..73c2015a016
--- /dev/null
+++ b/gcc/config/xtensa/xtensa-opts.h
@@ -0,0 +1,34 @@
+/* Definitions of option handling for Tensilica's Xtensa target
machine.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Espressif <jeroen@espressif.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+
+
+#ifndef XTENSA_OPTS_H
+#define XTENSA_OPTS_H
+
+enum esp32_psram_fix_type
+{
+  ESP32_PSRAM_FIX_DUPLDST,
+  ESP32_PSRAM_FIX_MEMW,
+  ESP32_PSRAM_FIX_NOPS
+};
+
+
+#endif /* XTENSA_OPTS_H */
diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index 828c7642b7c..61ef14b1c57 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -55,6 +55,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"
 #include "hw-doloop.h"
 #include "rtl-iter.h"
+#include "tree-pass.h"
+#include "context.h"
 #include "insn-attr.h"
 
 /* This file should be included last.  */
@@ -2636,6 +2638,435 @@ xtensa_return_in_msb (const_tree valtype)
 }
 
 
+#define USEFUL_INSN_P(INSN)                                          
\
+  (NONDEBUG_INSN_P (INSN) && GET_CODE (PATTERN (INSN)) != USE        
\
+   && GET_CODE (PATTERN (INSN)) != CLOBBER)
+
+/* If INSN is a delayed branch sequence, return the first instruction
+   in the sequence, otherwise return INSN itself.  */
+#define SEQ_BEGIN(INSN)                                              
\
+  (INSN_P (INSN) && GET_CODE (PATTERN (INSN)) == SEQUENCE            
\
+       ? as_a<rtx_insn *> (XVECEXP (PATTERN (INSN), 0, 0))           
\
+       : (INSN))
+
+/* Likewise for the last instruction in a delayed branch sequence.  */
+#define SEQ_END(INSN)                                                
\
+  (INSN_P (INSN) && GET_CODE (PATTERN (INSN)) == SEQUENCE ?
as_a<rtx_insn *>   \
+      (XVECEXP (PATTERN (INSN), 0, XVECLEN (PATTERN (INSN), 0) - 1)) :
(INSN))
+
+
+/* Execute the following loop body with SUBINSN set to each
instruction
+   between SEQ_BEGIN (INSN) and SEQ_END (INSN) inclusive.  */
+#define FOR_EACH_SUBINSN(SUBINSN, INSN)                              
\
+  for ((SUBINSN) = SEQ_BEGIN (INSN); (SUBINSN) != NEXT_INSN (SEQ_END
(INSN));  \
+       (SUBINSN) = NEXT_INSN (SUBINSN))
+
+
+/* Xtensa does a load/store inversion when a load and a store to the
same
+   address is found in the 5 affected stages of the pipeline: with a
load done
+   _after_ the store in code, the Xtensa will move it _before_ the
store in
+   execution.
+   Unfortunately, the ESP32 PSRAM cache messes up handling these
+   when an interrupt happens during these. This reorg step inserts
NOPs
+   between loads and stores so this never occurs.
+
+   The handling issue also shows up when doing a store to an 8 or 16-
bit
+   memory location followed by a larger (16 or 32-bit) sized load from
that
+   location within the time it takes to grab a cacheline from external
RAM
+   (which is at least 80 cycles). The cache will confuse the load and
store,
+   resulting in the bytes not set by the store to be read as garbage.
To fix
+   this, we insert amemory barrier after each 8/16-bit store that
isn't
+   followed by another store.  */
+
+/* Affected piece of pipeline is 5 entries long;
+   The load/store itself fills one.  */
+#define LOAD_STORE_OFF 4
+
+static int insns_since_store = 0;
+static rtx_insn *store_insn = NULL;
+static rtx_insn *last_hiqi_store = NULL;
+
+static void
+handle_fix_reorg_insn (rtx_insn *insn)
+{
+  enum attr_type attr_type = get_attr_type (insn);
+  if (attr_type == TYPE_STORE || attr_type == TYPE_FSTORE)
+    {
+      rtx x = XEXP (PATTERN (insn), 0);
+      /* Store  */
+      insns_since_store = 0;
+      store_insn = insn;
+      if (attr_type == TYPE_STORE
+          && (GET_MODE (x) == HImode || GET_MODE (x) == QImode))
+        {
+          /* This is an 8/16-bit store, record it.  */
+          last_hiqi_store = insn;
+        }
+      else
+        {
+          /* 32-bit store. This store undoes the possibility of
badness in
+             earlier 8/16-bit stores because it forces those stores to
+             finish.  */
+          last_hiqi_store = NULL;
+        }
+    }
+  else if (attr_type == TYPE_LOAD || attr_type == TYPE_FLOAD)
+    {
+      /* Load  */
+      if (store_insn)
+        {
+          while (insns_since_store++ < LOAD_STORE_OFF)
+            {
+              emit_insn_before (gen_nop (), insn);
+            }
+        }
+    }
+  else if (attr_type == TYPE_JUMP || attr_type == TYPE_CALL)
+    {
+      enum attr_condjmp attr_condjmp = get_attr_condjmp (insn);
+      if (attr_condjmp == CONDJMP_UNCOND)
+        {
+          /* Pipeline gets cleared; any load is inconsequential.  */
+          store_insn = NULL;
+        }
+    }
+  else
+    {
+      insns_since_store++;
+    }
+  if (attr_type == TYPE_LOAD || attr_type == TYPE_FLOAD
+      || attr_type == TYPE_JUMP || attr_type == TYPE_CALL)
+    {
+      if (last_hiqi_store)
+        {
+          /* Need to memory barrier the s8i/s16i instruction.  */
+          emit_insn_after (gen_memory_barrier (), last_hiqi_store);
+          last_hiqi_store = NULL;
+        }
+    }
+}
+
+static void
+xtensa_psram_cache_fix_nop_reorg ()
+{
+  rtx_insn *insn, *subinsn, *next_insn;
+  for (insn = get_insns (); insn != 0; insn = next_insn)
+    {
+      next_insn = NEXT_INSN (insn);
+      int length = get_attr_length (insn);
+
+      if (USEFUL_INSN_P (insn) && length > 0)
+        {
+          FOR_EACH_SUBINSN (subinsn, insn)
+          {
+            handle_fix_reorg_insn (subinsn);
+          }
+        }
+    }
+}
+
+/* Alternative fix to xtensa_psram_cache_fix_reorg. Tries to solve the
32-bit
+   load/store inversion by explicitly inserting a memory barrier
instead of
+   nops.
+   Slower than nops, but faster than just adding memws everywhere.  */
+
+static void
+handle_fix_reorg_memw (rtx_insn *insn)
+{
+  enum attr_type attr_type = get_attr_type (insn);
+  rtx x = XEXP (PATTERN (insn), 0);
+  if (attr_type == TYPE_STORE || attr_type == TYPE_FSTORE)
+    {
+      /* Store  */
+      insns_since_store = 0;
+      store_insn = insn;
+      if (attr_type == TYPE_STORE
+          && (GET_MODE (x) == HImode || GET_MODE (x) == QImode))
+        {
+          /* This is an 8/16-bit store, record it if it's not volatile
+             already.  */
+          if (!MEM_VOLATILE_P (x))
+            last_hiqi_store = insn;
+        }
+    }
+  else if (attr_type == TYPE_LOAD || attr_type == TYPE_FLOAD)
+    {
+      /* Load  */
+      if (MEM_P (x) && (!MEM_VOLATILE_P (x)))
+        {
+          if (store_insn)
+            {
+              emit_insn_before (gen_memory_barrier (), insn);
+              store_insn = NULL;
+            }
+        }
+    }
+  else if (attr_type == TYPE_JUMP || attr_type == TYPE_CALL)
+    {
+      enum attr_condjmp attr_condjmp = get_attr_condjmp (insn);
+      if (attr_condjmp == CONDJMP_UNCOND)
+        {
+          /* jump or return
+             Unconditional jumps seem to not clear the pipeline, and
there may
+             be a load after. Need to memw if earlier code had a
store.  */
+          if (store_insn)
+            {
+              emit_insn_before (gen_memory_barrier (), insn);
+              store_insn = NULL;
+            }
+        }
+    }
+  else
+    {
+      insns_since_store++;
+    }
+  if (attr_type == TYPE_LOAD || attr_type == TYPE_FLOAD
+      || attr_type == TYPE_JUMP || attr_type == TYPE_CALL)
+    {
+      if (last_hiqi_store)
+        {
+          /* Need to memory barrier the s8i/s16i instruction.  */
+          emit_insn_after (gen_memory_barrier (), last_hiqi_store);
+          last_hiqi_store = NULL;
+        }
+    }
+}
+
+static void
+xtensa_psram_cache_fix_memw_reorg ()
+{
+  rtx_insn *insn, *subinsn, *next_insn;
+  for (insn = get_insns (); insn != 0; insn = next_insn)
+    {
+      next_insn = NEXT_INSN (insn);
+      int length = get_attr_length (insn);
+
+      if (USEFUL_INSN_P (insn) && length > 0)
+        {
+          FOR_EACH_SUBINSN (subinsn, insn)
+          {
+            handle_fix_reorg_memw (subinsn);
+          }
+        }
+    }
+}
+
+/* Alternative fix to xtensa_psram_cache_fix_reorg. Tries to solve the
32-bit
+   load/store inversion by explicitly inserting a load after every
store.
+
+  For now, the logic is:
+  - Instruction is s32i:
+      Insert l32i from that address to the source register immediately
after,
+      plus a duplicated s32i after that.
+  - Instruction is s8i/s16i:
+      Note and insert a memw before a load.
+      (The same as xtensa_psram_cache_fix_reorg)
+  - If any of the args are volatile, no touch:
+      The memw resulting from that will fix everything.
+
+  Note: debug_rtx(insn) can dump an insn in lisp-like format.
+*/
+
+static void
+handle_fix_dupldst_store (rtx_insn *insn, enum attr_type attr_type)
+{
+  rtx x = XEXP (PATTERN (insn), 0);
+  /* Store  */
+  if (attr_type == TYPE_STORE
+      && (GET_MODE (x) == HImode || GET_MODE (x) == QImode))
+    {
+      /* This is an 8/16-bit store, record it if it's not volatile
already.  */
+      if (!MEM_VOLATILE_P (x))
+        last_hiqi_store = insn;
+    }
+  else
+    {
+      /* 32-bit store.
+         Add a load-after-store to fix psram issues *if* var is not
volatile */
+      if (MEM_P (x) && (!MEM_VOLATILE_P (x)))
+        {
+          rtx y = XEXP (PATTERN (insn), 1);
+          if (REG_P (y) && XINT (y, 0) == 1)
+            {
+              /* Store SP in mem? Can't movsi that back.
+                 Insert memory barrier instead.  */
+              emit_insn_after (gen_memory_barrier (), insn);
+            }
+          else
+            {
+              /* Add the load/store.
+                 Note: the instructions will be added in the OPPOSITE
order as
+                 the instructions are added between the s32i and the
next
+                 instruction:
+                 1:
+                   s32i(insn), s32i;
+                 2:
+                   s32i(insn), l32i, s32i;  */
+              /* Store again  */
+              emit_insn_after (gen_movsi (x, y), insn);
+              /* Load  */
+              emit_insn_after (gen_movsi (x, y), insn);
+            }
+        }
+    }
+}
+
+static void
+handle_fix_dupldst_reorg (rtx_insn *insn)
+{
+  enum attr_type attr_type = get_attr_type (insn);
+  if (attr_type == TYPE_STORE || attr_type == TYPE_FSTORE)
+    {
+      handle_fix_dupldst_store (insn, attr_type);
+    }
+
+  if (attr_type == TYPE_LOAD || attr_type == TYPE_FLOAD
+      || attr_type == TYPE_JUMP || attr_type == TYPE_CALL)
+    {
+      if (last_hiqi_store)
+        {
+          /* Need to memory barrier the s8i/s16i instruction.  */
+          emit_insn_after (gen_memory_barrier (), last_hiqi_store);
+          last_hiqi_store = NULL;
+        }
+    }
+}
+
+static void
+xtensa_psram_cache_fix_dupldst_reorg ()
+{
+  rtx_insn *insn, *subinsn, *next_insn;
+  last_hiqi_store = NULL;
+  for (insn = get_insns (); insn != 0; insn = next_insn)
+    {
+      next_insn = NEXT_INSN (insn);
+      int length = get_attr_length (insn);
+
+      if (USEFUL_INSN_P (insn) && length > 0)
+        {
+          FOR_EACH_SUBINSN (subinsn, insn)
+          {
+            handle_fix_dupldst_reorg (insn);
+          }
+        }
+    }
+}
+
+/* Emits a memw before every load/store instruction.
+   Hard-handed approach to get rid of any pipeline/memory issues... 
*/
+static void
+xtensa_insert_memw_reorg ()
+{
+  rtx_insn *insn, *subinsn, *next_insn;
+  int had_memw = 0;
+  for (insn = get_insns (); insn != 0; insn = next_insn)
+    {
+      next_insn = NEXT_INSN (insn);
+      int length = get_attr_length (insn);
+
+      if (USEFUL_INSN_P (insn) && length > 0)
+        {
+          FOR_EACH_SUBINSN (subinsn, insn)
+          {
+            rtx x = XEXP (PATTERN (subinsn), 0);
+            enum attr_type attr_type = get_attr_type (subinsn);
+            if (attr_type == TYPE_STORE)
+              {
+                if (MEM_P (x) && (!MEM_VOLATILE_P (x)))
+                  {
+                    emit_insn_after (gen_memory_barrier (), subinsn);
+                  }
+                had_memw = 1;
+              }
+            else if (attr_type == TYPE_LOAD)
+              {
+                if (MEM_P (x) && (!MEM_VOLATILE_P (x)) && !had_memw)
+                  {
+                    emit_insn_before (gen_memory_barrier (), subinsn);
+                  }
+                had_memw = 0;
+              }
+            else
+              {
+                had_memw = 0;
+              }
+          }
+        }
+    }
+}
+
+static unsigned int
+xtensa_machine_reorg (void)
+{
+  if (TARGET_ESP32_ALWAYS_MEMBARRIER)
+    {
+      xtensa_insert_memw_reorg ();
+    }
+  if (TARGET_ESP32_PSRAM_FIX_ENA)
+    {
+      if (esp32_psram_fix_strat == ESP32_PSRAM_FIX_DUPLDST)
+        {
+          xtensa_psram_cache_fix_dupldst_reorg ();
+        }
+      else if (esp32_psram_fix_strat == ESP32_PSRAM_FIX_MEMW)
+        {
+          xtensa_psram_cache_fix_memw_reorg ();
+        }
+      else if (esp32_psram_fix_strat == ESP32_PSRAM_FIX_NOPS)
+        {
+          xtensa_psram_cache_fix_nop_reorg ();
+        }
+      else
+        {
+          /* default to memw (note: 5.2.x defaulted to nops)  */
+          xtensa_psram_cache_fix_memw_reorg ();
+        }
+    }
+  return 0;
+}
+
+namespace
+{
+
+const pass_data pass_data_xtensa_psram_nops =
+{
+  RTL_PASS,           /* type */
+  "xtensa-psram-adj", /* name */
+  OPTGROUP_NONE,      /* optinfo_flags */
+  TV_MACH_DEP,        /* tv_id */
+  0,                  /* properties_required */
+  0,                  /* properties_provided */
+  0,                  /* properties_destroyed */
+  0,                  /* todo_flags_start */
+  0,                  /* todo_flags_finish */
+};
+
+class pass_xtensa_psram_nops : public rtl_opt_pass
+{
+public:
+  pass_xtensa_psram_nops (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_xtensa_psram_nops, ctxt)
+  {
+  }
+
+  /* opt_pass methods: */
+  virtual unsigned int
+  execute (function *)
+  {
+    return xtensa_machine_reorg ();
+  }
+
+}; /* class pass_mips_machine_reorg2  */
+
+} /* anon namespace  */
+
+rtl_opt_pass *
+make_pass_xtensa_psram_nops (gcc::context *ctxt)
+{
+  return new pass_xtensa_psram_nops (ctxt);
+}
+
+
 static void
 xtensa_option_override (void)
 {
@@ -2707,6 +3138,19 @@ xtensa_option_override (void)
   if (flag_pic && !flag_pie)
     flag_shlib = 1;
 
+  /* Register machine specific reorg for optional nop insertion to
+     fix psram cache bug on esp32 v0/v1 silicon  */
+  opt_pass *new_pass = make_pass_xtensa_psram_nops (g);
+  struct register_pass_info insert_pass_xtensa_psram_nops =
+    {
+      new_pass,		/* pass */
+      "dbr",			/* reference_pass_name */
+      1,			/* ref_pass_instance_number */
+      PASS_POS_INSERT_AFTER	/* po_op */
+    };
+  register_pass (&insert_pass_xtensa_psram_nops);
+
+
   /* Hot/cold partitioning does not work on this architecture, because
of
      constant pools (the load instruction cannot necessarily reach
that far).
      Therefore disable it on this architecture.  */
diff --git a/gcc/config/xtensa/xtensa.h b/gcc/config/xtensa/xtensa.h
index 16e3d55e896..21c038ca3d7 100644
--- a/gcc/config/xtensa/xtensa.h
+++ b/gcc/config/xtensa/xtensa.h
@@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see
 
 /* Get Xtensa configuration settings */
 #include "xtensa-config.h"
+#include "xtensa-opts.h"
 
 /* External variables defined in xtensa.cc.  */
 
diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 608110c20bc..e8013987dbf 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -97,6 +97,10 @@
   "unknown,none,QI,HI,SI,DI,SF,DF,BL"
   (const_string "unknown"))
 
+(define_attr "condjmp"
+  "na,cond,uncond"
+  (const_string "na"))
+
 (define_attr "length" "" (const_int 1))
 
 ;; Describe a user's asm statement.
@@ -115,14 +119,38 @@
 ;; reservations in the pipeline description below.  The Xtensa can
 ;; issue one instruction per cycle, so defining CPU units is
unnecessary.
 
+(define_cpu_unit "loadstore")
+
 (define_insn_reservation "xtensa_any_insn" 1
-			 (eq_attr "type"
"!load,fload,rsr,mul16,mul32,fmadd,fconv")
+			 (eq_attr "type"
"!load,fload,store,fstore,rsr,mul16,mul32,fmadd,fconv")
+			 "nothing")
+
+(define_insn_reservation "xtensa_memory_load" 2
+			 (and (not (match_test
"TARGET_ESP32_PSRAM_FIX_ENA"))
+			 (eq_attr "type" "load,fload"))
 			 "nothing")
 
-(define_insn_reservation "xtensa_memory" 2
-			 (eq_attr "type" "load,fload")
+(define_insn_reservation "xtensa_memory_store" 1
+			 (and (not (match_test
"TARGET_ESP32_PSRAM_FIX_ENA"))
+			 (eq_attr "type" "store,fstore"))
 			 "nothing")
 
+;; If psram cache issue needs fixing, it's better to keep
+;; stores far from loads from the same address. We cannot encode
+;; that behaviour entirely here (or maybe we can, but at least
+;; not easily), but we can try to get everything that smells like
+;; load or store up to a pipeline length apart from each other.
+
+(define_insn_reservation "xtensa_memory_load_psram_fix" 2
+			 (and (match_test
"TARGET_ESP32_PSRAM_FIX_ENA")
+			 (eq_attr "type" "load,fload"))
+			 "loadstore*5")
+
+(define_insn_reservation "xtensa_memory_store_psram_fix" 1
+			 (and (match_test
"TARGET_ESP32_PSRAM_FIX_ENA")
+			 (eq_attr "type" "store,fstore"))
+			 "loadstore*5")
+
 (define_insn_reservation "xtensa_sreg" 2
 			 (eq_attr "type" "rsr")
 			 "nothing")
@@ -1616,6 +1644,7 @@
 }
   [(set_attr "type"	"jump,jump")
    (set_attr "mode"	"none")
+   (set_attr "condjmp" "cond")
    (set_attr "length"	"3,3")])
 
 (define_insn "*ubtrue"
@@ -1631,6 +1660,7 @@
 }
   [(set_attr "type"	"jump,jump")
    (set_attr "mode"	"none")
+   (set_attr "condjmp" "cond")
    (set_attr "length"	"3,3")])
 
 ;; Branch patterns for bit testing
@@ -1665,6 +1695,7 @@
 }
   [(set_attr "type"	"jump")
    (set_attr "mode"	"none")
+   (set_attr "condjmp" "cond")
    (set_attr "length"	"3")])
 
 (define_insn "*masktrue"
@@ -1686,6 +1717,7 @@
 }
   [(set_attr "type"	"jump")
    (set_attr "mode"	"none")
+   (set_attr "condjmp" "cond")
    (set_attr "length"	"3")])
 
 (define_insn "*masktrue_bitcmpl"
@@ -1707,6 +1739,7 @@
 }
   [(set_attr "type"	"jump")
    (set_attr "mode"	"none")
+   (set_attr "condjmp" "cond")
    (set_attr "length"	"3")])
 
 (define_insn_and_split "*masktrue_const_bitcmpl"
@@ -1932,6 +1965,7 @@
   "loop\t%0, %l1_LEND"
   [(set_attr "type"	"jump")
    (set_attr "mode"	"none")
+   (set_attr "condjmp" "cond")
    (set_attr "length"	"3")])
 
 (define_insn "zero_cost_loop_end"
@@ -1949,6 +1983,7 @@
   "#"
   [(set_attr "type"	"jump")
    (set_attr "mode"	"none")
+   (set_attr "condjmp" "cond")
    (set_attr "length"	"0")])
 
 (define_insn "loop_end"
@@ -1968,6 +2003,7 @@
 }
   [(set_attr "type"	"jump")
    (set_attr "mode"	"none")
+   (set_attr "condjmp" "cond")
    (set_attr "length"	"0")])
 
 (define_split
@@ -2303,6 +2339,7 @@
 }
   [(set_attr "type"	"call")
    (set_attr "mode"	"none")
+   (set_attr "condjmp" "uncond")
    (set_attr "length"	"3")])
 
 (define_expand "untyped_call"
@@ -2347,6 +2384,7 @@
 }
   [(set_attr "type"	"jump")
    (set_attr "mode"	"none")
+   (set_attr "condjmp" "uncond")
    (set (attr "length")
 	(if_then_else (match_test "TARGET_DENSITY")
 		      (const_int 2)
@@ -2653,6 +2691,7 @@
 }
   [(set_attr "type"	"jump")
    (set_attr "mode"	"none")
+   (set_attr "condjmp" "cond")
    (set_attr "length"	"3")])
 
 (define_insn "*boolfalse"
@@ -2671,6 +2710,7 @@
 }
   [(set_attr "type"	"jump")
    (set_attr "mode"	"none")
+   (set_attr "condjmp" "cond")
    (set_attr "length"	"3")])
 
 \f
diff --git a/gcc/config/xtensa/xtensa.opt
b/gcc/config/xtensa/xtensa.opt
index 08338e39060..3696a7dd5fe 100644
--- a/gcc/config/xtensa/xtensa.opt
+++ b/gcc/config/xtensa/xtensa.opt
@@ -18,6 +18,9 @@
 ; along with GCC; see the file COPYING3.  If not see
 ; <http://www.gnu.org/licenses/>.
 
+HeaderInclude
+config/xtensa/xtensa-opts.h
+
 mconst16
 Target Mask(CONST16)
 Use CONST16 instruction to load constants.
@@ -60,3 +63,31 @@ Use call0 ABI.
 mabi=windowed
 Target RejectNegative Var(xtensa_windowed_abi, 1)
 Use windowed registers ABI.
+
+malways-memw
+Target Mask(ESP32_ALWAYS_MEMBARRIER)
+Always emit a MEMW before a load and after a store operation. Used to
debug memory coherency issues.
+
+mfix-esp32-psram-cache-issue
+Target Mask(ESP32_PSRAM_FIX_ENA)
+Work around a PSRAM cache issue in the ESP32 ECO1 chips.
+
+mfix-esp32-psram-cache-strategy=
+Target RejectNegative JoinedOrMissing Enum(esp32_psram_fix_type)
Var(esp32_psram_fix_strat) Init(ESP32_PSRAM_FIX_MEMW)
+Specify a psram cache fix strategy.
+
+Enum
+Name(esp32_psram_fix_type) Type(enum esp32_psram_fix_type)
+Psram cache fix strategies (for use with -mfix-esp32-psram-cache-
strategy= option):
+
+EnumValue
+Enum(esp32_psram_fix_type) String(dupldst)
Value(ESP32_PSRAM_FIX_DUPLDST)
+Fix esp32 psram cache issue by duplicating stores and non-word loads.
+
+EnumValue
+Enum(esp32_psram_fix_type) String(memw) Value(ESP32_PSRAM_FIX_MEMW)
+Fix esp32 psram cache issue by inserting memory barriers in critical
places. Default workaround.
+
+EnumValue
+Enum(esp32_psram_fix_type) String(nops) Value(ESP32_PSRAM_FIX_NOPS)
+Fix esp32 psram cache issue by inserting NOPs in critical places.
-- 
2.34.1


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-10-12 19:23 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-12 19:23 [PATCH] xtensa: Add workaround for pSRAM cache issue in ESP32 Alexey Lapshin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).