[PATCH 2/2] Aarch64: Add branch diluter pass

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Andrea Corallo <andrea.corallo@arm.com>
To: gcc-patches@gcc.gnu.org
Cc: nd@arm.com, Kyrylo Tkachov <kyrylo.tkachov@arm.com>,
	Richard Earnshaw <richard.earnshaw@arm.com>,
	Richard Sandiford	<richard.sandiford@arm.com>
Subject: [PATCH 2/2] Aarch64: Add branch diluter pass
Date: Wed, 22 Jul 2020 12:09:08 +0200	[thread overview]
Message-ID: <gkrtuxzyi7v.fsf@arm.com> (raw)
In-Reply-To: <gkry2nbyiiu.fsf@arm.com> (Andrea Corallo's message of "Wed, 22 Jul 2020 12:02:33 +0200")

[-- Attachment #1: Type: text/plain, Size: 4219 bytes --]

Hi all,

this second patch implements the AArch64 specific back-end pass
'branch-dilution' controllable by the followings command line options:

-mbranch-dilution

--param=aarch64-branch-dilution-granularity={num}

--param=aarch64-branch-dilution-max-branches={num}

Some cores known to be able to benefit from this pass have been given
default tuning values for their granularity and max-branches.  Each
affected core has a very specific granule size and associated max-branch
limit.  This is a microarchitecture specific optimization.  Typical
usage should be -mbranch-dilution with a specified -mcpu.  Cores with a
granularity tuned to 0 will be ignored. Options are provided for
experimentation.

Observed performance improvements on Neoverse N1 SPEC CPU 2006 where
up to ~+3% (xalancbmk) and ~+1.5% (sjeng).  Average code size increase
for all the testsuite proved to be ~0.4%.

* Algorithm and Heuristic

The pass takes a very simple 'sliding window' approach to the problem.
We crawl through each instruction (starting at the first branch) and
keep track of the number of branches within the current "granule" (or
window).  When this exceeds the max-branch value, the pass will dilute
the current granule, inserting nops to push out some of the branches.
The heuristic will favor unconditonal branches (for performance
reasons), or branches that are between two other branches (in order to
decrease the likelihood of another dilution call being needed).

Each branch type required a different method for nop insertion due to
RTL/basic_block restrictions:

- Returning calls do not end a basic block so can be handled by
  emitting a generic nop.

- Unconditional branches must be the end of a basic block, and nops
  cannot be outside of a basic block.  Thus the need for FILLER_INSN,
  which allows placement outside of a basic block

- and translates to a nop.

- For most conditional branches we've taken a simple approach and only
  handle the fallthru edge for simplicity, which we do by inserting a
  "nop block" of nops on the fallthru edge, mapping that back to the
  original destination block.

- asm gotos and pcsets are going to be tricky to analyze from a
  dilution perspective so are ignored at present.

* Testing

The two patches has been tested together on top of current master on
aarch64-unknown-linux-gnu as follow:

- Successful compilation of 3 stage bootstrap with the
  pass forced on (for stage 2, 3)

- No additional compilation failures (SPEC CPU 2006 and SPEC CPU 2017)

- No 'make check' regressions


Regards

  Andrea

gcc/ChangeLog

2020-07-17  Andrea Corallo  <andrea.corallo@arm.com>
	    Carey Williams  <carey.williams@arm.com>

	* config.gcc (extra_objs): Add aarch64-branch-dilution.o.
	* config/aarch64/aarch64-branch-dilution.c: New file.
	* config/aarch64/aarch64-passes.def (branch-dilution): Register
	pass.
        * config/aarch64/aarch64-protos.h (struct tune_params): Declare
	tuning parameters bdilution_gsize and bdilution_maxb.
        (make_pass_branch_dilution): New declaration.
        * config/aarch64/aarch64.c (generic_tunings, cortexa35_tunings)
        (cortexa53_tunings, cortexa57_tunings, cortexa72_tunings)
        (cortexa73_tunings, exynosm1_tunings, thunderxt88_tunings)
        (thunderx_tunings, tsv110_tunings, xgene1_tunings)
        (qdf24xx_tunings, saphira_tunings, thunderx2t99_tunings)
	(neoversen1_tunings): Provide default tunings for bdilution_gsize
	and bdilution_maxb.
        * config/aarch64/aarch64.md (filler_insn): Define new insn.
        * config/aarch64/aarch64.opt (-mbranch-dilution)
	(--param=aarch64-branch-dilution-granularity)
	(--param=aarch64-branch-dilution-max-branches): Add new options.
        * config/aarch64/t-aarch64 (aarch64-branch-dilution.c): New rule
	for aarch64-branch-dilution.c.
        * doc/invoke.texi (-mbranch-dilution)
	(--param=aarch64-branch-dilution-granularity)
        (--param=aarch64-branch-dilution-max-branches): Document branch
	dilution options.

gcc/testsuite/ChangeLog

2020-07-17  Andrea Corallo  <andrea.corallo@arm.com>
	    Carey Williams  <carey.williams@arm.com>

	* gcc.target/aarch64/branch-dilution-off.c: New file.
	* gcc.target/aarch64/branch-dilution-on.c: New file.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-Aarch64-Add-branch-diluter-pass.patch --]
[-- Type: text/x-diff, Size: 37485 bytes --]

From 386b3a3131d5f03a4c9fb8ee47b321009f17fab5 Mon Sep 17 00:00:00 2001
From: Andrea Corallo <andrea.corallo@arm.com>
Date: Thu, 16 Jul 2020 09:24:33 +0100
Subject: [PATCH 2/2] Aarch64: Add branch diluter pass

gcc/ChangeLog

2020-07-17  Andrea Corallo  <andrea.corallo@arm.com>
	    Carey Williams  <carey.williams@arm.com>

	* config.gcc (extra_objs): Add aarch64-branch-dilution.o.
	* config/aarch64/aarch64-branch-dilution.c: New file.
	* config/aarch64/aarch64-passes.def (branch-dilution): Register
	pass.
        * config/aarch64/aarch64-protos.h (struct tune_params): Declare
	tuning parameters bdilution_gsize and bdilution_maxb.
        (make_pass_branch_dilution): New declaration.
        * config/aarch64/aarch64.c (generic_tunings, cortexa35_tunings)
        (cortexa53_tunings, cortexa57_tunings, cortexa72_tunings)
        (cortexa73_tunings, exynosm1_tunings, thunderxt88_tunings)
        (thunderx_tunings, tsv110_tunings, xgene1_tunings)
        (qdf24xx_tunings, saphira_tunings, thunderx2t99_tunings)
	(neoversen1_tunings): Provide default tunings for bdilution_gsize
	and bdilution_maxb.
        * config/aarch64/aarch64.md (filler_insn): Define new insn.
        * config/aarch64/aarch64.opt (-mbranch-dilution)
	(--param=aarch64-branch-dilution-granularity)
	(--param=aarch64-branch-dilution-max-branches): Add new options.
        * config/aarch64/t-aarch64 (aarch64-branch-dilution.c): New rule
	for aarch64-branch-dilution.c.
        * doc/invoke.texi (-mbranch-dilution)
	(--param=aarch64-branch-dilution-granularity)
        (--param=aarch64-branch-dilution-max-branches): Document branch
	dilution options.

gcc/testsuite/ChangeLog

2020-07-17  Andrea Corallo  <andrea.corallo@arm.com>
	    Carey Williams  <aarey.williams@arm.com>

	* gcc.target/aarch64/branch-dilution-off.c: New file.
	* gcc.target/aarch64/branch-dilution-on.c: New file.
---
 gcc/config.gcc                                |   2 +-
 gcc/config/aarch64/aarch64-branch-dilution.c  | 648 ++++++++++++++++++
 gcc/config/aarch64/aarch64-passes.def         |   1 +
 gcc/config/aarch64/aarch64-protos.h           |   4 +
 gcc/config/aarch64/aarch64.c                  |  34 +
 gcc/config/aarch64/aarch64.md                 |   7 +
 gcc/config/aarch64/aarch64.opt                |  11 +
 gcc/config/aarch64/t-aarch64                  |   6 +
 gcc/doc/invoke.texi                           |  17 +-
 .../gcc.target/aarch64/branch-dilution-off.c  |  57 ++
 .../gcc.target/aarch64/branch-dilution-on.c   |  58 ++
 11 files changed, 843 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-branch-dilution.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/branch-dilution-off.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/branch-dilution-on.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 30b51c3dc81e..0b6b26d973f2 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -321,7 +321,7 @@ aarch64*-*-*)
 	c_target_objs="aarch64-c.o"
 	cxx_target_objs="aarch64-c.o"
 	d_target_objs="aarch64-d.o"
-	extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o aarch64-bti-insert.o"
+	extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o aarch64-bti-insert.o aarch64-branch-dilution.o"
 	target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c \$(srcdir)/config/aarch64/aarch64-sve-builtins.h \$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
 	target_has_targetm_common=yes
 	;;
diff --git a/gcc/config/aarch64/aarch64-branch-dilution.c b/gcc/config/aarch64/aarch64-branch-dilution.c
new file mode 100644
index 000000000000..3535384640fa
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-branch-dilution.c
@@ -0,0 +1,648 @@
+/* Branch dilution optimization pass for AArch64.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   Contributed by Arm Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#define INCLUDE_LIST
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "target.h"
+#include "rtl.h"
+#include "df.h"
+#include "insn-config.h"
+#include "regs.h"
+#include "memmodel.h"
+#include "emit-rtl.h"
+#include "recog.h"
+#include "cfganal.h"
+#include "insn-attr.h"
+#include "context.h"
+#include "tree-pass.h"
+#include "function-abi.h"
+#include "regrename.h"
+#include "aarch64-protos.h"
+#include "cfghooks.h"
+#include "cfgrtl.h"
+#include "cfgbuild.h"
+#include "errors.h"
+#include "diagnostic.h"
+
+namespace {
+
+unsigned max_branch = 0;
+unsigned granule_size = 0;
+
+static bool
+is_branch (rtx_insn *insn)
+{
+  if (insn == NULL)
+    return false;
+
+  return JUMP_P (insn) || CALL_P (insn) || ANY_RETURN_P (insn);
+}
+
+const pass_data
+pass_data_branch_dilution =
+  { RTL_PASS,      /* type.  */
+    "branch-dilution",   /* name.  */
+    OPTGROUP_NONE, /* optinfo_flags.  */
+    TV_NONE,       /* tv_id.  */
+    0,		 /* properties_required.  */
+    0,		 /* properties_provided.  */
+    0,		 /* properties_destroyed.  */
+    0,		 /* todo_flags_start.  */
+    0,		 /* todo_flags_finish.  */
+  };
+
+/* Return true if INSN is a branch insn.  */
+
+class pass_branch_dilution : public rtl_opt_pass
+{
+ public:
+  pass_branch_dilution (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_branch_dilution, ctxt)
+    {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+  {
+    return optimize && aarch64_bdilution;
+  }
+
+  virtual unsigned execute (function *);
+};
+
+/* Simple wrapper for RTX insns added to a granule.
+   It helps aid analysis and manipulation.  */
+struct insn_info
+{
+  insn_info (rtx_insn *);
+  rtx_insn *rtx;		/* underlying gcc rtx insn.  */
+  insn_info *next;		/* next insn in the granule.  */
+  insn_info *prev;		/* prev insn in the granule.  */
+  unsigned index;		/* current position in the granule.  */
+  bool is_branch;		/* denotes a branch insn.  */
+  bool is_unconditional;	/* denotes an unconditonal branch.  */
+  bool is_nop;			/* denotes a nop insn.  */
+  bool ignore;			/* to ignore unsupported branch types.  */
+};
+
+insn_info::insn_info (rtx_insn *i)
+: rtx (i), next (NULL), prev (NULL), index (-1),
+  is_branch (false), is_unconditional (false), is_nop (false),
+  ignore (false)
+{}
+
+/* A 'sliding window' like abstraction that represents
+   the current view of instructions that are being
+   considered for branch dilution.  */
+class insn_granule
+{
+ public:
+  insn_granule ();
+  /* Debug method.  */
+  void dump (const char *desc = "General");
+  /* Attempt to dilute/pad granule with nops.  */
+  int dilute ();
+  /* Returns true if the granule needs diluting.  */
+  bool saturated ();
+  /* Utility functions for inserting instructions into
+     the granule.  */
+  void insert_insn_after (insn_info *, insn_info *);
+  void add_insn (insn_info *);
+ private:
+  void remove_oldest_insn ();
+  void remove_newest_insn ();
+  void update_indexes ();
+  /* Utility functions for handling nop special nop
+     insertion.  */
+  void insert_nop_block_after (insn_info *insn);
+  basic_block create_nop_block_after_insn (insn_info *, int);
+  /* Dilution heuristics.  */
+  insn_info *get_best_branch ();
+  int branch_heuristic (insn_info *);
+
+  /* Pointers to the first/last instructions in granule.  */
+  insn_info *m_first = NULL;
+  insn_info *m_last = NULL;
+
+  /* Current counts of each interesting insn type in the granule.  */
+  unsigned m_insn_count;
+  unsigned m_branch_count;
+  unsigned m_ubranch_count;
+};
+
+insn_granule::insn_granule ()
+: m_insn_count (0), m_branch_count (0), m_ubranch_count (0)
+{}
+
+/* Create a new basic block, populated with nop_count nops, after a given
+   instruction.  */
+
+basic_block
+insn_granule::create_nop_block_after_insn (insn_info *insn, int nop_count)
+{
+  gcc_assert (nop_count > 0);
+  basic_block bb = BLOCK_FOR_INSN (insn->rtx);
+  rtx_insn *nop_ptr = NULL;
+  gcc_assert (is_branch (BB_END (bb)));
+  nop_ptr = emit_insn_after (gen_nop (), BB_END (bb));
+  insn_info *nop_insn = new insn_info (nop_ptr);
+  nop_insn->is_nop = true;
+  insert_insn_after (nop_insn, insn);
+  basic_block new_bb = create_basic_block (nop_ptr, NULL, bb);
+  set_block_for_insn (nop_ptr, new_bb);
+  for (int i = 0; i < (nop_count - 1); i++)
+    {
+      nop_ptr = emit_insn_after (gen_nop (), nop_ptr);
+      nop_insn = new insn_info (nop_ptr);
+      set_block_for_insn (nop_ptr, new_bb);
+      nop_insn->is_nop = true;
+      insert_insn_after (nop_insn, insn);
+    }
+  /* Fix up block endings.  */
+  BB_END (new_bb) = nop_ptr;
+  BB_END (bb) = insn->rtx;
+
+  return new_bb;
+}
+
+/* Dump a textual representation of the current state
+   of the insn_granule in the following format:
+
+   "===== GRANULE ====="
+   "====> {desc} <===="
+   "---> INS:?, BRA:? (?), FIRST: ?, LAST: ?"
+   "0. jump_insn  > NEXT = (?), PREV = (?) -- UID: ?"
+   "1. insn (nop)  > NEXT = (?), PREV = (?) -- UID: ?"
+   "2. insn  > NEXT = (?), PREV = (?) -- UID: ?"
+   "3. jump_insn  > NEXT = (?), PREV = (?) -- UID: ?"
+
+   Used only for debugging with fdump.
+*/
+
+void
+insn_granule::dump (const char *desc)
+{
+  insn_info *insn = m_first;
+  dump_printf (MSG_NOTE, "===== GRANULE =====\n====> %s <====\n", desc);
+  dump_printf (MSG_NOTE, "---> INS:%d, BRA:%d (%d), FIRST: %d, LAST: %d\n",
+	       m_insn_count, m_branch_count, m_ubranch_count,
+	       m_first->index, m_last->index);
+  while (insn)
+    {
+      dump_printf (MSG_NOTE,
+		   "%d. %s%s%s  > NEXT = (%d), PREV = (%d) -- UID: %d\n",
+		   insn->index, GET_RTX_NAME (GET_CODE (insn->rtx)),
+		   any_uncondjump_p (insn->rtx) ? " (ubranch)" : "",
+		   insn->is_nop ? " (nop)" : "",
+		   insn->next ? insn->next->index : -1,
+		   insn->prev ? insn->prev->index : -1, INSN_UID (insn->rtx));
+      insn = insn->next;
+    }
+}
+
+/* Simple heuristic used to favor certain types branches to pad after, for
+   e.g. we prefer unconditional branches or branches surrounded by other
+   branches.  */
+
+int
+insn_granule::branch_heuristic (insn_info *insn)
+{
+  int value = 0;
+  if (!is_branch (insn->rtx))
+    {
+      if (dump_file)
+	dump_printf (MSG_NOTE,
+		     "--> Ignoring insn: %d as it's not a branch\n",
+		     insn->index);
+      return -1;
+    }
+  if (insn->ignore)
+    {
+      if (dump_file)
+	dump_printf (MSG_NOTE,
+		     "--> Ignoring branch insn: %d (unsupported)\n",
+		     insn->index);
+      return -1;
+    }
+  if (insn == m_last)
+    {
+      if (dump_file)
+	dump_printf (MSG_NOTE,
+		     "--> Ignoring insn: %d as it's the last granule insn\n",
+		     insn->index);
+      return -1;
+    }
+  if (insn->is_unconditional)
+    value += 2;
+  if (is_branch (prev_real_nondebug_insn (insn->rtx))
+      && is_branch (next_real_nondebug_insn (insn->rtx)))
+    value++;
+
+  return value;
+}
+
+/* Iterate over the granule and test a heuristic against each insn.
+   Return the insn that seems most promising to pad after, or the
+   first, in a case where no others seem suitable.  */
+
+insn_info *
+insn_granule::get_best_branch ()
+{
+  insn_info *current_insn = m_first;
+  insn_info *best_insn = m_first;
+  int current_score = 0;
+  int best_score = 0;
+  /* Make sure we start with a branch.  */
+  while (current_insn && !current_insn->is_branch)
+    {
+      current_insn = current_insn->next;
+    }
+  best_insn = current_insn;
+  while (current_insn)
+    {
+      current_score = branch_heuristic (current_insn);
+      if (dump_file)
+	dump_printf (MSG_NOTE, "Evaluating insn %d (%s), score = %d\n",
+		     current_insn->index,
+		     GET_RTX_NAME (GET_CODE (current_insn->rtx)),
+		     current_score);
+      if (current_score > best_score)
+	{
+	  best_score = current_score;
+	  best_insn = current_insn;
+	}
+      current_insn = current_insn->next;
+    }
+  if (dump_file)
+    dump_printf (MSG_NOTE, "Returning best insn: %d %s.\n", best_insn->index,
+		 GET_RTX_NAME (GET_CODE (best_insn->rtx)));
+
+  return best_insn;
+}
+
+/* Insert the instruction into the granule, after the given instruction.  */
+
+void
+insn_granule::insert_insn_after (insn_info *new_insn, insn_info *current_insn)
+{
+  gcc_assert (current_insn != m_last);
+  if (dump_file)
+    dump_printf (MSG_NOTE, "Inserting nop after insn %d, at position %d\n",
+		 current_insn->index, current_insn->index + 1);
+  new_insn->index = current_insn->index + 1;
+
+  current_insn->next->prev = new_insn;
+  new_insn->next = current_insn->next;
+  current_insn->next = new_insn;
+  new_insn->prev = current_insn;
+
+  m_insn_count++;
+  update_indexes ();
+}
+
+/* Insert a new basic block containing a nop.  */
+
+void
+insn_granule::insert_nop_block_after (insn_info *insn)
+{
+  edge e;
+  basic_block bb = BLOCK_FOR_INSN (insn->rtx);
+  e = find_fallthru_edge (bb->succs);
+  if (e)
+    {
+      basic_block new_bb = create_nop_block_after_insn (insn, 1);
+      /* Wire up the edges and preserve the partition.  */
+      make_edge (new_bb, e->dest, EDGE_FALLTHRU);
+      BB_COPY_PARTITION (new_bb, bb);
+      redirect_edge_succ_nodup (e, new_bb);
+    }
+  else
+    {
+      /* Odd special case for exit functions, which have no fall-thru
+	 edge.  */
+      insn_info *nop_insn =
+	new insn_info (emit_filler_after (gen_filler_insn (), insn->rtx));
+      nop_insn->is_nop = true;
+      insert_insn_after (nop_insn, insn);
+    }
+}
+
+/* Analyse the granule's branches and insert a nop to dilute.  */
+
+int
+insn_granule::dilute ()
+{
+  if (dump_file)
+    dump_printf (MSG_NOTE, "> Starting dilution.\n");
+  insn_info *branch_insn = NULL;
+  branch_insn = get_best_branch ();
+  /* If the granule is saturated then branches should be available.
+     Inserting nops after the granule isn't going to help dilute it.  */
+  gcc_assert (branch_insn
+	      && (branch_insn->is_branch && (branch_insn != m_last)));
+  if (any_condjump_p (branch_insn->rtx))
+    {
+      /* We can insert a nop via a 'nop block'
+	 attached to the fallthru edge.  */
+      insert_nop_block_after (branch_insn);
+    }
+  else if (CALL_P (branch_insn->rtx))
+    {
+      /* Standard calls do not end the basic block,
+	 so a simple emit will suffice.  */
+      if (!control_flow_insn_p (branch_insn->rtx))
+	{
+	  insn_info *nop_insn = new insn_info
+	    (emit_insn_after (gen_nop (), branch_insn->rtx));
+	  nop_insn->is_nop = true;
+	  insert_insn_after (nop_insn, branch_insn);
+	}
+      else
+	/* Sibling calls and no-return calls do.  */
+	insert_nop_block_after (branch_insn);
+    }
+  else if (returnjump_p (branch_insn->rtx))
+    {
+      /* Return jumps must be followed by their barrier,
+	 so we emit the filler after that.  */
+      insn_info *nop_insn =
+	new insn_info (
+	  emit_filler_after (gen_filler_insn (),
+			     next_nonnote_nondebug_insn (branch_insn->rtx)));
+      nop_insn->is_nop = true;
+      insert_insn_after (nop_insn, branch_insn);
+    }
+  else if (any_uncondjump_p (branch_insn->rtx))
+    {
+      /* Any remaining unconditionals can be handled by a filler.  */
+      insn_info *nop_insn =
+	new insn_info (emit_filler_after (gen_filler_insn (),
+					  branch_insn->rtx));
+      nop_insn->is_nop = true;
+      insert_insn_after (nop_insn, branch_insn);
+    }
+  else if (pc_set (branch_insn->rtx) || JUMP_P (branch_insn->rtx))
+    {
+      /* TODO handle pc_set and asm _goto (s).
+	 For now we'll just pretend they're not branches.  */
+      branch_insn->is_branch = false;
+      m_branch_count--;
+      if (branch_insn->is_unconditional)
+	{
+	  branch_insn->is_unconditional = false;
+	  m_ubranch_count--;
+	}
+      branch_insn->ignore = true;
+    }
+  else
+    {
+      /* Unhandled branch type.  */
+      if (dump_file)
+	{
+	  dump_printf (MSG_NOTE, "Error: unhandled branch type:\n");
+	  print_rtl_single (dump_file, branch_insn->rtx);
+	}
+      gcc_unreachable ();
+    }
+
+  /* Trim the granule back to size after nop insertion.  */
+  int rollback = 0;
+  while (m_insn_count > granule_size)
+    {
+      remove_newest_insn ();
+      rollback++;
+    }
+  if (dump_file)
+    dump_printf (MSG_NOTE, "< End dilution.\n");
+
+  return rollback;
+}
+
+/* Return true if the granule needs diluting.  */
+
+bool
+insn_granule::saturated ()
+{
+  return m_branch_count > max_branch;
+}
+
+/* Update index in each insn_info in the current window.  */
+
+void
+insn_granule::update_indexes ()
+{
+  insn_info *i = m_first;
+  int n = 0;
+  i->index = n++;
+  while ((i = i->next))
+    {
+      i->index = n++;
+    }
+}
+
+/* Add an instruction to the granule.  */
+
+void
+insn_granule::add_insn (insn_info *insn)
+{
+  if (m_insn_count == 0)
+    m_first = m_last = insn;
+  else
+    {
+      if (m_insn_count >= granule_size)
+	remove_oldest_insn ();
+      m_last->next = insn;
+      insn->prev = m_last;
+      m_last = insn;
+    }
+  insn->index = m_insn_count++;
+  if (is_branch (insn->rtx))
+    {
+      m_branch_count++;
+      insn->is_branch = true;
+      if (any_uncondjump_p (insn->rtx))
+	{
+	  insn->is_unconditional = true;
+	  m_ubranch_count++;
+	}
+    }
+  update_indexes ();
+}
+
+/* Remove the last added instruction from the granule.  */
+
+void
+insn_granule::remove_newest_insn ()
+{
+  insn_info *to_delete = m_last;
+  if (dump_file)
+    dump_printf (MSG_NOTE, "to_delete: %d UID: %d\n", to_delete->index,
+	     INSN_UID (to_delete->rtx));
+  m_last = to_delete->prev;
+  m_last->next = NULL;
+  if (is_branch (to_delete->rtx) && !to_delete->ignore)
+    {
+      m_branch_count--;
+      if (any_uncondjump_p (to_delete->rtx))
+	m_ubranch_count--;
+    }
+  m_insn_count--;
+  delete to_delete;
+  update_indexes ();
+}
+
+/* Remove the first added instruction from the granule.  */
+
+void
+insn_granule::remove_oldest_insn ()
+{
+  insn_info *to_delete = m_first;
+  m_first = to_delete->next;
+  m_first->prev = NULL;
+  if (is_branch (to_delete->rtx) && !to_delete->ignore)
+    {
+      m_branch_count--;
+      if (any_uncondjump_p (to_delete->rtx))
+	m_ubranch_count--;
+    }
+  m_insn_count--;
+  delete to_delete;
+  update_indexes ();
+}
+
+/* Return the next instruction after START that is a "real" instruction
+   i.e. not barriers, code labels, debug insns etc.  */
+
+static rtx_insn *
+next_space_consuming_insn (rtx_insn *start)
+{
+  rtx_insn *insn = next_real_nondebug_insn (start);
+  while (insn && (recog_memoized (insn) < 0))
+    insn = next_real_nondebug_insn (insn);
+  return insn;
+}
+
+/* Find the next branching insn in the instruction stream starting
+   from, excluding START.  Return that insn or NULL if none is found.
+   Write in DIST the distance between the result and START
+   (in instructions).  */
+
+rtx_insn *
+find_next_branch (rtx_insn *start)
+{
+  rtx_insn *insn = next_space_consuming_insn (start);
+  while (insn)
+    {
+      if (is_branch (insn))
+	return insn;
+      insn = next_space_consuming_insn (insn);
+    }
+
+  return NULL;
+}
+
+/* Execute the branch dilution pass.  */
+
+unsigned int
+pass_branch_dilution::execute (ATTRIBUTE_UNUSED function *fun)
+{
+  granule_size = global_options_set.x_aarch64_bdilution_gsize
+    ? (unsigned)aarch64_bdilution_gsize
+    : aarch64_tune_params.bdilution_gsize;
+
+  max_branch = global_options_set.x_aarch64_bdilution_maxb
+    ? (unsigned)aarch64_bdilution_maxb
+    : aarch64_tune_params.bdilution_maxb;
+
+  if (dump_file)
+    dump_printf (MSG_NOTE, "BDILUTE OPTIONS: %d, %d\n", granule_size, max_branch);
+
+  /* Disabled.  */
+  if (granule_size < 1)
+    return 0;
+  /* Invalid.  */
+  if (max_branch < 1)
+    sorry ("branch dilution: max branch must be greater than zero");
+  else if (max_branch >= granule_size)
+    sorry ("branch dilution: max branches (%d) must be "
+	   "less than granule size (%d)", max_branch, granule_size);
+
+  if (dump_file)
+    dump_printf (MSG_NOTE, "branch-dilution start:\n");
+
+  /* Start scanning from the first occuring branch.  */
+  rtx_insn *curr_insn = find_next_branch (get_insns ());
+
+  if (!curr_insn)
+    return 0;
+
+  insn_granule granule;
+  granule.add_insn (new insn_info (curr_insn));
+
+  if (dump_file)
+    granule.dump ();
+
+  /* Iterate over the rest of the instruction stream.  */
+  while ((curr_insn = next_nonnote_nondebug_insn (curr_insn)))
+    {
+      if (!LABEL_P (curr_insn) && !BARRIER_P (curr_insn))
+	{
+	  granule.add_insn (new insn_info (curr_insn));
+	  if (dump_file)
+	    granule.dump ();
+	}
+      else
+	{
+	  /* Only check for saturation if changes have been made.  */
+	  continue;
+	}
+      if (granule.saturated ())
+	{
+	  if (dump_file)
+	    dump_printf (MSG_NOTE, "Granule is saturated.\n");
+	  int rollback = granule.dilute ();
+	  /* Now we need to re-scan any pushed out instructions.  */
+	  while (rollback > 0)
+	    {
+	      curr_insn = previous_insn (curr_insn);
+	      rollback--;
+	    }
+	  if (dump_file)
+	    granule.dump ();
+	}
+    }
+  if (dump_file)
+    dump_printf (MSG_NOTE, "branch-dilution end.\n");
+  return 0;
+}
+
+} // anon namespace
+
+/* Create the branch dilution pass.  */
+
+rtl_opt_pass *
+make_pass_branch_dilution (gcc::context *ctxt)
+{
+  return new pass_branch_dilution (ctxt);
+}
diff --git a/gcc/config/aarch64/aarch64-passes.def b/gcc/config/aarch64/aarch64-passes.def
index 223f78590568..9383858020af 100644
--- a/gcc/config/aarch64/aarch64-passes.def
+++ b/gcc/config/aarch64/aarch64-passes.def
@@ -21,4 +21,5 @@
 INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);
 INSERT_PASS_BEFORE (pass_reorder_blocks, 1, pass_track_speculation);
 INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
+INSERT_PASS_BEFORE (pass_compute_alignments, 1, pass_branch_dilution);
 INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti);
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 839f801a31b8..d02d2882df4c 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -267,6 +267,9 @@ struct tune_params
   int vec_reassoc_width;
   int min_div_recip_mul_sf;
   int min_div_recip_mul_df;
+  /* Branch dilution.  */
+  unsigned int bdilution_gsize;
+  unsigned int bdilution_maxb;
   /* Value for aarch64_case_values_threshold; or 0 for the default.  */
   unsigned int max_case_values;
 /* An enum specifying how to take into account CPU autoprefetch capabilities
@@ -761,6 +764,7 @@ rtl_opt_pass *make_pass_fma_steering (gcc::context *);
 rtl_opt_pass *make_pass_track_speculation (gcc::context *);
 rtl_opt_pass *make_pass_tag_collision_avoidance (gcc::context *);
 rtl_opt_pass *make_pass_insert_bti (gcc::context *ctxt);
+rtl_opt_pass *make_pass_branch_dilution (gcc::context *ctxt);
 
 poly_uint64 aarch64_regmode_natural_size (machine_mode);
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6ef2e397d393..23f98d471b32 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -888,6 +888,8 @@ static const struct tune_params generic_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* bdilution_gsize.  */
+  0,	/* bdilution_maxb.  */
   0,	/* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
@@ -915,6 +917,8 @@ static const struct tune_params cortexa35_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* bdilution_gsize.  */
+  0,	/* bdilution_maxb.  */
   0,	/* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
@@ -942,6 +946,8 @@ static const struct tune_params cortexa53_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* bdilution_gsize.  */
+  0,	/* bdilution_maxb.  */
   0,	/* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
@@ -969,6 +975,8 @@ static const struct tune_params cortexa57_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* bdilution_gsize.  */
+  0,	/* bdilution_maxb.  */
   0,	/* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS),	/* tune_flags.  */
@@ -996,6 +1004,8 @@ static const struct tune_params cortexa72_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  4,	/* bdilution_gsize.  */
+  2,	/* bdilution_maxb.  */
   0,	/* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
@@ -1023,6 +1033,8 @@ static const struct tune_params cortexa73_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* bdilution_gsize.  */
+  0,	/* bdilution_maxb.  */
   0,	/* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
@@ -1051,6 +1063,8 @@ static const struct tune_params exynosm1_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* bdilution_gsize.  */
+  0,	/* bdilution_maxb.  */
   48,	/* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
@@ -1077,6 +1091,8 @@ static const struct tune_params thunderxt88_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* bdilution_gsize.  */
+  0,	/* bdilution_maxb.  */
   0,	/* max_case_values.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW),	/* tune_flags.  */
@@ -1103,6 +1119,8 @@ static const struct tune_params thunderx_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* bdilution_gsize.  */
+  0,	/* bdilution_maxb.  */
   0,	/* max_case_values.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW
@@ -1131,6 +1149,8 @@ static const struct tune_params tsv110_tunings =
   1,    /* vec_reassoc_width.  */
   2,    /* min_div_recip_mul_sf.  */
   2,    /* min_div_recip_mul_df.  */
+  0,	/* bdilution_gsize.  */
+  0,	/* bdilution_maxb.  */
   0,    /* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,     /* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE),     /* tune_flags.  */
@@ -1158,6 +1178,8 @@ static const struct tune_params xgene1_tunings =
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
   17,	/* max_case_values.  */
+  0,	/* bdilution_gsize.  */
+  0,	/* bdilution_maxb.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),	/* tune_flags.  */
   &xgene1_prefetch_tune
@@ -1183,6 +1205,8 @@ static const struct tune_params emag_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* bdilution_gsize.  */
+  0,	/* bdilution_maxb.  */
   17,	/* max_case_values.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),	/* tune_flags.  */
@@ -1210,6 +1234,8 @@ static const struct tune_params qdf24xx_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* bdilution_gsize.  */
+  0,	/* bdilution_maxb.  */
   0,	/* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS, /* tune_flags.  */
@@ -1239,6 +1265,8 @@ static const struct tune_params saphira_tunings =
   1,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* bdilution_gsize.  */
+  0,	/* bdilution_maxb.  */
   0,	/* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE),		/* tune_flags.  */
@@ -1266,6 +1294,8 @@ static const struct tune_params thunderx2t99_tunings =
   2,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* bdilution_gsize.  */
+  0,	/* bdilution_maxb.  */
   0,	/* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
@@ -1293,6 +1323,8 @@ static const struct tune_params thunderx3t110_tunings =
   2,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  0,	/* bdilution_gsize.  */
+  0,	/* bdilution_maxb.  */
   0,	/* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
@@ -1319,6 +1351,8 @@ static const struct tune_params neoversen1_tunings =
   2,	/* vec_reassoc_width.  */
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
+  8,	/* bdilution_gsize.  */
+  4,	/* bdilution_maxb.  */
   0,	/* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE),	/* tune_flags.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index d5ca1898c02e..1bee7df74048 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -899,6 +899,13 @@
    (set_attr "sls_length" "retbr")]
 )
 
+(define_insn "filler_insn"
+  [(filler_insn)]
+  ""
+  "nop"
+  [(set_attr "type" "no_insn")
+   (set_attr "length" "4")])
+
 (define_insn "*cb<optab><mode>1"
   [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r")
 				(const_int 0))
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 5170361fd5e5..cd2cb5715da4 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -164,6 +164,10 @@ msign-return-address=
 Target WarnRemoved RejectNegative Joined Enum(aarch64_ra_sign_scope_t) Var(aarch64_ra_sign_scope) Init(AARCH64_FUNCTION_NONE) Save
 Select return address signing scope.
 
+mbranch-dilution
+Target Report RejectNegative Save Var(aarch64_bdilution) Init(0) Save
+Run the branch dilution pass.
+
 Enum
 Name(aarch64_ra_sign_scope_t) Type(enum aarch64_function_type)
 Supported AArch64 return address signing scope (for use with -msign-return-address= option):
@@ -275,3 +279,10 @@ The number of Newton iterations for calculating the reciprocal for float type.
 Target Joined UInteger Var(aarch64_double_recp_precision) Init(2) IntegerRange(1, 5) Param
 The number of Newton iterations for calculating the reciprocal for double type.  The precision of division is proportional to this param when division approximation is enabled.  The default value is 2.
 
+-param=aarch64-branch-dilution-granularity=
+Target RejectNegative Joined UInteger Var(aarch64_bdilution_gsize) Init(0)
+Size of instruction stream granules (window).
+
+-param=aarch64-branch-dilution-max-branches=
+Target RejectNegative Joined UInteger Var(aarch64_bdilution_maxb) Init(0)
+Max number of branches that should appear within an instruction granule.
diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
index 11d20b7be140..23f4d417b246 100644
--- a/gcc/config/aarch64/t-aarch64
+++ b/gcc/config/aarch64/t-aarch64
@@ -158,6 +158,12 @@ aarch64-bti-insert.o: $(srcdir)/config/aarch64/aarch64-bti-insert.c \
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
 		$(srcdir)/config/aarch64/aarch64-bti-insert.c
 
+aarch64-branch-dilution.o: $(srcdir)/config/aarch64/aarch64-branch-dilution.c \
+    $(CONFIG_H) $(SYSTEM_H) $(RTL_BASE_H) \
+    $(srcdir)/config/aarch64/aarch64-protos.h
+	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+		$(srcdir)/config/aarch64/aarch64-branch-dilution.c
+
 comma=,
 MULTILIB_OPTIONS    = $(subst $(comma),/, $(patsubst %, mabi=%, $(subst $(comma),$(comma)mabi=,$(TM_MULTILIB_CONFIG))))
 MULTILIB_DIRNAMES   = $(subst $(comma), ,$(TM_MULTILIB_CONFIG))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 825fd669a757..739446a19e3b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -705,7 +705,7 @@ Objective-C and Objective-C++ Dialects}.
 -moverride=@var{string}  -mverbose-cost-dump @gol
 -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{sysreg} @gol
 -mstack-protector-guard-offset=@var{offset} -mtrack-speculation @gol
--moutline-atomics }
+-moutline-atomics -mbranch-dilution @gol }
 
 @emph{Adapteva Epiphany Options}
 @gccoptlist{-mhalf-reg-file  -mprefer-short-insn-regs @gol
@@ -13604,6 +13604,16 @@ The number of Newton iterations for calculating the reciprocal for double type.
 The precision of division is propotional to this param when division
 approximation is enabled.  The default value is 2.
 
+@item aarch64-branch-dilution-granularity
+Specify the size of the granules (instruction windows) to be considered
+for branch dilution.  When omitted, the tuning for the specified @option{-mcpu}
+will be used.
+
+@item aarch64-branch-dilution-max-branches
+Specify the amount of branches a granule may contain before it is considered
+saturated, requiring branch dilution.  When omitted, the tuning for the
+specified @option{-mcpu} will be used.
+
 @end table
 
 @end table
@@ -17425,6 +17435,11 @@ functions.  The optional argument @samp{b-key} can be used to sign the functions
 with the B-key instead of the A-key.
 @samp{bti} turns on branch target identification mechanism.
 
+@item -mbranch-dilution
+@opindex mbranch-dilution
+Enable the branch dilution optimization pass to improve performance when
+expecting high branch density.
+
 @item -mharden-sls=@var{opts}
 @opindex mharden-sls
 Enable compiler hardening against straight line speculation (SLS).
diff --git a/gcc/testsuite/gcc.target/aarch64/branch-dilution-off.c b/gcc/testsuite/gcc.target/aarch64/branch-dilution-off.c
new file mode 100644
index 000000000000..46c771bc3889
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/branch-dilution-off.c
@@ -0,0 +1,57 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -mcpu=cortex-a72 --param case-values-threshold=50" } */
+/* { dg-final { scan-assembler-not  "\\s*b.*\n\\snop\n" } } */
+#include <stdlib.h>
+
+void
+branch (int* n)
+{
+  while ((*n % 2 != 0))
+    *n = *n * *n;
+}
+
+int
+main ()
+{
+  int *i = malloc (sizeof (int));
+  *i = 5;
+  while (*i < 1000) {
+    switch (*i) {
+      case 1:
+        *i += 1;
+        branch (i);
+	break;
+      case 2:
+        *i += 2;
+        branch (i);
+	break;
+      case 3:
+        *i += 3;
+	branch (i);
+	break;
+      case 4:
+        *i += 4;
+        branch (i);
+	break;
+      case 5:
+        *i += 5;
+	branch (i);
+	break;
+      case 6:
+        *i += 6;
+        branch (i);
+        break;
+      case 7:
+        *i += 7;
+        branch (i);
+	break;
+      case 8:
+        *i += 8;
+        branch (i);
+	break;
+      default:
+        branch (i);
+    }
+  }
+  return *i;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/branch-dilution-on.c b/gcc/testsuite/gcc.target/aarch64/branch-dilution-on.c
new file mode 100644
index 000000000000..25d5c60e38c3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/branch-dilution-on.c
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -mbranch-dilution -mcpu=cortex-a72 --param case-values-threshold=50 -fdump-rtl-branch-dilution" } */
+/* { dg-final { scan-assembler  "\\s*b.*\n\\snop\n" } } */
+/* { dg-final { scan-rtl-dump "filler_insn" "branch-dilution"} } */
+#include <stdlib.h>
+
+void
+branch (int* n)
+{
+  while ((*n % 2 != 0))
+    *n = *n * *n;
+}
+
+int
+main ()
+{
+  int *i = malloc (sizeof (int));
+  *i = 5;
+  while (*i < 1000) {
+    switch (*i) {
+      case 1:
+        *i += 1;
+        branch (i);
+	break;
+      case 2:
+        *i += 2;
+        branch (i);
+	break;
+      case 3:
+        *i += 3;
+	branch (i);
+	break;
+      case 4:
+        *i += 4;
+        branch (i);
+	break;
+      case 5:
+        *i += 5;
+	branch (i);
+	break;
+      case 6:
+        *i += 6;
+        branch (i);
+        break;
+      case 7:
+        *i += 7;
+        branch (i);
+	break;
+      case 8:
+        *i += 8;
+        branch (i);
+	break;
+      default:
+        branch (i);
+    }
+  }
+  return *i;
+}
-- 
2.17.1

next prev parent reply	other threads:[~2020-07-22 10:09 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-22 10:02 [PATCH 1/2] Add new RTX instruction class FILLER_INSN Andrea Corallo
2020-07-22 10:09 ` Andrea Corallo [this message]
2020-07-22 10:39   ` [PATCH 2/2] Aarch64: Add branch diluter pass Andrew Pinski
2020-07-22 13:53     ` Andrea Corallo
2020-07-22 16:43       ` Segher Boessenkool
2020-07-22 19:45         ` Andrea Corallo
2020-07-23 22:47           ` Segher Boessenkool
2020-07-24  7:01             ` Andrea Corallo
2020-07-24 11:53               ` Segher Boessenkool
2020-07-24 13:21                 ` Andrea Corallo
2020-07-24 22:09   ` Segher Boessenkool
2020-07-28 18:55     ` Andrea Corallo
2020-07-28 22:07       ` Segher Boessenkool
2020-07-22 12:24 ` [PATCH 1/2] Add new RTX instruction class FILLER_INSN Richard Biener
2020-07-22 13:16   ` Richard Earnshaw (lists)
2020-07-22 14:51   ` Andrea Corallo
2020-07-22 18:41 ` Joseph Myers
2020-07-24 21:18 ` Segher Boessenkool
2020-07-26 18:19   ` Eric Botcazou
2020-07-28 19:29   ` Andrea Corallo
2020-08-19  9:13   ` Andrea Corallo
2020-08-19 10:52     ` Richard Sandiford
2020-08-19 17:28     ` Segher Boessenkool
2020-08-19 16:51 ` Segher Boessenkool
2020-08-19 17:47   ` Andrea Corallo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=gkrtuxzyi7v.fsf@arm.com \
    --to=andrea.corallo@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=kyrylo.tkachov@arm.com \
    --cc=nd@arm.com \
    --cc=richard.earnshaw@arm.com \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).