1.76% performance loss in VRP due to inlining

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: Aldy Hernandez <aldyh@redhat.com>
To: GCC Mailing List <gcc@gcc.gnu.org>
Cc: "MacLeod, Andrew" <amacleod@redhat.com>
Subject: 1.76% performance loss in VRP due to inlining
Date: Fri, 26 Apr 2024 11:42:56 +0200	[thread overview]
Message-ID: <CAGm3qMVXdshX8T+ioAmEf2=5D8tzCHtZrmhzs0xcSTdUo0qCFg@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 1231 bytes --]

Hi folks!

In implementing prange (pointer ranges), I have found a 1.74% slowdown
in VRP, even without any code path actually using the code.  I have
tracked this down to irange::get_bitmask() being compiled differently
with and without the bare bones patch.  With the patch,
irange::get_bitmask() has a lot of code inlined into it, particularly
get_bitmask_from_range() and consequently the wide_int_storage code.

I don't know whether this is expected behavior, and if it is, how to
mitigate it.  I have tried declaring get_bitmask_from_range() inline,
but that didn't help.  OTOH, using __attribute__((always_inline))
helps a bit, but not entirely.  What does help is inlining
irange::get_bitmask() entirely, but that seems like a big hammer.

The overall slowdown in compilation is 0.26%, because VRP is a
relatively fast pass, but a measurable pass slowdown is something we'd
like to avoid.

What's the recommended approach here?

For the curious, I am attaching before and after copies of
value-range.s.  I am also attaching the two patches needed to
reproduce the problem on mainline.  The first patch is merely setup.
It is the second patch that exhibits the problem.  Notice there are no
uses of prange yet.

Thanks.
Aldy

[-- Attachment #2: 0001-Move-get_bitmask_from_range-out-of-irange-class.patch --]
[-- Type: text/x-patch, Size: 2802 bytes --]

From ee63833c5f56064ef47c2bb9debd485f77d00171 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez <aldyh@redhat.com>
Date: Tue, 19 Mar 2024 18:04:55 +0100
Subject: [PATCH] Move get_bitmask_from_range out of irange class.

---
 gcc/value-range.cc | 52 +++++++++++++++++++++++-----------------------
 gcc/value-range.h  |  1 -
 2 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 70375f7abf9..0f81ce32615 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -31,6 +31,30 @@ along with GCC; see the file COPYING3.  If not see
 #include "fold-const.h"
 #include "gimple-range.h"
 
+// Return the bitmask inherent in a range.
+
+static irange_bitmask
+get_bitmask_from_range (tree type,
+			const wide_int &min, const wide_int &max)
+{
+  unsigned prec = TYPE_PRECISION (type);
+
+  // All the bits of a singleton are known.
+  if (min == max)
+    {
+      wide_int mask = wi::zero (prec);
+      wide_int value = min;
+      return irange_bitmask (value, mask);
+    }
+
+  wide_int xorv = min ^ max;
+
+  if (xorv != 0)
+    xorv = wi::mask (prec - wi::clz (xorv), false, prec);
+
+  return irange_bitmask (wi::zero (prec), min | xorv);
+}
+
 void
 irange::accept (const vrange_visitor &v) const
 {
@@ -1832,31 +1856,6 @@ irange::invert ()
     verify_range ();
 }
 
-// Return the bitmask inherent in the range.
-
-irange_bitmask
-irange::get_bitmask_from_range () const
-{
-  unsigned prec = TYPE_PRECISION (type ());
-  wide_int min = lower_bound ();
-  wide_int max = upper_bound ();
-
-  // All the bits of a singleton are known.
-  if (min == max)
-    {
-      wide_int mask = wi::zero (prec);
-      wide_int value = lower_bound ();
-      return irange_bitmask (value, mask);
-    }
-
-  wide_int xorv = min ^ max;
-
-  if (xorv != 0)
-    xorv = wi::mask (prec - wi::clz (xorv), false, prec);
-
-  return irange_bitmask (wi::zero (prec), min | xorv);
-}
-
 // Remove trailing ranges that this bitmask indicates can't exist.
 
 void
@@ -1978,7 +1977,8 @@ irange::get_bitmask () const
   // in the mask.
   //
   // See also the note in irange_bitmask::intersect.
-  irange_bitmask bm = get_bitmask_from_range ();
+  irange_bitmask bm
+    = get_bitmask_from_range (type (), lower_bound (), upper_bound ());
   if (!m_bitmask.unknown_p ())
     bm.intersect (m_bitmask);
   return bm;
diff --git a/gcc/value-range.h b/gcc/value-range.h
index 9531df56988..dc5b153a83e 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -351,7 +351,6 @@ private:
   bool varying_compatible_p () const;
   bool intersect_bitmask (const irange &r);
   bool union_bitmask (const irange &r);
-  irange_bitmask get_bitmask_from_range () const;
   bool set_range_from_bitmask ();
 
   bool intersect (const wide_int& lb, const wide_int& ub);
-- 
2.44.0


[-- Attachment #3: 0002-Implement-minimum-prange-class-exhibiting-VRP-slowdo.patch --]
[-- Type: text/x-patch, Size: 13611 bytes --]

From 03c70de43177a97ec5e9c243aafc798c1f37e6d8 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez <aldyh@redhat.com>
Date: Wed, 20 Mar 2024 06:25:52 +0100
Subject: [PATCH] Implement minimum prange class exhibiting VRP slowdown.

---
 gcc/value-range-pretty-print.cc |  25 +++
 gcc/value-range-pretty-print.h  |   1 +
 gcc/value-range.cc              | 274 ++++++++++++++++++++++++++++++++
 gcc/value-range.h               | 196 +++++++++++++++++++++++
 4 files changed, 496 insertions(+)

diff --git a/gcc/value-range-pretty-print.cc b/gcc/value-range-pretty-print.cc
index c75cbea3955..154253e047f 100644
--- a/gcc/value-range-pretty-print.cc
+++ b/gcc/value-range-pretty-print.cc
@@ -113,6 +113,31 @@ vrange_printer::print_irange_bitmasks (const irange &r) const
   pp_string (pp, p);
 }
 
+void
+vrange_printer::visit (const prange &r) const
+{
+  pp_string (pp, "[prange] ");
+  if (r.undefined_p ())
+    {
+      pp_string (pp, "UNDEFINED");
+      return;
+    }
+  dump_generic_node (pp, r.type (), 0, TDF_NONE | TDF_NOUID, false);
+  pp_character (pp, ' ');
+  if (r.varying_p ())
+    {
+      pp_string (pp, "VARYING");
+      return;
+    }
+
+  pp_character (pp, '[');
+  //print_int_bound (pp, r.lower_bound (), r.type ());
+  pp_string (pp, ", ");
+  //print_int_bound (pp, r.upper_bound (), r.type ());
+  pp_character (pp, ']');
+  //print_irange_bitmasks (pp, r.m_bitmask);
+}
+
 void
 vrange_printer::print_real_value (tree type, const REAL_VALUE_TYPE &r) const
 {
diff --git a/gcc/value-range-pretty-print.h b/gcc/value-range-pretty-print.h
index ca85fd6157c..54ee0cf8c26 100644
--- a/gcc/value-range-pretty-print.h
+++ b/gcc/value-range-pretty-print.h
@@ -27,6 +27,7 @@ public:
   vrange_printer (pretty_printer *pp_) : pp (pp_) { }
   void visit (const unsupported_range &) const override;
   void visit (const irange &) const override;
+  void visit (const prange &) const override;
   void visit (const frange &) const override;
 private:
   void print_irange_bound (const wide_int &w, tree type) const;
diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 0f81ce32615..06ab1a616bf 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -377,6 +377,280 @@ irange::set_nonnegative (tree type)
        wi::to_wide (TYPE_MAX_VALUE (type)));
 }
 
+// Prange implementation.
+
+void
+prange::accept (const vrange_visitor &v) const
+{
+  v.visit (*this);
+}
+
+void
+prange::set_nonnegative (tree type)
+{
+  set (type,
+       wi::zero (TYPE_PRECISION (type)),
+       wi::max_value (TYPE_PRECISION (type), UNSIGNED));
+}
+
+void
+prange::set (tree min, tree max, value_range_kind kind)
+{
+  return set (TREE_TYPE (min), wi::to_wide (min), wi::to_wide (max), kind);
+}
+
+void
+prange::set (tree type, const wide_int &min, const wide_int &max,
+	     value_range_kind kind)
+{
+  if (kind == VR_UNDEFINED)
+    {
+      set_undefined ();
+      return;
+    }
+  if (kind == VR_VARYING)
+    {
+      set_varying (type);
+      return;
+    }
+  if (kind == VR_ANTI_RANGE)
+    {
+      gcc_checking_assert (min == 0 && max == 0);
+      set_nonzero (type);
+      return;
+    }
+  m_type = type;
+  m_min = min;
+  m_max = max;
+  if (m_min == 0 && m_max == -1)
+    {
+      m_kind = VR_VARYING;
+      m_bitmask.set_unknown (TYPE_PRECISION (type));
+      if (flag_checking)
+	verify_range ();
+      return;
+    }
+
+  m_kind = VR_RANGE;
+  m_bitmask = get_bitmask_from_range (type, min, max);
+  if (flag_checking)
+    verify_range ();
+}
+
+bool
+prange::contains_p (const wide_int &w) const
+{
+  if (undefined_p ())
+    return false;
+
+  if (varying_p ())
+    return true;
+
+  return (wi::le_p (lower_bound (), w, UNSIGNED)
+	  && wi::ge_p (upper_bound (), w, UNSIGNED));
+}
+
+bool
+prange::singleton_p (tree *result) const
+{
+  if (m_kind == VR_RANGE && lower_bound () == upper_bound ())
+    {
+      if (result)
+	*result = wide_int_to_tree (type (), m_min);
+      return true;
+    }
+  return false;
+}
+
+bool
+prange::union_ (const vrange &v)
+{
+  const prange &r = as_a <prange> (v);
+
+  if (r.undefined_p ())
+    return false;
+  if (undefined_p ())
+    {
+      *this = r;
+      if (flag_checking)
+	verify_range ();
+      return true;
+    }
+  if (varying_p ())
+    return false;
+  if (r.varying_p ())
+    {
+      set_varying (type ());
+      return true;
+    }
+
+  wide_int new_lb = wi::min (r.lower_bound (), lower_bound (), UNSIGNED);
+  wide_int new_ub = wi::max (r.upper_bound (), upper_bound (), UNSIGNED);
+  prange new_range (type (), new_lb, new_ub);
+  new_range.m_bitmask.union_ (m_bitmask);
+  new_range.m_bitmask.union_ (r.m_bitmask);
+  if (new_range.varying_compatible_p ())
+    {
+      set_varying (type ());
+      return true;
+    }
+  if (flag_checking)
+    new_range.verify_range ();
+  if (new_range == *this)
+    return false;
+  *this = new_range;
+  return true;
+}
+
+bool
+prange::intersect (const vrange &v)
+{
+  const prange &r = as_a <prange> (v);
+  gcc_checking_assert (undefined_p () || r.undefined_p ()
+		       || range_compatible_p (type (), r.type ()));
+
+  if (undefined_p ())
+    return false;
+  if (r.undefined_p ())
+    {
+      set_undefined ();
+      return true;
+    }
+  if (r.varying_p ())
+    return false;
+  if (varying_p ())
+    {
+      *this = r;
+      return true;
+    }
+
+  prange save = *this;
+  m_min = wi::max (r.lower_bound (), lower_bound (), UNSIGNED);
+  m_max = wi::min (r.upper_bound (), upper_bound (), UNSIGNED);
+  if (wi::gt_p (m_min, m_max, UNSIGNED))
+    {
+      set_undefined ();
+      return true;
+    }
+
+  // Intersect all bitmasks: the old one, the new one, and the other operand's.
+  irange_bitmask new_bitmask = get_bitmask_from_range (m_type, m_min, m_max);
+  m_bitmask.intersect (new_bitmask);
+  m_bitmask.intersect (r.m_bitmask);
+
+  if (flag_checking)
+    verify_range ();
+  if (*this == save)
+    return false;
+  return true;
+}
+
+prange &
+prange::operator= (const prange &src)
+{
+  m_type = src.m_type;
+  m_kind = src.m_kind;
+  m_min = src.m_min;
+  m_max = src.m_max;
+  m_bitmask = src.m_bitmask;
+  if (flag_checking)
+    verify_range ();
+  return *this;
+}
+
+bool
+prange::operator== (const prange &src) const
+{
+  if (m_kind == src.m_kind)
+    {
+      if (undefined_p ())
+	return true;
+
+      if (varying_p ())
+	return types_compatible_p (type (), src.type ());
+
+      return (m_min == src.m_min && m_max == src.m_max
+	      && m_bitmask == src.m_bitmask);
+    }
+  return false;
+}
+
+void
+prange::invert ()
+{
+  gcc_checking_assert (!undefined_p () && !varying_p ());
+
+  wide_int new_lb, new_ub;
+  unsigned prec = TYPE_PRECISION (type ());
+  wide_int type_min = wi::zero (prec);
+  wide_int type_max = wi::max_value (prec, UNSIGNED);
+  wi::overflow_type ovf;
+
+  if (lower_bound () == type_min)
+    {
+      new_lb = wi::add (upper_bound (), 1, UNSIGNED, &ovf);
+      if (ovf)
+	new_lb = type_min;
+      new_ub = type_max;
+      set (type (), new_lb, new_ub);
+    }
+  else if (upper_bound () == type_max)
+    {
+      wi::overflow_type ovf;
+      new_lb = type_min;
+      new_ub = wi::sub (lower_bound (), 1, UNSIGNED, &ovf);
+      if (ovf)
+	new_ub = type_max;
+      set (type (), new_lb, new_ub);
+    }
+  else
+    set_varying (type ());
+}
+
+void
+prange::verify_range () const
+{
+  gcc_checking_assert (m_discriminator == VR_PRANGE);
+
+  if (m_kind == VR_UNDEFINED)
+    return;
+
+  gcc_checking_assert (supports_p (type ()));
+
+  if (m_kind == VR_VARYING)
+    {
+      gcc_checking_assert (varying_compatible_p ());
+      return;
+    }
+  gcc_checking_assert (!varying_compatible_p ());
+  gcc_checking_assert (m_kind == VR_RANGE);
+}
+
+void
+prange::update_bitmask (const irange_bitmask &bm)
+{
+  gcc_checking_assert (!undefined_p ());
+
+  // If all the bits are known, this is a singleton.
+  if (bm.mask () == 0)
+    {
+      set (type (), m_bitmask.value (), m_bitmask.value ());
+      return;
+    }
+
+  // Drop VARYINGs with known bits to a plain range.
+  if (m_kind == VR_VARYING && !bm.unknown_p ())
+    m_kind = VR_RANGE;
+
+  m_bitmask = bm;
+  if (varying_compatible_p ())
+    m_kind = VR_VARYING;
+
+  if (flag_checking)
+    verify_range ();
+}
+
+
 void
 frange::accept (const vrange_visitor &v) const
 {
diff --git a/gcc/value-range.h b/gcc/value-range.h
index dc5b153a83e..9fac89a2f98 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -47,6 +47,8 @@ enum value_range_discriminator
 {
   // Range holds an integer or pointer.
   VR_IRANGE,
+  // Pointer range.
+  VR_PRANGE,
   // Floating point range.
   VR_FRANGE,
   // Range holds an unsupported type.
@@ -389,6 +391,54 @@ private:
   wide_int m_ranges[N*2];
 };
 
+class prange : public vrange
+{
+  friend class prange_storage;
+  friend class vrange_printer;
+public:
+  prange ();
+  prange (const prange &);
+  prange (tree type);
+  prange (tree type, const wide_int &, const wide_int &,
+	  value_range_kind = VR_RANGE);
+  static bool supports_p (const_tree type);
+  virtual bool supports_type_p (const_tree type) const final override;
+  virtual void accept (const vrange_visitor &v) const final override;
+  virtual void set_undefined () final override;
+  virtual void set_varying (tree type) final override;
+  virtual void set_nonzero (tree type) final override;
+  virtual void set_zero (tree type) final override;
+  virtual void set_nonnegative (tree type) final override;
+  virtual bool contains_p (tree cst) const final override;
+  virtual bool fits_p (const vrange &v) const final override;
+  virtual bool singleton_p (tree *result = NULL) const final override;
+  virtual bool zero_p () const final override;
+  virtual bool nonzero_p () const final override;
+  virtual void set (tree, tree, value_range_kind = VR_RANGE) final override;
+  virtual tree type () const final override;
+  virtual bool union_ (const vrange &v) final override;
+  virtual bool intersect (const vrange &v) final override;
+
+  prange& operator= (const prange &);
+  bool operator== (const prange &) const;
+  void set (tree type, const wide_int &, const wide_int &,
+	    value_range_kind = VR_RANGE);
+  void invert ();
+  bool contains_p (const wide_int &) const;
+  wide_int lower_bound () const;
+  wide_int upper_bound () const;
+  void verify_range () const;
+  irange_bitmask get_bitmask () const;
+  void update_bitmask (const irange_bitmask &);
+protected:
+  bool varying_compatible_p () const;
+
+  tree m_type;
+  wide_int m_min;
+  wide_int m_max;
+  irange_bitmask m_bitmask;
+};
+
 // Unsupported temporaries may be created by ranger before it's known
 // they're unsupported, or by vr_values::get_value_range.
 
@@ -667,6 +717,7 @@ class vrange_visitor
 {
 public:
   virtual void visit (const irange &) const { }
+  virtual void visit (const prange &) const { }
   virtual void visit (const frange &) const { }
   virtual void visit (const unsupported_range &) const { }
 };
@@ -1196,6 +1247,151 @@ irange_val_max (const_tree type)
   return wi::max_value (TYPE_PRECISION (type), TYPE_SIGN (type));
 }
 
+inline
+prange::prange ()
+  : vrange (VR_PRANGE)
+{
+  set_undefined ();
+}
+
+inline
+prange::prange (const prange &r)
+  : vrange (VR_PRANGE)
+{
+  *this = r;
+}
+
+inline
+prange::prange (tree type)
+  : vrange (VR_PRANGE)
+{
+  set_varying (type);
+}
+
+inline
+prange::prange (tree type, const wide_int &lb, const wide_int &ub,
+		value_range_kind kind)
+  : vrange (VR_PRANGE)
+{
+  set (type, lb, ub, kind);
+}
+
+inline bool
+prange::supports_p (const_tree type)
+{
+  return POINTER_TYPE_P (type);
+}
+
+inline bool
+prange::supports_type_p (const_tree type) const
+{
+  return POINTER_TYPE_P (type);
+}
+
+inline void
+prange::set_undefined ()
+{
+  m_kind = VR_UNDEFINED;
+}
+
+inline void
+prange::set_varying (tree type)
+{
+  m_kind = VR_VARYING;
+  m_type = type;
+  m_min = wi::zero (TYPE_PRECISION (type));
+  m_max = wi::max_value (TYPE_PRECISION (type), UNSIGNED);
+  m_bitmask.set_unknown (TYPE_PRECISION (type));
+
+  if (flag_checking)
+    verify_range ();
+}
+
+inline void
+prange::set_nonzero (tree type)
+{
+  m_kind = VR_RANGE;
+  m_type = type;
+  m_min = wi::one (TYPE_PRECISION (type));
+  m_max = wi::max_value (TYPE_PRECISION (type), UNSIGNED);
+  m_bitmask.set_unknown (TYPE_PRECISION (type));
+
+  if (flag_checking)
+    verify_range ();
+}
+
+inline void
+prange::set_zero (tree type)
+{
+  m_kind = VR_RANGE;
+  m_type = type;
+  wide_int zero = wi::zero (TYPE_PRECISION (type));
+  m_min = m_max = zero;
+  m_bitmask = irange_bitmask (zero, zero);
+
+  if (flag_checking)
+    verify_range ();
+}
+
+inline bool
+prange::contains_p (tree cst) const
+{
+  return contains_p (wi::to_wide (cst));
+}
+
+inline bool
+prange::zero_p () const
+{
+  return m_kind == VR_RANGE && m_min == 0 && m_max == 0;
+}
+
+inline bool
+prange::nonzero_p () const
+{
+  return m_kind == VR_RANGE && m_min == 1 && m_max == -1;
+}
+
+inline tree
+prange::type () const
+{
+  gcc_checking_assert (!undefined_p ());
+  return m_type;
+}
+
+inline wide_int
+prange::lower_bound () const
+{
+  gcc_checking_assert (!undefined_p ());
+  return m_min;
+}
+
+inline wide_int
+prange::upper_bound () const
+{
+  gcc_checking_assert (!undefined_p ());
+  return m_max;
+}
+
+inline bool
+prange::varying_compatible_p () const
+{
+  return (!undefined_p ()
+	  && m_min == 0 && m_max == -1 && get_bitmask ().unknown_p ());
+}
+
+inline irange_bitmask
+prange::get_bitmask () const
+{
+  return m_bitmask;
+}
+
+inline bool
+prange::fits_p (const vrange &) const
+{
+  return true;
+}
+
+
 inline
 frange::frange ()
   : vrange (VR_FRANGE)
-- 
2.44.0


[-- Attachment #4: irange_get_bitmask_with_patch.s --]
[-- Type: application/octet-stream, Size: 2925 bytes --]

	.globl	_ZNK6irange11get_bitmaskEv
	.type	_ZNK6irange11get_bitmaskEv, @function
_ZNK6irange11get_bitmaskEv:
.LFB3242:
	.cfi_startproc
	pushq	%r13
	.cfi_def_cfa_offset 16
	.cfi_offset 13, -16
	movq	%rdi, %r13
	pushq	%r12
	.cfi_def_cfa_offset 24
	.cfi_offset 12, -24
	pushq	%rbp
	.cfi_def_cfa_offset 32
	.cfi_offset 6, -32
	pushq	%rbx
	.cfi_def_cfa_offset 40
	.cfi_offset 3, -40
	movq	%rsi, %rbx
	subq	$168, %rsp
	.cfi_def_cfa_offset 208
	movzbl	10(%rsi), %eax
	movq	184(%rsi), %r12
	leal	-1(%rax,%rax), %eax
	leaq	(%rax,%rax,4), %rbp
	salq	$4, %rbp
	addq	%r12, %rbp
	movdqu	0(%rbp), %xmm0
	movaps	%xmm0, 80(%rsp)
	movdqu	16(%rbp), %xmm0
	movaps	%xmm0, 96(%rsp)
	movdqu	32(%rbp), %xmm0
	movaps	%xmm0, 112(%rsp)
	movdqu	48(%rbp), %xmm0
	movaps	%xmm0, 128(%rsp)
	movdqu	64(%rbp), %xmm0
	movaps	%xmm0, 144(%rsp)
	movl	156(%rsp), %eax
	cmpl	$576, %eax
	ja	.L2460
.L2448:
	movdqu	(%r12), %xmm0
	movaps	%xmm0, (%rsp)
	movdqu	16(%r12), %xmm0
	movaps	%xmm0, 16(%rsp)
	movdqu	32(%r12), %xmm0
	movaps	%xmm0, 32(%rsp)
	movdqu	48(%r12), %xmm0
	movaps	%xmm0, 48(%rsp)
	movdqu	64(%r12), %xmm0
	movaps	%xmm0, 64(%rsp)
	movl	76(%rsp), %eax
	cmpl	$576, %eax
	ja	.L2461
.L2449:
	movq	(%rbx), %rax
	movq	16(%rax), %rax
	cmpq	$_ZNK6irange4typeEv, %rax
	jne	.L2450
	movq	16(%rbx), %rax
.L2451:
	movzwl	54(%rax), %esi
	leaq	80(%rsp), %rcx
	movq	%rsp, %rdx
	movq	%r13, %rdi
	call	_ZL22get_bitmask_from_rangeP9tree_nodeRK16generic_wide_intI16wide_int_storageES5_.isra.0
	cmpl	$576, 76(%rsp)
	ja	.L2462
	cmpl	$576, 156(%rsp)
	ja	.L2463
.L2453:
	cmpl	$576, 180(%rbx)
	movl	176(%rbx), %eax
	leaq	104(%rbx), %rdx
	ja	.L2464
.L2455:
	cmpl	$1, %eax
	jne	.L2458
	cmpq	$-1, (%rdx)
	je	.L2447
.L2458:
	leaq	24(%rbx), %rsi
	movq	%r13, %rdi
	call	_ZN14irange_bitmask9intersectERKS_
.L2447:
	addq	$168, %rsp
	.cfi_remember_state
	.cfi_def_cfa_offset 40
	movq	%r13, %rax
	popq	%rbx
	.cfi_def_cfa_offset 32
	popq	%rbp
	.cfi_def_cfa_offset 24
	popq	%r12
	.cfi_def_cfa_offset 16
	popq	%r13
	.cfi_def_cfa_offset 8
	ret
	.p2align 4,,10
	.p2align 3
.L2450:
	.cfi_restore_state
	movq	%rbx, %rdi
	call	*%rax
	jmp	.L2451
	.p2align 4,,10
	.p2align 3
.L2461:
	leal	63(%rax), %edi
	shrl	$6, %edi
	salq	$3, %rdi
	call	xmalloc
	movl	72(%rsp), %edx
	movq	(%r12), %rsi
	movq	%rax, %rdi
	movq	%rax, (%rsp)
	salq	$3, %rdx
	call	memcpy
	jmp	.L2449
	.p2align 4,,10
	.p2align 3
.L2462:
	movq	(%rsp), %rdi
	call	free
	cmpl	$576, 156(%rsp)
	jbe	.L2453
	.p2align 4,,10
	.p2align 3
.L2463:
	movq	80(%rsp), %rdi
	call	free
	jmp	.L2453
	.p2align 4,,10
	.p2align 3
.L2464:
	movq	104(%rbx), %rdx
	jmp	.L2455
	.p2align 4,,10
	.p2align 3
.L2460:
	leal	63(%rax), %edi
	shrl	$6, %edi
	salq	$3, %rdi
	call	xmalloc
	movl	152(%rsp), %edx
	movq	0(%rbp), %rsi
	movq	%rax, %rdi
	movq	%rax, 80(%rsp)
	salq	$3, %rdx
	call	memcpy
	movq	184(%rbx), %r12
	jmp	.L2448
	.cfi_endproc
.LFE3242:
	.size	_ZNK6irange11get_bitmaskEv, .-_ZNK6irange11get_bitmaskEv
	.section	.rodata.str1.1
.LC38:
	.string	"add_vrange"

[-- Attachment #5: irange_get_bitmask_without_patch.s --]
[-- Type: application/octet-stream, Size: 10013 bytes --]

	.globl	_ZNK6irange11get_bitmaskEv
	.type	_ZNK6irange11get_bitmaskEv, @function
_ZNK6irange11get_bitmaskEv:
.LFB3197:
	.cfi_startproc
	pushq	%r15
	.cfi_def_cfa_offset 16
	.cfi_offset 15, -16
	pushq	%r14
	.cfi_def_cfa_offset 24
	.cfi_offset 14, -24
	pushq	%r13
	.cfi_def_cfa_offset 32
	.cfi_offset 13, -32
	pushq	%r12
	.cfi_def_cfa_offset 40
	.cfi_offset 12, -40
	pushq	%rbp
	.cfi_def_cfa_offset 48
	.cfi_offset 6, -48
	movq	%rdi, %rbp
	pushq	%rbx
	.cfi_def_cfa_offset 56
	.cfi_offset 3, -56
	movq	%rsi, %rbx
	subq	$584, %rsp
	.cfi_def_cfa_offset 640
	movzbl	10(%rsi), %eax
	movq	184(%rsi), %r13
	leal	-1(%rax,%rax), %eax
	leaq	(%rax,%rax,4), %r12
	salq	$4, %r12
	addq	%r13, %r12
	movdqu	(%r12), %xmm0
	movaps	%xmm0, 96(%rsp)
	movdqu	16(%r12), %xmm0
	movaps	%xmm0, 112(%rsp)
	movdqu	32(%r12), %xmm0
	movaps	%xmm0, 128(%rsp)
	movdqu	48(%r12), %xmm0
	movaps	%xmm0, 144(%rsp)
	movdqu	64(%r12), %xmm0
	movaps	%xmm0, 160(%rsp)
	movl	172(%rsp), %eax
	cmpl	$576, %eax
	ja	.L1610
.L1532:
	movdqu	0(%r13), %xmm0
	movaps	%xmm0, 16(%rsp)
	movdqu	16(%r13), %xmm0
	movaps	%xmm0, 32(%rsp)
	movdqu	32(%r13), %xmm0
	movaps	%xmm0, 48(%rsp)
	movdqu	48(%r13), %xmm0
	movaps	%xmm0, 64(%rsp)
	movdqu	64(%r13), %xmm0
	movaps	%xmm0, 80(%rsp)
	movl	92(%rsp), %eax
	cmpl	$576, %eax
	ja	.L1611
.L1533:
	movq	(%rbx), %rax
	movq	16(%rax), %rax
	cmpq	$_ZNK6irange4typeEv, %rax
	jne	.L1534
	movq	16(%rbx), %rax
.L1535:
	movl	92(%rsp), %r9d
	movzwl	54(%rax), %r14d
	movl	88(%rsp), %esi
	movl	172(%rsp), %r11d
	movl	168(%rsp), %r8d
	cmpl	$576, %r9d
	ja	.L1536
	cmpl	$576, %r11d
	ja	.L1612
	cmpl	%r8d, %esi
	je	.L1587
	movl	%r9d, 252(%rsp)
	leaq	16(%rsp), %r12
	leaq	96(%rsp), %rcx
	leaq	176(%rsp), %r15
	movq	%r12, %rax
	movq	%r15, %rdi
.L1541:
	leal	(%rsi,%r8), %edx
	cmpl	$2, %edx
	jne	.L1555
	movl	252(%rsp), %edx
	movq	(%rax), %rax
	xorq	(%rcx), %rax
	movq	%r15, %rcx
	movq	%rax, (%rdi)
	movl	$1, 248(%rsp)
	cmpl	$576, %edx
	ja	.L1613
.L1556:
	cmpq	$0, (%rcx)
	jne	.L1593
	leaq	416(%rsp), %rax
	leaq	496(%rsp), %r13
	movq	%rax, (%rsp)
.L1565:
	movq	(%rsp), %rdi
	movq	%r15, %rdx
	movq	%r12, %rsi
	call	_ZN2wi6bit_orI16generic_wide_intI16wide_int_storageES3_EENS_13binary_traitsIT_T0_XsrNS_10int_traitsIS5_EE14precision_typeEXsrNS7_IS6_EE14precision_typeEE11result_typeERKS5_RKS6_
	leaq	352(%rsp), %rax
	movq	$0, 352(%rsp)
	movq	%rax, 336(%rsp)
	movl	$1, 344(%rsp)
	movl	%r14d, 348(%rsp)
	movl	%r14d, 572(%rsp)
	cmpl	$576, %r14d
	ja	.L1614
	movq	$0, 496(%rsp)
	movl	$1, %esi
.L1582:
	movl	$0, 76(%rbp)
	movq	%rbp, %rdi
	movl	$0, 156(%rbp)
	movl	%esi, 568(%rsp)
	movq	%r13, %rsi
	call	_ZN16wide_int_storageaSERKS_.isra.0
	movq	(%rsp), %rsi
	leaq	80(%rbp), %rdi
	call	_ZN16wide_int_storageaSERKS_.isra.0
	movl	global_options+3536(%rip), %eax
	testl	%eax, %eax
	je	.L1569
	movl	156(%rbp), %eax
	cmpl	%eax, 76(%rbp)
	jne	.L1570
.L1569:
	cmpl	$576, 572(%rsp)
	ja	.L1615
.L1571:
	cmpl	$576, 492(%rsp)
	ja	.L1616
.L1572:
	cmpl	$576, 252(%rsp)
	jbe	.L1552
	movq	176(%rsp), %rdi
	call	free
	jmp	.L1552
	.p2align 4,,10
	.p2align 3
.L1587:
	leaq	16(%rsp), %r12
	leaq	96(%rsp), %rdi
	movq	%r12, %rcx
.L1540:
	xorl	%eax, %eax
	jmp	.L1545
	.p2align 4,,10
	.p2align 3
.L1618:
	addl	$1, %eax
	cmpl	%eax, %esi
	je	.L1617
.L1545:
	movl	%eax, %edx
	movq	(%rdi,%rdx,8), %r10
	cmpq	%r10, (%rcx,%rdx,8)
	je	.L1618
	movl	%r9d, 252(%rsp)
	cmpl	$576, %r9d
	ja	.L1543
	leaq	176(%rsp), %r15
	movq	%r12, %rax
	movq	%r15, %rdi
	.p2align 4,,10
	.p2align 3
.L1554:
	leaq	96(%rsp), %rcx
	cmpl	$576, %r11d
	jbe	.L1541
	.p2align 4,,10
	.p2align 3
.L1539:
	movq	96(%rsp), %rcx
	jmp	.L1541
	.p2align 4,,10
	.p2align 3
.L1617:
	leaq	512(%rsp), %rax
	movl	%r14d, 508(%rsp)
	movq	$0, 512(%rsp)
	movq	%rax, 496(%rsp)
	movl	$1, 504(%rsp)
	movl	%r14d, 332(%rsp)
	cmpl	$576, %r14d
	ja	.L1619
	movl	$1, %esi
	leaq	256(%rsp), %r13
	movq	$0, 256(%rsp)
.L1583:
	leaq	336(%rsp), %r14
	movl	%esi, 328(%rsp)
	movq	%r12, %rsi
	movq	%r14, %rdi
	call	_ZN16wide_int_storageC2ERKS_
	movl	$0, 76(%rbp)
	movq	%r14, %rsi
	movq	%rbp, %rdi
	movl	$0, 156(%rbp)
	call	_ZN16wide_int_storageaSERKS_.isra.0
	leaq	80(%rbp), %rdi
	movq	%r13, %rsi
	call	_ZN16wide_int_storageaSERKS_.isra.0
	movl	global_options+3536(%rip), %edx
	testl	%edx, %edx
	je	.L1549
	movl	156(%rbp), %eax
	cmpl	%eax, 76(%rbp)
	jne	.L1570
.L1549:
	cmpl	$576, 412(%rsp)
	ja	.L1620
.L1550:
	cmpl	$576, 332(%rsp)
	ja	.L1621
.L1552:
	cmpl	$576, 92(%rsp)
	ja	.L1622
.L1574:
	cmpl	$576, 172(%rsp)
	ja	.L1623
.L1575:
	cmpl	$576, 180(%rbx)
	movl	176(%rbx), %eax
	leaq	104(%rbx), %rdx
	ja	.L1624
.L1577:
	cmpl	$1, %eax
	jne	.L1580
	cmpq	$-1, (%rdx)
	je	.L1531
.L1580:
	leaq	24(%rbx), %rsi
	movq	%rbp, %rdi
	call	_ZN14irange_bitmask9intersectERKS_
.L1531:
	addq	$584, %rsp
	.cfi_remember_state
	.cfi_def_cfa_offset 56
	movq	%rbp, %rax
	popq	%rbx
	.cfi_def_cfa_offset 48
	popq	%rbp
	.cfi_def_cfa_offset 40
	popq	%r12
	.cfi_def_cfa_offset 32
	popq	%r13
	.cfi_def_cfa_offset 24
	popq	%r14
	.cfi_def_cfa_offset 16
	popq	%r15
	.cfi_def_cfa_offset 8
	ret
	.p2align 4,,10
	.p2align 3
.L1534:
	.cfi_restore_state
	movq	%rbx, %rdi
	call	*%rax
	jmp	.L1535
	.p2align 4,,10
	.p2align 3
.L1593:
	movl	$1, %eax
.L1564:
	movl	%eax, 424(%rsp)
	leaq	416(%rsp), %rax
	leaq	496(%rsp), %r13
	movq	%rax, %rdi
	movq	%rcx, 416(%rsp)
	movl	%edx, 428(%rsp)
	movq	%rax, (%rsp)
	call	_ZN2wi3clzERK16generic_wide_intI20wide_int_ref_storageILb0ELb1EEE
	movl	%r14d, %esi
	movl	%r14d, 572(%rsp)
	subl	%eax, %esi
	cmpl	$576, %r14d
	ja	.L1625
.L1558:
	movq	%r13, %rdi
.L1559:
	movl	%r14d, %ecx
	xorl	%edx, %edx
	call	_ZN2wi4maskEPljbj
	movl	572(%rsp), %edx
	movl	%eax, %ecx
	movl	%eax, 568(%rsp)
	sall	$6, %ecx
	cmpl	%ecx, %edx
	jnb	.L1560
	movq	%r13, %rcx
	cmpl	$576, %edx
	ja	.L1626
.L1561:
	subl	$1, %eax
	andl	$63, %edx
	leaq	(%rcx,%rax,8), %rsi
	movl	$64, %ecx
	movq	(%rsi), %rax
	subl	%edx, %ecx
	salq	%cl, %rax
	sarq	%cl, %rax
	movq	%rax, (%rsi)
.L1560:
	movq	%r13, %rsi
	movq	%r15, %rdi
	call	_ZN16wide_int_storageaSERKS_.isra.0
	cmpl	$576, 572(%rsp)
	jbe	.L1565
	movq	496(%rsp), %rdi
	call	free
	jmp	.L1565
	.p2align 4,,10
	.p2align 3
.L1536:
	movq	16(%rsp), %rcx
	cmpl	$576, %r11d
	ja	.L1627
	cmpl	%r8d, %esi
	jne	.L1581
	leaq	96(%rsp), %rdi
	leaq	16(%rsp), %r12
	jmp	.L1540
.L1627:
	leaq	16(%rsp), %r12
	cmpl	%r8d, %esi
	je	.L1538
	.p2align 4,,10
	.p2align 3
.L1581:
	movl	%r9d, 252(%rsp)
	leaq	16(%rsp), %r12
.L1543:
	leal	63(%r9), %edi
	leaq	176(%rsp), %r15
	shrl	$6, %edi
	salq	$3, %rdi
	call	xmalloc
	movl	88(%rsp), %esi
	movl	252(%rsp), %r9d
	movq	%rax, 176(%rsp)
	movl	92(%rsp), %edx
	movq	%rax, %rdi
	movl	168(%rsp), %r8d
	movl	172(%rsp), %r11d
	cmpl	$576, %r9d
	jbe	.L1607
.L1553:
	movq	%r12, %rax
	cmpl	$576, %edx
	jbe	.L1554
	movq	16(%rsp), %rax
	jmp	.L1554
	.p2align 4,,10
	.p2align 3
.L1624:
	movq	104(%rbx), %rdx
	jmp	.L1577
	.p2align 4,,10
	.p2align 3
.L1623:
	movq	96(%rsp), %rdi
	call	free
	jmp	.L1575
	.p2align 4,,10
	.p2align 3
.L1622:
	movq	16(%rsp), %rdi
	call	free
	jmp	.L1574
	.p2align 4,,10
	.p2align 3
.L1610:
	leal	63(%rax), %edi
	shrl	$6, %edi
	salq	$3, %rdi
	call	xmalloc
	movl	168(%rsp), %edx
	movq	(%r12), %rsi
	movq	%rax, %rdi
	movq	%rax, 96(%rsp)
	salq	$3, %rdx
	call	memcpy
	movq	184(%rbx), %r13
	jmp	.L1532
	.p2align 4,,10
	.p2align 3
.L1611:
	leal	63(%rax), %edi
	shrl	$6, %edi
	salq	$3, %rdi
	call	xmalloc
	movl	88(%rsp), %edx
	movq	0(%r13), %rsi
	movq	%rax, %rdi
	movq	%rax, 16(%rsp)
	salq	$3, %rdx
	call	memcpy
	jmp	.L1533
	.p2align 4,,10
	.p2align 3
.L1612:
	cmpl	%r8d, %esi
	je	.L1586
	leaq	176(%rsp), %r15
	leaq	16(%rsp), %r12
	movl	%r9d, 252(%rsp)
	movq	%r15, %rdi
	movq	%r12, %rax
	jmp	.L1539
	.p2align 4,,10
	.p2align 3
.L1614:
	leal	63(%r14), %edi
	shrl	$6, %edi
	salq	$3, %rdi
	call	xmalloc
	movl	344(%rsp), %esi
	cmpl	$576, 572(%rsp)
	movq	%rax, 496(%rsp)
	movq	336(%rsp), %rdi
	jbe	.L1628
.L1567:
	xorl	%edx, %edx
	.p2align 4,,10
	.p2align 3
.L1568:
	movq	(%rdi,%rdx,8), %rcx
	movq	%rcx, (%rax,%rdx,8)
	addq	$1, %rdx
	cmpl	%esi, %edx
	jb	.L1568
	jmp	.L1582
	.p2align 4,,10
	.p2align 3
.L1621:
	movq	256(%rsp), %rdi
	call	free
	jmp	.L1552
	.p2align 4,,10
	.p2align 3
.L1620:
	movq	336(%rsp), %rdi
	call	free
	jmp	.L1550
	.p2align 4,,10
	.p2align 3
.L1619:
	leal	63(%r14), %edi
	leaq	256(%rsp), %r13
	shrl	$6, %edi
	salq	$3, %rdi
	call	xmalloc
	movl	504(%rsp), %esi
	cmpl	$576, 332(%rsp)
	movq	%rax, 256(%rsp)
	movq	496(%rsp), %rdi
	jbe	.L1606
.L1547:
	xorl	%edx, %edx
	.p2align 4,,10
	.p2align 3
.L1548:
	movq	(%rdi,%rdx,8), %rcx
	movq	%rcx, (%rax,%rdx,8)
	addq	$1, %rdx
	cmpl	%esi, %edx
	jb	.L1548
	jmp	.L1583
	.p2align 4,,10
	.p2align 3
.L1613:
	movq	176(%rsp), %rcx
	jmp	.L1556
	.p2align 4,,10
	.p2align 3
.L1555:
	movl	%esi, %edx
	movq	%rax, %rsi
	call	_ZN2wi9xor_largeEPlPKljS2_jj
	movl	252(%rsp), %edx
	movl	%eax, 248(%rsp)
	cmpl	$576, %edx
	ja	.L1629
	movq	%r15, %rcx
.L1557:
	cmpl	$1, %eax
	jne	.L1564
	jmp	.L1556
	.p2align 4,,10
	.p2align 3
.L1616:
	movq	416(%rsp), %rdi
	call	free
	jmp	.L1572
	.p2align 4,,10
	.p2align 3
.L1615:
	movq	496(%rsp), %rdi
	call	free
	jmp	.L1571
.L1586:
	leaq	16(%rsp), %r12
	movq	%r12, %rcx
.L1538:
	movq	96(%rsp), %rdi
	jmp	.L1540
.L1625:
	leal	63(%r14), %edi
	movl	%esi, 12(%rsp)
	shrl	$6, %edi
	salq	$3, %rdi
	call	xmalloc
	movl	12(%rsp), %esi
	cmpl	$576, 572(%rsp)
	movq	%rax, 496(%rsp)
	movq	%rax, %rdi
	ja	.L1559
	jmp	.L1558
	.p2align 4,,10
	.p2align 3
.L1626:
	movq	496(%rsp), %rcx
	jmp	.L1561
.L1607:
	movq	%r15, %rdi
	jmp	.L1553
.L1628:
	movq	%r13, %rax
	jmp	.L1567
.L1629:
	movq	176(%rsp), %rcx
	jmp	.L1557
.L1606:
	movq	%r13, %rax
	jmp	.L1547
	.cfi_endproc
	.section	.text.unlikely
	.cfi_startproc
	.type	_ZNK6irange11get_bitmaskEv.cold, @function
_ZNK6irange11get_bitmaskEv.cold:
.LFSB3197:
.L1570:
	.cfi_def_cfa_offset 640
	.cfi_offset 3, -56
	.cfi_offset 6, -48
	.cfi_offset 12, -40
	.cfi_offset 13, -32
	.cfi_offset 14, -24
	.cfi_offset 15, -16
	call	_ZNK14irange_bitmask11verify_maskEv.part.0
	.cfi_endproc
.LFE3197:
	.text
	.size	_ZNK6irange11get_bitmaskEv, .-_ZNK6irange11get_bitmaskEv
	.section	.text.unlikely
	.size	_ZNK6irange11get_bitmaskEv.cold, .-_ZNK6irange11get_bitmaskEv.cold
.LCOLDE33:
	.text
.LHOTE33:
	.section	.rodata.str1.1
.LC34:
	.string	"add_vrange"

next             reply	other threads:[~2024-04-26  9:43 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-26  9:42 Aldy Hernandez [this message]
2024-04-30  7:57 ` Richard Biener
2024-04-30  8:21   ` Aldy Hernandez
2024-04-30  8:53 ` Martin Jambor
2024-04-30 19:09 ` Jason Merrill
2024-04-30 19:15   ` Richard Biener
2024-04-30 21:48     ` Building libgccjit with -fno-semantic-interposition? ( was Re: 1.76% performance loss in VRP due to inlining) David Malcolm
2024-05-02  7:40       ` Andrea Corallo
2024-04-30 19:22   ` 1.76% performance loss in VRP due to inlining Jakub Jelinek
2024-04-30 21:37     ` Jason Merrill
2024-05-03  8:55       ` Aldy Hernandez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGm3qMVXdshX8T+ioAmEf2=5D8tzCHtZrmhzs0xcSTdUo0qCFg@mail.gmail.com' \
    --to=aldyh@redhat.com \
    --cc=amacleod@redhat.com \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).