* 1.76% performance loss in VRP due to inlining @ 2024-04-26 9:42 Aldy Hernandez 2024-04-30 7:57 ` Richard Biener ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Aldy Hernandez @ 2024-04-26 9:42 UTC (permalink / raw) To: GCC Mailing List; +Cc: MacLeod, Andrew [-- Attachment #1: Type: text/plain, Size: 1231 bytes --] Hi folks! In implementing prange (pointer ranges), I have found a 1.74% slowdown in VRP, even without any code path actually using the code. I have tracked this down to irange::get_bitmask() being compiled differently with and without the bare bones patch. With the patch, irange::get_bitmask() has a lot of code inlined into it, particularly get_bitmask_from_range() and consequently the wide_int_storage code. I don't know whether this is expected behavior, and if it is, how to mitigate it. I have tried declaring get_bitmask_from_range() inline, but that didn't help. OTOH, using __attribute__((always_inline)) helps a bit, but not entirely. What does help is inlining irange::get_bitmask() entirely, but that seems like a big hammer. The overall slowdown in compilation is 0.26%, because VRP is a relatively fast pass, but a measurable pass slowdown is something we'd like to avoid. What's the recommended approach here? For the curious, I am attaching before and after copies of value-range.s. I am also attaching the two patches needed to reproduce the problem on mainline. The first patch is merely setup. It is the second patch that exhibits the problem. Notice there are no uses of prange yet. Thanks. Aldy [-- Attachment #2: 0001-Move-get_bitmask_from_range-out-of-irange-class.patch --] [-- Type: text/x-patch, Size: 2802 bytes --] From ee63833c5f56064ef47c2bb9debd485f77d00171 Mon Sep 17 00:00:00 2001 From: Aldy Hernandez <aldyh@redhat.com> Date: Tue, 19 Mar 2024 18:04:55 +0100 Subject: [PATCH] Move get_bitmask_from_range out of irange class. --- gcc/value-range.cc | 52 +++++++++++++++++++++++----------------------- gcc/value-range.h | 1 - 2 files changed, 26 insertions(+), 27 deletions(-) diff --git a/gcc/value-range.cc b/gcc/value-range.cc index 70375f7abf9..0f81ce32615 100644 --- a/gcc/value-range.cc +++ b/gcc/value-range.cc @@ -31,6 +31,30 @@ along with GCC; see the file COPYING3. If not see #include "fold-const.h" #include "gimple-range.h" +// Return the bitmask inherent in a range. + +static irange_bitmask +get_bitmask_from_range (tree type, + const wide_int &min, const wide_int &max) +{ + unsigned prec = TYPE_PRECISION (type); + + // All the bits of a singleton are known. + if (min == max) + { + wide_int mask = wi::zero (prec); + wide_int value = min; + return irange_bitmask (value, mask); + } + + wide_int xorv = min ^ max; + + if (xorv != 0) + xorv = wi::mask (prec - wi::clz (xorv), false, prec); + + return irange_bitmask (wi::zero (prec), min | xorv); +} + void irange::accept (const vrange_visitor &v) const { @@ -1832,31 +1856,6 @@ irange::invert () verify_range (); } -// Return the bitmask inherent in the range. - -irange_bitmask -irange::get_bitmask_from_range () const -{ - unsigned prec = TYPE_PRECISION (type ()); - wide_int min = lower_bound (); - wide_int max = upper_bound (); - - // All the bits of a singleton are known. - if (min == max) - { - wide_int mask = wi::zero (prec); - wide_int value = lower_bound (); - return irange_bitmask (value, mask); - } - - wide_int xorv = min ^ max; - - if (xorv != 0) - xorv = wi::mask (prec - wi::clz (xorv), false, prec); - - return irange_bitmask (wi::zero (prec), min | xorv); -} - // Remove trailing ranges that this bitmask indicates can't exist. void @@ -1978,7 +1977,8 @@ irange::get_bitmask () const // in the mask. // // See also the note in irange_bitmask::intersect. - irange_bitmask bm = get_bitmask_from_range (); + irange_bitmask bm + = get_bitmask_from_range (type (), lower_bound (), upper_bound ()); if (!m_bitmask.unknown_p ()) bm.intersect (m_bitmask); return bm; diff --git a/gcc/value-range.h b/gcc/value-range.h index 9531df56988..dc5b153a83e 100644 --- a/gcc/value-range.h +++ b/gcc/value-range.h @@ -351,7 +351,6 @@ private: bool varying_compatible_p () const; bool intersect_bitmask (const irange &r); bool union_bitmask (const irange &r); - irange_bitmask get_bitmask_from_range () const; bool set_range_from_bitmask (); bool intersect (const wide_int& lb, const wide_int& ub); -- 2.44.0 [-- Attachment #3: 0002-Implement-minimum-prange-class-exhibiting-VRP-slowdo.patch --] [-- Type: text/x-patch, Size: 13611 bytes --] From 03c70de43177a97ec5e9c243aafc798c1f37e6d8 Mon Sep 17 00:00:00 2001 From: Aldy Hernandez <aldyh@redhat.com> Date: Wed, 20 Mar 2024 06:25:52 +0100 Subject: [PATCH] Implement minimum prange class exhibiting VRP slowdown. --- gcc/value-range-pretty-print.cc | 25 +++ gcc/value-range-pretty-print.h | 1 + gcc/value-range.cc | 274 ++++++++++++++++++++++++++++++++ gcc/value-range.h | 196 +++++++++++++++++++++++ 4 files changed, 496 insertions(+) diff --git a/gcc/value-range-pretty-print.cc b/gcc/value-range-pretty-print.cc index c75cbea3955..154253e047f 100644 --- a/gcc/value-range-pretty-print.cc +++ b/gcc/value-range-pretty-print.cc @@ -113,6 +113,31 @@ vrange_printer::print_irange_bitmasks (const irange &r) const pp_string (pp, p); } +void +vrange_printer::visit (const prange &r) const +{ + pp_string (pp, "[prange] "); + if (r.undefined_p ()) + { + pp_string (pp, "UNDEFINED"); + return; + } + dump_generic_node (pp, r.type (), 0, TDF_NONE | TDF_NOUID, false); + pp_character (pp, ' '); + if (r.varying_p ()) + { + pp_string (pp, "VARYING"); + return; + } + + pp_character (pp, '['); + //print_int_bound (pp, r.lower_bound (), r.type ()); + pp_string (pp, ", "); + //print_int_bound (pp, r.upper_bound (), r.type ()); + pp_character (pp, ']'); + //print_irange_bitmasks (pp, r.m_bitmask); +} + void vrange_printer::print_real_value (tree type, const REAL_VALUE_TYPE &r) const { diff --git a/gcc/value-range-pretty-print.h b/gcc/value-range-pretty-print.h index ca85fd6157c..54ee0cf8c26 100644 --- a/gcc/value-range-pretty-print.h +++ b/gcc/value-range-pretty-print.h @@ -27,6 +27,7 @@ public: vrange_printer (pretty_printer *pp_) : pp (pp_) { } void visit (const unsupported_range &) const override; void visit (const irange &) const override; + void visit (const prange &) const override; void visit (const frange &) const override; private: void print_irange_bound (const wide_int &w, tree type) const; diff --git a/gcc/value-range.cc b/gcc/value-range.cc index 0f81ce32615..06ab1a616bf 100644 --- a/gcc/value-range.cc +++ b/gcc/value-range.cc @@ -377,6 +377,280 @@ irange::set_nonnegative (tree type) wi::to_wide (TYPE_MAX_VALUE (type))); } +// Prange implementation. + +void +prange::accept (const vrange_visitor &v) const +{ + v.visit (*this); +} + +void +prange::set_nonnegative (tree type) +{ + set (type, + wi::zero (TYPE_PRECISION (type)), + wi::max_value (TYPE_PRECISION (type), UNSIGNED)); +} + +void +prange::set (tree min, tree max, value_range_kind kind) +{ + return set (TREE_TYPE (min), wi::to_wide (min), wi::to_wide (max), kind); +} + +void +prange::set (tree type, const wide_int &min, const wide_int &max, + value_range_kind kind) +{ + if (kind == VR_UNDEFINED) + { + set_undefined (); + return; + } + if (kind == VR_VARYING) + { + set_varying (type); + return; + } + if (kind == VR_ANTI_RANGE) + { + gcc_checking_assert (min == 0 && max == 0); + set_nonzero (type); + return; + } + m_type = type; + m_min = min; + m_max = max; + if (m_min == 0 && m_max == -1) + { + m_kind = VR_VARYING; + m_bitmask.set_unknown (TYPE_PRECISION (type)); + if (flag_checking) + verify_range (); + return; + } + + m_kind = VR_RANGE; + m_bitmask = get_bitmask_from_range (type, min, max); + if (flag_checking) + verify_range (); +} + +bool +prange::contains_p (const wide_int &w) const +{ + if (undefined_p ()) + return false; + + if (varying_p ()) + return true; + + return (wi::le_p (lower_bound (), w, UNSIGNED) + && wi::ge_p (upper_bound (), w, UNSIGNED)); +} + +bool +prange::singleton_p (tree *result) const +{ + if (m_kind == VR_RANGE && lower_bound () == upper_bound ()) + { + if (result) + *result = wide_int_to_tree (type (), m_min); + return true; + } + return false; +} + +bool +prange::union_ (const vrange &v) +{ + const prange &r = as_a <prange> (v); + + if (r.undefined_p ()) + return false; + if (undefined_p ()) + { + *this = r; + if (flag_checking) + verify_range (); + return true; + } + if (varying_p ()) + return false; + if (r.varying_p ()) + { + set_varying (type ()); + return true; + } + + wide_int new_lb = wi::min (r.lower_bound (), lower_bound (), UNSIGNED); + wide_int new_ub = wi::max (r.upper_bound (), upper_bound (), UNSIGNED); + prange new_range (type (), new_lb, new_ub); + new_range.m_bitmask.union_ (m_bitmask); + new_range.m_bitmask.union_ (r.m_bitmask); + if (new_range.varying_compatible_p ()) + { + set_varying (type ()); + return true; + } + if (flag_checking) + new_range.verify_range (); + if (new_range == *this) + return false; + *this = new_range; + return true; +} + +bool +prange::intersect (const vrange &v) +{ + const prange &r = as_a <prange> (v); + gcc_checking_assert (undefined_p () || r.undefined_p () + || range_compatible_p (type (), r.type ())); + + if (undefined_p ()) + return false; + if (r.undefined_p ()) + { + set_undefined (); + return true; + } + if (r.varying_p ()) + return false; + if (varying_p ()) + { + *this = r; + return true; + } + + prange save = *this; + m_min = wi::max (r.lower_bound (), lower_bound (), UNSIGNED); + m_max = wi::min (r.upper_bound (), upper_bound (), UNSIGNED); + if (wi::gt_p (m_min, m_max, UNSIGNED)) + { + set_undefined (); + return true; + } + + // Intersect all bitmasks: the old one, the new one, and the other operand's. + irange_bitmask new_bitmask = get_bitmask_from_range (m_type, m_min, m_max); + m_bitmask.intersect (new_bitmask); + m_bitmask.intersect (r.m_bitmask); + + if (flag_checking) + verify_range (); + if (*this == save) + return false; + return true; +} + +prange & +prange::operator= (const prange &src) +{ + m_type = src.m_type; + m_kind = src.m_kind; + m_min = src.m_min; + m_max = src.m_max; + m_bitmask = src.m_bitmask; + if (flag_checking) + verify_range (); + return *this; +} + +bool +prange::operator== (const prange &src) const +{ + if (m_kind == src.m_kind) + { + if (undefined_p ()) + return true; + + if (varying_p ()) + return types_compatible_p (type (), src.type ()); + + return (m_min == src.m_min && m_max == src.m_max + && m_bitmask == src.m_bitmask); + } + return false; +} + +void +prange::invert () +{ + gcc_checking_assert (!undefined_p () && !varying_p ()); + + wide_int new_lb, new_ub; + unsigned prec = TYPE_PRECISION (type ()); + wide_int type_min = wi::zero (prec); + wide_int type_max = wi::max_value (prec, UNSIGNED); + wi::overflow_type ovf; + + if (lower_bound () == type_min) + { + new_lb = wi::add (upper_bound (), 1, UNSIGNED, &ovf); + if (ovf) + new_lb = type_min; + new_ub = type_max; + set (type (), new_lb, new_ub); + } + else if (upper_bound () == type_max) + { + wi::overflow_type ovf; + new_lb = type_min; + new_ub = wi::sub (lower_bound (), 1, UNSIGNED, &ovf); + if (ovf) + new_ub = type_max; + set (type (), new_lb, new_ub); + } + else + set_varying (type ()); +} + +void +prange::verify_range () const +{ + gcc_checking_assert (m_discriminator == VR_PRANGE); + + if (m_kind == VR_UNDEFINED) + return; + + gcc_checking_assert (supports_p (type ())); + + if (m_kind == VR_VARYING) + { + gcc_checking_assert (varying_compatible_p ()); + return; + } + gcc_checking_assert (!varying_compatible_p ()); + gcc_checking_assert (m_kind == VR_RANGE); +} + +void +prange::update_bitmask (const irange_bitmask &bm) +{ + gcc_checking_assert (!undefined_p ()); + + // If all the bits are known, this is a singleton. + if (bm.mask () == 0) + { + set (type (), m_bitmask.value (), m_bitmask.value ()); + return; + } + + // Drop VARYINGs with known bits to a plain range. + if (m_kind == VR_VARYING && !bm.unknown_p ()) + m_kind = VR_RANGE; + + m_bitmask = bm; + if (varying_compatible_p ()) + m_kind = VR_VARYING; + + if (flag_checking) + verify_range (); +} + + void frange::accept (const vrange_visitor &v) const { diff --git a/gcc/value-range.h b/gcc/value-range.h index dc5b153a83e..9fac89a2f98 100644 --- a/gcc/value-range.h +++ b/gcc/value-range.h @@ -47,6 +47,8 @@ enum value_range_discriminator { // Range holds an integer or pointer. VR_IRANGE, + // Pointer range. + VR_PRANGE, // Floating point range. VR_FRANGE, // Range holds an unsupported type. @@ -389,6 +391,54 @@ private: wide_int m_ranges[N*2]; }; +class prange : public vrange +{ + friend class prange_storage; + friend class vrange_printer; +public: + prange (); + prange (const prange &); + prange (tree type); + prange (tree type, const wide_int &, const wide_int &, + value_range_kind = VR_RANGE); + static bool supports_p (const_tree type); + virtual bool supports_type_p (const_tree type) const final override; + virtual void accept (const vrange_visitor &v) const final override; + virtual void set_undefined () final override; + virtual void set_varying (tree type) final override; + virtual void set_nonzero (tree type) final override; + virtual void set_zero (tree type) final override; + virtual void set_nonnegative (tree type) final override; + virtual bool contains_p (tree cst) const final override; + virtual bool fits_p (const vrange &v) const final override; + virtual bool singleton_p (tree *result = NULL) const final override; + virtual bool zero_p () const final override; + virtual bool nonzero_p () const final override; + virtual void set (tree, tree, value_range_kind = VR_RANGE) final override; + virtual tree type () const final override; + virtual bool union_ (const vrange &v) final override; + virtual bool intersect (const vrange &v) final override; + + prange& operator= (const prange &); + bool operator== (const prange &) const; + void set (tree type, const wide_int &, const wide_int &, + value_range_kind = VR_RANGE); + void invert (); + bool contains_p (const wide_int &) const; + wide_int lower_bound () const; + wide_int upper_bound () const; + void verify_range () const; + irange_bitmask get_bitmask () const; + void update_bitmask (const irange_bitmask &); +protected: + bool varying_compatible_p () const; + + tree m_type; + wide_int m_min; + wide_int m_max; + irange_bitmask m_bitmask; +}; + // Unsupported temporaries may be created by ranger before it's known // they're unsupported, or by vr_values::get_value_range. @@ -667,6 +717,7 @@ class vrange_visitor { public: virtual void visit (const irange &) const { } + virtual void visit (const prange &) const { } virtual void visit (const frange &) const { } virtual void visit (const unsupported_range &) const { } }; @@ -1196,6 +1247,151 @@ irange_val_max (const_tree type) return wi::max_value (TYPE_PRECISION (type), TYPE_SIGN (type)); } +inline +prange::prange () + : vrange (VR_PRANGE) +{ + set_undefined (); +} + +inline +prange::prange (const prange &r) + : vrange (VR_PRANGE) +{ + *this = r; +} + +inline +prange::prange (tree type) + : vrange (VR_PRANGE) +{ + set_varying (type); +} + +inline +prange::prange (tree type, const wide_int &lb, const wide_int &ub, + value_range_kind kind) + : vrange (VR_PRANGE) +{ + set (type, lb, ub, kind); +} + +inline bool +prange::supports_p (const_tree type) +{ + return POINTER_TYPE_P (type); +} + +inline bool +prange::supports_type_p (const_tree type) const +{ + return POINTER_TYPE_P (type); +} + +inline void +prange::set_undefined () +{ + m_kind = VR_UNDEFINED; +} + +inline void +prange::set_varying (tree type) +{ + m_kind = VR_VARYING; + m_type = type; + m_min = wi::zero (TYPE_PRECISION (type)); + m_max = wi::max_value (TYPE_PRECISION (type), UNSIGNED); + m_bitmask.set_unknown (TYPE_PRECISION (type)); + + if (flag_checking) + verify_range (); +} + +inline void +prange::set_nonzero (tree type) +{ + m_kind = VR_RANGE; + m_type = type; + m_min = wi::one (TYPE_PRECISION (type)); + m_max = wi::max_value (TYPE_PRECISION (type), UNSIGNED); + m_bitmask.set_unknown (TYPE_PRECISION (type)); + + if (flag_checking) + verify_range (); +} + +inline void +prange::set_zero (tree type) +{ + m_kind = VR_RANGE; + m_type = type; + wide_int zero = wi::zero (TYPE_PRECISION (type)); + m_min = m_max = zero; + m_bitmask = irange_bitmask (zero, zero); + + if (flag_checking) + verify_range (); +} + +inline bool +prange::contains_p (tree cst) const +{ + return contains_p (wi::to_wide (cst)); +} + +inline bool +prange::zero_p () const +{ + return m_kind == VR_RANGE && m_min == 0 && m_max == 0; +} + +inline bool +prange::nonzero_p () const +{ + return m_kind == VR_RANGE && m_min == 1 && m_max == -1; +} + +inline tree +prange::type () const +{ + gcc_checking_assert (!undefined_p ()); + return m_type; +} + +inline wide_int +prange::lower_bound () const +{ + gcc_checking_assert (!undefined_p ()); + return m_min; +} + +inline wide_int +prange::upper_bound () const +{ + gcc_checking_assert (!undefined_p ()); + return m_max; +} + +inline bool +prange::varying_compatible_p () const +{ + return (!undefined_p () + && m_min == 0 && m_max == -1 && get_bitmask ().unknown_p ()); +} + +inline irange_bitmask +prange::get_bitmask () const +{ + return m_bitmask; +} + +inline bool +prange::fits_p (const vrange &) const +{ + return true; +} + + inline frange::frange () : vrange (VR_FRANGE) -- 2.44.0 [-- Attachment #4: irange_get_bitmask_with_patch.s --] [-- Type: application/octet-stream, Size: 2925 bytes --] .globl _ZNK6irange11get_bitmaskEv .type _ZNK6irange11get_bitmaskEv, @function _ZNK6irange11get_bitmaskEv: .LFB3242: .cfi_startproc pushq %r13 .cfi_def_cfa_offset 16 .cfi_offset 13, -16 movq %rdi, %r13 pushq %r12 .cfi_def_cfa_offset 24 .cfi_offset 12, -24 pushq %rbp .cfi_def_cfa_offset 32 .cfi_offset 6, -32 pushq %rbx .cfi_def_cfa_offset 40 .cfi_offset 3, -40 movq %rsi, %rbx subq $168, %rsp .cfi_def_cfa_offset 208 movzbl 10(%rsi), %eax movq 184(%rsi), %r12 leal -1(%rax,%rax), %eax leaq (%rax,%rax,4), %rbp salq $4, %rbp addq %r12, %rbp movdqu 0(%rbp), %xmm0 movaps %xmm0, 80(%rsp) movdqu 16(%rbp), %xmm0 movaps %xmm0, 96(%rsp) movdqu 32(%rbp), %xmm0 movaps %xmm0, 112(%rsp) movdqu 48(%rbp), %xmm0 movaps %xmm0, 128(%rsp) movdqu 64(%rbp), %xmm0 movaps %xmm0, 144(%rsp) movl 156(%rsp), %eax cmpl $576, %eax ja .L2460 .L2448: movdqu (%r12), %xmm0 movaps %xmm0, (%rsp) movdqu 16(%r12), %xmm0 movaps %xmm0, 16(%rsp) movdqu 32(%r12), %xmm0 movaps %xmm0, 32(%rsp) movdqu 48(%r12), %xmm0 movaps %xmm0, 48(%rsp) movdqu 64(%r12), %xmm0 movaps %xmm0, 64(%rsp) movl 76(%rsp), %eax cmpl $576, %eax ja .L2461 .L2449: movq (%rbx), %rax movq 16(%rax), %rax cmpq $_ZNK6irange4typeEv, %rax jne .L2450 movq 16(%rbx), %rax .L2451: movzwl 54(%rax), %esi leaq 80(%rsp), %rcx movq %rsp, %rdx movq %r13, %rdi call _ZL22get_bitmask_from_rangeP9tree_nodeRK16generic_wide_intI16wide_int_storageES5_.isra.0 cmpl $576, 76(%rsp) ja .L2462 cmpl $576, 156(%rsp) ja .L2463 .L2453: cmpl $576, 180(%rbx) movl 176(%rbx), %eax leaq 104(%rbx), %rdx ja .L2464 .L2455: cmpl $1, %eax jne .L2458 cmpq $-1, (%rdx) je .L2447 .L2458: leaq 24(%rbx), %rsi movq %r13, %rdi call _ZN14irange_bitmask9intersectERKS_ .L2447: addq $168, %rsp .cfi_remember_state .cfi_def_cfa_offset 40 movq %r13, %rax popq %rbx .cfi_def_cfa_offset 32 popq %rbp .cfi_def_cfa_offset 24 popq %r12 .cfi_def_cfa_offset 16 popq %r13 .cfi_def_cfa_offset 8 ret .p2align 4,,10 .p2align 3 .L2450: .cfi_restore_state movq %rbx, %rdi call *%rax jmp .L2451 .p2align 4,,10 .p2align 3 .L2461: leal 63(%rax), %edi shrl $6, %edi salq $3, %rdi call xmalloc movl 72(%rsp), %edx movq (%r12), %rsi movq %rax, %rdi movq %rax, (%rsp) salq $3, %rdx call memcpy jmp .L2449 .p2align 4,,10 .p2align 3 .L2462: movq (%rsp), %rdi call free cmpl $576, 156(%rsp) jbe .L2453 .p2align 4,,10 .p2align 3 .L2463: movq 80(%rsp), %rdi call free jmp .L2453 .p2align 4,,10 .p2align 3 .L2464: movq 104(%rbx), %rdx jmp .L2455 .p2align 4,,10 .p2align 3 .L2460: leal 63(%rax), %edi shrl $6, %edi salq $3, %rdi call xmalloc movl 152(%rsp), %edx movq 0(%rbp), %rsi movq %rax, %rdi movq %rax, 80(%rsp) salq $3, %rdx call memcpy movq 184(%rbx), %r12 jmp .L2448 .cfi_endproc .LFE3242: .size _ZNK6irange11get_bitmaskEv, .-_ZNK6irange11get_bitmaskEv .section .rodata.str1.1 .LC38: .string "add_vrange" [-- Attachment #5: irange_get_bitmask_without_patch.s --] [-- Type: application/octet-stream, Size: 10013 bytes --] .globl _ZNK6irange11get_bitmaskEv .type _ZNK6irange11get_bitmaskEv, @function _ZNK6irange11get_bitmaskEv: .LFB3197: .cfi_startproc pushq %r15 .cfi_def_cfa_offset 16 .cfi_offset 15, -16 pushq %r14 .cfi_def_cfa_offset 24 .cfi_offset 14, -24 pushq %r13 .cfi_def_cfa_offset 32 .cfi_offset 13, -32 pushq %r12 .cfi_def_cfa_offset 40 .cfi_offset 12, -40 pushq %rbp .cfi_def_cfa_offset 48 .cfi_offset 6, -48 movq %rdi, %rbp pushq %rbx .cfi_def_cfa_offset 56 .cfi_offset 3, -56 movq %rsi, %rbx subq $584, %rsp .cfi_def_cfa_offset 640 movzbl 10(%rsi), %eax movq 184(%rsi), %r13 leal -1(%rax,%rax), %eax leaq (%rax,%rax,4), %r12 salq $4, %r12 addq %r13, %r12 movdqu (%r12), %xmm0 movaps %xmm0, 96(%rsp) movdqu 16(%r12), %xmm0 movaps %xmm0, 112(%rsp) movdqu 32(%r12), %xmm0 movaps %xmm0, 128(%rsp) movdqu 48(%r12), %xmm0 movaps %xmm0, 144(%rsp) movdqu 64(%r12), %xmm0 movaps %xmm0, 160(%rsp) movl 172(%rsp), %eax cmpl $576, %eax ja .L1610 .L1532: movdqu 0(%r13), %xmm0 movaps %xmm0, 16(%rsp) movdqu 16(%r13), %xmm0 movaps %xmm0, 32(%rsp) movdqu 32(%r13), %xmm0 movaps %xmm0, 48(%rsp) movdqu 48(%r13), %xmm0 movaps %xmm0, 64(%rsp) movdqu 64(%r13), %xmm0 movaps %xmm0, 80(%rsp) movl 92(%rsp), %eax cmpl $576, %eax ja .L1611 .L1533: movq (%rbx), %rax movq 16(%rax), %rax cmpq $_ZNK6irange4typeEv, %rax jne .L1534 movq 16(%rbx), %rax .L1535: movl 92(%rsp), %r9d movzwl 54(%rax), %r14d movl 88(%rsp), %esi movl 172(%rsp), %r11d movl 168(%rsp), %r8d cmpl $576, %r9d ja .L1536 cmpl $576, %r11d ja .L1612 cmpl %r8d, %esi je .L1587 movl %r9d, 252(%rsp) leaq 16(%rsp), %r12 leaq 96(%rsp), %rcx leaq 176(%rsp), %r15 movq %r12, %rax movq %r15, %rdi .L1541: leal (%rsi,%r8), %edx cmpl $2, %edx jne .L1555 movl 252(%rsp), %edx movq (%rax), %rax xorq (%rcx), %rax movq %r15, %rcx movq %rax, (%rdi) movl $1, 248(%rsp) cmpl $576, %edx ja .L1613 .L1556: cmpq $0, (%rcx) jne .L1593 leaq 416(%rsp), %rax leaq 496(%rsp), %r13 movq %rax, (%rsp) .L1565: movq (%rsp), %rdi movq %r15, %rdx movq %r12, %rsi call _ZN2wi6bit_orI16generic_wide_intI16wide_int_storageES3_EENS_13binary_traitsIT_T0_XsrNS_10int_traitsIS5_EE14precision_typeEXsrNS7_IS6_EE14precision_typeEE11result_typeERKS5_RKS6_ leaq 352(%rsp), %rax movq $0, 352(%rsp) movq %rax, 336(%rsp) movl $1, 344(%rsp) movl %r14d, 348(%rsp) movl %r14d, 572(%rsp) cmpl $576, %r14d ja .L1614 movq $0, 496(%rsp) movl $1, %esi .L1582: movl $0, 76(%rbp) movq %rbp, %rdi movl $0, 156(%rbp) movl %esi, 568(%rsp) movq %r13, %rsi call _ZN16wide_int_storageaSERKS_.isra.0 movq (%rsp), %rsi leaq 80(%rbp), %rdi call _ZN16wide_int_storageaSERKS_.isra.0 movl global_options+3536(%rip), %eax testl %eax, %eax je .L1569 movl 156(%rbp), %eax cmpl %eax, 76(%rbp) jne .L1570 .L1569: cmpl $576, 572(%rsp) ja .L1615 .L1571: cmpl $576, 492(%rsp) ja .L1616 .L1572: cmpl $576, 252(%rsp) jbe .L1552 movq 176(%rsp), %rdi call free jmp .L1552 .p2align 4,,10 .p2align 3 .L1587: leaq 16(%rsp), %r12 leaq 96(%rsp), %rdi movq %r12, %rcx .L1540: xorl %eax, %eax jmp .L1545 .p2align 4,,10 .p2align 3 .L1618: addl $1, %eax cmpl %eax, %esi je .L1617 .L1545: movl %eax, %edx movq (%rdi,%rdx,8), %r10 cmpq %r10, (%rcx,%rdx,8) je .L1618 movl %r9d, 252(%rsp) cmpl $576, %r9d ja .L1543 leaq 176(%rsp), %r15 movq %r12, %rax movq %r15, %rdi .p2align 4,,10 .p2align 3 .L1554: leaq 96(%rsp), %rcx cmpl $576, %r11d jbe .L1541 .p2align 4,,10 .p2align 3 .L1539: movq 96(%rsp), %rcx jmp .L1541 .p2align 4,,10 .p2align 3 .L1617: leaq 512(%rsp), %rax movl %r14d, 508(%rsp) movq $0, 512(%rsp) movq %rax, 496(%rsp) movl $1, 504(%rsp) movl %r14d, 332(%rsp) cmpl $576, %r14d ja .L1619 movl $1, %esi leaq 256(%rsp), %r13 movq $0, 256(%rsp) .L1583: leaq 336(%rsp), %r14 movl %esi, 328(%rsp) movq %r12, %rsi movq %r14, %rdi call _ZN16wide_int_storageC2ERKS_ movl $0, 76(%rbp) movq %r14, %rsi movq %rbp, %rdi movl $0, 156(%rbp) call _ZN16wide_int_storageaSERKS_.isra.0 leaq 80(%rbp), %rdi movq %r13, %rsi call _ZN16wide_int_storageaSERKS_.isra.0 movl global_options+3536(%rip), %edx testl %edx, %edx je .L1549 movl 156(%rbp), %eax cmpl %eax, 76(%rbp) jne .L1570 .L1549: cmpl $576, 412(%rsp) ja .L1620 .L1550: cmpl $576, 332(%rsp) ja .L1621 .L1552: cmpl $576, 92(%rsp) ja .L1622 .L1574: cmpl $576, 172(%rsp) ja .L1623 .L1575: cmpl $576, 180(%rbx) movl 176(%rbx), %eax leaq 104(%rbx), %rdx ja .L1624 .L1577: cmpl $1, %eax jne .L1580 cmpq $-1, (%rdx) je .L1531 .L1580: leaq 24(%rbx), %rsi movq %rbp, %rdi call _ZN14irange_bitmask9intersectERKS_ .L1531: addq $584, %rsp .cfi_remember_state .cfi_def_cfa_offset 56 movq %rbp, %rax popq %rbx .cfi_def_cfa_offset 48 popq %rbp .cfi_def_cfa_offset 40 popq %r12 .cfi_def_cfa_offset 32 popq %r13 .cfi_def_cfa_offset 24 popq %r14 .cfi_def_cfa_offset 16 popq %r15 .cfi_def_cfa_offset 8 ret .p2align 4,,10 .p2align 3 .L1534: .cfi_restore_state movq %rbx, %rdi call *%rax jmp .L1535 .p2align 4,,10 .p2align 3 .L1593: movl $1, %eax .L1564: movl %eax, 424(%rsp) leaq 416(%rsp), %rax leaq 496(%rsp), %r13 movq %rax, %rdi movq %rcx, 416(%rsp) movl %edx, 428(%rsp) movq %rax, (%rsp) call _ZN2wi3clzERK16generic_wide_intI20wide_int_ref_storageILb0ELb1EEE movl %r14d, %esi movl %r14d, 572(%rsp) subl %eax, %esi cmpl $576, %r14d ja .L1625 .L1558: movq %r13, %rdi .L1559: movl %r14d, %ecx xorl %edx, %edx call _ZN2wi4maskEPljbj movl 572(%rsp), %edx movl %eax, %ecx movl %eax, 568(%rsp) sall $6, %ecx cmpl %ecx, %edx jnb .L1560 movq %r13, %rcx cmpl $576, %edx ja .L1626 .L1561: subl $1, %eax andl $63, %edx leaq (%rcx,%rax,8), %rsi movl $64, %ecx movq (%rsi), %rax subl %edx, %ecx salq %cl, %rax sarq %cl, %rax movq %rax, (%rsi) .L1560: movq %r13, %rsi movq %r15, %rdi call _ZN16wide_int_storageaSERKS_.isra.0 cmpl $576, 572(%rsp) jbe .L1565 movq 496(%rsp), %rdi call free jmp .L1565 .p2align 4,,10 .p2align 3 .L1536: movq 16(%rsp), %rcx cmpl $576, %r11d ja .L1627 cmpl %r8d, %esi jne .L1581 leaq 96(%rsp), %rdi leaq 16(%rsp), %r12 jmp .L1540 .L1627: leaq 16(%rsp), %r12 cmpl %r8d, %esi je .L1538 .p2align 4,,10 .p2align 3 .L1581: movl %r9d, 252(%rsp) leaq 16(%rsp), %r12 .L1543: leal 63(%r9), %edi leaq 176(%rsp), %r15 shrl $6, %edi salq $3, %rdi call xmalloc movl 88(%rsp), %esi movl 252(%rsp), %r9d movq %rax, 176(%rsp) movl 92(%rsp), %edx movq %rax, %rdi movl 168(%rsp), %r8d movl 172(%rsp), %r11d cmpl $576, %r9d jbe .L1607 .L1553: movq %r12, %rax cmpl $576, %edx jbe .L1554 movq 16(%rsp), %rax jmp .L1554 .p2align 4,,10 .p2align 3 .L1624: movq 104(%rbx), %rdx jmp .L1577 .p2align 4,,10 .p2align 3 .L1623: movq 96(%rsp), %rdi call free jmp .L1575 .p2align 4,,10 .p2align 3 .L1622: movq 16(%rsp), %rdi call free jmp .L1574 .p2align 4,,10 .p2align 3 .L1610: leal 63(%rax), %edi shrl $6, %edi salq $3, %rdi call xmalloc movl 168(%rsp), %edx movq (%r12), %rsi movq %rax, %rdi movq %rax, 96(%rsp) salq $3, %rdx call memcpy movq 184(%rbx), %r13 jmp .L1532 .p2align 4,,10 .p2align 3 .L1611: leal 63(%rax), %edi shrl $6, %edi salq $3, %rdi call xmalloc movl 88(%rsp), %edx movq 0(%r13), %rsi movq %rax, %rdi movq %rax, 16(%rsp) salq $3, %rdx call memcpy jmp .L1533 .p2align 4,,10 .p2align 3 .L1612: cmpl %r8d, %esi je .L1586 leaq 176(%rsp), %r15 leaq 16(%rsp), %r12 movl %r9d, 252(%rsp) movq %r15, %rdi movq %r12, %rax jmp .L1539 .p2align 4,,10 .p2align 3 .L1614: leal 63(%r14), %edi shrl $6, %edi salq $3, %rdi call xmalloc movl 344(%rsp), %esi cmpl $576, 572(%rsp) movq %rax, 496(%rsp) movq 336(%rsp), %rdi jbe .L1628 .L1567: xorl %edx, %edx .p2align 4,,10 .p2align 3 .L1568: movq (%rdi,%rdx,8), %rcx movq %rcx, (%rax,%rdx,8) addq $1, %rdx cmpl %esi, %edx jb .L1568 jmp .L1582 .p2align 4,,10 .p2align 3 .L1621: movq 256(%rsp), %rdi call free jmp .L1552 .p2align 4,,10 .p2align 3 .L1620: movq 336(%rsp), %rdi call free jmp .L1550 .p2align 4,,10 .p2align 3 .L1619: leal 63(%r14), %edi leaq 256(%rsp), %r13 shrl $6, %edi salq $3, %rdi call xmalloc movl 504(%rsp), %esi cmpl $576, 332(%rsp) movq %rax, 256(%rsp) movq 496(%rsp), %rdi jbe .L1606 .L1547: xorl %edx, %edx .p2align 4,,10 .p2align 3 .L1548: movq (%rdi,%rdx,8), %rcx movq %rcx, (%rax,%rdx,8) addq $1, %rdx cmpl %esi, %edx jb .L1548 jmp .L1583 .p2align 4,,10 .p2align 3 .L1613: movq 176(%rsp), %rcx jmp .L1556 .p2align 4,,10 .p2align 3 .L1555: movl %esi, %edx movq %rax, %rsi call _ZN2wi9xor_largeEPlPKljS2_jj movl 252(%rsp), %edx movl %eax, 248(%rsp) cmpl $576, %edx ja .L1629 movq %r15, %rcx .L1557: cmpl $1, %eax jne .L1564 jmp .L1556 .p2align 4,,10 .p2align 3 .L1616: movq 416(%rsp), %rdi call free jmp .L1572 .p2align 4,,10 .p2align 3 .L1615: movq 496(%rsp), %rdi call free jmp .L1571 .L1586: leaq 16(%rsp), %r12 movq %r12, %rcx .L1538: movq 96(%rsp), %rdi jmp .L1540 .L1625: leal 63(%r14), %edi movl %esi, 12(%rsp) shrl $6, %edi salq $3, %rdi call xmalloc movl 12(%rsp), %esi cmpl $576, 572(%rsp) movq %rax, 496(%rsp) movq %rax, %rdi ja .L1559 jmp .L1558 .p2align 4,,10 .p2align 3 .L1626: movq 496(%rsp), %rcx jmp .L1561 .L1607: movq %r15, %rdi jmp .L1553 .L1628: movq %r13, %rax jmp .L1567 .L1629: movq 176(%rsp), %rcx jmp .L1557 .L1606: movq %r13, %rax jmp .L1547 .cfi_endproc .section .text.unlikely .cfi_startproc .type _ZNK6irange11get_bitmaskEv.cold, @function _ZNK6irange11get_bitmaskEv.cold: .LFSB3197: .L1570: .cfi_def_cfa_offset 640 .cfi_offset 3, -56 .cfi_offset 6, -48 .cfi_offset 12, -40 .cfi_offset 13, -32 .cfi_offset 14, -24 .cfi_offset 15, -16 call _ZNK14irange_bitmask11verify_maskEv.part.0 .cfi_endproc .LFE3197: .text .size _ZNK6irange11get_bitmaskEv, .-_ZNK6irange11get_bitmaskEv .section .text.unlikely .size _ZNK6irange11get_bitmaskEv.cold, .-_ZNK6irange11get_bitmaskEv.cold .LCOLDE33: .text .LHOTE33: .section .rodata.str1.1 .LC34: .string "add_vrange" ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 1.76% performance loss in VRP due to inlining 2024-04-26 9:42 1.76% performance loss in VRP due to inlining Aldy Hernandez @ 2024-04-30 7:57 ` Richard Biener 2024-04-30 8:21 ` Aldy Hernandez 2024-04-30 8:53 ` Martin Jambor 2024-04-30 19:09 ` Jason Merrill 2 siblings, 1 reply; 11+ messages in thread From: Richard Biener @ 2024-04-30 7:57 UTC (permalink / raw) To: Aldy Hernandez; +Cc: GCC Mailing List, MacLeod, Andrew On Fri, Apr 26, 2024 at 11:45 AM Aldy Hernandez via Gcc <gcc@gcc.gnu.org> wrote: > > Hi folks! > > In implementing prange (pointer ranges), I have found a 1.74% slowdown > in VRP, even without any code path actually using the code. I have > tracked this down to irange::get_bitmask() being compiled differently > with and without the bare bones patch. With the patch, > irange::get_bitmask() has a lot of code inlined into it, particularly > get_bitmask_from_range() and consequently the wide_int_storage code. > > I don't know whether this is expected behavior, and if it is, how to > mitigate it. I have tried declaring get_bitmask_from_range() inline, > but that didn't help. OTOH, using __attribute__((always_inline)) > helps a bit, but not entirely. What does help is inlining > irange::get_bitmask() entirely, but that seems like a big hammer. You can use -Winline to see why we don't inline an inline declared function. I would guess the unit-growth limit kicks in? Did you check a release checking compiler? That might still inline things. > The overall slowdown in compilation is 0.26%, because VRP is a > relatively fast pass, but a measurable pass slowdown is something we'd > like to avoid. > > What's the recommended approach here? > > For the curious, I am attaching before and after copies of > value-range.s. I am also attaching the two patches needed to > reproduce the problem on mainline. The first patch is merely setup. > It is the second patch that exhibits the problem. Notice there are no > uses of prange yet. > > Thanks. > Aldy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 1.76% performance loss in VRP due to inlining 2024-04-30 7:57 ` Richard Biener @ 2024-04-30 8:21 ` Aldy Hernandez 0 siblings, 0 replies; 11+ messages in thread From: Aldy Hernandez @ 2024-04-30 8:21 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Mailing List, MacLeod, Andrew On Tue, Apr 30, 2024 at 9:58 AM Richard Biener <richard.guenther@gmail.com> wrote: > > On Fri, Apr 26, 2024 at 11:45 AM Aldy Hernandez via Gcc <gcc@gcc.gnu.org> wrote: > > > > Hi folks! > > > > In implementing prange (pointer ranges), I have found a 1.74% slowdown > > in VRP, even without any code path actually using the code. I have > > tracked this down to irange::get_bitmask() being compiled differently > > with and without the bare bones patch. With the patch, > > irange::get_bitmask() has a lot of code inlined into it, particularly > > get_bitmask_from_range() and consequently the wide_int_storage code. > > > > I don't know whether this is expected behavior, and if it is, how to > > mitigate it. I have tried declaring get_bitmask_from_range() inline, > > but that didn't help. OTOH, using __attribute__((always_inline)) > > helps a bit, but not entirely. What does help is inlining > > irange::get_bitmask() entirely, but that seems like a big hammer. > > You can use -Winline to see why we don't inline an inline declared > function. I would guess the unit-growth limit kicks in? Ah, will do. Thanks. > > Did you check a release checking compiler? That might still > inline things. Yes, we only measure performance with release builds. Aldy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 1.76% performance loss in VRP due to inlining 2024-04-26 9:42 1.76% performance loss in VRP due to inlining Aldy Hernandez 2024-04-30 7:57 ` Richard Biener @ 2024-04-30 8:53 ` Martin Jambor 2024-04-30 19:09 ` Jason Merrill 2 siblings, 0 replies; 11+ messages in thread From: Martin Jambor @ 2024-04-30 8:53 UTC (permalink / raw) To: Aldy Hernandez, GCC Mailing List; +Cc: MacLeod, Andrew Hi, On Fri, Apr 26 2024, Aldy Hernandez via Gcc wrote: > Hi folks! > > In implementing prange (pointer ranges), I have found a 1.74% slowdown > in VRP, even without any code path actually using the code. I have > tracked this down to irange::get_bitmask() being compiled differently > with and without the bare bones patch. With the patch, > irange::get_bitmask() has a lot of code inlined into it, particularly > get_bitmask_from_range() and consequently the wide_int_storage code. > > I don't know whether this is expected behavior, and if it is, how to > mitigate it. I have tried declaring get_bitmask_from_range() inline, > but that didn't help. OTOH, using __attribute__((always_inline)) > helps a bit, but not entirely. What does help is inlining > irange::get_bitmask() entirely, but that seems like a big hammer. > > The overall slowdown in compilation is 0.26%, because VRP is a > relatively fast pass, but a measurable pass slowdown is something we'd > like to avoid. > > What's the recommended approach here? I'm afraid that the right approach (not sure if that also means the recommended approach) is to figure out why inlining irange::get_bitmask() helps, i.e. what unnecessary computations or memory accesses it avoids or which other subsequent optimizations it enables, etc. Then we can have a look whether IPA could facilitate this without inlining (or if eventually code shrinks to a reasonable size, how to teach the inliner to predict this). Martin > > For the curious, I am attaching before and after copies of > value-range.s. I am also attaching the two patches needed to > reproduce the problem on mainline. The first patch is merely setup. > It is the second patch that exhibits the problem. Notice there are no > uses of prange yet. > > Thanks. > Aldy > From ee63833c5f56064ef47c2bb9debd485f77d00171 Mon Sep 17 00:00:00 2001 > From: Aldy Hernandez <aldyh@redhat.com> > Date: Tue, 19 Mar 2024 18:04:55 +0100 > Subject: [PATCH] Move get_bitmask_from_range out of irange class. > > --- > gcc/value-range.cc | 52 +++++++++++++++++++++++----------------------- > gcc/value-range.h | 1 - > 2 files changed, 26 insertions(+), 27 deletions(-) > > diff --git a/gcc/value-range.cc b/gcc/value-range.cc > index 70375f7abf9..0f81ce32615 100644 > --- a/gcc/value-range.cc > +++ b/gcc/value-range.cc > @@ -31,6 +31,30 @@ along with GCC; see the file COPYING3. If not see > #include "fold-const.h" > #include "gimple-range.h" > > +// Return the bitmask inherent in a range. > + > +static irange_bitmask > +get_bitmask_from_range (tree type, > + const wide_int &min, const wide_int &max) > +{ > + unsigned prec = TYPE_PRECISION (type); > + > + // All the bits of a singleton are known. > + if (min == max) > + { > + wide_int mask = wi::zero (prec); > + wide_int value = min; > + return irange_bitmask (value, mask); > + } > + > + wide_int xorv = min ^ max; > + > + if (xorv != 0) > + xorv = wi::mask (prec - wi::clz (xorv), false, prec); > + > + return irange_bitmask (wi::zero (prec), min | xorv); > +} > + > void > irange::accept (const vrange_visitor &v) const > { > @@ -1832,31 +1856,6 @@ irange::invert () > verify_range (); > } > > -// Return the bitmask inherent in the range. > - > -irange_bitmask > -irange::get_bitmask_from_range () const > -{ > - unsigned prec = TYPE_PRECISION (type ()); > - wide_int min = lower_bound (); > - wide_int max = upper_bound (); > - > - // All the bits of a singleton are known. > - if (min == max) > - { > - wide_int mask = wi::zero (prec); > - wide_int value = lower_bound (); > - return irange_bitmask (value, mask); > - } > - > - wide_int xorv = min ^ max; > - > - if (xorv != 0) > - xorv = wi::mask (prec - wi::clz (xorv), false, prec); > - > - return irange_bitmask (wi::zero (prec), min | xorv); > -} > - > // Remove trailing ranges that this bitmask indicates can't exist. > > void > @@ -1978,7 +1977,8 @@ irange::get_bitmask () const > // in the mask. > // > // See also the note in irange_bitmask::intersect. > - irange_bitmask bm = get_bitmask_from_range (); > + irange_bitmask bm > + = get_bitmask_from_range (type (), lower_bound (), upper_bound ()); > if (!m_bitmask.unknown_p ()) > bm.intersect (m_bitmask); > return bm; > diff --git a/gcc/value-range.h b/gcc/value-range.h > index 9531df56988..dc5b153a83e 100644 > --- a/gcc/value-range.h > +++ b/gcc/value-range.h > @@ -351,7 +351,6 @@ private: > bool varying_compatible_p () const; > bool intersect_bitmask (const irange &r); > bool union_bitmask (const irange &r); > - irange_bitmask get_bitmask_from_range () const; > bool set_range_from_bitmask (); > > bool intersect (const wide_int& lb, const wide_int& ub); > -- > 2.44.0 > > From 03c70de43177a97ec5e9c243aafc798c1f37e6d8 Mon Sep 17 00:00:00 2001 > From: Aldy Hernandez <aldyh@redhat.com> > Date: Wed, 20 Mar 2024 06:25:52 +0100 > Subject: [PATCH] Implement minimum prange class exhibiting VRP slowdown. > > --- > gcc/value-range-pretty-print.cc | 25 +++ > gcc/value-range-pretty-print.h | 1 + > gcc/value-range.cc | 274 ++++++++++++++++++++++++++++++++ > gcc/value-range.h | 196 +++++++++++++++++++++++ > 4 files changed, 496 insertions(+) > > diff --git a/gcc/value-range-pretty-print.cc b/gcc/value-range-pretty-print.cc > index c75cbea3955..154253e047f 100644 > --- a/gcc/value-range-pretty-print.cc > +++ b/gcc/value-range-pretty-print.cc > @@ -113,6 +113,31 @@ vrange_printer::print_irange_bitmasks (const irange &r) const > pp_string (pp, p); > } > > +void > +vrange_printer::visit (const prange &r) const > +{ > + pp_string (pp, "[prange] "); > + if (r.undefined_p ()) > + { > + pp_string (pp, "UNDEFINED"); > + return; > + } > + dump_generic_node (pp, r.type (), 0, TDF_NONE | TDF_NOUID, false); > + pp_character (pp, ' '); > + if (r.varying_p ()) > + { > + pp_string (pp, "VARYING"); > + return; > + } > + > + pp_character (pp, '['); > + //print_int_bound (pp, r.lower_bound (), r.type ()); > + pp_string (pp, ", "); > + //print_int_bound (pp, r.upper_bound (), r.type ()); > + pp_character (pp, ']'); > + //print_irange_bitmasks (pp, r.m_bitmask); > +} > + > void > vrange_printer::print_real_value (tree type, const REAL_VALUE_TYPE &r) const > { > diff --git a/gcc/value-range-pretty-print.h b/gcc/value-range-pretty-print.h > index ca85fd6157c..54ee0cf8c26 100644 > --- a/gcc/value-range-pretty-print.h > +++ b/gcc/value-range-pretty-print.h > @@ -27,6 +27,7 @@ public: > vrange_printer (pretty_printer *pp_) : pp (pp_) { } > void visit (const unsupported_range &) const override; > void visit (const irange &) const override; > + void visit (const prange &) const override; > void visit (const frange &) const override; > private: > void print_irange_bound (const wide_int &w, tree type) const; > diff --git a/gcc/value-range.cc b/gcc/value-range.cc > index 0f81ce32615..06ab1a616bf 100644 > --- a/gcc/value-range.cc > +++ b/gcc/value-range.cc > @@ -377,6 +377,280 @@ irange::set_nonnegative (tree type) > wi::to_wide (TYPE_MAX_VALUE (type))); > } > > +// Prange implementation. > + > +void > +prange::accept (const vrange_visitor &v) const > +{ > + v.visit (*this); > +} > + > +void > +prange::set_nonnegative (tree type) > +{ > + set (type, > + wi::zero (TYPE_PRECISION (type)), > + wi::max_value (TYPE_PRECISION (type), UNSIGNED)); > +} > + > +void > +prange::set (tree min, tree max, value_range_kind kind) > +{ > + return set (TREE_TYPE (min), wi::to_wide (min), wi::to_wide (max), kind); > +} > + > +void > +prange::set (tree type, const wide_int &min, const wide_int &max, > + value_range_kind kind) > +{ > + if (kind == VR_UNDEFINED) > + { > + set_undefined (); > + return; > + } > + if (kind == VR_VARYING) > + { > + set_varying (type); > + return; > + } > + if (kind == VR_ANTI_RANGE) > + { > + gcc_checking_assert (min == 0 && max == 0); > + set_nonzero (type); > + return; > + } > + m_type = type; > + m_min = min; > + m_max = max; > + if (m_min == 0 && m_max == -1) > + { > + m_kind = VR_VARYING; > + m_bitmask.set_unknown (TYPE_PRECISION (type)); > + if (flag_checking) > + verify_range (); > + return; > + } > + > + m_kind = VR_RANGE; > + m_bitmask = get_bitmask_from_range (type, min, max); > + if (flag_checking) > + verify_range (); > +} > + > +bool > +prange::contains_p (const wide_int &w) const > +{ > + if (undefined_p ()) > + return false; > + > + if (varying_p ()) > + return true; > + > + return (wi::le_p (lower_bound (), w, UNSIGNED) > + && wi::ge_p (upper_bound (), w, UNSIGNED)); > +} > + > +bool > +prange::singleton_p (tree *result) const > +{ > + if (m_kind == VR_RANGE && lower_bound () == upper_bound ()) > + { > + if (result) > + *result = wide_int_to_tree (type (), m_min); > + return true; > + } > + return false; > +} > + > +bool > +prange::union_ (const vrange &v) > +{ > + const prange &r = as_a <prange> (v); > + > + if (r.undefined_p ()) > + return false; > + if (undefined_p ()) > + { > + *this = r; > + if (flag_checking) > + verify_range (); > + return true; > + } > + if (varying_p ()) > + return false; > + if (r.varying_p ()) > + { > + set_varying (type ()); > + return true; > + } > + > + wide_int new_lb = wi::min (r.lower_bound (), lower_bound (), UNSIGNED); > + wide_int new_ub = wi::max (r.upper_bound (), upper_bound (), UNSIGNED); > + prange new_range (type (), new_lb, new_ub); > + new_range.m_bitmask.union_ (m_bitmask); > + new_range.m_bitmask.union_ (r.m_bitmask); > + if (new_range.varying_compatible_p ()) > + { > + set_varying (type ()); > + return true; > + } > + if (flag_checking) > + new_range.verify_range (); > + if (new_range == *this) > + return false; > + *this = new_range; > + return true; > +} > + > +bool > +prange::intersect (const vrange &v) > +{ > + const prange &r = as_a <prange> (v); > + gcc_checking_assert (undefined_p () || r.undefined_p () > + || range_compatible_p (type (), r.type ())); > + > + if (undefined_p ()) > + return false; > + if (r.undefined_p ()) > + { > + set_undefined (); > + return true; > + } > + if (r.varying_p ()) > + return false; > + if (varying_p ()) > + { > + *this = r; > + return true; > + } > + > + prange save = *this; > + m_min = wi::max (r.lower_bound (), lower_bound (), UNSIGNED); > + m_max = wi::min (r.upper_bound (), upper_bound (), UNSIGNED); > + if (wi::gt_p (m_min, m_max, UNSIGNED)) > + { > + set_undefined (); > + return true; > + } > + > + // Intersect all bitmasks: the old one, the new one, and the other operand's. > + irange_bitmask new_bitmask = get_bitmask_from_range (m_type, m_min, m_max); > + m_bitmask.intersect (new_bitmask); > + m_bitmask.intersect (r.m_bitmask); > + > + if (flag_checking) > + verify_range (); > + if (*this == save) > + return false; > + return true; > +} > + > +prange & > +prange::operator= (const prange &src) > +{ > + m_type = src.m_type; > + m_kind = src.m_kind; > + m_min = src.m_min; > + m_max = src.m_max; > + m_bitmask = src.m_bitmask; > + if (flag_checking) > + verify_range (); > + return *this; > +} > + > +bool > +prange::operator== (const prange &src) const > +{ > + if (m_kind == src.m_kind) > + { > + if (undefined_p ()) > + return true; > + > + if (varying_p ()) > + return types_compatible_p (type (), src.type ()); > + > + return (m_min == src.m_min && m_max == src.m_max > + && m_bitmask == src.m_bitmask); > + } > + return false; > +} > + > +void > +prange::invert () > +{ > + gcc_checking_assert (!undefined_p () && !varying_p ()); > + > + wide_int new_lb, new_ub; > + unsigned prec = TYPE_PRECISION (type ()); > + wide_int type_min = wi::zero (prec); > + wide_int type_max = wi::max_value (prec, UNSIGNED); > + wi::overflow_type ovf; > + > + if (lower_bound () == type_min) > + { > + new_lb = wi::add (upper_bound (), 1, UNSIGNED, &ovf); > + if (ovf) > + new_lb = type_min; > + new_ub = type_max; > + set (type (), new_lb, new_ub); > + } > + else if (upper_bound () == type_max) > + { > + wi::overflow_type ovf; > + new_lb = type_min; > + new_ub = wi::sub (lower_bound (), 1, UNSIGNED, &ovf); > + if (ovf) > + new_ub = type_max; > + set (type (), new_lb, new_ub); > + } > + else > + set_varying (type ()); > +} > + > +void > +prange::verify_range () const > +{ > + gcc_checking_assert (m_discriminator == VR_PRANGE); > + > + if (m_kind == VR_UNDEFINED) > + return; > + > + gcc_checking_assert (supports_p (type ())); > + > + if (m_kind == VR_VARYING) > + { > + gcc_checking_assert (varying_compatible_p ()); > + return; > + } > + gcc_checking_assert (!varying_compatible_p ()); > + gcc_checking_assert (m_kind == VR_RANGE); > +} > + > +void > +prange::update_bitmask (const irange_bitmask &bm) > +{ > + gcc_checking_assert (!undefined_p ()); > + > + // If all the bits are known, this is a singleton. > + if (bm.mask () == 0) > + { > + set (type (), m_bitmask.value (), m_bitmask.value ()); > + return; > + } > + > + // Drop VARYINGs with known bits to a plain range. > + if (m_kind == VR_VARYING && !bm.unknown_p ()) > + m_kind = VR_RANGE; > + > + m_bitmask = bm; > + if (varying_compatible_p ()) > + m_kind = VR_VARYING; > + > + if (flag_checking) > + verify_range (); > +} > + > + > void > frange::accept (const vrange_visitor &v) const > { > diff --git a/gcc/value-range.h b/gcc/value-range.h > index dc5b153a83e..9fac89a2f98 100644 > --- a/gcc/value-range.h > +++ b/gcc/value-range.h > @@ -47,6 +47,8 @@ enum value_range_discriminator > { > // Range holds an integer or pointer. > VR_IRANGE, > + // Pointer range. > + VR_PRANGE, > // Floating point range. > VR_FRANGE, > // Range holds an unsupported type. > @@ -389,6 +391,54 @@ private: > wide_int m_ranges[N*2]; > }; > > +class prange : public vrange > +{ > + friend class prange_storage; > + friend class vrange_printer; > +public: > + prange (); > + prange (const prange &); > + prange (tree type); > + prange (tree type, const wide_int &, const wide_int &, > + value_range_kind = VR_RANGE); > + static bool supports_p (const_tree type); > + virtual bool supports_type_p (const_tree type) const final override; > + virtual void accept (const vrange_visitor &v) const final override; > + virtual void set_undefined () final override; > + virtual void set_varying (tree type) final override; > + virtual void set_nonzero (tree type) final override; > + virtual void set_zero (tree type) final override; > + virtual void set_nonnegative (tree type) final override; > + virtual bool contains_p (tree cst) const final override; > + virtual bool fits_p (const vrange &v) const final override; > + virtual bool singleton_p (tree *result = NULL) const final override; > + virtual bool zero_p () const final override; > + virtual bool nonzero_p () const final override; > + virtual void set (tree, tree, value_range_kind = VR_RANGE) final override; > + virtual tree type () const final override; > + virtual bool union_ (const vrange &v) final override; > + virtual bool intersect (const vrange &v) final override; > + > + prange& operator= (const prange &); > + bool operator== (const prange &) const; > + void set (tree type, const wide_int &, const wide_int &, > + value_range_kind = VR_RANGE); > + void invert (); > + bool contains_p (const wide_int &) const; > + wide_int lower_bound () const; > + wide_int upper_bound () const; > + void verify_range () const; > + irange_bitmask get_bitmask () const; > + void update_bitmask (const irange_bitmask &); > +protected: > + bool varying_compatible_p () const; > + > + tree m_type; > + wide_int m_min; > + wide_int m_max; > + irange_bitmask m_bitmask; > +}; > + > // Unsupported temporaries may be created by ranger before it's known > // they're unsupported, or by vr_values::get_value_range. > > @@ -667,6 +717,7 @@ class vrange_visitor > { > public: > virtual void visit (const irange &) const { } > + virtual void visit (const prange &) const { } > virtual void visit (const frange &) const { } > virtual void visit (const unsupported_range &) const { } > }; > @@ -1196,6 +1247,151 @@ irange_val_max (const_tree type) > return wi::max_value (TYPE_PRECISION (type), TYPE_SIGN (type)); > } > > +inline > +prange::prange () > + : vrange (VR_PRANGE) > +{ > + set_undefined (); > +} > + > +inline > +prange::prange (const prange &r) > + : vrange (VR_PRANGE) > +{ > + *this = r; > +} > + > +inline > +prange::prange (tree type) > + : vrange (VR_PRANGE) > +{ > + set_varying (type); > +} > + > +inline > +prange::prange (tree type, const wide_int &lb, const wide_int &ub, > + value_range_kind kind) > + : vrange (VR_PRANGE) > +{ > + set (type, lb, ub, kind); > +} > + > +inline bool > +prange::supports_p (const_tree type) > +{ > + return POINTER_TYPE_P (type); > +} > + > +inline bool > +prange::supports_type_p (const_tree type) const > +{ > + return POINTER_TYPE_P (type); > +} > + > +inline void > +prange::set_undefined () > +{ > + m_kind = VR_UNDEFINED; > +} > + > +inline void > +prange::set_varying (tree type) > +{ > + m_kind = VR_VARYING; > + m_type = type; > + m_min = wi::zero (TYPE_PRECISION (type)); > + m_max = wi::max_value (TYPE_PRECISION (type), UNSIGNED); > + m_bitmask.set_unknown (TYPE_PRECISION (type)); > + > + if (flag_checking) > + verify_range (); > +} > + > +inline void > +prange::set_nonzero (tree type) > +{ > + m_kind = VR_RANGE; > + m_type = type; > + m_min = wi::one (TYPE_PRECISION (type)); > + m_max = wi::max_value (TYPE_PRECISION (type), UNSIGNED); > + m_bitmask.set_unknown (TYPE_PRECISION (type)); > + > + if (flag_checking) > + verify_range (); > +} > + > +inline void > +prange::set_zero (tree type) > +{ > + m_kind = VR_RANGE; > + m_type = type; > + wide_int zero = wi::zero (TYPE_PRECISION (type)); > + m_min = m_max = zero; > + m_bitmask = irange_bitmask (zero, zero); > + > + if (flag_checking) > + verify_range (); > +} > + > +inline bool > +prange::contains_p (tree cst) const > +{ > + return contains_p (wi::to_wide (cst)); > +} > + > +inline bool > +prange::zero_p () const > +{ > + return m_kind == VR_RANGE && m_min == 0 && m_max == 0; > +} > + > +inline bool > +prange::nonzero_p () const > +{ > + return m_kind == VR_RANGE && m_min == 1 && m_max == -1; > +} > + > +inline tree > +prange::type () const > +{ > + gcc_checking_assert (!undefined_p ()); > + return m_type; > +} > + > +inline wide_int > +prange::lower_bound () const > +{ > + gcc_checking_assert (!undefined_p ()); > + return m_min; > +} > + > +inline wide_int > +prange::upper_bound () const > +{ > + gcc_checking_assert (!undefined_p ()); > + return m_max; > +} > + > +inline bool > +prange::varying_compatible_p () const > +{ > + return (!undefined_p () > + && m_min == 0 && m_max == -1 && get_bitmask ().unknown_p ()); > +} > + > +inline irange_bitmask > +prange::get_bitmask () const > +{ > + return m_bitmask; > +} > + > +inline bool > +prange::fits_p (const vrange &) const > +{ > + return true; > +} > + > + > inline > frange::frange () > : vrange (VR_FRANGE) > -- > 2.44.0 > > .globl _ZNK6irange11get_bitmaskEv > .type _ZNK6irange11get_bitmaskEv, @function > _ZNK6irange11get_bitmaskEv: > .LFB3242: > .cfi_startproc > pushq %r13 > .cfi_def_cfa_offset 16 > .cfi_offset 13, -16 > movq %rdi, %r13 > pushq %r12 > .cfi_def_cfa_offset 24 > .cfi_offset 12, -24 > pushq %rbp > .cfi_def_cfa_offset 32 > .cfi_offset 6, -32 > pushq %rbx > .cfi_def_cfa_offset 40 > .cfi_offset 3, -40 > movq %rsi, %rbx > subq $168, %rsp > .cfi_def_cfa_offset 208 > movzbl 10(%rsi), %eax > movq 184(%rsi), %r12 > leal -1(%rax,%rax), %eax > leaq (%rax,%rax,4), %rbp > salq $4, %rbp > addq %r12, %rbp > movdqu 0(%rbp), %xmm0 > movaps %xmm0, 80(%rsp) > movdqu 16(%rbp), %xmm0 > movaps %xmm0, 96(%rsp) > movdqu 32(%rbp), %xmm0 > movaps %xmm0, 112(%rsp) > movdqu 48(%rbp), %xmm0 > movaps %xmm0, 128(%rsp) > movdqu 64(%rbp), %xmm0 > movaps %xmm0, 144(%rsp) > movl 156(%rsp), %eax > cmpl $576, %eax > ja .L2460 > .L2448: > movdqu (%r12), %xmm0 > movaps %xmm0, (%rsp) > movdqu 16(%r12), %xmm0 > movaps %xmm0, 16(%rsp) > movdqu 32(%r12), %xmm0 > movaps %xmm0, 32(%rsp) > movdqu 48(%r12), %xmm0 > movaps %xmm0, 48(%rsp) > movdqu 64(%r12), %xmm0 > movaps %xmm0, 64(%rsp) > movl 76(%rsp), %eax > cmpl $576, %eax > ja .L2461 > .L2449: > movq (%rbx), %rax > movq 16(%rax), %rax > cmpq $_ZNK6irange4typeEv, %rax > jne .L2450 > movq 16(%rbx), %rax > .L2451: > movzwl 54(%rax), %esi > leaq 80(%rsp), %rcx > movq %rsp, %rdx > movq %r13, %rdi > call _ZL22get_bitmask_from_rangeP9tree_nodeRK16generic_wide_intI16wide_int_storageES5_.isra.0 > cmpl $576, 76(%rsp) > ja .L2462 > cmpl $576, 156(%rsp) > ja .L2463 > .L2453: > cmpl $576, 180(%rbx) > movl 176(%rbx), %eax > leaq 104(%rbx), %rdx > ja .L2464 > .L2455: > cmpl $1, %eax > jne .L2458 > cmpq $-1, (%rdx) > je .L2447 > .L2458: > leaq 24(%rbx), %rsi > movq %r13, %rdi > call _ZN14irange_bitmask9intersectERKS_ > .L2447: > addq $168, %rsp > .cfi_remember_state > .cfi_def_cfa_offset 40 > movq %r13, %rax > popq %rbx > .cfi_def_cfa_offset 32 > popq %rbp > .cfi_def_cfa_offset 24 > popq %r12 > .cfi_def_cfa_offset 16 > popq %r13 > .cfi_def_cfa_offset 8 > ret > .p2align 4,,10 > .p2align 3 > .L2450: > .cfi_restore_state > movq %rbx, %rdi > call *%rax > jmp .L2451 > .p2align 4,,10 > .p2align 3 > .L2461: > leal 63(%rax), %edi > shrl $6, %edi > salq $3, %rdi > call xmalloc > movl 72(%rsp), %edx > movq (%r12), %rsi > movq %rax, %rdi > movq %rax, (%rsp) > salq $3, %rdx > call memcpy > jmp .L2449 > .p2align 4,,10 > .p2align 3 > .L2462: > movq (%rsp), %rdi > call free > cmpl $576, 156(%rsp) > jbe .L2453 > .p2align 4,,10 > .p2align 3 > .L2463: > movq 80(%rsp), %rdi > call free > jmp .L2453 > .p2align 4,,10 > .p2align 3 > .L2464: > movq 104(%rbx), %rdx > jmp .L2455 > .p2align 4,,10 > .p2align 3 > .L2460: > leal 63(%rax), %edi > shrl $6, %edi > salq $3, %rdi > call xmalloc > movl 152(%rsp), %edx > movq 0(%rbp), %rsi > movq %rax, %rdi > movq %rax, 80(%rsp) > salq $3, %rdx > call memcpy > movq 184(%rbx), %r12 > jmp .L2448 > .cfi_endproc > .LFE3242: > .size _ZNK6irange11get_bitmaskEv, .-_ZNK6irange11get_bitmaskEv > .section .rodata.str1.1 > .LC38: > .string "add_vrange" > .globl _ZNK6irange11get_bitmaskEv > .type _ZNK6irange11get_bitmaskEv, @function > _ZNK6irange11get_bitmaskEv: > .LFB3197: > .cfi_startproc > pushq %r15 > .cfi_def_cfa_offset 16 > .cfi_offset 15, -16 > pushq %r14 > .cfi_def_cfa_offset 24 > .cfi_offset 14, -24 > pushq %r13 > .cfi_def_cfa_offset 32 > .cfi_offset 13, -32 > pushq %r12 > .cfi_def_cfa_offset 40 > .cfi_offset 12, -40 > pushq %rbp > .cfi_def_cfa_offset 48 > .cfi_offset 6, -48 > movq %rdi, %rbp > pushq %rbx > .cfi_def_cfa_offset 56 > .cfi_offset 3, -56 > movq %rsi, %rbx > subq $584, %rsp > .cfi_def_cfa_offset 640 > movzbl 10(%rsi), %eax > movq 184(%rsi), %r13 > leal -1(%rax,%rax), %eax > leaq (%rax,%rax,4), %r12 > salq $4, %r12 > addq %r13, %r12 > movdqu (%r12), %xmm0 > movaps %xmm0, 96(%rsp) > movdqu 16(%r12), %xmm0 > movaps %xmm0, 112(%rsp) > movdqu 32(%r12), %xmm0 > movaps %xmm0, 128(%rsp) > movdqu 48(%r12), %xmm0 > movaps %xmm0, 144(%rsp) > movdqu 64(%r12), %xmm0 > movaps %xmm0, 160(%rsp) > movl 172(%rsp), %eax > cmpl $576, %eax > ja .L1610 > .L1532: > movdqu 0(%r13), %xmm0 > movaps %xmm0, 16(%rsp) > movdqu 16(%r13), %xmm0 > movaps %xmm0, 32(%rsp) > movdqu 32(%r13), %xmm0 > movaps %xmm0, 48(%rsp) > movdqu 48(%r13), %xmm0 > movaps %xmm0, 64(%rsp) > movdqu 64(%r13), %xmm0 > movaps %xmm0, 80(%rsp) > movl 92(%rsp), %eax > cmpl $576, %eax > ja .L1611 > .L1533: > movq (%rbx), %rax > movq 16(%rax), %rax > cmpq $_ZNK6irange4typeEv, %rax > jne .L1534 > movq 16(%rbx), %rax > .L1535: > movl 92(%rsp), %r9d > movzwl 54(%rax), %r14d > movl 88(%rsp), %esi > movl 172(%rsp), %r11d > movl 168(%rsp), %r8d > cmpl $576, %r9d > ja .L1536 > cmpl $576, %r11d > ja .L1612 > cmpl %r8d, %esi > je .L1587 > movl %r9d, 252(%rsp) > leaq 16(%rsp), %r12 > leaq 96(%rsp), %rcx > leaq 176(%rsp), %r15 > movq %r12, %rax > movq %r15, %rdi > .L1541: > leal (%rsi,%r8), %edx > cmpl $2, %edx > jne .L1555 > movl 252(%rsp), %edx > movq (%rax), %rax > xorq (%rcx), %rax > movq %r15, %rcx > movq %rax, (%rdi) > movl $1, 248(%rsp) > cmpl $576, %edx > ja .L1613 > .L1556: > cmpq $0, (%rcx) > jne .L1593 > leaq 416(%rsp), %rax > leaq 496(%rsp), %r13 > movq %rax, (%rsp) > .L1565: > movq (%rsp), %rdi > movq %r15, %rdx > movq %r12, %rsi > call _ZN2wi6bit_orI16generic_wide_intI16wide_int_storageES3_EENS_13binary_traitsIT_T0_XsrNS_10int_traitsIS5_EE14precision_typeEXsrNS7_IS6_EE14precision_typeEE11result_typeERKS5_RKS6_ > leaq 352(%rsp), %rax > movq $0, 352(%rsp) > movq %rax, 336(%rsp) > movl $1, 344(%rsp) > movl %r14d, 348(%rsp) > movl %r14d, 572(%rsp) > cmpl $576, %r14d > ja .L1614 > movq $0, 496(%rsp) > movl $1, %esi > .L1582: > movl $0, 76(%rbp) > movq %rbp, %rdi > movl $0, 156(%rbp) > movl %esi, 568(%rsp) > movq %r13, %rsi > call _ZN16wide_int_storageaSERKS_.isra.0 > movq (%rsp), %rsi > leaq 80(%rbp), %rdi > call _ZN16wide_int_storageaSERKS_.isra.0 > movl global_options+3536(%rip), %eax > testl %eax, %eax > je .L1569 > movl 156(%rbp), %eax > cmpl %eax, 76(%rbp) > jne .L1570 > .L1569: > cmpl $576, 572(%rsp) > ja .L1615 > .L1571: > cmpl $576, 492(%rsp) > ja .L1616 > .L1572: > cmpl $576, 252(%rsp) > jbe .L1552 > movq 176(%rsp), %rdi > call free > jmp .L1552 > .p2align 4,,10 > .p2align 3 > .L1587: > leaq 16(%rsp), %r12 > leaq 96(%rsp), %rdi > movq %r12, %rcx > .L1540: > xorl %eax, %eax > jmp .L1545 > .p2align 4,,10 > .p2align 3 > .L1618: > addl $1, %eax > cmpl %eax, %esi > je .L1617 > .L1545: > movl %eax, %edx > movq (%rdi,%rdx,8), %r10 > cmpq %r10, (%rcx,%rdx,8) > je .L1618 > movl %r9d, 252(%rsp) > cmpl $576, %r9d > ja .L1543 > leaq 176(%rsp), %r15 > movq %r12, %rax > movq %r15, %rdi > .p2align 4,,10 > .p2align 3 > .L1554: > leaq 96(%rsp), %rcx > cmpl $576, %r11d > jbe .L1541 > .p2align 4,,10 > .p2align 3 > .L1539: > movq 96(%rsp), %rcx > jmp .L1541 > .p2align 4,,10 > .p2align 3 > .L1617: > leaq 512(%rsp), %rax > movl %r14d, 508(%rsp) > movq $0, 512(%rsp) > movq %rax, 496(%rsp) > movl $1, 504(%rsp) > movl %r14d, 332(%rsp) > cmpl $576, %r14d > ja .L1619 > movl $1, %esi > leaq 256(%rsp), %r13 > movq $0, 256(%rsp) > .L1583: > leaq 336(%rsp), %r14 > movl %esi, 328(%rsp) > movq %r12, %rsi > movq %r14, %rdi > call _ZN16wide_int_storageC2ERKS_ > movl $0, 76(%rbp) > movq %r14, %rsi > movq %rbp, %rdi > movl $0, 156(%rbp) > call _ZN16wide_int_storageaSERKS_.isra.0 > leaq 80(%rbp), %rdi > movq %r13, %rsi > call _ZN16wide_int_storageaSERKS_.isra.0 > movl global_options+3536(%rip), %edx > testl %edx, %edx > je .L1549 > movl 156(%rbp), %eax > cmpl %eax, 76(%rbp) > jne .L1570 > .L1549: > cmpl $576, 412(%rsp) > ja .L1620 > .L1550: > cmpl $576, 332(%rsp) > ja .L1621 > .L1552: > cmpl $576, 92(%rsp) > ja .L1622 > .L1574: > cmpl $576, 172(%rsp) > ja .L1623 > .L1575: > cmpl $576, 180(%rbx) > movl 176(%rbx), %eax > leaq 104(%rbx), %rdx > ja .L1624 > .L1577: > cmpl $1, %eax > jne .L1580 > cmpq $-1, (%rdx) > je .L1531 > .L1580: > leaq 24(%rbx), %rsi > movq %rbp, %rdi > call _ZN14irange_bitmask9intersectERKS_ > .L1531: > addq $584, %rsp > .cfi_remember_state > .cfi_def_cfa_offset 56 > movq %rbp, %rax > popq %rbx > .cfi_def_cfa_offset 48 > popq %rbp > .cfi_def_cfa_offset 40 > popq %r12 > .cfi_def_cfa_offset 32 > popq %r13 > .cfi_def_cfa_offset 24 > popq %r14 > .cfi_def_cfa_offset 16 > popq %r15 > .cfi_def_cfa_offset 8 > ret > .p2align 4,,10 > .p2align 3 > .L1534: > .cfi_restore_state > movq %rbx, %rdi > call *%rax > jmp .L1535 > .p2align 4,,10 > .p2align 3 > .L1593: > movl $1, %eax > .L1564: > movl %eax, 424(%rsp) > leaq 416(%rsp), %rax > leaq 496(%rsp), %r13 > movq %rax, %rdi > movq %rcx, 416(%rsp) > movl %edx, 428(%rsp) > movq %rax, (%rsp) > call _ZN2wi3clzERK16generic_wide_intI20wide_int_ref_storageILb0ELb1EEE > movl %r14d, %esi > movl %r14d, 572(%rsp) > subl %eax, %esi > cmpl $576, %r14d > ja .L1625 > .L1558: > movq %r13, %rdi > .L1559: > movl %r14d, %ecx > xorl %edx, %edx > call _ZN2wi4maskEPljbj > movl 572(%rsp), %edx > movl %eax, %ecx > movl %eax, 568(%rsp) > sall $6, %ecx > cmpl %ecx, %edx > jnb .L1560 > movq %r13, %rcx > cmpl $576, %edx > ja .L1626 > .L1561: > subl $1, %eax > andl $63, %edx > leaq (%rcx,%rax,8), %rsi > movl $64, %ecx > movq (%rsi), %rax > subl %edx, %ecx > salq %cl, %rax > sarq %cl, %rax > movq %rax, (%rsi) > .L1560: > movq %r13, %rsi > movq %r15, %rdi > call _ZN16wide_int_storageaSERKS_.isra.0 > cmpl $576, 572(%rsp) > jbe .L1565 > movq 496(%rsp), %rdi > call free > jmp .L1565 > .p2align 4,,10 > .p2align 3 > .L1536: > movq 16(%rsp), %rcx > cmpl $576, %r11d > ja .L1627 > cmpl %r8d, %esi > jne .L1581 > leaq 96(%rsp), %rdi > leaq 16(%rsp), %r12 > jmp .L1540 > .L1627: > leaq 16(%rsp), %r12 > cmpl %r8d, %esi > je .L1538 > .p2align 4,,10 > .p2align 3 > .L1581: > movl %r9d, 252(%rsp) > leaq 16(%rsp), %r12 > .L1543: > leal 63(%r9), %edi > leaq 176(%rsp), %r15 > shrl $6, %edi > salq $3, %rdi > call xmalloc > movl 88(%rsp), %esi > movl 252(%rsp), %r9d > movq %rax, 176(%rsp) > movl 92(%rsp), %edx > movq %rax, %rdi > movl 168(%rsp), %r8d > movl 172(%rsp), %r11d > cmpl $576, %r9d > jbe .L1607 > .L1553: > movq %r12, %rax > cmpl $576, %edx > jbe .L1554 > movq 16(%rsp), %rax > jmp .L1554 > .p2align 4,,10 > .p2align 3 > .L1624: > movq 104(%rbx), %rdx > jmp .L1577 > .p2align 4,,10 > .p2align 3 > .L1623: > movq 96(%rsp), %rdi > call free > jmp .L1575 > .p2align 4,,10 > .p2align 3 > .L1622: > movq 16(%rsp), %rdi > call free > jmp .L1574 > .p2align 4,,10 > .p2align 3 > .L1610: > leal 63(%rax), %edi > shrl $6, %edi > salq $3, %rdi > call xmalloc > movl 168(%rsp), %edx > movq (%r12), %rsi > movq %rax, %rdi > movq %rax, 96(%rsp) > salq $3, %rdx > call memcpy > movq 184(%rbx), %r13 > jmp .L1532 > .p2align 4,,10 > .p2align 3 > .L1611: > leal 63(%rax), %edi > shrl $6, %edi > salq $3, %rdi > call xmalloc > movl 88(%rsp), %edx > movq 0(%r13), %rsi > movq %rax, %rdi > movq %rax, 16(%rsp) > salq $3, %rdx > call memcpy > jmp .L1533 > .p2align 4,,10 > .p2align 3 > .L1612: > cmpl %r8d, %esi > je .L1586 > leaq 176(%rsp), %r15 > leaq 16(%rsp), %r12 > movl %r9d, 252(%rsp) > movq %r15, %rdi > movq %r12, %rax > jmp .L1539 > .p2align 4,,10 > .p2align 3 > .L1614: > leal 63(%r14), %edi > shrl $6, %edi > salq $3, %rdi > call xmalloc > movl 344(%rsp), %esi > cmpl $576, 572(%rsp) > movq %rax, 496(%rsp) > movq 336(%rsp), %rdi > jbe .L1628 > .L1567: > xorl %edx, %edx > .p2align 4,,10 > .p2align 3 > .L1568: > movq (%rdi,%rdx,8), %rcx > movq %rcx, (%rax,%rdx,8) > addq $1, %rdx > cmpl %esi, %edx > jb .L1568 > jmp .L1582 > .p2align 4,,10 > .p2align 3 > .L1621: > movq 256(%rsp), %rdi > call free > jmp .L1552 > .p2align 4,,10 > .p2align 3 > .L1620: > movq 336(%rsp), %rdi > call free > jmp .L1550 > .p2align 4,,10 > .p2align 3 > .L1619: > leal 63(%r14), %edi > leaq 256(%rsp), %r13 > shrl $6, %edi > salq $3, %rdi > call xmalloc > movl 504(%rsp), %esi > cmpl $576, 332(%rsp) > movq %rax, 256(%rsp) > movq 496(%rsp), %rdi > jbe .L1606 > .L1547: > xorl %edx, %edx > .p2align 4,,10 > .p2align 3 > .L1548: > movq (%rdi,%rdx,8), %rcx > movq %rcx, (%rax,%rdx,8) > addq $1, %rdx > cmpl %esi, %edx > jb .L1548 > jmp .L1583 > .p2align 4,,10 > .p2align 3 > .L1613: > movq 176(%rsp), %rcx > jmp .L1556 > .p2align 4,,10 > .p2align 3 > .L1555: > movl %esi, %edx > movq %rax, %rsi > call _ZN2wi9xor_largeEPlPKljS2_jj > movl 252(%rsp), %edx > movl %eax, 248(%rsp) > cmpl $576, %edx > ja .L1629 > movq %r15, %rcx > .L1557: > cmpl $1, %eax > jne .L1564 > jmp .L1556 > .p2align 4,,10 > .p2align 3 > .L1616: > movq 416(%rsp), %rdi > call free > jmp .L1572 > .p2align 4,,10 > .p2align 3 > .L1615: > movq 496(%rsp), %rdi > call free > jmp .L1571 > .L1586: > leaq 16(%rsp), %r12 > movq %r12, %rcx > .L1538: > movq 96(%rsp), %rdi > jmp .L1540 > .L1625: > leal 63(%r14), %edi > movl %esi, 12(%rsp) > shrl $6, %edi > salq $3, %rdi > call xmalloc > movl 12(%rsp), %esi > cmpl $576, 572(%rsp) > movq %rax, 496(%rsp) > movq %rax, %rdi > ja .L1559 > jmp .L1558 > .p2align 4,,10 > .p2align 3 > .L1626: > movq 496(%rsp), %rcx > jmp .L1561 > .L1607: > movq %r15, %rdi > jmp .L1553 > .L1628: > movq %r13, %rax > jmp .L1567 > .L1629: > movq 176(%rsp), %rcx > jmp .L1557 > .L1606: > movq %r13, %rax > jmp .L1547 > .cfi_endproc > .section .text.unlikely > .cfi_startproc > .type _ZNK6irange11get_bitmaskEv.cold, @function > _ZNK6irange11get_bitmaskEv.cold: > .LFSB3197: > .L1570: > .cfi_def_cfa_offset 640 > .cfi_offset 3, -56 > .cfi_offset 6, -48 > .cfi_offset 12, -40 > .cfi_offset 13, -32 > .cfi_offset 14, -24 > .cfi_offset 15, -16 > call _ZNK14irange_bitmask11verify_maskEv.part.0 > .cfi_endproc > .LFE3197: > .text > .size _ZNK6irange11get_bitmaskEv, .-_ZNK6irange11get_bitmaskEv > .section .text.unlikely > .size _ZNK6irange11get_bitmaskEv.cold, .-_ZNK6irange11get_bitmaskEv.cold > .LCOLDE33: > .text > .LHOTE33: > .section .rodata.str1.1 > .LC34: > .string "add_vrange" ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 1.76% performance loss in VRP due to inlining 2024-04-26 9:42 1.76% performance loss in VRP due to inlining Aldy Hernandez 2024-04-30 7:57 ` Richard Biener 2024-04-30 8:53 ` Martin Jambor @ 2024-04-30 19:09 ` Jason Merrill 2024-04-30 19:15 ` Richard Biener 2024-04-30 19:22 ` 1.76% performance loss in VRP due to inlining Jakub Jelinek 2 siblings, 2 replies; 11+ messages in thread From: Jason Merrill @ 2024-04-30 19:09 UTC (permalink / raw) To: Aldy Hernandez; +Cc: GCC Mailing List, MacLeod, Andrew On Fri, Apr 26, 2024 at 5:44 AM Aldy Hernandez via Gcc <gcc@gcc.gnu.org> wrote: > > In implementing prange (pointer ranges), I have found a 1.74% slowdown > in VRP, even without any code path actually using the code. I have > tracked this down to irange::get_bitmask() being compiled differently > with and without the bare bones patch. With the patch, > irange::get_bitmask() has a lot of code inlined into it, particularly > get_bitmask_from_range() and consequently the wide_int_storage code. ... > +static irange_bitmask > +get_bitmask_from_range (tree type, > + const wide_int &min, const wide_int &max) ... > -irange_bitmask > -irange::get_bitmask_from_range () const My guess is that this is the relevant change: the old function has external linkage, and is therefore interposable, which inhibits inlining. The new function has internal linkage, which allows inlining. Relatedly, I wonder if we want to build GCC with -fno-semantic-interposition? Jason ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 1.76% performance loss in VRP due to inlining 2024-04-30 19:09 ` Jason Merrill @ 2024-04-30 19:15 ` Richard Biener 2024-04-30 21:48 ` Building libgccjit with -fno-semantic-interposition? ( was Re: 1.76% performance loss in VRP due to inlining) David Malcolm 2024-04-30 19:22 ` 1.76% performance loss in VRP due to inlining Jakub Jelinek 1 sibling, 1 reply; 11+ messages in thread From: Richard Biener @ 2024-04-30 19:15 UTC (permalink / raw) To: Jason Merrill; +Cc: Aldy Hernandez, GCC Mailing List, MacLeod, Andrew > Am 30.04.2024 um 21:11 schrieb Jason Merrill via Gcc <gcc@gcc.gnu.org>: > > On Fri, Apr 26, 2024 at 5:44 AM Aldy Hernandez via Gcc <gcc@gcc.gnu.org> wrote: >> >> In implementing prange (pointer ranges), I have found a 1.74% slowdown >> in VRP, even without any code path actually using the code. I have >> tracked this down to irange::get_bitmask() being compiled differently >> with and without the bare bones patch. With the patch, >> irange::get_bitmask() has a lot of code inlined into it, particularly >> get_bitmask_from_range() and consequently the wide_int_storage code. > ... >> +static irange_bitmask >> +get_bitmask_from_range (tree type, >> + const wide_int &min, const wide_int &max) > ... >> -irange_bitmask >> -irange::get_bitmask_from_range () const > > My guess is that this is the relevant change: the old function has > external linkage, and is therefore interposable, which inhibits > inlining. The new function has internal linkage, which allows > inlining. > > Relatedly, I wonder if we want to build GCC with -fno-semantic-interposition? I guess that’s a good idea, though it’s already implied when doing LTO bootstrap and building cc1 and friends? (But not for libgccjit?) Richard > > Jason > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Building libgccjit with -fno-semantic-interposition? ( was Re: 1.76% performance loss in VRP due to inlining) 2024-04-30 19:15 ` Richard Biener @ 2024-04-30 21:48 ` David Malcolm 2024-05-02 7:40 ` Andrea Corallo 0 siblings, 1 reply; 11+ messages in thread From: David Malcolm @ 2024-04-30 21:48 UTC (permalink / raw) To: Richard Biener, Jason Merrill Cc: Aldy Hernandez, GCC Mailing List, MacLeod, Andrew, Antoni Boucher, Andrea Corallo, jit On Tue, 2024-04-30 at 21:15 +0200, Richard Biener via Gcc wrote: > > > > Am 30.04.2024 um 21:11 schrieb Jason Merrill via Gcc > > <gcc@gcc.gnu.org>: > > > > On Fri, Apr 26, 2024 at 5:44 AM Aldy Hernandez via Gcc > > <gcc@gcc.gnu.org> wrote: > > > > > > In implementing prange (pointer ranges), I have found a 1.74% > > > slowdown > > > in VRP, even without any code path actually using the code. I > > > have > > > tracked this down to irange::get_bitmask() being compiled > > > differently > > > with and without the bare bones patch. With the patch, > > > irange::get_bitmask() has a lot of code inlined into it, > > > particularly > > > get_bitmask_from_range() and consequently the wide_int_storage > > > code. > > ... > > > +static irange_bitmask > > > +get_bitmask_from_range (tree type, > > > + const wide_int &min, const wide_int &max) > > ... > > > -irange_bitmask > > > -irange::get_bitmask_from_range () const > > > > My guess is that this is the relevant change: the old function has > > external linkage, and is therefore interposable, which inhibits > > inlining. The new function has internal linkage, which allows > > inlining. > > > > Relatedly, I wonder if we want to build GCC with -fno-semantic- > > interposition? > > I guess that’s a good idea, though it’s already implied when doing > LTO bootstrap and building cc1 and friends? (But not for libgccjit?) [CCing jit mailing list] FWIW I've no idea if any libgccjit users are using semantic interposition; I suspect the answer is "no one is using it". Antoyo, Andrea [also CCed]: are either of you using semantic interposition of symbols within libgccjit? If not, we *might* get a slightly faster libgccjit by building it with -fno-semantic-interposition. Or maybe not... Dave > > Richard > > > > > Jason > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Building libgccjit with -fno-semantic-interposition? ( was Re: 1.76% performance loss in VRP due to inlining) 2024-04-30 21:48 ` Building libgccjit with -fno-semantic-interposition? ( was Re: 1.76% performance loss in VRP due to inlining) David Malcolm @ 2024-05-02 7:40 ` Andrea Corallo 0 siblings, 0 replies; 11+ messages in thread From: Andrea Corallo @ 2024-05-02 7:40 UTC (permalink / raw) To: David Malcolm, Richard Biener, Jason Merrill Cc: Aldy Hernandez, GCC Mailing List, MacLeod, Andrew, Antoni Boucher, jit > FWIW I've no idea if any libgccjit users are using semantic > interposition; I suspect the answer is "no one is using it". > > Antoyo, Andrea [also CCed]: are either of you using semantic > interposition of symbols within libgccjit? Hi David, AFAIU in Emacs we are not relying on interposition of symbols. Thanks Andrea ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 1.76% performance loss in VRP due to inlining 2024-04-30 19:09 ` Jason Merrill 2024-04-30 19:15 ` Richard Biener @ 2024-04-30 19:22 ` Jakub Jelinek 2024-04-30 21:37 ` Jason Merrill 1 sibling, 1 reply; 11+ messages in thread From: Jakub Jelinek @ 2024-04-30 19:22 UTC (permalink / raw) To: Jason Merrill; +Cc: Aldy Hernandez, GCC Mailing List, MacLeod, Andrew On Tue, Apr 30, 2024 at 03:09:51PM -0400, Jason Merrill via Gcc wrote: > On Fri, Apr 26, 2024 at 5:44 AM Aldy Hernandez via Gcc <gcc@gcc.gnu.org> wrote: > > > > In implementing prange (pointer ranges), I have found a 1.74% slowdown > > in VRP, even without any code path actually using the code. I have > > tracked this down to irange::get_bitmask() being compiled differently > > with and without the bare bones patch. With the patch, > > irange::get_bitmask() has a lot of code inlined into it, particularly > > get_bitmask_from_range() and consequently the wide_int_storage code. > ... > > +static irange_bitmask > > +get_bitmask_from_range (tree type, > > + const wide_int &min, const wide_int &max) > ... > > -irange_bitmask > > -irange::get_bitmask_from_range () const > > My guess is that this is the relevant change: the old function has > external linkage, and is therefore interposable, which inhibits > inlining. The new function has internal linkage, which allows > inlining. Even when a function is exported, when not compiled with -fpic/-fPIC if we know the function is defined in current TU, it can't be interposed, Try int foo (int x) { return x + 1; } int bar (int x, int y) { return foo (x) + foo (y); } with -O2 -fpic -fno-semantic-interposition vs. -O2 -fpic vs. -O2 -fpie vs. -O2. > Relatedly, I wonder if we want to build GCC with -fno-semantic-interposition? It could be useful just for libgccjit. And not sure if libgccjit users don't want to interpose something. Jakub ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 1.76% performance loss in VRP due to inlining 2024-04-30 19:22 ` 1.76% performance loss in VRP due to inlining Jakub Jelinek @ 2024-04-30 21:37 ` Jason Merrill 2024-05-03 8:55 ` Aldy Hernandez 0 siblings, 1 reply; 11+ messages in thread From: Jason Merrill @ 2024-04-30 21:37 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Aldy Hernandez, GCC Mailing List, MacLeod, Andrew On 4/30/24 12:22, Jakub Jelinek wrote: > On Tue, Apr 30, 2024 at 03:09:51PM -0400, Jason Merrill via Gcc wrote: >> On Fri, Apr 26, 2024 at 5:44 AM Aldy Hernandez via Gcc <gcc@gcc.gnu.org> wrote: >>> >>> In implementing prange (pointer ranges), I have found a 1.74% slowdown >>> in VRP, even without any code path actually using the code. I have >>> tracked this down to irange::get_bitmask() being compiled differently >>> with and without the bare bones patch. With the patch, >>> irange::get_bitmask() has a lot of code inlined into it, particularly >>> get_bitmask_from_range() and consequently the wide_int_storage code. >> ... >>> +static irange_bitmask >>> +get_bitmask_from_range (tree type, >>> + const wide_int &min, const wide_int &max) >> ... >>> -irange_bitmask >>> -irange::get_bitmask_from_range () const >> >> My guess is that this is the relevant change: the old function has >> external linkage, and is therefore interposable, which inhibits >> inlining. The new function has internal linkage, which allows >> inlining. > > Even when a function is exported, when not compiled with -fpic/-fPIC > if we know the function is defined in current TU, it can't be interposed, Ah, I was misremembering the effect of the change. Rather, it's that if we see that a function with internal linkage has only a single caller, we try harder to inline it. Jason ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 1.76% performance loss in VRP due to inlining 2024-04-30 21:37 ` Jason Merrill @ 2024-05-03 8:55 ` Aldy Hernandez 0 siblings, 0 replies; 11+ messages in thread From: Aldy Hernandez @ 2024-05-03 8:55 UTC (permalink / raw) To: Jason Merrill; +Cc: Jakub Jelinek, GCC Mailing List, MacLeod, Andrew [-- Attachment #1: Type: text/plain, Size: 6348 bytes --] After some very painful analysis, I was able to reduce the degradation we are experiencing in VRP to a handful of lines in the new implementation of prange. What happens is that any series of small changes to a new prange class causes changes in the inlining of wide_int_storage elsewhere. With the attached patch, one difference lies in irange::singleton_p(tree *). Note that this is in irange, which is completely unrelated to the new (unused) code. Using trunk as the stage1 compiler, we can see the assembly for irange::singleton_p(tree *) in value-range.cc is different with and without my patch. The number of calls into wide_int within irange::singleton_p(tree *) changes: awk '/^_ZNK6irange11singleton_pEPP9tree_node/,/endproc/' value-range.s | grep call.*wide_int With mainline sources: call _ZN16wide_int_storageC2ERKS_ call _Z16wide_int_to_treeP9tree_nodeRK8poly_intILj1E16generic_wide_intI20wide_int_ref_storageILb0ELb1EEEE With the attached patch: call _ZN16wide_int_storageC2ERKS_ call _ZN16wide_int_storageC2ERKS_ call _Z16wide_int_to_treeP9tree_nodeRK8poly_intILj1E16generic_wide_intI20wide_int_ref_storageILb0ELb1EEEE call _ZN16wide_int_storageC2ERKS_ The additional calls correspond to the wide_int_storage constructor: $ c++filt _ZN16wide_int_storageC2ERKS_ wide_int_storage::wide_int_storage(wide_int_storage const&) Using -fno-semantic-interposition makes no difference. Here are the relevant bits in the difference from -Winline with and without my patch: > inlined from ‘virtual bool irange::singleton_p(tree_node**) const’ at /home/aldyh/src/gcc/gcc/value-range.cc:1254:40: > /home/aldyh/src/gcc/gcc/wide-int.h:1196:8: warning: inlining failed in call to ‘wide_int_storage::wide_int_storage(const wide_int_storage&)’: --param inline-unit-growth limit reached [-Winline] > 1196 | inline wide_int_storage::wide_int_storage (const wide_int_storage &x) > | ^~~~~~~~~~~~~~~~ > /home/aldyh/src/gcc/gcc/wide-int.h:775:7: note: called from here > 775 | class GTY(()) generic_wide_int : public storage > | ^~~~~~~~~~~~~~~~ > /home/aldyh/src/gcc/gcc/wide-int.h:1196:8: warning: inlining failed in call to ‘wide_int_storage::wide_int_storage(const wide_int_storage&)’: --param inline-unit-growth limit reached [-Winline] > 1196 | inline wide_int_storage::wide_int_storage (const wide_int_storage &x) > | ^~~~~~~~~~~~~~~~ > /home/aldyh/src/gcc/gcc/wide-int.h:775:7: note: called from here > 775 | class GTY(()) generic_wide_int : public storage > | ^~~~~~~~~~~~~~~~ > In copy constructor ‘generic_wide_int<wide_int_storage>::generic_wide_int(const generic_wide_int<wide_int_storage>&)’, > inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1122:25, Note that this is just one example. There are also inlining differences to irange::get_bitmask(), irange::union_bitmask(), irange::operator=, among others. Most of the inlining failures seem to be related to wide_int_storage. I am attaching the difference in -Winline for the curious. Tracking this down is tricky because the slightest change in the patch causes different inlining in irange. Even using a slightly different stage1 compiler produces different changes. For example, using GCC 13 as the stage1 compiler, VRP exhibits a slowdown of 2% with the full prange class. Although this is virtually identical to the slowdown for using trunk as the stage1 compiler, the inlining failures are a tad different. I am tempted to commit the attached to mainline, which slows down VRP by 0.3%, but is measurable enough to analyze, just so we have a base commit-point from where to do the analysis. My wife is about to give birth any day now, so I'm afraid if I drop off for a few months, we'll lose the analysis and the point in time from where to do it. One final thing. The full prange class patch, even when disabled, slows VRP by 2%. I tried to implement the class in small increments, and every small change caused a further slowdown. I don't know if this 2% is final, or if further tweaks in this space will slow us down more. On a positive note, with the entirety of prange implemented (not just the base class but range-ops implemented and prange enabled, there is no overall change to VRP, and IPA-cp speeds up by 7%. This is because holding pointers in prange is a net win that overcomes the 2% handicap the inliner is hitting us with. I would love to hear thoughts, and if y'all agree that committing a small skeleton now can help us track this down in the future. Aldy On Tue, Apr 30, 2024 at 11:37 PM Jason Merrill <jason@redhat.com> wrote: > > On 4/30/24 12:22, Jakub Jelinek wrote: > > On Tue, Apr 30, 2024 at 03:09:51PM -0400, Jason Merrill via Gcc wrote: > >> On Fri, Apr 26, 2024 at 5:44 AM Aldy Hernandez via Gcc <gcc@gcc.gnu.org> wrote: > >>> > >>> In implementing prange (pointer ranges), I have found a 1.74% slowdown > >>> in VRP, even without any code path actually using the code. I have > >>> tracked this down to irange::get_bitmask() being compiled differently > >>> with and without the bare bones patch. With the patch, > >>> irange::get_bitmask() has a lot of code inlined into it, particularly > >>> get_bitmask_from_range() and consequently the wide_int_storage code. > >> ... > >>> +static irange_bitmask > >>> +get_bitmask_from_range (tree type, > >>> + const wide_int &min, const wide_int &max) > >> ... > >>> -irange_bitmask > >>> -irange::get_bitmask_from_range () const > >> > >> My guess is that this is the relevant change: the old function has > >> external linkage, and is therefore interposable, which inhibits > >> inlining. The new function has internal linkage, which allows > >> inlining. > > > > Even when a function is exported, when not compiled with -fpic/-fPIC > > if we know the function is defined in current TU, it can't be interposed, > > Ah, I was misremembering the effect of the change. Rather, it's that if > we see that a function with internal linkage has only a single caller, > we try harder to inline it. > > Jason > [-- Attachment #2: minimal-prange.diff --] [-- Type: text/x-patch, Size: 2488 bytes --] diff --git a/gcc/value-range.h b/gcc/value-range.h index f1c638f8cd0..178a690f551 100644 --- a/gcc/value-range.h +++ b/gcc/value-range.h @@ -378,6 +378,39 @@ private: wide_int m_ranges[N*2]; }; +class prange : public vrange +{ +public: + static bool supports_p (const_tree) { return false; } + virtual bool supports_type_p (const_tree) const final override { return false; } + virtual void accept (const vrange_visitor &) const final override {} + virtual void set_undefined () final override {} + virtual void set_varying (tree) final override {} + virtual void set_nonzero (tree) final override {} + virtual void set_zero (tree) final override; + virtual void set_nonnegative (tree) final override {} + virtual bool contains_p (tree) const final override { return false; } + virtual bool fits_p (const vrange &) const final override { return false; } + virtual bool singleton_p (tree * = NULL) const final override { return false; } + virtual bool zero_p () const final override { return false; } + virtual bool nonzero_p () const final override { return false; } + virtual void set (tree, tree, value_range_kind = VR_RANGE) final override {} + virtual tree type () const final override { return NULL; } + virtual bool union_ (const vrange &) final override { return false; } + virtual bool intersect (const vrange &) final override { return false; } + virtual tree lbound () const final override { return NULL; } + virtual tree ubound () const final override { return NULL; } + + wide_int lower_bound () const; + wide_int upper_bound () const; + irange_bitmask get_bitmask () const final override; + void update_bitmask (const irange_bitmask &) final override {} +private: + wide_int m_min; + wide_int m_max; + irange_bitmask m_bitmask; +}; + // Unsupported temporaries may be created by ranger before it's known // they're unsupported, or by vr_values::get_value_range. @@ -1187,6 +1220,32 @@ irange_val_max (const_tree type) return wi::max_value (TYPE_PRECISION (type), TYPE_SIGN (type)); } +inline void +prange::set_zero (tree type) +{ + wide_int zero = wi::zero (TYPE_PRECISION (type)); + m_min = m_max = zero; + m_bitmask = irange_bitmask (zero, zero); +} + +inline wide_int +prange::lower_bound () const +{ + return m_min; +} + +inline wide_int +prange::upper_bound () const +{ + return m_max; +} + +inline irange_bitmask +prange::get_bitmask () const +{ + return m_bitmask; +} + inline frange::frange () : vrange (VR_FRANGE) [-- Attachment #3: winline.diff --] [-- Type: text/x-patch, Size: 37489 bytes --] 4c4 < inlined from ‘virtual void irange::set_varying(tree)’ at /home/aldyh/src/gcc/gcc/value-range.h:1074:115: --- > inlined from ‘virtual void irange::set_varying(tree)’ at /home/aldyh/src/gcc/gcc/value-range.h:1107:115: 12,13c12,13 < inlined from ‘virtual void irange::set_varying(tree)’ at /home/aldyh/src/gcc/gcc/value-range.h:1075:115, < inlined from ‘virtual void irange::set_varying(tree)’ at /home/aldyh/src/gcc/gcc/value-range.h:1063:1: --- > inlined from ‘virtual void irange::set_varying(tree)’ at /home/aldyh/src/gcc/gcc/value-range.h:1108:115, > inlined from ‘virtual void irange::set_varying(tree)’ at /home/aldyh/src/gcc/gcc/value-range.h:1096:1: 21c21,36 < inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1089:25, --- > inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1122:25, > inlined from ‘virtual bool irange::singleton_p(tree_node**) const’ at /home/aldyh/src/gcc/gcc/value-range.cc:1254:40: > /home/aldyh/src/gcc/gcc/wide-int.h:1196:8: warning: inlining failed in call to ‘wide_int_storage::wide_int_storage(const wide_int_storage&)’: --param inline-unit-growth limit reached [-Winline] > 1196 | inline wide_int_storage::wide_int_storage (const wide_int_storage &x) > | ^~~~~~~~~~~~~~~~ > /home/aldyh/src/gcc/gcc/wide-int.h:775:7: note: called from here > 775 | class GTY(()) generic_wide_int : public storage > | ^~~~~~~~~~~~~~~~ > /home/aldyh/src/gcc/gcc/wide-int.h:1196:8: warning: inlining failed in call to ‘wide_int_storage::wide_int_storage(const wide_int_storage&)’: --param inline-unit-growth limit reached [-Winline] > 1196 | inline wide_int_storage::wide_int_storage (const wide_int_storage &x) > | ^~~~~~~~~~~~~~~~ > /home/aldyh/src/gcc/gcc/wide-int.h:775:7: note: called from here > 775 | class GTY(()) generic_wide_int : public storage > | ^~~~~~~~~~~~~~~~ > In copy constructor ‘generic_wide_int<wide_int_storage>::generic_wide_int(const generic_wide_int<wide_int_storage>&)’, > inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1122:25, 30,31c45,46 < inlined from ‘wide_int irange::upper_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1100:29, < inlined from ‘virtual bool irange::zero_p() const’ at /home/aldyh/src/gcc/gcc/value-range.h:953:19: --- > inlined from ‘wide_int irange::upper_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1133:29, > inlined from ‘virtual bool irange::zero_p() const’ at /home/aldyh/src/gcc/gcc/value-range.h:986:19: 215,217c230,235 < /home/aldyh/src/gcc/gcc/wide-int.h:1213:1: warning: inlining failed in call to ‘wide_int_storage& wide_int_storage::operator=(const wide_int_storage&)’: --param inline-unit-growth limit reached [-Winline] < 1213 | wide_int_storage::operator = (const wide_int_storage &x) < | ^~~~~~~~~~~~~~~~ --- > In copy constructor ‘generic_wide_int<wide_int_storage>::generic_wide_int(const generic_wide_int<wide_int_storage>&)’, > inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1122:25, > inlined from ‘bool irange::singleton_p(wide_int&) const’ at /home/aldyh/src/gcc/gcc/value-range.cc:1268:23: > /home/aldyh/src/gcc/gcc/wide-int.h:1196:8: warning: inlining failed in call to ‘wide_int_storage::wide_int_storage(const wide_int_storage&)’: --param inline-unit-growth limit reached [-Winline] > 1196 | inline wide_int_storage::wide_int_storage (const wide_int_storage &x) > | ^~~~~~~~~~~~~~~~ 222,223c240,241 < /home/aldyh/src/gcc/gcc/value-range.h:1063:1: warning: inlining failed in call to ‘virtual void irange::set_varying(tree)’: --param max-inline-insns-single limit reached [-Winline] < 1063 | irange::set_varying (tree type) --- > /home/aldyh/src/gcc/gcc/value-range.h:1096:1: warning: inlining failed in call to ‘virtual void irange::set_varying(tree)’: --param max-inline-insns-single limit reached [-Winline] > 1096 | irange::set_varying (tree type) 225,226c243,244 < /home/aldyh/src/gcc/gcc/value-range.h:1029:15: note: called from here < 1029 | set_varying (type); --- > /home/aldyh/src/gcc/gcc/value-range.h:1062:15: note: called from here > 1062 | set_varying (type); 229,230c247,248 < /home/aldyh/src/gcc/gcc/value-range.h:1063:1: warning: inlining failed in call to ‘virtual void irange::set_varying(tree)’: --param max-inline-insns-single limit reached [-Winline] < 1063 | irange::set_varying (tree type) --- > /home/aldyh/src/gcc/gcc/value-range.h:1096:1: warning: inlining failed in call to ‘virtual void irange::set_varying(tree)’: --param max-inline-insns-single limit reached [-Winline] > 1096 | irange::set_varying (tree type) 232,233c250,251 < /home/aldyh/src/gcc/gcc/value-range.h:1029:15: note: called from here < 1029 | set_varying (type); --- > /home/aldyh/src/gcc/gcc/value-range.h:1062:15: note: called from here > 1062 | set_varying (type); 236,237c254,255 < /home/aldyh/src/gcc/gcc/value-range.h:1063:1: warning: inlining failed in call to ‘virtual void irange::set_varying(tree)’: --param max-inline-insns-single limit reached [-Winline] < 1063 | irange::set_varying (tree type) --- > /home/aldyh/src/gcc/gcc/value-range.h:1096:1: warning: inlining failed in call to ‘virtual void irange::set_varying(tree)’: --param max-inline-insns-single limit reached [-Winline] > 1096 | irange::set_varying (tree type) 239,240c257,258 < /home/aldyh/src/gcc/gcc/value-range.h:1029:15: note: called from here < 1029 | set_varying (type); --- > /home/aldyh/src/gcc/gcc/value-range.h:1062:15: note: called from here > 1062 | set_varying (type); 243,244c261,262 < /home/aldyh/src/gcc/gcc/value-range.h:1063:1: warning: inlining failed in call to ‘virtual void irange::set_varying(tree)’: --param max-inline-insns-single limit reached [-Winline] < 1063 | irange::set_varying (tree type) --- > /home/aldyh/src/gcc/gcc/value-range.h:1096:1: warning: inlining failed in call to ‘virtual void irange::set_varying(tree)’: --param max-inline-insns-single limit reached [-Winline] > 1096 | irange::set_varying (tree type) 246,247c264,265 < /home/aldyh/src/gcc/gcc/value-range.h:1029:15: note: called from here < 1029 | set_varying (type); --- > /home/aldyh/src/gcc/gcc/value-range.h:1062:15: note: called from here > 1062 | set_varying (type); 278a297,311 > In copy constructor ‘generic_wide_int<wide_int_storage>::generic_wide_int(const generic_wide_int<wide_int_storage>&)’, > inlined from ‘irange_bitmask get_bitmask_from_range(tree, const wide_int&, const wide_int&)’ at /home/aldyh/src/gcc/gcc/value-range.cc:46:24, > inlined from ‘virtual irange_bitmask irange::get_bitmask() const’ at /home/aldyh/src/gcc/gcc/value-range.cc:2034:70: > /home/aldyh/src/gcc/gcc/wide-int.h:1196:8: warning: inlining failed in call to ‘wide_int_storage::wide_int_storage(const wide_int_storage&)’: --param inline-unit-growth limit reached [-Winline] > 1196 | inline wide_int_storage::wide_int_storage (const wide_int_storage &x) > | ^~~~~~~~~~~~~~~~ > /home/aldyh/src/gcc/gcc/wide-int.h:775:7: note: called from here > 775 | class GTY(()) generic_wide_int : public storage > | ^~~~~~~~~~~~~~~~ > /home/aldyh/src/gcc/gcc/wide-int.h:1196:8: warning: inlining failed in call to ‘wide_int_storage::wide_int_storage(const wide_int_storage&)’: --param inline-unit-growth limit reached [-Winline] > 1196 | inline wide_int_storage::wide_int_storage (const wide_int_storage &x) > | ^~~~~~~~~~~~~~~~ > /home/aldyh/src/gcc/gcc/wide-int.h:775:7: note: called from here > 775 | class GTY(()) generic_wide_int : public storage > | ^~~~~~~~~~~~~~~~ 351,352c384,385 < inlined from ‘bool irange::varying_compatible_p() const’ at /home/aldyh/src/gcc/gcc/value-range.h:931:11, < inlined from ‘void irange::normalize_kind()’ at /home/aldyh/src/gcc/gcc/value-range.h:1155:33: --- > inlined from ‘bool irange::varying_compatible_p() const’ at /home/aldyh/src/gcc/gcc/value-range.h:963:15, > inlined from ‘void irange::normalize_kind()’ at /home/aldyh/src/gcc/gcc/value-range.h:1188:33: 358a392,418 > In function ‘typename wi::binary_traits<T1, T2>::predicate_result operator==(const T1&, const T2&) [with T1 = generic_wide_int<wide_int_storage>; T2 = generic_wide_int<wide_int_storage>]’, > inlined from ‘bool irange::varying_compatible_p() const’ at /home/aldyh/src/gcc/gcc/value-range.h:964:11, > inlined from ‘void irange::normalize_kind()’ at /home/aldyh/src/gcc/gcc/value-range.h:1188:33: > /home/aldyh/src/gcc/gcc/wide-int.h:2246:1: warning: inlining failed in call to ‘bool wi::eq_p(const T1&, const T2&) [with T1 = generic_wide_int<wide_int_storage>; T2 = generic_wide_int<wide_int_storage>]’: --param inline-unit-growth limit reached [-Winline] > 2246 | wi::eq_p (const T1 &x, const T2 &y) > | ^~ > /home/aldyh/src/gcc/gcc/wide-int.h:3852:147: note: called from here > 3852 | BINARY_PREDICATE (operator ==, eq_p) > | ^ > In copy constructor ‘generic_wide_int<wide_int_storage>::generic_wide_int(const generic_wide_int<wide_int_storage>&)’, > inlined from ‘irange_bitmask::irange_bitmask(const irange_bitmask&)’ at /home/aldyh/src/gcc/gcc/value-range.h:131:7, > inlined from ‘bool irange::union_bitmask(const irange&)’ at /home/aldyh/src/gcc/gcc/value-range.cc:2106:25: > /home/aldyh/src/gcc/gcc/wide-int.h:1196:8: warning: inlining failed in call to ‘wide_int_storage::wide_int_storage(const wide_int_storage&)’: --param inline-unit-growth limit reached [-Winline] > 1196 | inline wide_int_storage::wide_int_storage (const wide_int_storage &x) > | ^~~~~~~~~~~~~~~~ > /home/aldyh/src/gcc/gcc/wide-int.h:775:7: note: called from here > 775 | class GTY(()) generic_wide_int : public storage > | ^~~~~~~~~~~~~~~~ > In copy constructor ‘generic_wide_int<wide_int_storage>::generic_wide_int(const generic_wide_int<wide_int_storage>&)’, > inlined from ‘irange_bitmask::irange_bitmask(const irange_bitmask&)’ at /home/aldyh/src/gcc/gcc/value-range.h:131:7, > inlined from ‘bool irange::union_bitmask(const irange&)’ at /home/aldyh/src/gcc/gcc/value-range.cc:2106:25: > /home/aldyh/src/gcc/gcc/wide-int.h:1196:8: warning: inlining failed in call to ‘wide_int_storage::wide_int_storage(const wide_int_storage&)’: --param inline-unit-growth limit reached [-Winline] > 1196 | inline wide_int_storage::wide_int_storage (const wide_int_storage &x) > | ^~~~~~~~~~~~~~~~ > /home/aldyh/src/gcc/gcc/wide-int.h:775:7: note: called from here > 775 | class GTY(()) generic_wide_int : public storage > | ^~~~~~~~~~~~~~~~ 367a428,448 > In function ‘typename wi::binary_traits<T1, T2>::operator_result operator|(const T1&, const T2&) [with T1 = generic_wide_int<wide_int_storage>; T2 = generic_wide_int<wide_int_storage>]’, > inlined from ‘void irange_bitmask::union_(const irange_bitmask&)’ at /home/aldyh/src/gcc/gcc/value-range.h:239:58, > inlined from ‘bool irange::union_bitmask(const irange&)’ at /home/aldyh/src/gcc/gcc/value-range.cc:2107:13: > /home/aldyh/src/gcc/gcc/wide-int.h:2798:1: warning: inlining failed in call to ‘typename wi::binary_traits<T1, T2>::result_type wi::bit_or(const T1&, const T2&) [with T1 = generic_wide_int<wide_int_storage>; T2 = generic_wide_int<wide_int_storage>]’: --param inline-unit-growth limit reached [-Winline] > 2798 | wi::bit_or (const T1 &x, const T2 &y) > | ^~ > /home/aldyh/src/gcc/gcc/wide-int.h:3855:152: note: called from here > 3855 | BINARY_OPERATOR (operator |, bit_or) > | ^ > In function ‘typename wi::binary_traits<T1, T2>::operator_result operator|(const T1&, const T2&) [with T1 = generic_wide_int<wide_int_storage>; T2 = generic_wide_int<wide_int_storage>]’, > inlined from ‘void irange_bitmask::union_(const irange_bitmask&)’ at /home/aldyh/src/gcc/gcc/value-range.h:239:26, > inlined from ‘bool irange::union_bitmask(const irange&)’ at /home/aldyh/src/gcc/gcc/value-range.cc:2107:13: > /home/aldyh/src/gcc/gcc/wide-int.h:2798:1: warning: inlining failed in call to ‘typename wi::binary_traits<T1, T2>::result_type wi::bit_or(const T1&, const T2&) [with T1 = generic_wide_int<wide_int_storage>; T2 = generic_wide_int<wide_int_storage>]’: --param inline-unit-growth limit reached [-Winline] > 2798 | wi::bit_or (const T1 &x, const T2 &y) > | ^~ > /home/aldyh/src/gcc/gcc/wide-int.h:3855:152: note: called from here > 3855 | BINARY_OPERATOR (operator |, bit_or) > | ^ > In member function ‘generic_wide_int<wide_int_storage>& generic_wide_int<wide_int_storage>::operator=(generic_wide_int<wide_int_storage>&&)’, > inlined from ‘void irange_bitmask::union_(const irange_bitmask&)’ at /home/aldyh/src/gcc/gcc/value-range.h:239:58, > inlined from ‘bool irange::union_bitmask(const irange&)’ at /home/aldyh/src/gcc/gcc/value-range.cc:2107:13: 373a455,472 > In function ‘typename wi::binary_traits<T1, T2>::operator_result operator|(const T1&, const T2&) [with T1 = generic_wide_int<wide_int_storage>; T2 = generic_wide_int<wide_int_storage>]’, > inlined from ‘void irange_bitmask::union_(const irange_bitmask&)’ at /home/aldyh/src/gcc/gcc/value-range.h:239:58, > inlined from ‘bool irange::union_bitmask(const irange&)’ at /home/aldyh/src/gcc/gcc/value-range.cc:2107:13: > /home/aldyh/src/gcc/gcc/wide-int.h:2798:1: warning: inlining failed in call to ‘typename wi::binary_traits<T1, T2>::result_type wi::bit_or(const T1&, const T2&) [with T1 = generic_wide_int<wide_int_storage>; T2 = generic_wide_int<wide_int_storage>]’: --param inline-unit-growth limit reached [-Winline] > 2798 | wi::bit_or (const T1 &x, const T2 &y) > | ^~ > /home/aldyh/src/gcc/gcc/wide-int.h:3855:152: note: called from here > 3855 | BINARY_OPERATOR (operator |, bit_or) > | ^ > In function ‘typename wi::binary_traits<T1, T2>::operator_result operator|(const T1&, const T2&) [with T1 = generic_wide_int<wide_int_storage>; T2 = generic_wide_int<wide_int_storage>]’, > inlined from ‘void irange_bitmask::union_(const irange_bitmask&)’ at /home/aldyh/src/gcc/gcc/value-range.h:239:26, > inlined from ‘bool irange::union_bitmask(const irange&)’ at /home/aldyh/src/gcc/gcc/value-range.cc:2107:13: > /home/aldyh/src/gcc/gcc/wide-int.h:2798:1: warning: inlining failed in call to ‘typename wi::binary_traits<T1, T2>::result_type wi::bit_or(const T1&, const T2&) [with T1 = generic_wide_int<wide_int_storage>; T2 = generic_wide_int<wide_int_storage>]’: --param inline-unit-growth limit reached [-Winline] > 2798 | wi::bit_or (const T1 &x, const T2 &y) > | ^~ > /home/aldyh/src/gcc/gcc/wide-int.h:3855:152: note: called from here > 3855 | BINARY_OPERATOR (operator |, bit_or) > | ^ 408,409c507,508 < /home/aldyh/src/gcc/gcc/value-range.h:1151:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] < 1151 | irange::normalize_kind () --- > /home/aldyh/src/gcc/gcc/value-range.h:1184:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] > 1184 | irange::normalize_kind () 413a513,520 > In member function ‘generic_wide_int<wide_int_storage>& generic_wide_int<wide_int_storage>::operator=(const generic_wide_int<wide_int_storage>&)’, > inlined from ‘irange& irange::operator=(const irange&)’ at /home/aldyh/src/gcc/gcc/value-range.cc:1022:56: > /home/aldyh/src/gcc/gcc/wide-int.h:1213:1: warning: inlining failed in call to ‘wide_int_storage& wide_int_storage::operator=(const wide_int_storage&)’: --param inline-unit-growth limit reached [-Winline] > 1213 | wide_int_storage::operator = (const wide_int_storage &x) > | ^~~~~~~~~~~~~~~~ > /home/aldyh/src/gcc/gcc/wide-int.h:775:7: note: called from here > 775 | class GTY(()) generic_wide_int : public storage > | ^~~~~~~~~~~~~~~~ 415,416c522,523 < /home/aldyh/src/gcc/gcc/value-range.h:1151:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] < 1151 | irange::normalize_kind () --- > /home/aldyh/src/gcc/gcc/value-range.h:1184:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] > 1184 | irange::normalize_kind () 420a528,536 > In copy constructor ‘generic_wide_int<wide_int_storage>::generic_wide_int(const generic_wide_int<wide_int_storage>&)’, > inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1122:25, > inlined from ‘void irange::invert()’ at /home/aldyh/src/gcc/gcc/value-range.cc:1863:42: > /home/aldyh/src/gcc/gcc/wide-int.h:1196:8: warning: inlining failed in call to ‘wide_int_storage::wide_int_storage(const wide_int_storage&)’: --param inline-unit-growth limit reached [-Winline] > 1196 | inline wide_int_storage::wide_int_storage (const wide_int_storage &x) > | ^~~~~~~~~~~~~~~~ > /home/aldyh/src/gcc/gcc/wide-int.h:775:7: note: called from here > 775 | class GTY(()) generic_wide_int : public storage > | ^~~~~~~~~~~~~~~~ 429,430c545,546 < inlined from ‘void irange::maybe_resize(int)’ at /home/aldyh/src/gcc/gcc/value-range.h:655:22, < inlined from ‘void irange::maybe_resize(int)’ at /home/aldyh/src/gcc/gcc/value-range.h:644:1, --- > inlined from ‘void irange::maybe_resize(int)’ at /home/aldyh/src/gcc/gcc/value-range.h:688:22, > inlined from ‘void irange::maybe_resize(int)’ at /home/aldyh/src/gcc/gcc/value-range.h:677:1, 437a554,562 > In copy constructor ‘generic_wide_int<wide_int_storage>::generic_wide_int(const generic_wide_int<wide_int_storage>&)’, > inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1122:25, > inlined from ‘void irange::invert()’ at /home/aldyh/src/gcc/gcc/value-range.cc:1863:42: > /home/aldyh/src/gcc/gcc/wide-int.h:1196:8: warning: inlining failed in call to ‘wide_int_storage::wide_int_storage(const wide_int_storage&)’: --param inline-unit-growth limit reached [-Winline] > 1196 | inline wide_int_storage::wide_int_storage (const wide_int_storage &x) > | ^~~~~~~~~~~~~~~~ > /home/aldyh/src/gcc/gcc/wide-int.h:775:7: note: called from here > 775 | class GTY(()) generic_wide_int : public storage > | ^~~~~~~~~~~~~~~~ 447c572 < inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1089:25, --- > inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1122:25, 545a671,677 > /home/aldyh/src/gcc/gcc/value-range.h: In member function ‘void irange::invert()’: > /home/aldyh/src/gcc/gcc/value-range.h:695:1: warning: inlining failed in call to ‘int_range<N, RESIZABLE>::~int_range() noexcept [with unsigned int N = 3; bool RESIZABLE = true]’: --param inline-unit-growth limit reached [-Winline] > 695 | int_range<N, RESIZABLE>::~int_range () > | ^~~~~~~~~~~~~~~~~~~~~~~ > /home/aldyh/src/gcc/gcc/value-range.cc:1910:1: note: called from here > 1910 | } > | ^ 576,577c708,709 < /home/aldyh/src/gcc/gcc/value-range.h:1151:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] < 1151 | irange::normalize_kind () --- > /home/aldyh/src/gcc/gcc/value-range.h:1184:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] > 1184 | irange::normalize_kind () 704c836 < inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1089:25, --- > inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1122:25, 713,714c845,846 < inlined from ‘wide_int irange::upper_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1100:29, < inlined from ‘wide_int irange::upper_bound() const’ at /home/aldyh/src/gcc/gcc/value-range.h:1110:32, --- > inlined from ‘wide_int irange::upper_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1133:29, > inlined from ‘wide_int irange::upper_bound() const’ at /home/aldyh/src/gcc/gcc/value-range.h:1143:32, 723c855 < inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1089:25, --- > inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1122:25, 740,741c872,873 < inlined from ‘wide_int irange::upper_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1100:29, < inlined from ‘wide_int irange::upper_bound() const’ at /home/aldyh/src/gcc/gcc/value-range.h:1110:32, --- > inlined from ‘wide_int irange::upper_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1133:29, > inlined from ‘wide_int irange::upper_bound() const’ at /home/aldyh/src/gcc/gcc/value-range.h:1143:32, 750c882 < inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1089:25, --- > inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1122:25, 765c897 < inlined from ‘wide_int irange::upper_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1100:29, --- > inlined from ‘wide_int irange::upper_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1133:29, 774c906 < inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1089:25, --- > inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1122:25, 783,784c915,916 < inlined from ‘wide_int irange::upper_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1100:29, < inlined from ‘wide_int irange::upper_bound() const’ at /home/aldyh/src/gcc/gcc/value-range.h:1110:32, --- > inlined from ‘wide_int irange::upper_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1133:29, > inlined from ‘wide_int irange::upper_bound() const’ at /home/aldyh/src/gcc/gcc/value-range.h:1143:32, 862a995,1006 > In constructor ‘generic_wide_int<T>::generic_wide_int(const T&) [with T = wi::hwi_with_prec; storage = wide_int_storage]’, > inlined from ‘bool contains_zero_p(const irange&)’ at /home/aldyh/src/gcc/gcc/value-range.h:1205:67, > inlined from ‘bool irange::set_range_from_bitmask()’ at /home/aldyh/src/gcc/gcc/value-range.cc:1968:39: > /home/aldyh/src/gcc/gcc/wide-int.h:1184:8: warning: inlining failed in call to ‘wide_int_storage::wide_int_storage(const T&) [with T = wi::hwi_with_prec]’: --param inline-unit-growth limit reached [-Winline] > 1184 | inline wide_int_storage::wide_int_storage (const T &x) > | ^~~~~~~~~~~~~~~~ > /home/aldyh/src/gcc/gcc/wide-int.h:847:15: note: called from here > 847 | : storage (x) > | ^ > In function ‘typename wi::binary_traits<T1, T2>::operator_result operator|(const T1&, const T2&) [with T1 = generic_wide_int<wide_int_storage>; T2 = generic_wide_int<wide_int_storage>]’, > inlined from ‘wide_int irange_bitmask::get_nonzero_bits() const’ at /home/aldyh/src/gcc/gcc/value-range.h:200:20, > inlined from ‘bool irange::set_range_from_bitmask()’ at /home/aldyh/src/gcc/gcc/value-range.cc:1969:49: 885,886c1029,1036 < /home/aldyh/src/gcc/gcc/value-range.h:1142:1: warning: inlining failed in call to ‘virtual void irange::set_zero(tree)’: --param inline-unit-growth limit reached [-Winline] < 1142 | irange::set_zero (tree type) --- > /home/aldyh/src/gcc/gcc/value-range.h:695:1: warning: inlining failed in call to ‘int_range<N, RESIZABLE>::~int_range() noexcept [with unsigned int N = 2; bool RESIZABLE = false]’: --param inline-unit-growth limit reached [-Winline] > 695 | int_range<N, RESIZABLE>::~int_range () > | ^~~~~~~~~~~~~~~~~~~~~~~ > /home/aldyh/src/gcc/gcc/value-range.cc:1977:9: note: called from here > 1977 | } > | ^ > /home/aldyh/src/gcc/gcc/value-range.h:1175:1: warning: inlining failed in call to ‘virtual void irange::set_zero(tree)’: --param inline-unit-growth limit reached [-Winline] > 1175 | irange::set_zero (tree type) 892,893c1042,1043 < /home/aldyh/src/gcc/gcc/value-range.h:1151:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] < 1151 | irange::normalize_kind () --- > /home/aldyh/src/gcc/gcc/value-range.h:1184:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] > 1184 | irange::normalize_kind () 906a1057,1066 > In function ‘bool wi::les_p(const T1&, const T2&) [with T1 = generic_wide_int<wide_int_storage>; T2 = generic_wide_int<wide_int_storage>]’, > inlined from ‘bool wi::le_p(const T1&, const T2&, signop) [with T1 = generic_wide_int<wide_int_storage>; T2 = generic_wide_int<wide_int_storage>]’ at /home/aldyh/src/gcc/gcc/wide-int.h:2381:18, > inlined from ‘bool wi::le_p(const T1&, const T2&, signop) [with T1 = generic_wide_int<wide_int_storage>; T2 = generic_wide_int<wide_int_storage>]’ at /home/aldyh/src/gcc/gcc/wide-int.h:2378:1, > inlined from ‘bool irange::irange_contains_p(const irange&) const’ at /home/aldyh/src/gcc/gcc/value-range.cc:1570:16: > /home/aldyh/src/gcc/gcc/wide-int.h:2293:1: warning: inlining failed in call to ‘bool wi::lts_p(const T1&, const T2&) [with T1 = generic_wide_int<wide_int_storage>; T2 = generic_wide_int<wide_int_storage>]’: --param inline-unit-growth limit reached [-Winline] > 2293 | wi::lts_p (const T1 &x, const T2 &y) > | ^~ > /home/aldyh/src/gcc/gcc/wide-int.h:2364:17: note: called from here > 2364 | return !lts_p (y, x); > | ~~~~~~^~~~~~ 957c1117 < inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1089:25, --- > inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1122:25, 985,986c1145,1146 < inlined from ‘wide_int irange::upper_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1100:29, < inlined from ‘wide_int irange::upper_bound() const’ at /home/aldyh/src/gcc/gcc/value-range.h:1110:32, --- > inlined from ‘wide_int irange::upper_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1133:29, > inlined from ‘wide_int irange::upper_bound() const’ at /home/aldyh/src/gcc/gcc/value-range.h:1143:32, 1079,1080c1239,1240 < /home/aldyh/src/gcc/gcc/value-range.h:662:1: warning: inlining failed in call to ‘int_range<N, RESIZABLE>::~int_range() noexcept [with unsigned int N = 3; bool RESIZABLE = true]’: --param inline-unit-growth limit reached [-Winline] < 662 | int_range<N, RESIZABLE>::~int_range () --- > /home/aldyh/src/gcc/gcc/value-range.h:695:1: warning: inlining failed in call to ‘int_range<N, RESIZABLE>::~int_range() noexcept [with unsigned int N = 3; bool RESIZABLE = true]’: --param inline-unit-growth limit reached [-Winline] > 695 | int_range<N, RESIZABLE>::~int_range () 1085,1086c1245,1255 < /home/aldyh/src/gcc/gcc/value-range.h:662:1: warning: inlining failed in call to ‘int_range<N, RESIZABLE>::~int_range() noexcept [with unsigned int N = 3; bool RESIZABLE = true]’: --param inline-unit-growth limit reached [-Winline] < 662 | int_range<N, RESIZABLE>::~int_range () --- > In constructor ‘generic_wide_int<T>::generic_wide_int(const T&) [with T = wi::hwi_with_prec; storage = wide_int_storage]’, > inlined from ‘void irange_bitmask::adjust_range(irange&) const’ at /home/aldyh/src/gcc/gcc/value-range.cc:1929:54: > /home/aldyh/src/gcc/gcc/wide-int.h:1184:8: warning: inlining failed in call to ‘wide_int_storage::wide_int_storage(const T&) [with T = wi::hwi_with_prec]’: --param inline-unit-growth limit reached [-Winline] > 1184 | inline wide_int_storage::wide_int_storage (const T &x) > | ^~~~~~~~~~~~~~~~ > /home/aldyh/src/gcc/gcc/wide-int.h:847:15: note: called from here > 847 | : storage (x) > | ^ > /home/aldyh/src/gcc/gcc/value-range.h: In member function ‘void irange_bitmask::adjust_range(irange&) const’: > /home/aldyh/src/gcc/gcc/value-range.h:695:1: warning: inlining failed in call to ‘int_range<N, RESIZABLE>::~int_range() noexcept [with unsigned int N = 3; bool RESIZABLE = true]’: --param inline-unit-growth limit reached [-Winline] > 695 | int_range<N, RESIZABLE>::~int_range () 1091,1092c1260,1261 < /home/aldyh/src/gcc/gcc/value-range.h:662:1: warning: inlining failed in call to ‘int_range<N, RESIZABLE>::~int_range() noexcept [with unsigned int N = 3; bool RESIZABLE = true]’: --param inline-unit-growth limit reached [-Winline] < 662 | int_range<N, RESIZABLE>::~int_range () --- > /home/aldyh/src/gcc/gcc/value-range.h:695:1: warning: inlining failed in call to ‘int_range<N, RESIZABLE>::~int_range() noexcept [with unsigned int N = 3; bool RESIZABLE = true]’: --param inline-unit-growth limit reached [-Winline] > 695 | int_range<N, RESIZABLE>::~int_range () 1097,1098c1266,1267 < /home/aldyh/src/gcc/gcc/value-range.h:662:1: warning: inlining failed in call to ‘int_range<N, RESIZABLE>::~int_range() noexcept [with unsigned int N = 3; bool RESIZABLE = true]’: --param inline-unit-growth limit reached [-Winline] < 662 | int_range<N, RESIZABLE>::~int_range () --- > /home/aldyh/src/gcc/gcc/value-range.h:695:1: warning: inlining failed in call to ‘int_range<N, RESIZABLE>::~int_range() noexcept [with unsigned int N = 3; bool RESIZABLE = true]’: --param inline-unit-growth limit reached [-Winline] > 695 | int_range<N, RESIZABLE>::~int_range () 1135,1136c1304,1305 < /home/aldyh/src/gcc/gcc/value-range.h:1151:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] < 1151 | irange::normalize_kind () --- > /home/aldyh/src/gcc/gcc/value-range.h:1184:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] > 1184 | irange::normalize_kind () 1142c1311 < inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1089:25, --- > inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1122:25, 1151,1152c1320,1321 < inlined from ‘wide_int irange::upper_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1100:29, < inlined from ‘wide_int irange::upper_bound() const’ at /home/aldyh/src/gcc/gcc/value-range.h:1110:32, --- > inlined from ‘wide_int irange::upper_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1133:29, > inlined from ‘wide_int irange::upper_bound() const’ at /home/aldyh/src/gcc/gcc/value-range.h:1143:32, 1161,1162c1330,1331 < /home/aldyh/src/gcc/gcc/value-range.h:1151:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] < 1151 | irange::normalize_kind () --- > /home/aldyh/src/gcc/gcc/value-range.h:1184:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] > 1184 | irange::normalize_kind () 1168,1169c1337,1338 < inlined from ‘void irange::maybe_resize(int)’ at /home/aldyh/src/gcc/gcc/value-range.h:655:22, < inlined from ‘void irange::maybe_resize(int)’ at /home/aldyh/src/gcc/gcc/value-range.h:644:1, --- > inlined from ‘void irange::maybe_resize(int)’ at /home/aldyh/src/gcc/gcc/value-range.h:688:22, > inlined from ‘void irange::maybe_resize(int)’ at /home/aldyh/src/gcc/gcc/value-range.h:677:1, 1308,1309c1477,1478 < /home/aldyh/src/gcc/gcc/value-range.h:1151:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] < 1151 | irange::normalize_kind () --- > /home/aldyh/src/gcc/gcc/value-range.h:1184:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] > 1184 | irange::normalize_kind () 1314,1315c1483,1484 < /home/aldyh/src/gcc/gcc/value-range.h:662:1: warning: inlining failed in call to ‘int_range<N, RESIZABLE>::~int_range() noexcept [with unsigned int N = 3; bool RESIZABLE = true]’: --param inline-unit-growth limit reached [-Winline] < 662 | int_range<N, RESIZABLE>::~int_range () --- > /home/aldyh/src/gcc/gcc/value-range.h:695:1: warning: inlining failed in call to ‘int_range<N, RESIZABLE>::~int_range() noexcept [with unsigned int N = 3; bool RESIZABLE = true]’: --param inline-unit-growth limit reached [-Winline] > 695 | int_range<N, RESIZABLE>::~int_range () 1414,1415c1583,1584 < /home/aldyh/src/gcc/gcc/value-range.h:1151:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] < 1151 | irange::normalize_kind () --- > /home/aldyh/src/gcc/gcc/value-range.h:1184:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] > 1184 | irange::normalize_kind () 1428,1430c1597,1601 < /home/aldyh/src/gcc/gcc/wide-int.h:1213:1: warning: inlining failed in call to ‘wide_int_storage& wide_int_storage::operator=(const wide_int_storage&)’: --param inline-unit-growth limit reached [-Winline] < 1213 | wide_int_storage::operator = (const wide_int_storage &x) < | ^~~~~~~~~~~~~~~~ --- > In copy constructor ‘generic_wide_int<wide_int_storage>::generic_wide_int(const generic_wide_int<wide_int_storage>&)’, > inlined from ‘bool irange::irange_single_pair_union(const irange&)’ at /home/aldyh/src/gcc/gcc/value-range.cc:1343:25: > /home/aldyh/src/gcc/gcc/wide-int.h:1196:8: warning: inlining failed in call to ‘wide_int_storage::wide_int_storage(const wide_int_storage&)’: --param inline-unit-growth limit reached [-Winline] > 1196 | inline wide_int_storage::wide_int_storage (const wide_int_storage &x) > | ^~~~~~~~~~~~~~~~ 1517,1518c1688,1689 < /home/aldyh/src/gcc/gcc/value-range.h:1151:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] < 1151 | irange::normalize_kind () --- > /home/aldyh/src/gcc/gcc/value-range.h:1184:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] > 1184 | irange::normalize_kind () 1524c1695 < inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1089:25, --- > inlined from ‘wide_int irange::lower_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1122:25, 1533,1534c1704,1705 < inlined from ‘wide_int irange::upper_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1100:29, < inlined from ‘wide_int irange::upper_bound() const’ at /home/aldyh/src/gcc/gcc/value-range.h:1110:32, --- > inlined from ‘wide_int irange::upper_bound(unsigned int) const’ at /home/aldyh/src/gcc/gcc/value-range.h:1133:29, > inlined from ‘wide_int irange::upper_bound() const’ at /home/aldyh/src/gcc/gcc/value-range.h:1143:32, 1772,1773c1943,1944 < /home/aldyh/src/gcc/gcc/value-range.h:1151:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] < 1151 | irange::normalize_kind () --- > /home/aldyh/src/gcc/gcc/value-range.h:1184:1: warning: inlining failed in call to ‘void irange::normalize_kind()’: --param max-inline-insns-single limit reached [-Winline] > 1184 | irange::normalize_kind () ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-05-03 8:55 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-04-26 9:42 1.76% performance loss in VRP due to inlining Aldy Hernandez 2024-04-30 7:57 ` Richard Biener 2024-04-30 8:21 ` Aldy Hernandez 2024-04-30 8:53 ` Martin Jambor 2024-04-30 19:09 ` Jason Merrill 2024-04-30 19:15 ` Richard Biener 2024-04-30 21:48 ` Building libgccjit with -fno-semantic-interposition? ( was Re: 1.76% performance loss in VRP due to inlining) David Malcolm 2024-05-02 7:40 ` Andrea Corallo 2024-04-30 19:22 ` 1.76% performance loss in VRP due to inlining Jakub Jelinek 2024-04-30 21:37 ` Jason Merrill 2024-05-03 8:55 ` Aldy Hernandez
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).