public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Martin Sebor <msebor@gmail.com>
To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com
Subject: Re: [07/23] Add a class that multiplexes two pointer types
Date: Wed, 25 Nov 2020 16:33:26 -0700	[thread overview]
Message-ID: <c2079f07-0d25-a604-0ab9-ab3314704b12@gmail.com> (raw)
In-Reply-To: <mptmtzl8yh1.fsf@arm.com>

On 11/13/20 1:14 AM, Richard Sandiford via Gcc-patches wrote:
> This patch adds a pointer_mux<T1, T2> class that provides similar
> functionality to:
> 
>      union { T1 *a; T2 *b; };
>      ...
>      bool is_b_rather_than_a;
> 
> except that the is_b_rather_than_a tag is stored in the low bit
> of the pointer.  See the comments in the patch for a comparison
> between the two approaches and why this one can be more efficient.
> 
> I've tried to microoptimise the class a fair bit, since a later
> patch uses it extensively in order to keep the sizes of data
> structures down.

I've been reading these changes mostly out of interest than to
provide comments.  I like your use of C++ -- you clearly know
the language very well.  I also appreciate the extensive
commentary.  It makes understanding the code (and the changes)
much easier.  Thank you for doing that!  We should all aspire
to follow your example! :)

I do have one concern: the tendency to prioritize efficiency
over safety (this can be said about most GCC code). Specifically
in this class, the address bit twiddling makes me uneasy.  I don't
think the object model in either language (certainly not C but
I don't have the impression C++ either) makes it unequivocally
valid.  On the contrary, I'd say many of us interpret the current
rules as leaving it undefined.  There are efforts to sanction
this sort of thing under some conditions (e.g, the C object
model proposal) but they have not been adopted yet.  I think
we should try to avoid exploiting these dark corners in new
code.

I'm not too concerned that it will break with some compilers
(it might, but code like this is out there already and works).
What, I worry is that it will either prevent or make much more
difficult any access checking that might otherwise be possible.
I also worry that it will encourage people who look to GCC code
for examples to duplicate these tricks in their own code, making
it in turn harder for us to help them detect bugs in it.

Having said that, I looked for tests that verify this new utility
class (and the others in this series), partly to get a better
idea of how it's meant to be used.  I couldn't find any.  I'd
expect every nontrivial, general-purpose utility class to come
with tests.  (Having a library of these components might make
testing easier.)

Martin

> 
> gcc/
> 	* mux-utils.h: New file.
> ---
>   gcc/mux-utils.h | 248 ++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 248 insertions(+)
>   create mode 100644 gcc/mux-utils.h
> 
> diff --git a/gcc/mux-utils.h b/gcc/mux-utils.h
> new file mode 100644
> index 00000000000..17ced49cd22
> --- /dev/null
> +++ b/gcc/mux-utils.h
> @@ -0,0 +1,248 @@
> +// Multiplexer utilities
> +// Copyright (C) 2020 Free Software Foundation, Inc.
> +//
> +// This file is part of GCC.
> +//
> +// GCC is free software; you can redistribute it and/or modify it under
> +// the terms of the GNU General Public License as published by the Free
> +// Software Foundation; either version 3, or (at your option) any later
> +// version.
> +//
> +// GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +// WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +// FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +// for more details.
> +//
> +// You should have received a copy of the GNU General Public License
> +// along with GCC; see the file COPYING3.  If not see
> +// <http://www.gnu.org/licenses/>.
> +
> +#ifndef GCC_MUX_UTILS_H
> +#define GCC_MUX_UTILS_H 1
> +
> +// A class that stores a choice "A or B", where A has type T1 * and B has
> +// type T2 *.  Both T1 and T2 must have an alignment greater than 1, since
> +// the low bit is used to identify B over A.  T1 and T2 can be the same.
> +//
> +// A can be a null pointer but B cannot.
> +//
> +// Barring the requirement that B must be nonnull, using the class is
> +// equivalent to using:
> +//
> +//     union { T1 *A; T2 *B; };
> +//
> +// and having a separate tag bit to indicate which alternative is active.
> +// However, using this class can have two advantages over a union:
> +//
> +// - It avoides the need to find somewhere to store the tag bit.
> +//
> +// - The compiler is aware that B cannot be null, which can make checks
> +//   of the form:
> +//
> +//       if (auto *B = mux.dyn_cast<T2 *> ())
> +//
> +//   more efficient.  With a union-based representation, the dyn_cast
> +//   check could fail either because MUX is an A or because MUX is a
> +//   null B, both of which require a run-time test.  With a pointer_mux,
> +//   only a check for MUX being A is needed.
> +template<typename T1, typename T2 = T1>
> +class pointer_mux
> +{
> +public:
> +  // Return an A pointer with the given value.
> +  static pointer_mux first (T1 *);
> +
> +  // Return a B pointer with the given (nonnull) value.
> +  static pointer_mux second (T2 *);
> +
> +  pointer_mux () = default;
> +
> +  // Create a null A pointer.
> +  pointer_mux (std::nullptr_t) : m_ptr (nullptr) {}
> +
> +  // Create an A or B pointer with the given value.  This is only valid
> +  // if T1 and T2 are distinct and if T can be resolved to exactly one
> +  // of them.
> +  template<typename T,
> +	   typename Enable = typename
> +	     std::enable_if<std::is_convertible<T *, T1 *>::value
> +			    != std::is_convertible<T *, T2 *>::value>::type>
> +  pointer_mux (T *ptr);
> +
> +  // Return true unless the pointer is a null A pointer.
> +  explicit operator bool () const { return m_ptr; }
> +
> +  // Assign A and B pointers respectively.
> +  void set_first (T1 *ptr) { *this = first (ptr); }
> +  void set_second (T2 *ptr) { *this = second (ptr); }
> +
> +  // Return true if the pointer is an A pointer.
> +  bool is_first () const { return !(uintptr_t (m_ptr) & 1); }
> +
> +  // Return true if the pointer is a B pointer.
> +  bool is_second () const { return uintptr_t (m_ptr) & 1; }
> +
> +  // Return the contents of the pointer, given that it is known to be
> +  // an A pointer.
> +  T1 *known_first () const { return reinterpret_cast<T1 *> (m_ptr); }
> +
> +  // Return the contents of the pointer, given that it is known to be
> +  // a B pointer.
> +  T2 *known_second () const { return reinterpret_cast<T2 *> (m_ptr - 1); }
> +
> +  // If the pointer is an A pointer, return its contents, otherwise
> +  // return null.  Thus a null return can mean that the pointer is
> +  // either a null A pointer or a B pointer.
> +  //
> +  // If all A pointers are nonnull, it is more efficient to use:
> +  //
> +  //    if (ptr.is_first ())
> +  //      ...use ptr.known_first ()...
> +  //
> +  // over:
> +  //
> +  //    if (T1 *a = ptr.first_or_null ())
> +  //      ...use a...
> +  T1 *first_or_null () const;
> +
> +  // If the pointer is a B pointer, return its contents, otherwise
> +  // return null.  Using:
> +  //
> +  //    if (T1 *b = ptr.second_or_null ())
> +  //      ...use b...
> +  //
> +  // should be at least as efficient as:
> +  //
> +  //    if (ptr.is_second ())
> +  //      ...use ptr.known_second ()...
> +  T2 *second_or_null () const;
> +
> +  // Return true if the pointer is a T.
> +  //
> +  // This is only valid if T1 and T2 are distinct and if T can be
> +  // resolved to exactly one of them.  The condition is checked using
> +  // a static assertion rather than SFINAE because it gives a clearer
> +  // error message.
> +  template<typename T>
> +  bool is_a () const;
> +
> +  // Assert that the pointer is a T and return it as such.  See is_a
> +  // for the restrictions on T.
> +  template<typename T>
> +  T as_a () const;
> +
> +  // If the pointer is a T, return it as such, otherwise return null.
> +  // See is_a for the restrictions on T.
> +  template<typename T>
> +  T dyn_cast () const;
> +
> +private:
> +  pointer_mux (char *ptr) : m_ptr (ptr) {}
> +
> +  // The pointer value for A pointers, or the pointer value + 1 for B pointers.
> +  // Using a pointer rather than a uintptr_t tells the compiler that second ()
> +  // can never return null, and that second_or_null () is only null if
> +  // is_first ().
> +  char *m_ptr;
> +};
> +
> +template<typename T1, typename T2>
> +inline pointer_mux<T1, T2>
> +pointer_mux<T1, T2>::first (T1 *ptr)
> +{
> +  gcc_checking_assert (!(uintptr_t (ptr) & 1));
> +  return reinterpret_cast<char *> (ptr);
> +}
> +
> +template<typename T1, typename T2>
> +inline pointer_mux<T1, T2>
> +pointer_mux<T1, T2>::second (T2 *ptr)
> +{
> +  gcc_checking_assert (!(uintptr_t (ptr) & 1));
> +  return reinterpret_cast<char *> (ptr) + 1;
> +}
> +
> +template<typename T1, typename T2>
> +template<typename T, typename Enable>
> +inline pointer_mux<T1, T2>::pointer_mux (T *ptr)
> +  : m_ptr (reinterpret_cast<char *> (ptr))
> +{
> +  if (std::is_convertible<T *, T2 *>::value)
> +    m_ptr += 1;
> +}
> +
> +template<typename T1, typename T2>
> +inline T1 *
> +pointer_mux<T1, T2>::first_or_null () const
> +{
> +  return is_first () ? known_first () : nullptr;
> +}
> +
> +template<typename T1, typename T2>
> +inline T2 *
> +pointer_mux<T1, T2>::second_or_null () const
> +{
> +  // Micro optimization that's effective as of GCC 11: compute the value
> +  // of the second pointer as an integer and test that, so that the integer
> +  // result can be reused as the pointer and so that all computation can
> +  // happen before a branch on null.  This reduces the number of branches
> +  // needed for loops.
> +  return uintptr_t (m_ptr - 1) & 1 ? nullptr : known_second ();
> +}
> +
> +template<typename T1, typename T2>
> +template<typename T>
> +inline bool
> +pointer_mux<T1, T2>::is_a () const
> +{
> +  static_assert (std::is_convertible<T1 *, T>::value
> +		 != std::is_convertible<T2 *, T>::value,
> +		 "Ambiguous pointer type");
> +  if (std::is_convertible<T2 *, T>::value)
> +    return is_second ();
> +  else
> +    return is_first ();
> +}
> +
> +template<typename T1, typename T2>
> +template<typename T>
> +inline T
> +pointer_mux<T1, T2>::as_a () const
> +{
> +  static_assert (std::is_convertible<T1 *, T>::value
> +		 != std::is_convertible<T2 *, T>::value,
> +		 "Ambiguous pointer type");
> +  if (std::is_convertible<T2 *, T>::value)
> +    {
> +      gcc_checking_assert (is_second ());
> +      return reinterpret_cast<T> (m_ptr - 1);
> +    }
> +  else
> +    {
> +      gcc_checking_assert (is_first ());
> +      return reinterpret_cast<T> (m_ptr);
> +    }
> +}
> +
> +template<typename T1, typename T2>
> +template<typename T>
> +inline T
> +pointer_mux<T1, T2>::dyn_cast () const
> +{
> +  static_assert (std::is_convertible<T1 *, T>::value
> +		 != std::is_convertible<T2 *, T>::value,
> +		 "Ambiguous pointer type");
> +  if (std::is_convertible<T2 *, T>::value)
> +    {
> +      if (is_second ())
> +	return reinterpret_cast<T> (m_ptr - 1);
> +    }
> +  else
> +    {
> +      if (is_first ())
> +	return reinterpret_cast<T> (m_ptr);
> +    }
> +  return nullptr;
> +}
> +
> +#endif
> 


  parent reply	other threads:[~2020-11-25 23:33 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-13  8:10 [00/23] Make fwprop use an on-the-side RTL SSA representation Richard Sandiford
2020-11-13  8:11 ` [01/23] vec: Silence clang warning Richard Sandiford
2020-11-25 19:58   ` Jeff Law
2020-11-13  8:12 ` [02/23] rtlanal: Remove noop_move_p REG_EQUAL condition Richard Sandiford
2020-11-25 20:00   ` Jeff Law
2020-11-13  8:12 ` [03/23] reginfo: Add a global_reg_set Richard Sandiford
2020-11-25 20:01   ` Jeff Law
2020-11-13  8:13 ` [04/23] Move iterator_range to a new iterator-utils.h file Richard Sandiford
2020-11-25 20:02   ` Jeff Law
2020-11-13  8:13 ` [05/23] Add more iterator utilities Richard Sandiford
2020-11-25 20:12   ` Jeff Law
2020-11-13  8:14 ` [06/23] Add an RAII class for managing obstacks Richard Sandiford
2020-11-25 20:15   ` Jeff Law
2020-11-13  8:14 ` [07/23] Add a class that multiplexes two pointer types Richard Sandiford
2020-11-25 20:23   ` Jeff Law
2020-11-26 16:15     ` Richard Sandiford
2020-11-30  1:28       ` Jeff Law
2020-11-25 23:33   ` Martin Sebor [this message]
2020-11-26 17:06     ` Richard Sandiford
2020-11-27 18:12       ` Richard Sandiford
2020-11-28  0:17       ` Martin Sebor
2020-12-17  0:17         ` Richard Sandiford
2020-12-17 14:21           ` Tom Tromey
2020-12-17 15:38             ` Richard Sandiford
2020-12-17 15:44               ` Nathan Sidwell
2021-01-04 15:32                 ` Jeff Law
2020-11-13  8:15 ` [08/23] Add an alternative splay tree implementation Richard Sandiford
2020-12-02 20:36   ` Jeff Law
2020-12-17  0:29     ` Richard Sandiford
2021-01-04 15:27       ` Jeff Law
2021-01-01  8:25   ` Andreas Schwab
2021-01-04 14:53     ` Richard Sandiford
2021-01-04 15:02       ` Andreas Schwab
2021-01-04 15:42         ` Richard Sandiford
2021-01-05 12:13           ` Richard Biener
2020-11-13  8:15 ` [09/23] Add a cut-down version of std::span (array_slice) Richard Sandiford
2020-11-30 19:56   ` Jeff Law
2022-08-03 15:13   ` Martin Jambor
2022-08-03 15:31     ` Richard Sandiford
2022-08-10 16:03   ` Martin Jambor
2022-08-11  6:58     ` Richard Biener
2022-08-16  7:59       ` Richard Sandiford
2020-11-13  8:16 ` [10/23] Tweak the way that is_a is implemented Richard Sandiford
2020-12-02  5:15   ` Jeff Law
2020-11-13  8:16 ` [11/23] Split update_cfg_for_uncondjump out of combine Richard Sandiford
2020-11-30  6:14   ` Jeff Law
2020-11-13  8:17 ` [12/23] Export print-rtl.c:print_insn_with_notes Richard Sandiford
2020-11-25 20:24   ` Jeff Law
2020-11-13  8:18 ` [13/23] recog: Split out a register_asm_p function Richard Sandiford
2020-11-25 20:24   ` Jeff Law
2020-11-13  8:18 ` [14/23] simplify-rtx: Put simplify routines into a class Richard Sandiford
2020-11-30 19:54   ` Jeff Law
2020-11-13  8:19 ` [15/23] recog: Add a validate_change_xveclen function Richard Sandiford
2020-11-30 20:03   ` Jeff Law
2020-11-13  8:19 ` [16/23] recog: Add a way of temporarily undoing changes Richard Sandiford
2020-11-25 20:27   ` Jeff Law
2020-12-17  0:22     ` Richard Sandiford
2020-11-13  8:20 ` [17/23] recog: Add a class for propagating into insns Richard Sandiford
2020-12-03 22:32   ` Jeff Law
2020-11-13  8:20 ` [18/23] recog: Add an RAII class for undoing insn changes Richard Sandiford
2020-11-25 20:27   ` Jeff Law
2020-11-13  8:20 ` [19/23] rtlanal: Add some new helper classes Richard Sandiford
2020-12-13 17:30   ` Jeff Law
2020-12-14 16:37     ` Richard Sandiford
2020-12-14 20:02       ` Jeff Law
2020-11-13  8:21 ` [20/23] rtlanal: Add simple_regno_set Richard Sandiford
2020-11-25 20:31   ` Jeff Law
2020-12-17  0:47     ` Richard Sandiford
2021-01-04 15:28       ` Jeff Law
2020-11-13  8:22 ` [21/23] doc: Add documentation for rtl-ssa Richard Sandiford
2020-11-30  6:26   ` Jeff Law
2020-11-13  8:23 ` [PATCH 22/23] Add rtl-ssa Richard Sandiford
2020-12-16  3:31   ` Jeff Law
2020-12-17  0:33     ` Richard Sandiford
2020-12-19 20:01       ` Jeff Law
2020-11-13  8:24 ` [PATCH 23/23] fwprop: Rewrite to use RTL SSA Richard Sandiford
2020-12-16  3:52   ` Jeff Law
2020-12-17  0:34     ` Richard Sandiford
2020-11-25 19:58 ` [00/23] Make fwprop use an on-the-side RTL SSA representation Jeff Law
2020-11-26 16:03   ` Richard Sandiford
2020-11-27 15:56     ` Michael Matz
2020-11-27 16:31       ` Richard Sandiford
2020-11-30 21:13         ` Jeff Law
2020-12-01  0:03           ` Michael Matz
2020-12-01 10:15             ` Richard Sandiford
2020-12-02  0:25             ` Jeff Law
2020-11-30  6:45     ` Jeff Law
2020-11-30 14:12       ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c2079f07-0d25-a604-0ab9-ab3314704b12@gmail.com \
    --to=msebor@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).