From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) by sourceware.org (Postfix) with ESMTPS id 045753857C4F for ; Wed, 25 Nov 2020 23:33:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 045753857C4F Received: by mail-qt1-x831.google.com with SMTP id m65so2883095qte.11 for ; Wed, 25 Nov 2020 15:33:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=kPHXn3lrz+lLp+/i6AqPy4YDSAnOriucv1gje6bUE9c=; b=VVbRIReTlxb7aAr5epRYMpkiHQ4pW8Ty5LSZ9Oj51vEZCLGTMQqbHOEHOGNplmXJOH V7fkxX1kgnnNmTEIWGchoPXOHKzwIb8SEWA5sjtNh8Szc/EqRpV0GnvOJTm09fksAoIu G7CnBtbLGrU+w0s5+rbiit1CL5VsUE0Dyjrm/tZYPzpllwm9AGbD0p9SqmIO54kMpJ5N axl1MLjxUsC1Q+W8jwflOJHzxXDPb1sKFH8R7ISa1X+7FywWr/K4ORUOQyUj9xLFfEcZ 23G5V1pdy26kIla+cTyjUwepX8XUx4x+5zPdP0Li93sWSR/qpmpdy2FrRUuvl7nm1zVT Fnjg== X-Gm-Message-State: AOAM530+bJroS4A0H6QE3BMqZKr/AWrEIrEUgSPopZ0ktH7Fs8nlIpe0 RCdHfqFzv+CKQxVEx9vUk+2jnY1aHDw= X-Google-Smtp-Source: ABdhPJznLutebs15H+6xl9aI9cg79nicdnoiVtNgPKzh97bwZi9vn51rbVl9u9UZ4odMU1Dthga1oQ== X-Received: by 2002:ac8:41d4:: with SMTP id o20mr399967qtm.313.1606347208508; Wed, 25 Nov 2020 15:33:28 -0800 (PST) Received: from [192.168.0.41] (75-166-106-198.hlrn.qwest.net. [75.166.106.198]) by smtp.gmail.com with ESMTPSA id z88sm905718qtd.46.2020.11.25.15.33.27 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 25 Nov 2020 15:33:27 -0800 (PST) Subject: Re: [07/23] Add a class that multiplexes two pointer types To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com References: From: Martin Sebor Message-ID: Date: Wed, 25 Nov 2020 16:33:26 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Nov 2020 23:33:49 -0000 On 11/13/20 1:14 AM, Richard Sandiford via Gcc-patches wrote: > This patch adds a pointer_mux class that provides similar > functionality to: > > union { T1 *a; T2 *b; }; > ... > bool is_b_rather_than_a; > > except that the is_b_rather_than_a tag is stored in the low bit > of the pointer. See the comments in the patch for a comparison > between the two approaches and why this one can be more efficient. > > I've tried to microoptimise the class a fair bit, since a later > patch uses it extensively in order to keep the sizes of data > structures down. I've been reading these changes mostly out of interest than to provide comments. I like your use of C++ -- you clearly know the language very well. I also appreciate the extensive commentary. It makes understanding the code (and the changes) much easier. Thank you for doing that! We should all aspire to follow your example! :) I do have one concern: the tendency to prioritize efficiency over safety (this can be said about most GCC code). Specifically in this class, the address bit twiddling makes me uneasy. I don't think the object model in either language (certainly not C but I don't have the impression C++ either) makes it unequivocally valid. On the contrary, I'd say many of us interpret the current rules as leaving it undefined. There are efforts to sanction this sort of thing under some conditions (e.g, the C object model proposal) but they have not been adopted yet. I think we should try to avoid exploiting these dark corners in new code. I'm not too concerned that it will break with some compilers (it might, but code like this is out there already and works). What, I worry is that it will either prevent or make much more difficult any access checking that might otherwise be possible. I also worry that it will encourage people who look to GCC code for examples to duplicate these tricks in their own code, making it in turn harder for us to help them detect bugs in it. Having said that, I looked for tests that verify this new utility class (and the others in this series), partly to get a better idea of how it's meant to be used. I couldn't find any. I'd expect every nontrivial, general-purpose utility class to come with tests. (Having a library of these components might make testing easier.) Martin > > gcc/ > * mux-utils.h: New file. > --- > gcc/mux-utils.h | 248 ++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 248 insertions(+) > create mode 100644 gcc/mux-utils.h > > diff --git a/gcc/mux-utils.h b/gcc/mux-utils.h > new file mode 100644 > index 00000000000..17ced49cd22 > --- /dev/null > +++ b/gcc/mux-utils.h > @@ -0,0 +1,248 @@ > +// Multiplexer utilities > +// Copyright (C) 2020 Free Software Foundation, Inc. > +// > +// This file is part of GCC. > +// > +// GCC is free software; you can redistribute it and/or modify it under > +// the terms of the GNU General Public License as published by the Free > +// Software Foundation; either version 3, or (at your option) any later > +// version. > +// > +// GCC is distributed in the hope that it will be useful, but WITHOUT ANY > +// WARRANTY; without even the implied warranty of MERCHANTABILITY or > +// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > +// for more details. > +// > +// You should have received a copy of the GNU General Public License > +// along with GCC; see the file COPYING3. If not see > +// . > + > +#ifndef GCC_MUX_UTILS_H > +#define GCC_MUX_UTILS_H 1 > + > +// A class that stores a choice "A or B", where A has type T1 * and B has > +// type T2 *. Both T1 and T2 must have an alignment greater than 1, since > +// the low bit is used to identify B over A. T1 and T2 can be the same. > +// > +// A can be a null pointer but B cannot. > +// > +// Barring the requirement that B must be nonnull, using the class is > +// equivalent to using: > +// > +// union { T1 *A; T2 *B; }; > +// > +// and having a separate tag bit to indicate which alternative is active. > +// However, using this class can have two advantages over a union: > +// > +// - It avoides the need to find somewhere to store the tag bit. > +// > +// - The compiler is aware that B cannot be null, which can make checks > +// of the form: > +// > +// if (auto *B = mux.dyn_cast ()) > +// > +// more efficient. With a union-based representation, the dyn_cast > +// check could fail either because MUX is an A or because MUX is a > +// null B, both of which require a run-time test. With a pointer_mux, > +// only a check for MUX being A is needed. > +template > +class pointer_mux > +{ > +public: > + // Return an A pointer with the given value. > + static pointer_mux first (T1 *); > + > + // Return a B pointer with the given (nonnull) value. > + static pointer_mux second (T2 *); > + > + pointer_mux () = default; > + > + // Create a null A pointer. > + pointer_mux (std::nullptr_t) : m_ptr (nullptr) {} > + > + // Create an A or B pointer with the given value. This is only valid > + // if T1 and T2 are distinct and if T can be resolved to exactly one > + // of them. > + template + typename Enable = typename > + std::enable_if::value > + != std::is_convertible::value>::type> > + pointer_mux (T *ptr); > + > + // Return true unless the pointer is a null A pointer. > + explicit operator bool () const { return m_ptr; } > + > + // Assign A and B pointers respectively. > + void set_first (T1 *ptr) { *this = first (ptr); } > + void set_second (T2 *ptr) { *this = second (ptr); } > + > + // Return true if the pointer is an A pointer. > + bool is_first () const { return !(uintptr_t (m_ptr) & 1); } > + > + // Return true if the pointer is a B pointer. > + bool is_second () const { return uintptr_t (m_ptr) & 1; } > + > + // Return the contents of the pointer, given that it is known to be > + // an A pointer. > + T1 *known_first () const { return reinterpret_cast (m_ptr); } > + > + // Return the contents of the pointer, given that it is known to be > + // a B pointer. > + T2 *known_second () const { return reinterpret_cast (m_ptr - 1); } > + > + // If the pointer is an A pointer, return its contents, otherwise > + // return null. Thus a null return can mean that the pointer is > + // either a null A pointer or a B pointer. > + // > + // If all A pointers are nonnull, it is more efficient to use: > + // > + // if (ptr.is_first ()) > + // ...use ptr.known_first ()... > + // > + // over: > + // > + // if (T1 *a = ptr.first_or_null ()) > + // ...use a... > + T1 *first_or_null () const; > + > + // If the pointer is a B pointer, return its contents, otherwise > + // return null. Using: > + // > + // if (T1 *b = ptr.second_or_null ()) > + // ...use b... > + // > + // should be at least as efficient as: > + // > + // if (ptr.is_second ()) > + // ...use ptr.known_second ()... > + T2 *second_or_null () const; > + > + // Return true if the pointer is a T. > + // > + // This is only valid if T1 and T2 are distinct and if T can be > + // resolved to exactly one of them. The condition is checked using > + // a static assertion rather than SFINAE because it gives a clearer > + // error message. > + template > + bool is_a () const; > + > + // Assert that the pointer is a T and return it as such. See is_a > + // for the restrictions on T. > + template > + T as_a () const; > + > + // If the pointer is a T, return it as such, otherwise return null. > + // See is_a for the restrictions on T. > + template > + T dyn_cast () const; > + > +private: > + pointer_mux (char *ptr) : m_ptr (ptr) {} > + > + // The pointer value for A pointers, or the pointer value + 1 for B pointers. > + // Using a pointer rather than a uintptr_t tells the compiler that second () > + // can never return null, and that second_or_null () is only null if > + // is_first (). > + char *m_ptr; > +}; > + > +template > +inline pointer_mux > +pointer_mux::first (T1 *ptr) > +{ > + gcc_checking_assert (!(uintptr_t (ptr) & 1)); > + return reinterpret_cast (ptr); > +} > + > +template > +inline pointer_mux > +pointer_mux::second (T2 *ptr) > +{ > + gcc_checking_assert (!(uintptr_t (ptr) & 1)); > + return reinterpret_cast (ptr) + 1; > +} > + > +template > +template > +inline pointer_mux::pointer_mux (T *ptr) > + : m_ptr (reinterpret_cast (ptr)) > +{ > + if (std::is_convertible::value) > + m_ptr += 1; > +} > + > +template > +inline T1 * > +pointer_mux::first_or_null () const > +{ > + return is_first () ? known_first () : nullptr; > +} > + > +template > +inline T2 * > +pointer_mux::second_or_null () const > +{ > + // Micro optimization that's effective as of GCC 11: compute the value > + // of the second pointer as an integer and test that, so that the integer > + // result can be reused as the pointer and so that all computation can > + // happen before a branch on null. This reduces the number of branches > + // needed for loops. > + return uintptr_t (m_ptr - 1) & 1 ? nullptr : known_second (); > +} > + > +template > +template > +inline bool > +pointer_mux::is_a () const > +{ > + static_assert (std::is_convertible::value > + != std::is_convertible::value, > + "Ambiguous pointer type"); > + if (std::is_convertible::value) > + return is_second (); > + else > + return is_first (); > +} > + > +template > +template > +inline T > +pointer_mux::as_a () const > +{ > + static_assert (std::is_convertible::value > + != std::is_convertible::value, > + "Ambiguous pointer type"); > + if (std::is_convertible::value) > + { > + gcc_checking_assert (is_second ()); > + return reinterpret_cast (m_ptr - 1); > + } > + else > + { > + gcc_checking_assert (is_first ()); > + return reinterpret_cast (m_ptr); > + } > +} > + > +template > +template > +inline T > +pointer_mux::dyn_cast () const > +{ > + static_assert (std::is_convertible::value > + != std::is_convertible::value, > + "Ambiguous pointer type"); > + if (std::is_convertible::value) > + { > + if (is_second ()) > + return reinterpret_cast (m_ptr - 1); > + } > + else > + { > + if (is_first ()) > + return reinterpret_cast (m_ptr); > + } > + return nullptr; > +} > + > +#endif >