From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x332.google.com (mail-ot1-x332.google.com [IPv6:2607:f8b0:4864:20::332]) by sourceware.org (Postfix) with ESMTPS id E0C81385703F for ; Mon, 7 Jun 2021 22:17:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E0C81385703F Received: by mail-ot1-x332.google.com with SMTP id o17-20020a9d76510000b02903eabfc221a9so4629528otl.0 for ; Mon, 07 Jun 2021 15:17:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=1TA1glVAhwVk4DG6JysM05eLTisRc3orkKIVpKh23jg=; b=E6pzErqxOb8uGJbNVDX82y5iiltYifRjS+TucVjQsKNaqsZhdWQM0zfandO0iDOcvH ZDFvqJ200cVq1sXkapqND5qpvRQzrP+wB9pRlcsd81JK5aKUtfbABqWXNxpdJhP2Mbw9 rLOYhfOQu+vMxCkT+DB1N+XSOETN7kDLgboteS3bzJl4ibhnfIyzTCw6yk7aOWu8Pgr/ fEJSjBQQS209GRe/PWqQ0A3Q4ayj7cl4iJK8fukNnA9SUXuEDOJSSL+IZrKi+RaYKI3h Zg7pP2pOMLr32PE11cVMN7+/BgV9Z6rzdHy+ah0MJL34ERljD/rFCjOiKRDCraNbCah4 vDWw== X-Gm-Message-State: AOAM530SsdRzj87VQCdz5dTwSEwizth8+0F0BfKnAHSWxnbqwzCGQ1v8 c/6HKzf3+Bk17CtRoiYecWE4QXfGjV0= X-Google-Smtp-Source: ABdhPJwwv1SlHeo5l0piuEM7EyWQtv6DgJm2XLWmgtmhJBsXCEX5aYY9M9tqdaRnNYeDfdXN1JYUiA== X-Received: by 2002:a05:6830:110:: with SMTP id i16mr15559333otp.249.1623104230884; Mon, 07 Jun 2021 15:17:10 -0700 (PDT) Received: from [192.168.0.41] (184-96-226-253.hlrn.qwest.net. [184.96.226.253]) by smtp.gmail.com with ESMTPSA id y7sm2523586oix.36.2021.06.07.15.17.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 07 Jun 2021 15:17:10 -0700 (PDT) Subject: Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904) To: Trevor Saunders Cc: Richard Biener , Jonathan Wakely , gcc-patches References: <91545a73-12af-33b2-c6e7-119b5a21de60@gmail.com> <4d503394-4e82-1d36-41ca-34315042775b@redhat.com> <372f08c6-9466-3bdb-a246-0815e1b84047@gmail.com> From: Martin Sebor Message-ID: <9297e9ff-cb66-4800-30bc-24d5042d558a@gmail.com> Date: Mon, 7 Jun 2021 16:17:09 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2021 22:17:22 -0000 On 6/3/21 2:29 AM, Trevor Saunders wrote: > On Wed, Jun 02, 2021 at 10:04:03AM -0600, Martin Sebor via Gcc-patches wrote: >> On 6/2/21 12:55 AM, Richard Biener wrote: >>> On Tue, Jun 1, 2021 at 9:56 PM Martin Sebor wrote: >>>> >>>> On 5/27/21 2:53 PM, Jason Merrill wrote: >>>>> On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote: >>>>>> On 4/27/21 8:04 AM, Richard Biener wrote: >>>>>>> On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor wrote: >>>>>>>> >>>>>>>> On 4/27/21 1:58 AM, Richard Biener wrote: >>>>>>>>> On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> PR 90904 notes that auto_vec is unsafe to copy and assign because >>>>>>>>>> the class manages its own memory but doesn't define (or delete) >>>>>>>>>> either special function. Since I first ran into the problem, >>>>>>>>>> auto_vec has grown a move ctor and move assignment from >>>>>>>>>> a dynamically-allocated vec but still no copy ctor or copy >>>>>>>>>> assignment operator. >>>>>>>>>> >>>>>>>>>> The attached patch adds the two special functions to auto_vec along >>>>>>>>>> with a few simple tests. It makes auto_vec safe to use in containers >>>>>>>>>> that expect copyable and assignable element types and passes >>>>>>>>>> bootstrap >>>>>>>>>> and regression testing on x86_64-linux. >>>>>>>>> >>>>>>>>> The question is whether we want such uses to appear since those >>>>>>>>> can be quite inefficient? Thus the option is to delete those >>>>>>>>> operators? >>>>>>>> >>>>>>>> I would strongly prefer the generic vector class to have the properties >>>>>>>> expected of any other generic container: copyable and assignable. If >>>>>>>> we also want another vector type with this restriction I suggest to add >>>>>>>> another "noncopyable" type and make that property explicit in its name. >>>>>>>> I can submit one in a followup patch if you think we need one. >>>>>>> >>>>>>> I'm not sure (and not strictly against the copy and assign). Looking >>>>>>> around >>>>>>> I see that vec<> does not do deep copying. Making auto_vec<> do it >>>>>>> might be surprising (I added the move capability to match how vec<> >>>>>>> is used - as "reference" to a vector) >>>>>> >>>>>> The vec base classes are special: they have no ctors at all (because >>>>>> of their use in unions). That's something we might have to live with >>>>>> but it's not a model to follow in ordinary containers. >>>>> >>>>> I don't think we have to live with it anymore, now that we're writing >>>>> C++11. >>>>> >>>>>> The auto_vec class was introduced to fill the need for a conventional >>>>>> sequence container with a ctor and dtor. The missing copy ctor and >>>>>> assignment operators were an oversight, not a deliberate feature. >>>>>> This change fixes that oversight. > > I've been away a while, but trying to get back into this, sorry. It was > definitely an oversight to leave these undefined for the compiler to > provide a default definition of, but I agree with Richi, the better > thing to have done, or do now would be to mark them as deleted and make > auto_vec move only (with copy() for when you really need a deep copy. >>>>>> >>>>>> The revised patch also adds a copy ctor/assignment to the auto_vec >>>>>> primary template (that's also missing it). In addition, it adds >>>>>> a new class called auto_vec_ncopy that disables copying and >>>>>> assignment as you prefer. >>>>> >>>>> Hmm, adding another class doesn't really help with the confusion richi >>>>> mentions. And many uses of auto_vec will pass them as vec, which will >>>>> still do a shallow copy. I think it's probably better to disable the >>>>> copy special members for auto_vec until we fix vec<>. >>>> >>>> There are at least a couple of problems that get in the way of fixing >>>> all of vec to act like a well-behaved C++ container: >>>> >>>> 1) The embedded vec has a trailing "flexible" array member with its >>>> instances having different size. They're initialized by memset and >>>> copied by memcpy. The class can't have copy ctors or assignments >>>> but it should disable/delete them instead. >>>> >>>> 2) The heap-based vec is used throughout GCC with the assumption of >>>> shallow copy semantics (not just as function arguments but also as >>>> members of other such POD classes). This can be changed by providing >>>> copy and move ctors and assignment operators for it, and also for >>>> some of the classes in which it's a member and that are used with >>>> the same assumption. >>>> >>>> 3) The heap-based vec::block_remove() assumes its elements are PODs. >>>> That breaks in VEC_ORDERED_REMOVE_IF (used in gcc/dwarf2cfi.c:2862 >>>> and tree-vect-patterns.c). >>>> >>>> I took a stab at both and while (1) is easy, (2) is shaping up to >>>> be a big and tricky project. Tricky because it involves using >>>> std::move in places where what's moved is subsequently still used. >>>> I can keep plugging away at it but it won't change the fact that >>>> the embedded and heap-based vecs have different requirements. >>> >>> So you figured that neither vec<> nor auto_vec<> are a container like >>> std::vector. >> >> That's obvious from glancing at their definitions. I didn't go >> through the exercise to figure that out. >> >>> >>> I'm not sure it makes sense to try to make it so since obviously vec<> >>> was designed to match the actual needs of GCC. auto_vec<> was added >>> to make a RAII (like auto_bitmap, etc.) wrapper, plus it got the ability >>> to provide initial stack storage. >> >> The goal was to see if the two vec instances could be made safer >> to use but taking advantage of C++ 11 features. As I mentioned >> recently, creating a copy of a vec and modifying it changes it as >> well as the original (e.g., by changing a vec argument passed to >> it by value a function changes the actual argument in the caller). >> That's surprising to most C++ programmers. > > It can probably be improved now with c++11, but while very unfortunate > There is hard requirements on how vec works from existing code using it. > >> My conclusion from the exercise is that although some of the problems >> with vec can, and IMO should, be solved, making the heap-based one >> a well-behaved C++ 11 container will take considerable effort and >> is impossible for the embedded vec. > > Yes, fortunately things using embedded vec do not at all expect a c++ > container, and so don't really mismatch it. You probably should not be > creating them yourself unless you are creating a new object with an > embedded vector, and you probably don't want to do that. > >>> >>>> It doesn't seem to me that having a safely copyable auto_vec needs >>>> to be put on hold until the rats nest above is untangled. It won't >>>> make anything worse than it is. (I have a project that depends on >>>> a sane auto_vec working). >>> >>> So how does your usage look like? I can't really figure who'd need >>> deep copying of a container - note there's vec<>::copy at your >>> discretion. >>> >>>> A couple of alternatives to solving this are to use std::vector or >>>> write an equivalent vector class just for GCC. > > imho one of the significant advantages to having our own datastructures > rather than using the standard library is the ability to have a > different API that is less constrained by history, and can make better > choices than standard containers like deleting operators that would > otherwise require deep coppies. Though certainly they don't always live > up to that like the oversight here of not defining the copy / assignment > operators at all. Perhaps there's an argument to be made for the > standard containers doing deep coppies that it makes the language easier > to use, but its not all that much easier than .copy(), if that's your > priority c++ probably isn't the right tool for the job, and I doubt it > makes sense for gcc in particular. Having special containers just for GCC is mainly a liability. They are different, sometimes gratuitously, from the standard containers that most C++ programmers are familiar with, and commonly even inconsistent with one another. Not just their APs but also their guarantees and effects. They're harder to use correctly and easy to make mistakes with. They're also less well tested and and much less well documented. >>> As said, can you show the usage that's impossible to do with >>> the current vec<>/auto_vec<>? >> >> The test case in PR 90904 shows a trivial example. More generally, >> using an auto_vec that depends on it being assignable (e.g., storing >> an auto_vec in another container like hash_map or auto_vec itself) >> is impossible. Using a plain vec requires manual memory management >> and so is error-prone. > > Certainly deleting the copy constructor and assignment operator means > that you can't use them, but can you show real code where it is a > significant imposition to have to call .copy() rather than using them? > Certainly its a little longer, but deep copies are a bit of a > performance footgun, especially when you have vectors linear in the size > of the function all over, and your goal is to be no worse than > O(N log(N)), meaning you can copy the vector at most log(N) times at > worst. > > I would think storing move only objects in auto_vec and hash_* should > work, and if it doesn't should be fixable without introducing overly > easy ways to make deep coppies. > >> But more important, as a C++ code base, GCC should follow the best >> practices for the language. Among the essential ones are using RAII >> to manage resources and the Rule of Three (or Five in C++ 11): a class >> that defines a dtor should also define a copy ctor and copy assignment >> (and move ctor and move assignment in C++). > > When discussing the rule of 3/5 at least > https://en.cppreference.com/w/cpp/language/rule_of_three considers > deleting the member to be a form of definition, see the part about non > copiable members and deleting both copy constructor and assignment, The Rule of Three was coined sometime in the early '90s, well before C++ 11 where deleted function were introduced. The original intent was to emphasize that a class that defines one of the three special member functions should almost certainly define all of them, to make the basic operations safe. The reason for the rule was that programmers commonly forgot to define the full set and ended up setting a trap for users of their classes who inadvertently came to make use of their shallow copy semantics. In his 2001 Dr. Dobb's article about the rule, Andy Koenig shows a poster child for this mistake that looks like a spitting image of auto_vec: https://www.drdobbs.com/c-made-easier-the-rule-of-three/184401400#1 Despite its notoriety, the mistake is still common today, including in GCC. Not just in auto_vec, but also elsewhere, including the recently added auto_string_vec, or hash_table, now "fixed" by providing a copy ctor but deleting copy assignment. Thanks to that, hash-based containers are safely copyable but cannot be assigned. > in > this case to make the class move only. Strictly speaking, I suppose its > true that an array of 10k items is copiable, but its also very likely > something to be avoided if at all possible, and doesn't need to be made > easy. Inefficient copying should certainly be avoided, not just in GCC but in all code. But it's not (or should not be) a design goal of a general purpose API to make copying difficult just because some copies might be excessive. APIs should be easy and safe to work with first. Efficiency comes second, and shouldn't come at the expense of maintainability or safety. Martin > > Trev > >> >> Martin