From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qv1-xf2d.google.com (mail-qv1-xf2d.google.com [IPv6:2607:f8b0:4864:20::f2d]) by sourceware.org (Postfix) with ESMTPS id AB9C7398545A for ; Sat, 28 Nov 2020 00:17:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org AB9C7398545A Received: by mail-qv1-xf2d.google.com with SMTP id n9so3057948qvp.5 for ; Fri, 27 Nov 2020 16:17:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=iM0/bCkEYyJhlpMPrYGXsRr7NJcQ/oATki3NgdwxmFA=; b=CI4OufDSnqNiaigla0agXUM2ZDu9QEc7h/dqXXSofHvgWc1u9Rx0gWCYNAxZDPWURU O0K0VLHzWB1GRY5BvnsaGL4aD+t6e7l5mBl5YLgBRf+8U3cZgS1y8zpTVw2oFQnLZs9z Q2/4Wdv/frfyi8dDtZb7t99pd8755uyS0ucBbu83KyLKQRCnjS+dbXS/L03D6+nasuDV PImdSwvxuRno6VId/zG3CGGt1O0IvK+0ZknX5QklAu7gMgFHi0rhKmaRkStziaN32HBw Uux1I8ayiiWJRjsainszIExZKvLD2XgwJ4+mkYybwyXKoR0NKr1V+i4AvtswfRWb5T7J lVzQ== X-Gm-Message-State: AOAM533eQn32deaCoYNm8BvB+jas22IsYI2imJRBSFmALU0Qqc0Tt+4X 4fo2LU2qJLhJyBusqOOvmrXgG4Vuuy8= X-Google-Smtp-Source: ABdhPJx0IqBqYfa21E1RD8BB4VtZIrCxy3eYhpceoUetwYVLSH78BP+JKsdfHbkskQnLwtrg20iVFA== X-Received: by 2002:ad4:476b:: with SMTP id d11mr10954928qvx.57.1606522634277; Fri, 27 Nov 2020 16:17:14 -0800 (PST) Received: from [192.168.0.41] (75-166-106-198.hlrn.qwest.net. [75.166.106.198]) by smtp.gmail.com with ESMTPSA id e126sm7791158qkb.90.2020.11.27.16.17.13 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 27 Nov 2020 16:17:13 -0800 (PST) Subject: Re: [07/23] Add a class that multiplexes two pointer types To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com References: From: Martin Sebor Message-ID: Date: Fri, 27 Nov 2020 17:17:12 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Nov 2020 00:17:16 -0000 On 11/26/20 10:06 AM, Richard Sandiford wrote: > Martin Sebor writes: >> I do have one concern: the tendency to prioritize efficiency >> over safety (this can be said about most GCC code). Specifically >> in this class, the address bit twiddling makes me uneasy. I don't >> think the object model in either language (certainly not C but >> I don't have the impression C++ either) makes it unequivocally >> valid. On the contrary, I'd say many of us interpret the current >> rules as leaving it undefined. There are efforts to sanction >> this sort of thing under some conditions (e.g, the C object >> model proposal) but they have not been adopted yet. I think >> we should try to avoid exploiting these dark corners in new >> code. > > I'd tried to stick to operations that I thought were well-defined. > The primitives being used are really: > > (1) convert a T1* or T2* to char* > (2) increment an unincremented char* > (3) decrement an incremented char* > (4) convert a char* back to T1* or T2* > (5) convert a char* to an intptr_t in order to test its low bit All those are valid as long as the pointer points into the same object, both before and after. > I thought (1) and (4) were allowed. At least, [basic.compound] says > that void* must be able to hold any object pointer and that it must have > the same representation as char*, so I thought the conversion in (1) was > guaranteed to be representable. And (4) only ever undoes (1): it only > converts the result of (1) back to the original pointer type. > > For (2) and (3), the incremented pointer will still be within the > containing object, so I thought it would be well-defined. Here too, > (3) only ever undoes (2): it only decrements a pointer that had > previously been incremented. > > One thing I'd deliberately tried to avoid was converting integers > “back” to pointers, because that seemed like a more dangerous thing. > That's why: > >>> +template >>> +inline T2 * >>> +pointer_mux::second_or_null () const >>> +{ >>> + // Micro optimization that's effective as of GCC 11: compute the value >>> + // of the second pointer as an integer and test that, so that the integer >>> + // result can be reused as the pointer and so that all computation can >>> + // happen before a branch on null. This reduces the number of branches >>> + // needed for loops. >>> + return uintptr_t (m_ptr - 1) & 1 ? nullptr : known_second (); This is only valid if m_ptr points to the second byte of an object. If it points to the first byte of A then it's invalid. This would make the test valid but the result strictly unspecified (though in practice I'd expect it to do what you expect): return (uintptr_t (m_ptr) - 1) & 1 ? nullptr : known_second (); >>> +} > > is written in a somewhat indirect way. > > Are your concerns with the primitives above, or is the problem with > something else? My initial impression was that the code stored information in the least significant bits of the pointer. Now that I've looked at it again I still think it does that, except not directly but indirectly, by storing a pointer to either the first byte of one object (A) or to the second byte of another (B). Correct? (If so, I would recommend to expand the documentation to make this clearer.) It's clever (a little too clever for my taste) but other than the m_ptr - 1 part above I can't think of anything undefined about it. My main concern also wasn't with the bit twiddling as such but with hiding the identity of the objects by manipulating the representation of the pointers via integer operations. Since (if) the code doesn't really do that, it may be less of a problem than I thought. Martin