From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 120887 invoked by alias); 9 Nov 2017 03:31:45 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 120877 invoked by uid 89); 9 Nov 2017 03:31:45 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=suffer, struggle X-HELO: mail-oi0-f44.google.com Received: from mail-oi0-f44.google.com (HELO mail-oi0-f44.google.com) (209.85.218.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 09 Nov 2017 03:31:43 +0000 Received: by mail-oi0-f44.google.com with SMTP id y206so2251251oiy.4 for ; Wed, 08 Nov 2017 19:31:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=qrLwvlL8SwDz18+o0Lv9vi1n0wWwL1OEqaekBwGjrgs=; b=Px7HZ/LfAHsnhX6+Hmr7aXyrtaiXgVX5z4Sf89sQ0+9MhwZwV5ZcGMyEYzXB7Zi4YV y1fz/C1oVpdOcgdgVvqB9zXI0YN7ChHwza7zVNjl1bWaoyydniAYZ591DFsLaBlWmFai qBVFAxt9pvi1a0M8TTE4QFKGCqPD8cbK9WIdQn+52M3oJ6+qywVD4y9kDjx0PoEdo8Zy hNBjf6lZOkXkadwXCkjoP3leSdiBFDmH3zrAovY1GkzzmPE0lu2605+X9ev4FZQBvaaP n+gpOdqFVN0mpgXMNXRIHVeh1CLyV6ztioVGAWiD2wDwU2ReqUgifi7tvSgoElJPADyV Rt1g== X-Gm-Message-State: AJaThX4uFRWEI13JiHrxLxDZMAn3xBLyXvrLjhxHWFfYIlIsw5svyEpt RNKvN9JWwQ2JpoKoLg1u7sw= X-Google-Smtp-Source: ABhQp+TiF1913r8jXCR6sH6sUuX2jfYT+Uw6Hs34ttw2G97MoHiIKJRzgL0w9UD1OJbxR8ajrH5Q/g== X-Received: by 10.202.223.68 with SMTP id w65mr1552443oig.101.1510198301776; Wed, 08 Nov 2017 19:31:41 -0800 (PST) Received: from localhost.localdomain (75-171-240-43.hlrn.qwest.net. [75.171.240.43]) by smtp.gmail.com with ESMTPSA id j7sm2747513otd.68.2017.11.08.19.31.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 Nov 2017 19:31:39 -0800 (PST) Subject: Re: [001/nnn] poly_int: add poly-int.h To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org References: <871sltvm7r.fsf@linaro.org> <87vaj5u7id.fsf@linaro.org> <87po8t5d55.fsf@linaro.org> <1b724f75-7515-08f7-a783-d86fb1fe8bf8@gmail.com> <87375oww6b.fsf@linaro.org> <52db2aa0-43cb-ab73-d5ff-0347a4cfee10@gmail.com> <87r2t8pqv3.fsf@linaro.org> From: Martin Sebor Message-ID: Date: Thu, 09 Nov 2017 09:10:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <87r2t8pqv3.fsf@linaro.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2017-11/txt/msg00700.txt.bz2 On 11/08/2017 11:28 AM, Richard Sandiford wrote: > Martin Sebor writes: >> On 11/08/2017 09:51 AM, Richard Sandiford wrote: >>> Martin Sebor writes: >>>> On 11/08/2017 02:32 AM, Richard Sandiford wrote: >>>>> Martin Sebor writes: >>>>>> I haven't done nearly a thorough review but the dtor followed by >>>>>> the placement new in the POLY_SET_COEFF() macro caught my eye so >>>>>> I thought I'd ask sooner rather than later. Given the macro >>>>>> definition: >>>>>> >>>>>> + The dummy comparison against a null C * is just a way of checking >>>>>> + that C gives the right type. */ >>>>>> +#define POLY_SET_COEFF(C, RES, I, VALUE) \ >>>>>> + ((void) (&(RES).coeffs[0] == (C *) 0), \ >>>>>> + wi::int_traits::precision_type == wi::FLEXIBLE_PRECISION \ >>>>>> + ? (void) ((RES).coeffs[I] = VALUE) \ >>>>>> + : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE))) >>>>>> >>>>>> is the following use well-defined? >>>>>> >>>>>> +template >>>>>> +inline poly_int_pod& >>>>>> +poly_int_pod::operator <<= (unsigned int a) >>>>>> +{ >>>>>> + POLY_SET_COEFF (C, *this, 0, this->coeffs[0] << a); >>>>>> >>>>>> It looks to me as though the VALUE argument in the ctor invoked >>>>>> by the placement new expression is evaluated after the dtor has >>>>>> destroyed the very array element the VALUE argument expands to. >>>>> >>>>> Good catch! It should simply have been doing <<= on each coefficient -- >>>>> I must have got carried away when converting to POLY_SET_COEFF. >>>>> >>>>> I double-checked the other uses and think that's the only one. >>>>> >>>>>> Whether or not is, in fact, a problem, it seems to me that using >>>>>> a function template rather than a macro would be a clearer and >>>>>> safer way to do the same thing. (Safer in that the macro also >>>>>> evaluates its arguments multiple times, which is often a source >>>>>> of subtle bugs.) >>>>> >>>>> That would slow down -O0 builds though, by introducing an extra >>>>> function call and set of temporaries even when the coefficients >>>>> are primitive integers. >>>> >>>> Would decorating the function template with attribute always_inline >>>> help? >>> >>> It would remove the call itself, but we'd still have the extra temporary >>> objects that were the function argument and return value. >> >> Sorry, I do not want to get into another long discussion about >> trade-offs between safety and efficiency but I'm not sure I see >> what extra temporaries it would create. It seems to me that >> an inline function template that took arguments of user-defined >> types by reference and others by value should be just as efficient >> as a macro. >> >> From GCC's own manual: >> >> 6.43 An Inline Function is As Fast As a Macro >> https://gcc.gnu.org/onlinedocs/gcc/Inline.html > > You can see the difference with something like: > > inline > void __attribute__((always_inline)) > f(int &dst, const int &src) { dst = src; } > > int g1(const int &y) { int x; f(x, y); return x; } > int g2(const int &y) { int x; x = y; return x; } Let me say at the outset that I struggle to comprehend that a few instructions is even a consideration when not optimizing, especially in light of the bug the macro caused that would have been prevented by using a function instead. But... ...I don't think your example above is representative of using the POLY_SET_COEFF macro. The function template I'm suggesting might look something to this: template inline void __attribute__ ((always_inline)) poly_set_coeff (poly_int_pod *p, unsigned idx, C val) { ((void) (&(*p).coeffs[0] == (C *) 0), wi::int_traits::precision_type == wi::FLEXIBLE_PRECISION ? (void) ((*p).coeffs[0] = val) : (void) ((*p).coeffs[0].~C (), new (&(*p).coeffs[0]) C (val))); if (N >= 2) for (unsigned int i = 1; i < N; i++) ((void) (&(*p).coeffs[0] == (C *) 0), wi::int_traits::precision_type == wi::FLEXIBLE_PRECISION ? (void) ((*p).coeffs[i] = val) : (void) ((*p).coeffs[i].~C (), new (&(*p).coeffs[i]) C (val))); } To compare apples to apples I suggest to instead compare the shift operator (or any other poly_int function that uses the macro) that doesn't suffer from the bug vs one that makes use of the function template. I see a difference of 2 instructions on x86_64 (21 vs 23) for operator<<=. Are two assembly instructions even worth talking about? >> If that's not the case and there is a significant performance >> penalty associated with inline functions at -O0 then GCC should >> be fixed to avoid it. > > I think those docs are really talking about inline functions being as > fast as macros when optimisation is enabled. I don't think we make > any guarantees about -O0 code quality. Sure, but you are using unsafe macros in preference to a safer inline function even with optimization, introducing a bug as a result, and making an argument that the performance impact of a few instructions when not using optimization is what should drive the decision between one and the other in all situations. With all respect, I fail to see the logic in this like of reasoning. By that argument we would never be able to define any inline functions. That being said, if the performance implications of using inline functions with no optimization are so serious here then I suggest you should be concerned about introducing the poly_int API in its current form at all: every access to the class is an inline function. On a more serious/constructive note, if you really are worried about efficiency at this level then introducing an intrinsic primitive into the compiler instead of a set of classes might be worth thinking about. It will only benefit GCC but it might lay a foundation for all sorts of infinite precision integer classes (including the C++ proposal that was pointed out in the other thread). Martin