From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-466357-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 120887 invoked by alias); 9 Nov 2017 03:31:45 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 120877 invoked by uid 89); 9 Nov 2017 03:31:45 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=suffer, struggle
X-HELO: mail-oi0-f44.google.com
Received: from mail-oi0-f44.google.com (HELO mail-oi0-f44.google.com) (209.85.218.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 09 Nov 2017 03:31:43 +0000
Received: by mail-oi0-f44.google.com with SMTP id y206so2251251oiy.4        for <gcc-patches@gcc.gnu.org>; Wed, 08 Nov 2017 19:31:43 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20161025;        h=x-gm-message-state:subject:to:references:from:message-id:date         :user-agent:mime-version:in-reply-to:content-transfer-encoding;        bh=qrLwvlL8SwDz18+o0Lv9vi1n0wWwL1OEqaekBwGjrgs=;        b=Px7HZ/LfAHsnhX6+Hmr7aXyrtaiXgVX5z4Sf89sQ0+9MhwZwV5ZcGMyEYzXB7Zi4YV         y1fz/C1oVpdOcgdgVvqB9zXI0YN7ChHwza7zVNjl1bWaoyydniAYZ591DFsLaBlWmFai         qBVFAxt9pvi1a0M8TTE4QFKGCqPD8cbK9WIdQn+52M3oJ6+qywVD4y9kDjx0PoEdo8Zy         hNBjf6lZOkXkadwXCkjoP3leSdiBFDmH3zrAovY1GkzzmPE0lu2605+X9ev4FZQBvaaP         n+gpOdqFVN0mpgXMNXRIHVeh1CLyV6ztioVGAWiD2wDwU2ReqUgifi7tvSgoElJPADyV         Rt1g==
X-Gm-Message-State: AJaThX4uFRWEI13JiHrxLxDZMAn3xBLyXvrLjhxHWFfYIlIsw5svyEpt	RNKvN9JWwQ2JpoKoLg1u7sw=
X-Google-Smtp-Source: ABhQp+TiF1913r8jXCR6sH6sUuX2jfYT+Uw6Hs34ttw2G97MoHiIKJRzgL0w9UD1OJbxR8ajrH5Q/g==
X-Received: by 10.202.223.68 with SMTP id w65mr1552443oig.101.1510198301776;        Wed, 08 Nov 2017 19:31:41 -0800 (PST)
Received: from localhost.localdomain (75-171-240-43.hlrn.qwest.net. [75.171.240.43])        by smtp.gmail.com with ESMTPSA id j7sm2747513otd.68.2017.11.08.19.31.39        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);        Wed, 08 Nov 2017 19:31:39 -0800 (PST)
Subject: Re: [001/nnn] poly_int: add poly-int.h
To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org
References: <871sltvm7r.fsf@linaro.org> <87vaj5u7id.fsf@linaro.org> <acd0c3a8-f9cf-0a86-84a2-3e0dbdf9c6e8@gmail.com> <87po8t5d55.fsf@linaro.org> <1b724f75-7515-08f7-a783-d86fb1fe8bf8@gmail.com> <87375oww6b.fsf@linaro.org> <52db2aa0-43cb-ab73-d5ff-0347a4cfee10@gmail.com> <87r2t8pqv3.fsf@linaro.org>
From: Martin Sebor <msebor@gmail.com>
Message-ID: <cdfb5964-8639-933a-00b3-c129c9a51173@gmail.com>
Date: Thu, 09 Nov 2017 09:10:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <87r2t8pqv3.fsf@linaro.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
X-SW-Source: 2017-11/txt/msg00700.txt.bz2

On 11/08/2017 11:28 AM, Richard Sandiford wrote:
> Martin Sebor <msebor@gmail.com> writes:
>> On 11/08/2017 09:51 AM, Richard Sandiford wrote:
>>> Martin Sebor <msebor@gmail.com> writes:
>>>> On 11/08/2017 02:32 AM, Richard Sandiford wrote:
>>>>> Martin Sebor <msebor@gmail.com> writes:
>>>>>> I haven't done nearly a thorough review but the dtor followed by
>>>>>> the placement new in the POLY_SET_COEFF() macro caught my eye so
>>>>>> I thought I'd ask sooner rather than later.  Given the macro
>>>>>> definition:
>>>>>>
>>>>>> +   The dummy comparison against a null C * is just a way of checking
>>>>>> +   that C gives the right type.  */
>>>>>> +#define POLY_SET_COEFF(C, RES, I, VALUE) \
>>>>>> +  ((void) (&(RES).coeffs[0] == (C *) 0), \
>>>>>> +   wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION \
>>>>>> +   ? (void) ((RES).coeffs[I] = VALUE) \
>>>>>> +   : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
>>>>>>
>>>>>> is the following use well-defined?
>>>>>>
>>>>>> +template<unsigned int N, typename C>
>>>>>> +inline poly_int_pod<N, C>&
>>>>>> +poly_int_pod<N, C>::operator <<= (unsigned int a)
>>>>>> +{
>>>>>> +  POLY_SET_COEFF (C, *this, 0, this->coeffs[0] << a);
>>>>>>
>>>>>> It looks to me as though the VALUE argument in the ctor invoked
>>>>>> by the placement new expression is evaluated after the dtor has
>>>>>> destroyed the very array element the VALUE argument expands to.
>>>>>
>>>>> Good catch!  It should simply have been doing <<= on each coefficient --
>>>>> I must have got carried away when converting to POLY_SET_COEFF.
>>>>>
>>>>> I double-checked the other uses and think that's the only one.
>>>>>
>>>>>> Whether or not is, in fact, a problem, it seems to me that using
>>>>>> a function template rather than a macro would be a clearer and
>>>>>> safer way to do the same thing.  (Safer in that the macro also
>>>>>> evaluates its arguments multiple times, which is often a source
>>>>>> of subtle bugs.)
>>>>>
>>>>> That would slow down -O0 builds though, by introducing an extra
>>>>> function call and set of temporaries even when the coefficients
>>>>> are primitive integers.
>>>>
>>>> Would decorating the function template with attribute always_inline
>>>> help?
>>>
>>> It would remove the call itself, but we'd still have the extra temporary
>>> objects that were the function argument and return value.
>>
>> Sorry, I do not want to get into another long discussion about
>> trade-offs between safety and efficiency but I'm not sure I see
>> what extra temporaries it would create.  It seems to me that
>> an inline function template that took arguments of user-defined
>> types by reference and others by value should be just as efficient
>> as a macro.
>>
>>  From GCC's own manual:
>>
>>    6.43 An Inline Function is As Fast As a Macro
>>    https://gcc.gnu.org/onlinedocs/gcc/Inline.html
>
> You can see the difference with something like:
>
>   inline
>   void __attribute__((always_inline))
>   f(int &dst, const int &src) { dst = src; }
>
>   int g1(const int &y) { int x; f(x, y); return x; }
>   int g2(const int &y) { int x; x = y; return x; }

Let me say at the outset that I struggle to comprehend that a few
instructions is even a consideration when not optimizing, especially
in light of the bug the macro caused that would have been prevented
by using a function instead.  But...

...I don't think your example above is representative of using
the POLY_SET_COEFF macro.  The function template I'm suggesting
might look something to this:

   template <unsigned N, class C>
   inline void __attribute__ ((always_inline))
   poly_set_coeff (poly_int_pod<N, C> *p, unsigned idx, C val)
   {
     ((void) (&(*p).coeffs[0] == (C *) 0), 
wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION ? (void) 
((*p).coeffs[0] = val) : (void) ((*p).coeffs[0].~C (), new 
(&(*p).coeffs[0]) C (val)));

     if (N >= 2)
       for (unsigned int i = 1; i < N; i++)
         ((void) (&(*p).coeffs[0] == (C *) 0), 
wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION ? (void) 
((*p).coeffs[i] = val) : (void) ((*p).coeffs[i].~C (), new 
(&(*p).coeffs[i]) C (val)));
   }

To compare apples to apples I suggest to instead compare the shift
operator (or any other poly_int function that uses the macro) that
doesn't suffer from the bug vs one that makes use of the function
template.  I see a difference of 2 instructions on x86_64 (21 vs
23) for operator<<=.

Are two assembly instructions even worth talking about?

>> If that's not the case and there is a significant performance
>> penalty associated with inline functions at -O0 then GCC should
>> be fixed to avoid it.
>
> I think those docs are really talking about inline functions being as
> fast as macros when optimisation is enabled.  I don't think we make
> any guarantees about -O0 code quality.

Sure, but you are using unsafe macros in preference to a safer
inline function even with optimization, introducing a bug as
a result, and making an argument that the performance impact
of a few instructions when not using optimization is what should
drive the decision between one and the other in all situations.
With all respect, I fail to see the logic in this like of
reasoning.  By that argument we would never be able to define
any inline functions.

That being said, if the performance implications of using inline
functions with no optimization are so serious here then I suggest
you should be concerned about introducing the poly_int API in its
current form at all: every access to the class is an inline
function.

On a more serious/constructive note, if you really are worried
about efficiency at this level then introducing an intrinsic
primitive into the compiler instead of a set of classes might
be worth thinking about.  It will only benefit GCC but it might
lay a foundation for all sorts of infinite precision integer
classes (including the C++ proposal that was pointed out in
the other thread).

Martin