From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-466378-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 30762 invoked by alias); 9 Nov 2017 11:06:55 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 30721 invoked by uid 89); 9 Nov 2017 11:06:54 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=initialise
X-HELO: mail-wm0-f49.google.com
Received: from mail-wm0-f49.google.com (HELO mail-wm0-f49.google.com) (74.125.82.49) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 09 Nov 2017 11:06:51 +0000
Received: by mail-wm0-f49.google.com with SMTP id s66so16119678wmf.5        for <gcc-patches@gcc.gnu.org>; Thu, 09 Nov 2017 03:06:50 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20161025;        h=x-gm-message-state:from:to:mail-followup-to:cc:subject:references         :date:in-reply-to:message-id:user-agent:mime-version;        bh=b6ta6t30hznKBvgmXvy1atDXMQDKeN276wOWheuwCnY=;        b=HYhqzag8nJn7AHEkPMw7rKK/URVCM3aaHg2P1DZgIFWQH6GTCvAOMc1NXjeOc80t9s         K6d5Ibne7MCpfPJlnOLv446Vvq/fFfYIZdb8RzyGVoItXYxKAy+Io/vRrkoPyqcuqSv8         Zcgh12si7rTQKJ47ZG6F+vfBnSWQxgOR75C/oFhLDpDtYLowZSrGgrRKa2k5Lc6MZddZ         MQne9/gNg8Pz43zI8Dd6Iio16Qzu4Q+8gLRYSsC/m+7e+kDmG6Gj3Z0Ru+YSL+ha91Yn         BmGhgdbsSTQCJevuedCopnuiqH+l/PQoOlGaKMnoMSaCRnoq8G+LjVUGry5eOhtmO2Ss         NjjA==
X-Gm-Message-State: AJaThX5sT7UQaHvsL1KXdD+v2cxD1/ayvMhxRJqyJzng4rWva8IVXN9j	LZ5MlOAulSm74fGi4/47oPkc70Id+qs=
X-Google-Smtp-Source: AGs4zMYiBDqLyJ8RokcU7I+hBEqjBlR7PeJKBlkkkwbaQajZcsnBcaluqzx4fSUsQW90hjz8Cntacw==
X-Received: by 10.28.109.220 with SMTP id b89mr59108wmi.30.1510225608746;        Thu, 09 Nov 2017 03:06:48 -0800 (PST)
Received: from localhost ([2.25.234.120])        by smtp.gmail.com with ESMTPSA id d4sm3316691wmh.35.2017.11.09.03.06.47        (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);        Thu, 09 Nov 2017 03:06:47 -0800 (PST)
From: Richard Sandiford <richard.sandiford@linaro.org>
To: Martin Sebor <msebor@gmail.com>
Mail-Followup-To: Martin Sebor <msebor@gmail.com>,gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [001/nnn] poly_int: add poly-int.h
References: <871sltvm7r.fsf@linaro.org> <87vaj5u7id.fsf@linaro.org>	<acd0c3a8-f9cf-0a86-84a2-3e0dbdf9c6e8@gmail.com>	<87po8t5d55.fsf@linaro.org>	<1b724f75-7515-08f7-a783-d86fb1fe8bf8@gmail.com>	<87375oww6b.fsf@linaro.org>	<52db2aa0-43cb-ab73-d5ff-0347a4cfee10@gmail.com>	<87r2t8pqv3.fsf@linaro.org>	<cdfb5964-8639-933a-00b3-c129c9a51173@gmail.com>
Date: Thu, 09 Nov 2017 11:14:00 -0000
In-Reply-To: <cdfb5964-8639-933a-00b3-c129c9a51173@gmail.com> (Martin Sebor's	message of "Wed, 8 Nov 2017 20:31:38 -0700")
Message-ID: <87lgjfpv78.fsf@linaro.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-SW-Source: 2017-11/txt/msg00721.txt.bz2

Martin Sebor <msebor@gmail.com> writes:
> On 11/08/2017 11:28 AM, Richard Sandiford wrote:
>> Martin Sebor <msebor@gmail.com> writes:
>>> On 11/08/2017 09:51 AM, Richard Sandiford wrote:
>>>> Martin Sebor <msebor@gmail.com> writes:
>>>>> On 11/08/2017 02:32 AM, Richard Sandiford wrote:
>>>>>> Martin Sebor <msebor@gmail.com> writes:
>>>>>>> I haven't done nearly a thorough review but the dtor followed by
>>>>>>> the placement new in the POLY_SET_COEFF() macro caught my eye so
>>>>>>> I thought I'd ask sooner rather than later.  Given the macro
>>>>>>> definition:
>>>>>>>
>>>>>>> +   The dummy comparison against a null C * is just a way of checking
>>>>>>> +   that C gives the right type.  */
>>>>>>> +#define POLY_SET_COEFF(C, RES, I, VALUE) \
>>>>>>> +  ((void) (&(RES).coeffs[0] == (C *) 0), \
>>>>>>> +   wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION \
>>>>>>> +   ? (void) ((RES).coeffs[I] = VALUE) \
>>>>>>> +   : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
>>>>>>>
>>>>>>> is the following use well-defined?
>>>>>>>
>>>>>>> +template<unsigned int N, typename C>
>>>>>>> +inline poly_int_pod<N, C>&
>>>>>>> +poly_int_pod<N, C>::operator <<= (unsigned int a)
>>>>>>> +{
>>>>>>> +  POLY_SET_COEFF (C, *this, 0, this->coeffs[0] << a);
>>>>>>>
>>>>>>> It looks to me as though the VALUE argument in the ctor invoked
>>>>>>> by the placement new expression is evaluated after the dtor has
>>>>>>> destroyed the very array element the VALUE argument expands to.
>>>>>>
>>>>>> Good catch!  It should simply have been doing <<= on each coefficient --
>>>>>> I must have got carried away when converting to POLY_SET_COEFF.
>>>>>>
>>>>>> I double-checked the other uses and think that's the only one.
>>>>>>
>>>>>>> Whether or not is, in fact, a problem, it seems to me that using
>>>>>>> a function template rather than a macro would be a clearer and
>>>>>>> safer way to do the same thing.  (Safer in that the macro also
>>>>>>> evaluates its arguments multiple times, which is often a source
>>>>>>> of subtle bugs.)
>>>>>>
>>>>>> That would slow down -O0 builds though, by introducing an extra
>>>>>> function call and set of temporaries even when the coefficients
>>>>>> are primitive integers.
>>>>>
>>>>> Would decorating the function template with attribute always_inline
>>>>> help?
>>>>
>>>> It would remove the call itself, but we'd still have the extra temporary
>>>> objects that were the function argument and return value.
>>>
>>> Sorry, I do not want to get into another long discussion about
>>> trade-offs between safety and efficiency but I'm not sure I see
>>> what extra temporaries it would create.  It seems to me that
>>> an inline function template that took arguments of user-defined
>>> types by reference and others by value should be just as efficient
>>> as a macro.
>>>
>>>  From GCC's own manual:
>>>
>>>    6.43 An Inline Function is As Fast As a Macro
>>>    https://gcc.gnu.org/onlinedocs/gcc/Inline.html
>>
>> You can see the difference with something like:
>>
>>   inline
>>   void __attribute__((always_inline))
>>   f(int &dst, const int &src) { dst = src; }
>>
>>   int g1(const int &y) { int x; f(x, y); return x; }
>>   int g2(const int &y) { int x; x = y; return x; }
>
> Let me say at the outset that I struggle to comprehend that a few
> instructions is even a consideration when not optimizing, especially
> in light of the bug the macro caused that would have been prevented
> by using a function instead.  But...

Many people still build at -O0 though.  One of the things I was asked
for was the time it takes to build stage 2 with an -O0 stage 1
(where stage 1 would usually be built by the host compiler).

> ...I don't think your example above is representative of using
> the POLY_SET_COEFF macro.  The function template I'm suggesting
> might look something to this:
>
>    template <unsigned N, class C>
>    inline void __attribute__ ((always_inline))
>    poly_set_coeff (poly_int_pod<N, C> *p, unsigned idx, C val)
>    {
>      ((void) (&(*p).coeffs[0] == (C *) 0), 
> wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION ? (void) 
> ((*p).coeffs[0] = val) : (void) ((*p).coeffs[0].~C (), new 
> (&(*p).coeffs[0]) C (val)));
>
>      if (N >= 2)
>        for (unsigned int i = 1; i < N; i++)
>          ((void) (&(*p).coeffs[0] == (C *) 0), 
> wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION ? (void) 
> ((*p).coeffs[i] = val) : (void) ((*p).coeffs[i].~C (), new 
> (&(*p).coeffs[i]) C (val)));
>    }

That ignores the idx parameter and sets all coefficents to val.  Did you
mean somnething like:

   template <unsigned N, typename C1, typename C2>
   inline void __attribute__ ((always_inline))
   poly_set_coeff (poly_int_pod<N, C1> *p, unsigned idx, C2 val)
   {
     wi::int_traits<C1>::precision_type == wi::FLEXIBLE_PRECISION ? (void) ((*p).coeffs[idx] = val) : (void) ((*p).coeffs[idx].~C1 (), new (&(*p).coeffs[idx]) C1 (val));
   }

?  If so...

> To compare apples to apples I suggest to instead compare the shift
> operator (or any other poly_int function that uses the macro) that
> doesn't suffer from the bug vs one that makes use of the function
> template.  I see a difference of 2 instructions on x86_64 (21 vs
> 23) for operator<<=.
>
> Are two assembly instructions even worth talking about?

...the problem is that passing C by value defeats the point of the
optimisation:

  /* RES is a poly_int result that has coefficients of type C and that
     is being built up a coefficient at a time.  Set coefficient number I
     to VALUE in the most efficient way possible.

     For primitive C it is better to assign directly, since it avoids
     any further calls and so is more efficient when the compiler is
     built at -O0.  But for wide-int based C it is better to construct
     the value in-place.  This means that calls out to a wide-int.cc
     routine can take the address of RES rather than the address of
     a temporary.

With the inline function, the wide-int.cc routines will be taking
the address of the temporary "val" object, which will then be used
to initialise the target object via a copy.  The macro was there
to avoid the copy.

E.g. for a normal --enable-checking=release build of current sources
on x86_64, mem_ref_offset is:

0000000000000034 T mem_ref_offset(tree_node const*)

With the POLY_SET_COEFF macro it's the same size (and code) with
poly-int.h:

0000000000000034 T mem_ref_offset(tree_node const*)

But using the function above gives:

0000000000000058 T mem_ref_offset(tree_node const*)

which is very similar to what we'd get by assigning to the coefficients
normally.

This kind of thing happened in quite a few other places.  mem_ref_offset
is just a nice example because it's so self-contained.  And it did have
a measurable effect on the speed of the compiler.

That's why the cut-down version quoted above passed the source by
reference too.  Doing that, i.e.:

   template <unsigned N, typename C1, typename C2>
   inline void __attribute__ ((always_inline))
   poly_set_coeff (poly_int_pod<N, C1> *p, unsigned idx, const C2 &val)

gives:

0000000000000052 T mem_ref_offset(tree_node const*)

But the use of this inline function in <<= would be just as incorrect as
using the macro.

[These are all sizes for normally-optimised release builds]

>>> If that's not the case and there is a significant performance
>>> penalty associated with inline functions at -O0 then GCC should
>>> be fixed to avoid it.
>>
>> I think those docs are really talking about inline functions being as
>> fast as macros when optimisation is enabled.  I don't think we make
>> any guarantees about -O0 code quality.
>
> Sure, but you are using unsafe macros in preference to a safer
> inline function even with optimization, introducing a bug as
> a result, and making an argument that the performance impact
> of a few instructions when not using optimization is what should
> drive the decision between one and the other in all situations.
> With all respect, I fail to see the logic in this like of
> reasoning.  By that argument we would never be able to define
> any inline functions.
>
> That being said, if the performance implications of using inline
> functions with no optimization are so serious here then I suggest
> you should be concerned about introducing the poly_int API in its
> current form at all: every access to the class is an inline
> function.

It's a trade-off.  It would be very difficult to do poly-int.h
via macros without making the changes even more invasive.
But with a case like this, where we *can* do something common
via a macro, I think using the macro makes sense.  Especially
when it's local to the file rather than a "public" interface.

> On a more serious/constructive note, if you really are worried
> about efficiency at this level then introducing an intrinsic
> primitive into the compiler instead of a set of classes might
> be worth thinking about.  It will only benefit GCC but it might
> lay a foundation for all sorts of infinite precision integer
> classes (including the C++ proposal that was pointed out in
> the other thread).

This has to work with host compilers other than GCC though.

Thanks,
Richard