From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30762 invoked by alias); 9 Nov 2017 11:06:55 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 30721 invoked by uid 89); 9 Nov 2017 11:06:54 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=initialise X-HELO: mail-wm0-f49.google.com Received: from mail-wm0-f49.google.com (HELO mail-wm0-f49.google.com) (74.125.82.49) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 09 Nov 2017 11:06:51 +0000 Received: by mail-wm0-f49.google.com with SMTP id s66so16119678wmf.5 for ; Thu, 09 Nov 2017 03:06:50 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:cc:subject:references :date:in-reply-to:message-id:user-agent:mime-version; bh=b6ta6t30hznKBvgmXvy1atDXMQDKeN276wOWheuwCnY=; b=HYhqzag8nJn7AHEkPMw7rKK/URVCM3aaHg2P1DZgIFWQH6GTCvAOMc1NXjeOc80t9s K6d5Ibne7MCpfPJlnOLv446Vvq/fFfYIZdb8RzyGVoItXYxKAy+Io/vRrkoPyqcuqSv8 Zcgh12si7rTQKJ47ZG6F+vfBnSWQxgOR75C/oFhLDpDtYLowZSrGgrRKa2k5Lc6MZddZ MQne9/gNg8Pz43zI8Dd6Iio16Qzu4Q+8gLRYSsC/m+7e+kDmG6Gj3Z0Ru+YSL+ha91Yn BmGhgdbsSTQCJevuedCopnuiqH+l/PQoOlGaKMnoMSaCRnoq8G+LjVUGry5eOhtmO2Ss NjjA== X-Gm-Message-State: AJaThX5sT7UQaHvsL1KXdD+v2cxD1/ayvMhxRJqyJzng4rWva8IVXN9j LZ5MlOAulSm74fGi4/47oPkc70Id+qs= X-Google-Smtp-Source: AGs4zMYiBDqLyJ8RokcU7I+hBEqjBlR7PeJKBlkkkwbaQajZcsnBcaluqzx4fSUsQW90hjz8Cntacw== X-Received: by 10.28.109.220 with SMTP id b89mr59108wmi.30.1510225608746; Thu, 09 Nov 2017 03:06:48 -0800 (PST) Received: from localhost ([2.25.234.120]) by smtp.gmail.com with ESMTPSA id d4sm3316691wmh.35.2017.11.09.03.06.47 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 09 Nov 2017 03:06:47 -0800 (PST) From: Richard Sandiford To: Martin Sebor Mail-Followup-To: Martin Sebor ,gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Cc: gcc-patches@gcc.gnu.org Subject: Re: [001/nnn] poly_int: add poly-int.h References: <871sltvm7r.fsf@linaro.org> <87vaj5u7id.fsf@linaro.org> <87po8t5d55.fsf@linaro.org> <1b724f75-7515-08f7-a783-d86fb1fe8bf8@gmail.com> <87375oww6b.fsf@linaro.org> <52db2aa0-43cb-ab73-d5ff-0347a4cfee10@gmail.com> <87r2t8pqv3.fsf@linaro.org> Date: Thu, 09 Nov 2017 11:14:00 -0000 In-Reply-To: (Martin Sebor's message of "Wed, 8 Nov 2017 20:31:38 -0700") Message-ID: <87lgjfpv78.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-SW-Source: 2017-11/txt/msg00721.txt.bz2 Martin Sebor writes: > On 11/08/2017 11:28 AM, Richard Sandiford wrote: >> Martin Sebor writes: >>> On 11/08/2017 09:51 AM, Richard Sandiford wrote: >>>> Martin Sebor writes: >>>>> On 11/08/2017 02:32 AM, Richard Sandiford wrote: >>>>>> Martin Sebor writes: >>>>>>> I haven't done nearly a thorough review but the dtor followed by >>>>>>> the placement new in the POLY_SET_COEFF() macro caught my eye so >>>>>>> I thought I'd ask sooner rather than later. Given the macro >>>>>>> definition: >>>>>>> >>>>>>> + The dummy comparison against a null C * is just a way of checking >>>>>>> + that C gives the right type. */ >>>>>>> +#define POLY_SET_COEFF(C, RES, I, VALUE) \ >>>>>>> + ((void) (&(RES).coeffs[0] == (C *) 0), \ >>>>>>> + wi::int_traits::precision_type == wi::FLEXIBLE_PRECISION \ >>>>>>> + ? (void) ((RES).coeffs[I] = VALUE) \ >>>>>>> + : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE))) >>>>>>> >>>>>>> is the following use well-defined? >>>>>>> >>>>>>> +template >>>>>>> +inline poly_int_pod& >>>>>>> +poly_int_pod::operator <<= (unsigned int a) >>>>>>> +{ >>>>>>> + POLY_SET_COEFF (C, *this, 0, this->coeffs[0] << a); >>>>>>> >>>>>>> It looks to me as though the VALUE argument in the ctor invoked >>>>>>> by the placement new expression is evaluated after the dtor has >>>>>>> destroyed the very array element the VALUE argument expands to. >>>>>> >>>>>> Good catch! It should simply have been doing <<= on each coefficient -- >>>>>> I must have got carried away when converting to POLY_SET_COEFF. >>>>>> >>>>>> I double-checked the other uses and think that's the only one. >>>>>> >>>>>>> Whether or not is, in fact, a problem, it seems to me that using >>>>>>> a function template rather than a macro would be a clearer and >>>>>>> safer way to do the same thing. (Safer in that the macro also >>>>>>> evaluates its arguments multiple times, which is often a source >>>>>>> of subtle bugs.) >>>>>> >>>>>> That would slow down -O0 builds though, by introducing an extra >>>>>> function call and set of temporaries even when the coefficients >>>>>> are primitive integers. >>>>> >>>>> Would decorating the function template with attribute always_inline >>>>> help? >>>> >>>> It would remove the call itself, but we'd still have the extra temporary >>>> objects that were the function argument and return value. >>> >>> Sorry, I do not want to get into another long discussion about >>> trade-offs between safety and efficiency but I'm not sure I see >>> what extra temporaries it would create. It seems to me that >>> an inline function template that took arguments of user-defined >>> types by reference and others by value should be just as efficient >>> as a macro. >>> >>> From GCC's own manual: >>> >>> 6.43 An Inline Function is As Fast As a Macro >>> https://gcc.gnu.org/onlinedocs/gcc/Inline.html >> >> You can see the difference with something like: >> >> inline >> void __attribute__((always_inline)) >> f(int &dst, const int &src) { dst = src; } >> >> int g1(const int &y) { int x; f(x, y); return x; } >> int g2(const int &y) { int x; x = y; return x; } > > Let me say at the outset that I struggle to comprehend that a few > instructions is even a consideration when not optimizing, especially > in light of the bug the macro caused that would have been prevented > by using a function instead. But... Many people still build at -O0 though. One of the things I was asked for was the time it takes to build stage 2 with an -O0 stage 1 (where stage 1 would usually be built by the host compiler). > ...I don't think your example above is representative of using > the POLY_SET_COEFF macro. The function template I'm suggesting > might look something to this: > > template > inline void __attribute__ ((always_inline)) > poly_set_coeff (poly_int_pod *p, unsigned idx, C val) > { > ((void) (&(*p).coeffs[0] == (C *) 0), > wi::int_traits::precision_type == wi::FLEXIBLE_PRECISION ? (void) > ((*p).coeffs[0] = val) : (void) ((*p).coeffs[0].~C (), new > (&(*p).coeffs[0]) C (val))); > > if (N >= 2) > for (unsigned int i = 1; i < N; i++) > ((void) (&(*p).coeffs[0] == (C *) 0), > wi::int_traits::precision_type == wi::FLEXIBLE_PRECISION ? (void) > ((*p).coeffs[i] = val) : (void) ((*p).coeffs[i].~C (), new > (&(*p).coeffs[i]) C (val))); > } That ignores the idx parameter and sets all coefficents to val. Did you mean somnething like: template inline void __attribute__ ((always_inline)) poly_set_coeff (poly_int_pod *p, unsigned idx, C2 val) { wi::int_traits::precision_type == wi::FLEXIBLE_PRECISION ? (void) ((*p).coeffs[idx] = val) : (void) ((*p).coeffs[idx].~C1 (), new (&(*p).coeffs[idx]) C1 (val)); } ? If so... > To compare apples to apples I suggest to instead compare the shift > operator (or any other poly_int function that uses the macro) that > doesn't suffer from the bug vs one that makes use of the function > template. I see a difference of 2 instructions on x86_64 (21 vs > 23) for operator<<=. > > Are two assembly instructions even worth talking about? ...the problem is that passing C by value defeats the point of the optimisation: /* RES is a poly_int result that has coefficients of type C and that is being built up a coefficient at a time. Set coefficient number I to VALUE in the most efficient way possible. For primitive C it is better to assign directly, since it avoids any further calls and so is more efficient when the compiler is built at -O0. But for wide-int based C it is better to construct the value in-place. This means that calls out to a wide-int.cc routine can take the address of RES rather than the address of a temporary. With the inline function, the wide-int.cc routines will be taking the address of the temporary "val" object, which will then be used to initialise the target object via a copy. The macro was there to avoid the copy. E.g. for a normal --enable-checking=release build of current sources on x86_64, mem_ref_offset is: 0000000000000034 T mem_ref_offset(tree_node const*) With the POLY_SET_COEFF macro it's the same size (and code) with poly-int.h: 0000000000000034 T mem_ref_offset(tree_node const*) But using the function above gives: 0000000000000058 T mem_ref_offset(tree_node const*) which is very similar to what we'd get by assigning to the coefficients normally. This kind of thing happened in quite a few other places. mem_ref_offset is just a nice example because it's so self-contained. And it did have a measurable effect on the speed of the compiler. That's why the cut-down version quoted above passed the source by reference too. Doing that, i.e.: template inline void __attribute__ ((always_inline)) poly_set_coeff (poly_int_pod *p, unsigned idx, const C2 &val) gives: 0000000000000052 T mem_ref_offset(tree_node const*) But the use of this inline function in <<= would be just as incorrect as using the macro. [These are all sizes for normally-optimised release builds] >>> If that's not the case and there is a significant performance >>> penalty associated with inline functions at -O0 then GCC should >>> be fixed to avoid it. >> >> I think those docs are really talking about inline functions being as >> fast as macros when optimisation is enabled. I don't think we make >> any guarantees about -O0 code quality. > > Sure, but you are using unsafe macros in preference to a safer > inline function even with optimization, introducing a bug as > a result, and making an argument that the performance impact > of a few instructions when not using optimization is what should > drive the decision between one and the other in all situations. > With all respect, I fail to see the logic in this like of > reasoning. By that argument we would never be able to define > any inline functions. > > That being said, if the performance implications of using inline > functions with no optimization are so serious here then I suggest > you should be concerned about introducing the poly_int API in its > current form at all: every access to the class is an inline > function. It's a trade-off. It would be very difficult to do poly-int.h via macros without making the changes even more invasive. But with a case like this, where we *can* do something common via a macro, I think using the macro makes sense. Especially when it's local to the file rather than a "public" interface. > On a more serious/constructive note, if you really are worried > about efficiency at this level then introducing an intrinsic > primitive into the compiler instead of a set of classes might > be worth thinking about. It will only benefit GCC but it might > lay a foundation for all sorts of infinite precision integer > classes (including the C++ proposal that was pointed out in > the other thread). This has to work with host compilers other than GCC though. Thanks, Richard