From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 54227 invoked by alias); 3 Sep 2019 14:19:36 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 54219 invoked by uid 89); 3 Sep 2019 14:19:36 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-6.6 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_1,SPF_PASS autolearn=ham version=3.3.1 spammy=well-known, wellknown, ordering X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 03 Sep 2019 14:19:34 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2EE1B337; Tue, 3 Sep 2019 07:19:33 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 050C73F246; Tue, 3 Sep 2019 07:19:31 -0700 (PDT) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener ,Barnaby Wilks , "gcc-patches\@gcc.gnu.org" , nd , "law\@redhat.com" , "ian\@airs.com" , Tamar Christina , Wilco Dijkstra , richard.sandiford@arm.com Cc: Barnaby Wilks , "gcc-patches\@gcc.gnu.org" , nd , "law\@redhat.com" , "ian\@airs.com" , Tamar Christina , Wilco Dijkstra Subject: Re: [PATCH][GCC] Simplify to single precision where possible for binary/builtin maths operations. References: <571395fe-921b-5a68-ec8d-84850a732253@arm.com> Date: Tue, 03 Sep 2019 14:19:00 -0000 In-Reply-To: (Richard Biener's message of "Tue, 3 Sep 2019 10:23:33 +0200 (CEST)") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-IsSubscribed: yes X-SW-Source: 2019-09/txt/msg00121.txt.bz2 Richard Biener writes: > On Mon, 2 Sep 2019, Barnaby Wilks wrote: > >> Hello, >> >> This patch introduces an optimization for narrowing binary and builtin >> math operations to the smallest type when unsafe math optimizations are >> enabled (typically -Ofast or -ffast-math). >> >> Consider the example: >> >> float f (float x) { >> return 1.0 / sqrt (x); >> } >> >> f: >> fcvt d0, s0 >> fmov d1, 1.0e+0 >> fsqrt d0, d0 >> fdiv d0, d1, d0 >> fcvt s0, d0 >> ret >> >> Given that all outputs are of float type, we can do the whole >> calculation in single precision and avoid any potentially expensive >> conversions between single and double precision. >> >> Aka the expression would end up looking more like >> >> float f (float x) { >> return 1.0f / sqrtf (x); >> } >> >> f: >> fsqrt s0, s0 >> fmov s1, 1.0e+0 >> fdiv s0, s1, s0 >> ret >> >> This optimization will narrow casts around math builtins, and also >> not try to find the widest type for calculations when processing binary >> math operations (if unsafe math optimizations are enable). >> >> Added tests to verify that narrower math builtins are chosen and >> no unnecessary casts are introduced when appropriate. >> >> Bootstrapped and regtested on aarch64 and x86_64 with no regressions. >> >> I don't have write access, so if OK for trunk then can someone commit on >> my behalf? > [...] > > Now - as a general comment I think adding this kind of narrowing is > good but doing it via match.pd patterns is quite limiting - eventually > the backprop pass would be a fit for propagating "needed precision" > and narrowing feeding stmts accordingly in a more general way? > Richard can probably tell quickest if it is feasible in that framework. Yeah, I think it would be a good fit, and would for example cope with cases in which we select between two double results before doing the truncation to float. I'd wanted to do something similar for integer truncation but never found the time... At the moment, backprop handles a single piece of information: whether the sign of the value matters. This is (over?)generalised to be one bit of information in a word of flags. I guess we could take the same approach here and have flags for certain well-known floating-point types, but it might be cleaner to instead have a field that records the widest mode that users of the result want. I think to do this we'd need to build an array that maps floating-point machine_modes to their order in the FOR_EACH_MODE_IN_CLASS chain. That'll give us a total ordering over floating-point modes and mean that operator & (the usage_info confluence function) can just take whichever of the input usage_info modes has the highest index in this chain. Thanks, Richard