From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x531.google.com (mail-ed1-x531.google.com [IPv6:2a00:1450:4864:20::531]) by sourceware.org (Postfix) with ESMTPS id D0D1D3892450 for ; Mon, 9 Aug 2021 09:58:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D0D1D3892450 Received: by mail-ed1-x531.google.com with SMTP id b7so23711554edu.3 for ; Mon, 09 Aug 2021 02:58:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=Hr83vr8j1kA8sVAnnJVIOC5tYmPxqmlFL3xKC4tkG6U=; b=I2PZFT9q5kU/qcjIBLg2QzYtK18Bm/a/JrFchLRKviPk/XSkxdc04dpkH7o8mNy4tz hVBhVKw87umyZXTnhBtX+frbg8MDJjYA/zpFIDmAqs1bYtDfo5xfVG8vlrrQp1AEpE3d bUPiK/AUBXDIPeyM0fNMIHxAEen1w5A/cFVKbpRjbtLDqp3+XiFhG56VA56mfiRwfkg1 H/E89nHvOJP2Myb/2P69hB4VSrYlcyc9S3w+J0w/0Mc6eZ7MbsV8lvkYJbW52oxjovLe QbrCNrDYNPFt4zhHGZ24gutCnSmPw9ltkASnduxDdphprQ4GyUaMvzRv0MqLIaSGIsaF YW1w== X-Gm-Message-State: AOAM530qmict9eE7OceQpSFvhYZfddDFIpHADB/URK0M18C1qzSgO4Bj GwG4aGXgJVDvAWpwGCUT0A4eA5sbzVmMpZ1+FvM= X-Google-Smtp-Source: ABdhPJwCOW2BGg1soBFzgIv5zMEc1tTIP2aZoPonjf1lOXw9Hb6LPKzlzVRqd6SIXJpqFzhPqGwzDMa4k9RzVICS/5A= X-Received: by 2002:a05:6402:1603:: with SMTP id f3mr28707315edv.274.1628503108912; Mon, 09 Aug 2021 02:58:28 -0700 (PDT) MIME-Version: 1.0 References: <1e409dd-c5a8-a59f-d7ba-86ae805ab54f@hippo.saclay.inria.fr> <688987e-6fa7-9526-5068-5fae14b0c56b@hippo.saclay.inria.fr> In-Reply-To: <688987e-6fa7-9526-5068-5fae14b0c56b@hippo.saclay.inria.fr> From: Richard Biener Date: Mon, 9 Aug 2021 11:58:18 +0200 Message-ID: Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176] To: Marc Glisse Cc: Victor Tong , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Aug 2021 09:58:32 -0000 On Sat, Aug 7, 2021 at 12:49 AM Marc Glisse wrote: > > On Fri, 6 Aug 2021, Victor Tong wrote: > > > Thanks for the feedback. Here's the updated pattern: > > > > /* X - (X - Y) --> Y */ > > (simplify > > (minus (convert1? @0) (convert2? (minus@2 (convert3? @@0) @1))) > > (if (ANY_INTEGRAL_TYPE_P (type) > > && TYPE_OVERFLOW_UNDEFINED(type) > > && !TYPE_OVERFLOW_SANITIZED(type) > > && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@2)) > > && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@2)) > > && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@2)) > > && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@0)) > > && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0)) > > && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0)) > > && TYPE_PRECISION (TREE_TYPE (@2)) <=3D TYPE_PRECISION (type) > > && TYPE_PRECISION (TREE_TYPE (@0)) <=3D TYPE_PRECISION (type)) > > (convert @1))) > > > > I kept the TYPE_OVERFLOW_SANITIZED checks because I saw other patterns = that leverage undefined overflows check for it. I think this new pattern sh= ouldn't be applied if overflow sanitizer checks are enabled. > > > >>> why is this testing TREE_TYPE (@0)? > > > > I'm checking the type of @0 because I'm concerned that there could be a= case where @0's type isn't an integer type with undefined overflow. I trie= d creating a test case and couldn't seem to create one where this is violat= ed but I kept the checks to avoid causing a regression. If I'm being overca= utious and you feel that the type checks on @0 aren't needed, I can remove = them. I think the precision check on TREE_TYPE(@0) is needed to avoid trunc= ation cases though. > > It doesn't matter if @0 has undefined overflow, but it can matter that it > be signed (yes, the 2 are correlated...) if it has the same precision as > @2. Otherwise (int64_t)(-1u)-(int64_t)((int)(-1u)-0) is definitely not 0 > and it has type:int64_t, @2:int, @0:unsigned. > > Ignoring the sanitizer, the confusing double matching of constants, and > restricting to scalars, I think the tightest condition (without vrp) that > works is > > TYPE_PRECISION (type) <=3D TYPE_PRECISION (TREE_TYPE (@2)) || > TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@2)) && > (TYPE_PRECISION (TREE_TYPE (@0)) < TYPE_PRECISION (TREE_TYPE (@2)) || > TYPE_PRECISION (TREE_TYPE (@0)) =3D=3D TYPE_PRECISION (TREE_TYPE (@2)= ) && > !TYPE_UNSIGNED (TREE_TYPE (@0)) > ) > > (where implicitly undefined =3D> signed) but of course it is ok to handle > only a subset. It is too late for me to think about @0 vs @@0. > > The patch you posted seems to accept the wrong > (int128t)LLONG_MAX-(int128_t)((int)LLONG_MAX-0) -> (int128_t)0 > > Using ANY_INTEGRAL_TYPE_P allows vectors, and TYPE_PRECISION mustn't be > used for vectors (maybe we should add more checking to catch such uses), > and they may not be very happy with convert either... > > >>> Once we'd "inline" nop_convert genmatch would complain about this. > > Maybe the new transform could be about scalars, and we could restrict the > old one to vectors, to simplify the code, Hmm, I guess that might indeed be a good idea. > but that wouldn't help genmatch > with the fact that the pattern x-(x-y) would appear twice. Is it really > that bad? It is suspicious, but can be justified. I think genmatch will either complain or might end up generating unreachable code. Note with the present patch there's probably order forcing patterns inbetwe= en so it could simply work. Otherwise whether it works or not (as expected) depends on placing positions of the patterns ... So no, it's not inherently "bad" but it's not designed to work. > > Is someone working on inlining nop_convert? I'd like to avoid breaking = someone else's work if that's being worked on right now. > > > > I've also added some extra tests to cover this new pattern. I've attach= ed a patch with my latest changes. > > > > > > From: Richard Biener > > Sent: Wednesday, July 28, 2021 2:59 AM > > To: Victor Tong > > Cc: Marc Glisse ; gcc-patches@gcc.gnu.org > > Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize divisio= n followed by multiply [PR95176] > > > > On Tue, Jun 29, 2021 at 1:10 AM Victor Tong wrot= e: > >> > >> Thanks Richard and Marc. > >> > >> I wrote the following test case to compare the outputs of fn1() and fn= 1NoOpt() below with my extra pattern being applied. I tested the two functi= ons with all of the integers from INT_MIN to INT_MAX. > >> > >> long > >> fn1 (int x) > >> { > >> return 42L - (long)(42 - x); > >> } > >> > >> #pragma GCC push_options > >> #pragma GCC optimize ("O0") > >> long > >> fn1NoOpt (int x) > >> { > >> volatile int y =3D (42 - x); > >> return 42L - (long)y; > >> } > >> #pragma GCC pop_options > >> > >> int main () > >> { > >> for (long i=3DINT_MIN; i<=3DINT_MAX;i++) > >> { > >> auto valNoOpt =3D fn1NoOpt(i); > >> auto valOpt =3D fn1(i); > >> if (valNoOpt !=3D valOpt) > >> printf("valOpt=3D%ld, valNoOpt=3D%ld\n", valO= pt, valNoOpt); > >> } > >> return 0; > >> } > >> > >> I saw that the return values of fn1() and fn1NoOpt() differed when the= input was between INT_MIN and INT_MIN+42 inclusive. When passing values in= this range to fn1NoOpt(), a signed overflow is triggered which causes the = value to differ (undefined behavior). This seems to go in line with what Ma= rc described and I think the transformation is correct in the scenario abov= e. I do think that type casts that result in truncation (i.e. from a higher= precision to a lower one) or with unsigned types will result in an incorre= ct transformation so those scenarios need to be avoided. > >> > >> Given that the extra pattern I'm adding is taking advantage the undefi= ned behavior of signed integer overflow, I'm considering keeping the existi= ng nop_convert pattern in place and adding a new pattern to cover these new= cases. I'd also like to avoid touching nop_convert given that it's used in= a number of other patterns. > >> > >> This is the pattern I have currently: > >> > >> (simplify > >> (minus (convert1? @0) (convert2? (minus (convert3? @2) @1))) > >> (if (operand_equal_p(@0, @2, 0) > > > > The operand_equal_p should be reflected by using @0 in place of @2. > > > >> && INTEGRAL_TYPE_P (type) > >> && TYPE_OVERFLOW_UNDEFINED(type) > >> && !TYPE_OVERFLOW_SANITIZED(type) > >> && INTEGRAL_TYPE_P (TREE_TYPE(@1)) > >> && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@1)) > >> && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@1)) > >> && !TYPE_UNSIGNED (TREE_TYPE (@1)) > >> && !TYPE_UNSIGNED (type) > > > > please group checks on common argument. I think a single test > > on INTEGRAL_TYPE_P (type) is enough, it could be ANY_INTEGRAL_TYPE_P > > to include vector and complex types. > > > >> && TYPE_PRECISION (TREE_TYPE (@1)) <=3D TYPE_PRECISION (type) > >> && INTEGRAL_TYPE_P (TREE_TYPE(@0)) > >> && TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@0)) > > > > why is this testing TREE_TYPE (@0)? The outer subtract is using 'type'= , > > the inner subtract uses TREE_TYPE (@1) though you could place > > a capture on the minus to make what you test more obvious, like > > > > (minus (convert1? @0) (convert2? (minus@3 (convert3? @2) @1))) > > > > TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@3)) > > > > there's one set of checks too much I think. > > > >> && !TYPE_OVERFLOW_SANITIZED(TREE_TYPE(@0)) > >> && !TYPE_UNSIGNED (TREE_TYPE (@0)) > > > > we only ever have TYPE_OVERFLOW_UNDEFINED on singed types so > > the !TYPE_UNSIGNED checks are superfluous. > > > >> && TYPE_PRECISION (TREE_TYPE (@0)) <=3D TYPE_PRECISION (type) > >> && TREE_TYPE(@1) =3D=3D TREE_TYPE(@2)) > > > > This check means that convert3 is never present since a MINUS requires > > compatible types. > > > > Sorry for the late reply. > > > > Note the pattern will necessarily be partly redundant with the > > > > (simplify > > (minus (nop_convert1? (minus (nop_convert2? @0) @1)) @0) > > (if (!ANY_INTEGRAL_TYPE_P (type) > > || TYPE_OVERFLOW_WRAPS (type)) > > (negate (view_convert @1)) > > (view_convert (negate @1)))) > > > > pattern. Once we'd "inline" nop_convert genmatch would complain > > about this. > > > >> (convert @1))) > >> > >> Is there a more concise/better way of writing the pattern? I was looki= ng for similar checks in match.pd and I couldn't find anything that I could= leverage. > >> > >> I also kept my pattern to the specific scenario I'm seeing with the re= gression to lower the risk of something breaking. I've limited @1 and @2 to= have the same type. > >> > >> I'm also in favor of adding/running computer verification to make sure= the transformation is legal. I've written some tests to verify that the pa= ttern is being applied in the right scenarios and not being applied in othe= rs, but I think there are too many possibilities to manually write them all= . Is there anything in GCC that can be used to verify that match.pd transfo= rmations are correct? I'm thinking of something like Alive https://nam06.sa= felinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fgithub.com%2FAliveToolk= it%2Falive2&data=3D04%7C01%7Cvitong%40microsoft.com%7C64334bd5c79443af0= 4ec08d951ae6117%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63763063167823= 1097%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik= 1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3Dx9Heq%2Bjgb%2Fy1PJFZ0gT9yfkQuzOz0rz= WnZsl7VOymp0%3D&reserved=3D0. > >> > >> Thanks, > >> Victor > >> > >> > >> > >> From: Richard Biener > >> Sent: Monday, June 21, 2021 12:08 AM > >> To: Marc Glisse > >> Cc: Victor Tong ; gcc-patches@gcc.gnu.org > >> Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize divisi= on followed by multiply [PR95176] > >> > >> On Sat, Jun 19, 2021 at 7:05 PM Marc Glisse wro= te: > >> > > >> > On Fri, 18 Jun 2021, Richard Biener wrote: > >> > > >> > >> Option 2: Add a new pattern to support scenarios that the existin= g nop_convert pattern bails out on. > >> > >> > >> > >> Existing pattern: > >> > >> > >> > >> (simplify > >> > >> (minus (nop_convert1? @0) (nop_convert2? (minus (nop_convert3?= @@0) @1))) > >> > >> (view_convert @1)) > >> > > >> > I tried to check with a program when > >> > > >> > T3 x; > >> > T1 y; > >> > (T2)x-(T2)((T1)x-y) > >> > > >> > can be safely replaced with > >> > > >> > (T2)y > >> > > >> > From the output, it looks like this is safe when T1 is at least as l= arge > >> > as T2. It is wrong when T1 is unsigned and smaller than T2. And when= T1 is > >> > signed and smaller than T2, it is ok if T3 is the same type as T1 (s= igned > >> > then) or has strictly less precision (any sign), and not in other ca= ses. > >> > > >> > Note that this is when signed implies undefined overflow and unsigne= d > >> > implies wrapping, and I wouldn't put too much faith in this recently > >> > dusted program. And it doesn't say how to write the match.pd pattern= with > >> > '?', "@@", disabling it if TYPE_OVERFLOW_SANITIZED, etc. > >> > > >> > Mostly, I wanted to say that if we are going to go handle more than > >> > nop_convert for more than just 1 or 2 easy transformations, I think = some > >> > kind of computer verification would be useful, it would save a lot o= f time > >> > and headaches. > >> > >> True. I wonder if auto-generating such tests from match.pd rules woul= d > >> be a good project to work on (GSoC?). When there's define_match > >> involved things get a little tricky, but then one item on the TODO lis= t > >> is "inlining" those at the use site (exploding the decision tree even = more). > >> > >> Richard. > >> > >> > (I just check by brute force all possible precisions (from 1 to 6) a= nd > >> > signedness for T1, T2 and T3, all possible values for x and y, compu= te > >> > the before and after formulas, accepting if there is UB before, reje= cting > >> > if there is UB after (and not before), and manually try to see a pat= tern > >> > in the list of types that work) > >> > > >> > -- > >> > Marc Glisse > > -- > Marc Glisse