From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 108333 invoked by alias); 6 Sep 2017 05:21:55 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 107924 invoked by uid 89); 6 Sep 2017 05:21:54 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 06 Sep 2017 05:21:52 +0000 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 30362C0587D4; Wed, 6 Sep 2017 05:21:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 30362C0587D4 Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=law@redhat.com Received: from localhost.localdomain (ovpn-112-2.rdu2.redhat.com [10.10.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id DC05F197F7; Wed, 6 Sep 2017 05:21:49 +0000 (UTC) Subject: Re: [RFA] [PATCH][PR tree-optimization/64910] Fix reassociation of binary bitwise operations with 3 operands From: Jeff Law To: Christophe Lyon Cc: Richard Biener , GCC Patches References: <56788DA6.80904@redhat.com> <5690444A.2070404@redhat.com> <56948AD4.1020909@redhat.com> <5695F13E.6030507@redhat.com> <0ae5510d-3dc4-7038-ebb2-7291abd22f27@redhat.com> <67d0b934-75dd-afdd-2558-55d6f5f5b83a@redhat.com> Message-ID: <9611166b-76e6-c28f-32bf-03e03eae9194@redhat.com> Date: Wed, 06 Sep 2017 05:21:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <67d0b934-75dd-afdd-2558-55d6f5f5b83a@redhat.com> Content-Type: multipart/mixed; boundary="------------E37BF97FBEAF7DB29E1CC98D" X-IsSubscribed: yes X-SW-Source: 2017-09/txt/msg00311.txt.bz2 This is a multi-part message in MIME format. --------------E37BF97FBEAF7DB29E1CC98D Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-length: 8863 On 09/05/2017 11:26 AM, Jeff Law wrote: > On 09/05/2017 12:38 AM, Christophe Lyon wrote: >> Hi Jeff, >> >> >> On 3 September 2017 at 16:44, Jeff Law wrote: >>> On 01/13/2016 05:30 AM, Richard Biener wrote: >>>> On Wed, Jan 13, 2016 at 7:39 AM, Jeff Law wrote: >>>>> On 01/12/2016 08:11 AM, Richard Biener wrote: >>>>>> >>>>>> On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law wrote: >>>>>>> >>>>>>> On 01/11/2016 03:32 AM, Richard Biener wrote: >>>>>>> >>>>>>>> >>>>>>>> Yeah, reassoc is largely about canonicalization. >>>>>>>> >>>>>>>>> Plus doing it in TER is almost certainly more complex than getting it >>>>>>>>> right >>>>>>>>> in reassoc to begin with. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I guess canonicalizing differently is ok but you'll still create >>>>>>>> ((a & b) & 1) & c then if you only change the above place. >>>>>>> >>>>>>> >>>>>>> What's best for that expression would depend on factors like whether or >>>>>>> not >>>>>>> the target can exploit ILP. ie (a & b) & (1 & c) exposes more >>>>>>> parallelism >>>>>>> while (((a & b) & c) & 1) is not good for parallelism, but does expose >>>>>>> the >>>>>>> bit test. >>>>>>> >>>>>>> reassoc currently generates ((a & 1) & b) & c which is dreadful as >>>>>>> there's >>>>>>> no ILP or chance of creating a bit test. My patch shuffles things >>>>>>> around, >>>>>>> but still doesn't expose the ILP or bit test in the 4 operand case. >>>>>>> Based >>>>>>> on the comments in reassoc, it didn't seem like the author thought >>>>>>> anything >>>>>>> beyond the 3-operand case was worth handling. So my patch just handles >>>>>>> the >>>>>>> 3-operand case. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> So I'm not sure what pattern the backend is looking for? >>>>>>> >>>>>>> >>>>>>> It just wants the constant last in the sequence. That exposes bit clear, >>>>>>> set, flip, test, etc idioms. >>>>>> >>>>>> >>>>>> But those don't feed another bit operation, right? Thus we'd like to see >>>>>> ((a & b) & c) & 1, not ((a & b) & 1) & c? It sounds like the instructions >>>>>> are designed to feed conditionals (aka CC consuming ops)? >>>>> >>>>> At the gimple level they could feed a conditional, or be part of a series of >>>>> ops on an SSA_NAME that eventually gets stored to memory, etc. At the RTL >>>>> level they'll feed CC consumers and bit manipulations of pseudos or memory. >>>>> >>>>> For the 3-op case, we always want the constant last. For the 4-op case it's >>>>> less clear. Though ((a & b) & c) & 1 is certainly better than ((a & b) & 1) >>>>> & c. >>>> >>>> Ok, so handling it in swap_ops_for_binary_stmt is merely a convenient place >>>> to special-case bitwise ops. The "real" fix to the sorting heuristic would be >>>> to sort constants at the opposite end. >>>> >>>> That might be too invasive right now but there is another "convenient" place: >>>> >>>> /* If the operand vector is now empty, all operands were >>>> consumed by the __builtin_powi optimization. */ >>>> ... >>>> else >>>> { >>>> machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); >>>> int ops_num = ops.length (); >>>> int width = get_reassociation_width (ops_num, rhs_code, mode); >>>> tree new_lhs = lhs; >>>> >>>> if (dump_file && (dump_flags & TDF_DETAILS)) >>>> fprintf (dump_file, >>>> "Width = %d was chosen for >>>> reassociation\n", width); >>>> >>>> at this point you can check rhs_code and move the (single) constant >>>> entry in ops (if there is any constant entry) from .last () to the beginning. >>>> >>>> That'll get the 4 operand case correct as well and properly models >>>> "constant last" for the specified operation kind. >>> Resurrecting an old thread... Just now getting around to flushing this >>> out of the queue. >>> >>> To recap, given something like (x & y) & C reassociation will turn that >>> into (x & C) & y. It's functionally equivalent, but it will inhibit >>> generation of bit test instructions. >>> >>> I originally hacked up swap_ops_for_binary_stmt. You requested that >>> change be made in reassociate_bb so that it would apply to cases where >>> there are more than 3 args. >>> >>> So that's what this patch does. OK for the trunk now? >>> >>> Bootstrapped and regression tested on x86_64. Also tested the new >>> testcase on m68k. >>> >>> >>> commit c10ae0339674c27c89a1fa1904217a55bf530cb3 >>> Author: Jeff Law >>> Date: Sun Sep 3 10:42:30 2017 -0400 >>> >>> 2017-09-03 Jeff Law >>> >>> PR tree-optimization/64910 >>> * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, >>> swap the first and last operand if the last is a constant. >>> >>> PR tree-optimization/64910 >>> * gcc.dg/tree-ssa/pr64910-2.c: New test. >>> >>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog >>> index 3f632ca31c2..2c9a8c8265a 100644 >>> --- a/gcc/ChangeLog >>> +++ b/gcc/ChangeLog >>> @@ -1,3 +1,9 @@ >>> +2017-09-03 Jeff Law >>> + >>> + PR tree-optimization/64910 >>> + * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, >>> + swap the first and last operand if the last is a constant. >>> + >>> 2017-09-01 Segher Boessenkool >>> >>> PR rtl-optimization/82024 >>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog >>> index 4ead57edfa2..766677c899b 100644 >>> --- a/gcc/testsuite/ChangeLog >>> +++ b/gcc/testsuite/ChangeLog >>> @@ -1,3 +1,8 @@ >>> +2017-09-03 Jeff Law >>> + >>> + PR tree-optimization/64910 >>> + * gcc.dg/tree-ssa/pr64910-2.c: New test. >>> + >>> 2017-09-01 Jakub Jelinek >>> >>> PR target/81766 >>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c >>> new file mode 100644 >>> index 00000000000..2e3d6790776 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c >>> @@ -0,0 +1,85 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-O2 -fdump-tree-reassoc1" } */ >>> + >>> +/* We want to make sure that we reassociate in a way that has the >>> + constant last. With the constant last, it's more likely to result >>> + in a bitfield test on targets with such capabilities. */ >>> + >>> +extern void boo (); >>> + >>> +int b2b_uc (unsigned char u, unsigned char w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> + >>> +int b2b_us (unsigned short u, unsigned short w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> + >>> +int b2b_ui (unsigned int u, unsigned int w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> +int b2b_ul (unsigned long u, unsigned long w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> +int b2b_ull (unsigned long long u, unsigned long long w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> + >>> +int b2b_sc (signed char u, signed char w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> + >>> +int b2b_ss (signed short u, signed short w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> + >>> +int b2b_si (signed int u, signed int w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> +int b2b_sl (signed long u, signed long w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> +int b2b_sll (signed long long u, signed long long w) >>> +{ >>> + if ((u & w) & 0x20) >>> + boo (); >>> +} >>> + >>> +/* The AND of U & W should go into a temporary, when is then ANDed >>> + with the constant. >>> + >>> + First verify that we have the right number of ANDs between U and W. */ >>> +/* { dg-final { scan-tree-dump-times "\[uw\]_\[0-9\]+.D. \& \[uw\]_\[0-9\]+.D.;" 10 "reassoc1"} } */ >>> + >>> +/* Then verify that we have the right number of ANDS between a temporary >>> + and the constant. */ >>> +/* { dg-final { scan-tree-dump-times "_\[0-9]+ \& 32;" 10 "reassoc1"} } */ >> >> This part of the new testcase fails on aarch64 & arm: >> FAIL: gcc.dg/tree-ssa/pr64910-2.c scan-tree-dump-times reassoc1 >> "_[0-9]+ & 32;" 10 > Thanks. I'm investigating. So when there are precisely 2 operands the most recent change will create non-canonical gimple. Specifically we can end up with cst OP ssa_name On some targets (like aarch64, arm, likely others). Nothing actually cared -- the form got rewritten at some point and aarch64 at least generates the desired assembly code. But since the test was looking at the .reassoc dump, it failed. The fix is easy. Just restrict the operand swapping to cases where there are 3 or more ops. Bootstrapped and regression tested on aarch64. Committing. Jeff --------------E37BF97FBEAF7DB29E1CC98D Content-Type: text/plain; charset=UTF-8; name="P" Content-Transfer-Encoding: base64 Content-Disposition: inline; filename="P" Content-length: 2766 Y29tbWl0IDA3N2NmODgzYzNlOTgzMzY5ZTY4YjI3MzA3ZWM0OTQ0YzcwOTcz OTMKQXV0aG9yOiBsYXcgPGxhd0AxMzhiYzc1ZC0wZDA0LTA0MTAtOTYxZi04 MmVlNzJiMDU0YTQ+CkRhdGU6ICAgV2VkIFNlcCA2IDA1OjIwOjI1IDIwMTcg KzAwMDAKCiAgICAgICAgICAgIFBSIHRyZWUtb3B0aW1pemF0aW9uLzY0OTEw CiAgICAgICAgICAgICogdHJlZS1zc2EtcmVhc3NvYy5jIChyZWFzc29jaWF0 ZV9iYik6IFJlc3RyaWN0IGxhc3QgY2hhbmdlIHRvCiAgICAgICAgICAgIGNh c2VzIHdoZXJlIHdlIGhhdmUgMyBvciBtb3JlIG9wZXJhbmRzLgogICAgCiAg ICBnaXQtc3ZuLWlkOiBzdm4rc3NoOi8vZ2NjLmdudS5vcmcvc3ZuL2djYy90 cnVua0AyNTE3NTEgMTM4YmM3NWQtMGQwNC0wNDEwLTk2MWYtODJlZTcyYjA1 NGE0CgpkaWZmIC0tZ2l0IGEvZ2NjL0NoYW5nZUxvZyBiL2djYy9DaGFuZ2VM b2cKaW5kZXggNTM2Mzc5MDU3MjIuLjFiM2JkZGJiODA2IDEwMDY0NAotLS0g YS9nY2MvQ2hhbmdlTG9nCisrKyBiL2djYy9DaGFuZ2VMb2cKQEAgLTEsMyAr MSw5IEBACisyMDE3LTA5LTA1ICBKZWZmIExhdyAgPGxhd0ByZWRoYXQuY29t PgorCisJUFIgdHJlZS1vcHRpbWl6YXRpb24vNjQ5MTAKKwkqIHRyZWUtc3Nh LXJlYXNzb2MuYyAocmVhc3NvY2lhdGVfYmIpOiBSZXN0cmljdCBsYXN0IGNo YW5nZSB0bworCWNhc2VzIHdoZXJlIHdlIGhhdmUgMyBvciBtb3JlIG9wZXJh bmRzLgorCiAyMDE3LTA5LTA1ICBKYWt1YiBKZWxpbmVrICA8amFrdWJAcmVk aGF0LmNvbT4KIAogCVBSIG1pZGRsZS1lbmQvODE3NjgKZGlmZiAtLWdpdCBh L2djYy90cmVlLXNzYS1yZWFzc29jLmMgYi9nY2MvdHJlZS1zc2EtcmVhc3Nv Yy5jCmluZGV4IDc2MDQ4MTk2YjI3Li4yZmI2YWVmNTFkNyAxMDA2NDQKLS0t IGEvZ2NjL3RyZWUtc3NhLXJlYXNzb2MuYworKysgYi9nY2MvdHJlZS1zc2Et cmVhc3NvYy5jCkBAIC01NzYzLDE0ICs1NzYzLDE1IEBAIHJlYXNzb2NpYXRl X2JiIChiYXNpY19ibG9jayBiYikKIAkJCSAgICAgIldpZHRoID0gJWQgd2Fz IGNob3NlbiBmb3IgcmVhc3NvY2lhdGlvblxuIiwgd2lkdGgpOwogCiAKLQkJ ICAvKiBGb3IgYmluYXJ5IGJpdCBvcGVyYXRpb25zLCBpZiB0aGUgbGFzdCBv cGVyYW5kIGluCi0JCSAgICAgT1BTIGlzIGEgY29uc3RhbnQsIG1vdmUgaXQg dG8gdGhlIGZyb250LiAgVGhpcwotCQkgICAgIGhlbHBzIGVuc3VyZSB0aGF0 IHdlIGdlbmVyYXRlIChYICYgWSkgJiBDIHJhdGhlcgotCQkgICAgIHRoYW4g KFggJiBDKSAmIFkuICBUaGUgZm9ybWVyIHdpbGwgb2Z0ZW4gbWF0Y2gKLQkJ ICAgICBhIGNhbm9uaWNhbCBiaXQgdGVzdCB3aGVuIHdlIGdldCB0byBSVEwu ICAqLwotCQkgIGlmICgocmhzX2NvZGUgPT0gQklUX0FORF9FWFBSCi0JCSAg ICAgICB8fCByaHNfY29kZSA9PSBCSVRfSU9SX0VYUFIKLQkJICAgICAgIHx8 IHJoc19jb2RlID09IEJJVF9YT1JfRVhQUikKKwkJICAvKiBGb3IgYmluYXJ5 IGJpdCBvcGVyYXRpb25zLCBpZiB0aGVyZSBhcmUgYXQgbGVhc3QgMworCQkg ICAgIG9wZXJhbmRzIGFuZCB0aGUgbGFzdCBsYXN0IG9wZXJhbmQgaW4gT1BT IGlzIGEgY29uc3RhbnQsCisJCSAgICAgbW92ZSBpdCB0byB0aGUgZnJvbnQu ICBUaGlzIGhlbHBzIGVuc3VyZSB0aGF0IHdlIGdlbmVyYXRlCisJCSAgICAg KFggJiBZKSAmIEMgcmF0aGVyIHRoYW4gKFggJiBDKSAmIFkuICBUaGUgZm9y bWVyIHdpbGwKKwkJICAgICBvZnRlbiBtYXRjaCBhIGNhbm9uaWNhbCBiaXQg dGVzdCB3aGVuIHdlIGdldCB0byBSVEwuICAqLworCQkgIGlmIChvcHMubGVu Z3RoICgpICE9IDIKKwkJICAgICAgJiYgKHJoc19jb2RlID09IEJJVF9BTkRf RVhQUgorCQkgICAgICAgICAgfHwgcmhzX2NvZGUgPT0gQklUX0lPUl9FWFBS CisJCSAgICAgICAgICB8fCByaHNfY29kZSA9PSBCSVRfWE9SX0VYUFIpCiAJ CSAgICAgICYmIFRSRUVfQ09ERSAob3BzLmxhc3QgKCktPm9wKSA9PSBJTlRF R0VSX0NTVCkKIAkJICAgIHN0ZDo6c3dhcCAoKm9wc1swXSwgKm9wc1tvcHNf bnVtIC0gMV0pOwogCg== --------------E37BF97FBEAF7DB29E1CC98D--