From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 6E7E93858C36 for ; Mon, 15 May 2023 15:03:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6E7E93858C36 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684163026; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=6OlwOzrosR3OLibEzdoUDDnCrhjXH7tdJn3/IK+Cvms=; b=SMJ5RHE3zdBZoOv0KZyBeDSpYoEQevz7bVf751SvJGS9EcB+TOe5bHOZjXp8E+qeyCufdG t+iRIqfbnGWRIUxBrOZHstxtVkz7JypMeoZN7JT4HtZlr2VOnxNpFUO4FK02QatmtyI4v/ 2L4GMAzjcvFb3CUi1GGav+83BQEoDos= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-130-U99wpU8qNJumC98vzK35qw-1; Mon, 15 May 2023 11:03:44 -0400 X-MC-Unique: U99wpU8qNJumC98vzK35qw-1 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-3f42b36733aso26205365e9.3 for ; Mon, 15 May 2023 08:03:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684163023; x=1686755023; h=in-reply-to:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=1W51UNGyqqBdN0rVNUuto+OHnLTWLoAsJwuqWNouJS0=; b=OVyzWuIXVSZC33/uVsBh/kGPftMrgfHlLCZLwRCS0lpiTna5SAzgMIhXdBdYkGtviy ZUlmSk1hgzDwOLQFDJnJe2QVmWgMTSScx3VQC2ENH4cYZHXEVmxWSiVYB3FJfGFSpRaJ 1r3IoFgKiZ+e5/gFV8HLOcBgn1+PKszmQUlz2d56dTiTlE7vHLmqm53uDXciNyHf4HEw C65pGh0pvadihI/TJ7FCVyhLE1fwOLdpw3dTAzLUpya1eMLxTokaZgeT5BIrx/8qcLa+ B+UKyvQ7rBDZ7v+4QluFRyGG0/QfF7pkkKYxFMsCbi7/B4TomuB4pfBKt4rU0dIGZDAP 22WA== X-Gm-Message-State: AC+VfDyNNlIHspT90U1TTQjiMfvCU5Bmfwvy3b1JiXqK0zJOnVgWra8K JEG5GJv0aKIOQZp+GAliafCvlQ148sY/fE9nbrzgfw0MoehHD8SOZcYgv4XHqIgbWHPKI04hqKE XMaufz69Am643uuhQDg== X-Received: by 2002:a1c:7205:0:b0:3f5:176:c398 with SMTP id n5-20020a1c7205000000b003f50176c398mr4527774wmc.31.1684163023277; Mon, 15 May 2023 08:03:43 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7qAfIxJCI/wyKWu8XnaqNw0YtmAounuYAMy6M2gh5niAWXdJceOEVS1xnF44Mhtl9HtuQuQA== X-Received: by 2002:a1c:7205:0:b0:3f5:176:c398 with SMTP id n5-20020a1c7205000000b003f50176c398mr4527756wmc.31.1684163022875; Mon, 15 May 2023 08:03:42 -0700 (PDT) Received: from [192.168.1.201] ([139.47.42.170]) by smtp.gmail.com with ESMTPSA id j15-20020a05600c1c0f00b003f1738d0d13sm28979776wms.1.2023.05.15.08.03.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 15 May 2023 08:03:42 -0700 (PDT) Message-ID: <27b0ef44-02d3-fed3-f5b6-6651d4293826@redhat.com> Date: Mon, 15 May 2023 17:03:41 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [PATCH] Add auto-resizing capability to irange's [PR109695] To: Richard Biener Cc: GCC patches , Andrew MacLeod , Jakub Jelinek , mjambor@suse.cz References: <20230515103523.100412-1-aldyh@redhat.com> From: Aldy Hernandez In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/mixed; boundary="------------vaDOPzrOLKonpYYi8Cpe5wet" Content-Language: en-US X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,NICE_REPLY_A,RCVD_IN_BARRACUDACENTRAL,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multi-part message in MIME format. --------------vaDOPzrOLKonpYYi8Cpe5wet Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 5/15/23 13:08, Richard Biener wrote: > On Mon, May 15, 2023 at 12:35 PM Aldy Hernandez wrote: >> >> >> We can now have int_range for automatically >> resizable ranges. int_range_max is now int_range<3, true> >> for a 69X reduction in size from current trunk, and 6.9X reduction from >> GCC12. This incurs a 5% performance penalty for VRP that is more than >> covered by our > 13% improvements recently. >> >> >> int_range_max is the temporary range object we use in the ranger for >> integers. With the conversion to wide_int, this structure bloated up >> significantly because wide_ints are huge (80 bytes a piece) and are >> about 10 times as big as a plain tree. Since the temporary object >> requires 255 sub-ranges, that's 255 * 80 * 2, plus the control word. >> This means the structure grew from 4112 bytes to 40912 bytes. >> >> This patch adds the ability to resize ranges as needed, defaulting to >> no resizing, while int_range_max now defaults to 3 sub-ranges (instead >> of 255) and grows to 255 when the range being calculated does not fit. >> >> For example: >> >> int_range<1> foo; // 1 sub-range with no resizing. >> int_range<5> foo; // 5 sub-ranges with no resizing. >> int_range<5, true> foo; // 5 sub-ranges with resizing. >> >> I ran some tests and found that 3 sub-ranges cover 99% of cases, so >> I've set the int_range_max default to that: >> >> typedef int_range<3, /*RESIZABLE=*/true> int_range_max; >> >> We don't bother growing incrementally, since the default covers most >> cases and we have a 255 hard-limit. This hard limit could be reduced >> to 128, since my tests never saw a range needing more than 124, but we >> could do that as a follow-up if needed. >> >> With 3-subranges, int_range_max is now 592 bytes versus 40912 for >> trunk, and versus 4112 bytes for GCC12! The penalty is 5.04% for VRP >> and 3.02% for threading, with no noticeable change in overall >> compilation (0.27%). This is more than covered by our 13.26% >> improvements for the legacy removal + wide_int conversion. > > Thanks for doing this. > >> I think this approach is a good alternative, while providing us with >> flexibility going forward. For example, we could try defaulting to a >> 8 sub-ranges for a noticeable improvement in VRP. We could also use >> large sub-ranges for switch analysis to avoid resizing. >> >> Another approach I tried was always resizing. With this, we could >> drop the whole int_range nonsense, and have irange just hold a >> resizable range. This simplified things, but incurred a 7% penalty on >> ipa_cp. This was hard to pinpoint, and I'm not entirely convinced >> this wasn't some artifact of valgrind. However, until we're sure, >> let's avoid massive changes, especially since IPA changes are coming >> up. >> >> For the curious, a particular hot spot for IPA in this area was: >> >> ipcp_vr_lattice::meet_with_1 (const value_range *other_vr) >> { >> ... >> ... >> value_range save (m_vr); >> m_vr.union_ (*other_vr); >> return m_vr != save; >> } >> >> The problem isn't the resizing (since we do that at most once) but the >> fact that for some functions with lots of callers we end up a huge >> range that gets copied and compared for every meet operation. Maybe >> the IPA algorithm could be adjusted somehow??. > > Well, the above just wants to know whether the union_ operation changed > the range. I suppose that would be an interesting (and easy to compute?) > secondary output of union_ and it seems it already computes that (but > maybe not correctly?). So I suggest to change the above to union_ returns a value specifically for that, which Andrew uses for cache optimization. For that matter, your suggestion was my first approach, but I quickly found out we were being overly pessimistic in some cases, and I was too lazy to figure out why. > > bool res; > if (flag_checking) > { > value_range save (m_vr); > res = m_vr.union_ (*other_vr); > gcc_assert (res == (m_vr != save)); > } > else > res = m_vr.union (*other_vr); > return res; With your suggested sanity check I chased the problem to a minor inconsistency when unioning nonzero masks. The issue wasn't a bug, just a pessimization. I'm attaching a patch that corrects the oversight (well, not oversight, everything was more expensive with trees)... It yields a 6.89% improvement to the ipa-cp pass!!! Thanks. I'll push it if it passes tests. BTW, without the annoying IPA-cp performance regression, this paves the way for nuking int_range in favor of just irange, and have everything resize as needed. I'll wait for Andrew to chime in when he returns from PTO, since we may want to leave int_range around since it does provide flexibility (at the expensive of fugly looking declarations). Aldy --------------vaDOPzrOLKonpYYi8Cpe5wet Content-Type: text/x-patch; charset=UTF-8; name="0001-Only-return-changed-true-in-union_nonzero-when-appro.patch" Content-Disposition: attachment; filename*0="0001-Only-return-changed-true-in-union_nonzero-when-appro.pa"; filename*1="tch" Content-Transfer-Encoding: base64 RnJvbSA2YTczNTRkMzQ5NDY2NWQ0NmY4Y2JmYzcxYzU4Zjc4NGMwMjE0MmZmIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBBbGR5IEhlcm5hbmRleiA8YWxkeWhAcmVkaGF0LmNvbT4KRGF0 ZTogTW9uLCAxNSBNYXkgMjAyMyAxNToxMDoxMSArMDIwMApTdWJqZWN0OiBbUEFUQ0hdIE9ubHkg cmV0dXJuIGNoYW5nZWQ9dHJ1ZSBpbiB1bmlvbl9ub256ZXJvIHdoZW4gYXBwcm9wcmlhdGUuCgpp cmFuZ2U6OnVuaW9uXyB3YXMgYmVpbmcgb3Zlcmx5IHBlc3NpbWlzdGljIGluIGl0cyByZXR1cm4g dmFsdWUuICBJdAp3YXMgcmV0dXJuaW5nIGZhbHNlIHdoZW4gdGhlIG5vbnplcm8gbWFzayB3YXMg cG9zc2libHkgdGhlIHNhbWUuCgpUaGUgcmVhc29uIGZvciB0aGlzIGlzIGJlY2F1c2UgdGhlIG5v bnplcm8gbWFzayBpcyBub3QgZW50aXJlbHkga2VwdAp1cCB0byBkYXRlLiAgV2UgYXZvaWQgc2V0 dGluZyBpdCB1cCB3aGVuIGEgbmV3IHJhbmdlIGlzIHNldCAoZnJvbSBhCnNldCwgaW50ZXJzZWN0 LCB1bmlvbiwgZXRjKSwgYmVjYXVzZSBjYWxjdWxhdGluZyBhIG1hc2sgZnJvbSBhIHJhbmdlCmlz IG1lYXN1cmFibHkgZXhwZW5zaXZlLiAgSG93ZXZlciwgaXJhbmdlOjpnZXRfbm9uemVyb19iaXRz KCkgd2lsbAphbHdheXMgcmV0dXJuIHRoZSBjb3JyZWN0IG1hc2sgYmVjYXVzZSBpdCB3aWxsIGNh bGN1bGF0ZSB0aGUgbm9uemVybwptYXNrIGluaGVyaXQgaW4gdGhlIG1hc2sgb24gdGhlIGZseSBh bmQgYml0d2lzZSBvciBpdCB3aXRoIHRoZSBzYXZlZAptYXNrLiAgVGhpcyB3YXMgYW4gb3B0aW1p emF0aW9uIGJlY2F1c2UgbGFzdCByZWxlYXNlIGl0IHdhcyBhIGJpZwpwZW5hbHR5IHRvIGtlZXAg dGhlIG1hc2sgdXAgdG8gZGF0ZS4gIFRoaXMgbWF5IG5vdCBuZWNlc3NhcmlseSBiZSB0aGUKY2Fz ZSB3aXRoIHRoZSBjb252ZXJzaW9uIHRvIHdpZGVfaW50J3MuICBXZSBzaG91bGQgaW52ZXN0aWdh dGUuCgpKdXN0IHRvIGJlIGNsZWFyLCB0aGUgcmVzdWx0IGZyb20gZ2V0X25vbnplcm9fYml0cygp IGlzIGFsd2F5cyBjb3JyZWN0CmFzIHNlZW4gYnkgdGhlIHVzZXIsIGJ1dCB0aGUgd2lkZV9pbnQg aW4gdGhlIGlyYW5nZSBkb2VzIG5vdCBjb250YWluCmFsbCB0aGUgaW5mb3JtYXRpb24sIHNpbmNl IHBhcnQgb2YgdGhlIG5vbnplcm8gYml0cyBjYW4gYmUgZGV0ZXJtaW5lZApieSB0aGUgcmFuZ2Ug aXRzZWxmLCBvbiB0aGUgZmx5LgoKVGhlIGZpeCBoZXJlIGlzIHRvIG1ha2Ugc3VyZSB0aGF0IHRo ZSByZXN1bHQgdGhlIHVzZXIgc2VlcyAoY2FsbGVycyBvZgpnZXRfbm9uemVyb19iaXRzKCkpIGNo YW5nZWQgd2hlbiB1bmlvbmluZyBiaXRzLiAgVGhpcyBhbGxvd3MKaXBjcF92cl9sYXR0aWNlOjpt ZWV0X3dpdGhfMSB0byBhdm9pZCB1bm5lY2Vzc2FyeSBjb3BpZXMgd2hlbgpkZXRlcm1pbmluZyBp ZiBhIHJhbmdlIGNoYW5nZWQuCgpUaGlzIHBhdGNoIHlpZWxkcyBhbiA2Ljg5JSBpbXByb3ZlbWVu dCB0byB0aGUgaXBhX2NwIHBhc3MuICBJJ20KaW5jbHVkaW5nIHRoZSBJUEEgY2hhbmdlcyBpbiB0 aGlzIHBhdGNoLCBhcyBpdCdzIGEgdGVzdGNhc2Ugb2Ygc29ydHMgZm9yCnRoZSBjaGFuZ2UuCgpn Y2MvQ2hhbmdlTG9nOgoKCSogaXBhLWNwLmNjIChpcGNwX3ZyX2xhdHRpY2U6Om1lZXRfd2l0aF8x KTogQXZvaWQgdW5uZWNlc3NhcnkKCXJhbmdlIGNvcHlpbmcKCSogdmFsdWUtcmFuZ2UuY2MgKGly YW5nZTo6dW5pb25fbm9uemVyb19iaXRzKTogUmV0dXJuIFRSVUUgb25seQoJd2hlbiByYW5nZSBj aGFuZ2VkLgotLS0KIGdjYy9pcGEtY3AuY2MgICAgICB8IDEzICsrKysrKysrKystLS0KIGdjYy92 YWx1ZS1yYW5nZS5jYyB8ICA1ICsrKy0tCiAyIGZpbGVzIGNoYW5nZWQsIDEzIGluc2VydGlvbnMo KyksIDUgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvZ2NjL2lwYS1jcC5jYyBiL2djYy9pcGEt Y3AuY2MKaW5kZXggMWY1ZTBlMTM4NzIuLjhjZDBmYTJjYWU3IDEwMDY0NAotLS0gYS9nY2MvaXBh LWNwLmNjCisrKyBiL2djYy9pcGEtY3AuY2MKQEAgLTEwNDAsOSArMTA0MCwxNiBAQCBpcGNwX3Zy X2xhdHRpY2U6Om1lZXRfd2l0aF8xIChjb25zdCB2YWx1ZV9yYW5nZSAqb3RoZXJfdnIpCiAgIGlm IChvdGhlcl92ci0+dmFyeWluZ19wICgpKQogICAgIHJldHVybiBzZXRfdG9fYm90dG9tICgpOwog Ci0gIHZhbHVlX3JhbmdlIHNhdmUgKG1fdnIpOwotICBtX3ZyLnVuaW9uXyAoKm90aGVyX3ZyKTsK LSAgcmV0dXJuIG1fdnIgIT0gc2F2ZTsKKyAgYm9vbCByZXM7CisgIGlmIChmbGFnX2NoZWNraW5n KQorICAgIHsKKyAgICAgIHZhbHVlX3JhbmdlIHNhdmUgKG1fdnIpOworICAgICAgcmVzID0gbV92 ci51bmlvbl8gKCpvdGhlcl92cik7CisgICAgICBnY2NfYXNzZXJ0IChyZXMgPT0gKG1fdnIgIT0g c2F2ZSkpOworICAgIH0KKyAgZWxzZQorICAgIHJlcyA9IG1fdnIudW5pb25fICgqb3RoZXJfdnIp OworICByZXR1cm4gcmVzOwogfQogCiAvKiBSZXR1cm4gdHJ1ZSBpZiB2YWx1ZSByYW5nZSBpbmZv cm1hdGlvbiBpbiB0aGUgbGF0dGljZSBpcyB5ZXQgdW5rbm93bi4gICovCmRpZmYgLS1naXQgYS9n Y2MvdmFsdWUtcmFuZ2UuY2MgYi9nY2MvdmFsdWUtcmFuZ2UuY2MKaW5kZXggZGVmOTI5OWRjMGUu LmEzNDFjZWNlODZkIDEwMDY0NAotLS0gYS9nY2MvdmFsdWUtcmFuZ2UuY2MKKysrIGIvZ2NjL3Zh bHVlLXJhbmdlLmNjCkBAIC0xODU5LDEyICsxODU5LDEzIEBAIGlyYW5nZTo6dW5pb25fbm9uemVy b19iaXRzIChjb25zdCBpcmFuZ2UgJnIpCiAgIGJvb2wgY2hhbmdlZCA9IGZhbHNlOwogICBpZiAo bV9ub256ZXJvX21hc2sgIT0gci5tX25vbnplcm9fbWFzaykKICAgICB7Ci0gICAgICBtX25vbnpl cm9fbWFzayA9IGdldF9ub256ZXJvX2JpdHMgKCkgfCByLmdldF9ub256ZXJvX2JpdHMgKCk7Cisg ICAgICB3aWRlX2ludCBzYXZlID0gZ2V0X25vbnplcm9fYml0cyAoKTsKKyAgICAgIG1fbm9uemVy b19tYXNrID0gc2F2ZSB8IHIuZ2V0X25vbnplcm9fYml0cyAoKTsKICAgICAgIC8vIE5vIG5lZWQg dG8gY2FsbCBzZXRfcmFuZ2VfZnJvbV9ub256ZXJvX2JpdHMsIGJlY2F1c2Ugd2UnbGwKICAgICAg IC8vIG5ldmVyIG5hcnJvdyB0aGUgcmFuZ2UuICBCZXNpZGVzLCBpdCB3b3VsZCBjYXVzZSBlbmRs ZXNzCiAgICAgICAvLyByZWN1cnNpb24gYmVjYXVzZSBvZiB0aGUgdW5pb25fIGluCiAgICAgICAv LyBzZXRfcmFuZ2VfZnJvbV9ub256ZXJvX2JpdHMuCi0gICAgICBjaGFuZ2VkID0gdHJ1ZTsKKyAg ICAgIGNoYW5nZWQgPSBtX25vbnplcm9fbWFzayAhPSBzYXZlOwogICAgIH0KICAgbm9ybWFsaXpl X2tpbmQgKCk7CiAgIGlmIChmbGFnX2NoZWNraW5nKQotLSAKMi40MC4wCgo= --------------vaDOPzrOLKonpYYi8Cpe5wet--