From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 3AED73858C60 for ; Wed, 8 Sep 2021 10:44:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3AED73858C60 Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-204-AoCDUFNdPRmDtMWhNHs7ug-1; Wed, 08 Sep 2021 06:44:51 -0400 X-MC-Unique: AoCDUFNdPRmDtMWhNHs7ug-1 Received: by mail-wm1-f72.google.com with SMTP id x10-20020a7bc76a000000b002f8cba3fd65so882701wmk.2 for ; Wed, 08 Sep 2021 03:44:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=gjOl72wfZA5izTEHfb+IAPuLFRk/leBNo+DEh86lNdw=; b=bPEdrVvYyogZGeFGatGjg6KItb/JYmofxXyhHuxsHoNSeLg3UHZ3QyX4bdlEzBpESk vk4lJtZFo1Q1IWxZZkKU9KH4dQq5zlkTaUTLm2LAP3lyoMXtsjcIJSkUFhBudnFDlJxg Lk5AnJB6bRVGFu47+WNYvFAOsTyx7gJdIsZZB2SZQ9m2ZEUVaPcMjzpfe7N3r2RtOA4q OQqFsV2mvGYXx1pmh5grj5jOwTzHYcInwnheO/tAiwLMwRAIoyVgwwLYG2RGSu9lkpBK CSlr+aomjeAnHA+HqmNDmCEJCh5IsN7MSuiO3pXfB9M0xyXGqcaP8Jyoq14C/fJ8hWAd vJ6A== X-Gm-Message-State: AOAM531HT307mKDKA85olYgR+cX7Bujis2kltccQTMgXihdcbwVSr9mz zui5W2bE/AudJFQGpQlYKblYtIg5ATrRx+J0fJEoWELO4nMyRi2qvEyM311oqRuNI+8A+9+uVs5 n5im8YQ8= X-Received: by 2002:a05:6000:1627:: with SMTP id v7mr3159349wrb.195.1631097890227; Wed, 08 Sep 2021 03:44:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwidum7x0KfFv0fT3N7FoNRLHmsgNG6lDSIYZHk9K/jtc0L+FCXuPw+eVd0Tg4ZsA0pBesSOA== X-Received: by 2002:a05:6000:1627:: with SMTP id v7mr3159323wrb.195.1631097889891; Wed, 08 Sep 2021 03:44:49 -0700 (PDT) Received: from abulafia.quesejoda.com ([139.47.33.227]) by smtp.gmail.com with ESMTPSA id e2sm1754494wra.40.2021.09.08.03.44.48 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 08 Sep 2021 03:44:49 -0700 (PDT) Subject: Re: More aggressive threading causing loop-interchange-9.c regression To: Michael Matz Cc: Jeff Law , Richard Biener , GCC Mailing List , Andrew MacLeod References: <09e48b82-bc51-405e-7680-89a5f08e4e8f@redhat.com> From: Aldy Hernandez Message-ID: Date: Wed, 8 Sep 2021 12:44:47 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/mixed; boundary="------------15C7DDA1488FBDBD560A87D2" Content-Language: en-US X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Sep 2021 10:44:57 -0000 This is a multi-part message in MIME format. --------------15C7DDA1488FBDBD560A87D2 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit First of all. Thanks so much for this incredibly useful explanation. It helps a lot. I'm still trying to wrap my head around this, so please bear with me. On 9/7/21 4:45 PM, Michael Matz wrote: > Hello, > > On Tue, 7 Sep 2021, Aldy Hernandez via Gcc wrote: > >> The regression comes from the simple_reduc_1() function in >> tree-ssa/loop-interchange-9.c, and it involves the following path: >> >> =========== BB 5 ============ >> Imports: n_10(D) j_19 >> Exports: n_10(D) j_13 j_19 >> j_13 : j_19(I) >> n_10(D) int [1, 257] >> j_19 int [0, 256] >> Relational : (j_13 > j_19) >> [local count: 118111600]: >> # sum_21 = PHI >> c[j_19] = sum_21; >> j_13 = j_19 + 1; >> if (n_10(D) > j_13) >> goto ; [89.00%] >> else >> goto ; [11.00%] > > So, this is the outer loops exit block ... > >> =========== BB 9 ============ >> n_10(D) int [2, 257] >> j_13 int [1, 256] >> Relational : (n_10(D) > j_19) >> Relational : (n_10(D) > j_13) >> [local count: 105119324]: >> goto ; [100.00%] > > ... this the outer loops latch block ... > >> =========== BB 3 ============ >> Imports: n_10(D) >> Exports: n_10(D) >> n_10(D) int [1, +INF] >> [local count: 118111600]: >> # j_19 = PHI >> sum_11 = c[j_19]; >> if (n_10(D) > 0) >> goto ; [89.00%] >> else >> goto ; [11.00%] > > ... and this the outer loops header, as well as inner loops pre-header. Actually, the inner loop's pre-header seems to be BB8, which is empty, and leads into the BB4 which is the header of the inner loop. > The attempted thread hence involves a back-edge (of the outer loop) and a > loop-entry edge (bb3->bb8) of the inner loop. Doing that threading would > destroy some properties that our loop optimizers rely on, e.g. that the > loop header of a natural loop is only reached by two edges: entry edge and > back edge, and that the latch blocks are empty and that there's only a > single latch. (What exactly we require depends on several flags in > loops_state_satisfies_p). But threading 3->8 doesn't involve a loop-entry edge. It involves a new edge into the *pre-header* of the inner loop. I am attaching the IL right after threading and before we clean up the CFG (il-after-threading.txt). After threading we have have rewritten the 5->9 edge into 5->11: [local count: 118111600]: # sum_21 = PHI c[j_19] = sum_21; j_13 = j_19 + 1; if (n_10(D) > j_13) goto ; [89.00%] else goto ; [11.00%] [local count: 105119324]: [local count: 105119324]: # j_7 = PHI sum_6 = c[j_19]; goto ; [100.00%] [local count: 0]: ;; This becomes unreachable. goto ; [100.00%] Notice BB9 becomes unreachable, so we're basically eliding the latch in BB9. Question: could BB12 not be considered the loop latch now and BB8 be the outer loop's entry? This would also mean, that BB3 is now outside of the outer loop, which means the threader peeled off the first iteration of the loop. Or is it a requirement that the latch be empty? > >> With knowledge from previous passes (SSA_NAME_RANGE_INFO), we know that >> the loop cannot legally iterate outside the size of c[256]. So, j_13 >> lies within [1, 257] and n_10 is [2, +INF] at the end of the path. >> This allows us to thread BB3 to BB8. > > So, IIUC doing this threading would create a new entry to BB8: it would > then be entered by its natural entry (bb3), by its natural back edge > (whatever bb that is now) and the new entry from the threaded outer back > edge (bb9 or bb5?). As I mentioned, BB8 looks like the pre-header to the inner loop. But yes, it now has multiple entries: the edge from BB3, and the back edge from BB12 (which seems like it could be a new latch, since it's the only back edge). > > The inner loop wouldn't then be recognized as natural anymore and the > whole nest not as perfect, and hence loop interchange can't easily happen > anymore. Most other structural loop optimizers of us would have problems > with that situation as well. If I clean up the CFG and call loop_optimizer_init() again at this point, what is destroyed is the outer loop, not the inner loop. If you look at the IL at this point (il-after-cleanup-and-rerunning-loop-init.txt), we have a latch in BB7 going back up to BB4, though the conditional is now in BB6 (eeech): ... ... ... [local count: 118111600]: # sum_21 = PHI # j_18 = PHI c[j_18] = sum_21; j_13 = j_18 + 1; if (n_10(D) > j_13) goto ; [89.00%] else goto ; [11.00%] [local count: 105119324]: # j_7 = PHI sum_6 = c[j_7]; goto ; [100.00%] Perhaps, this looks sufficiently different that the loop optimizer can't recognize it as a loop? > >> All the blocks lie within the same loop, and the path passes all the >> tests in path_profitable_p(). >> >> Is there something about the shape of this path that should make it >> invalid in the backward threader, or should the test be marked with >> -fdisable-tree-threadN (or something else entirely)? > > This is a functionality test checking that the very necessary interchange > in this test does happen with default plus -floop-interchange (with the > intention of it being enabled by O3 or with profile info). So no > additional flags can be added without changing the meaning of this test. Agreed. We're shuffling things around. What I'm missing here is why we're unable to recognize this new form as a loop. That being said, I am not above figuring out what the magic incantation is to keep the threader from doing this transformation. However, I just want to make sure we're not papering over something else. > >> Note that the >> backward threader is allowed to peel through loop headers. > > Something needs to give way in the path threaders before loop > optimizations: either threading through back edges, through loop latches > or through loop headers needs to be disabled. I think traditionally at > least threading through latches should be disabled, because doing so > usually destroys simple loop structure. I see that profitable_path_p() of > the backward threader wants to be careful in some situations involving > loops and latches; possibly it's not careful enough yet for the additional > power brought by ranger. > > See also the additional tests tree-cfgcleanup.c:tree_forwarder_block_p is > doing when loops are active. > > After the structural loop optimizers the threader can go wild and thread > everything it finds. Thanks again. This is very helpful. Aldy --------------15C7DDA1488FBDBD560A87D2 Content-Type: text/plain; charset=UTF-8; name="il-after-threading.txt" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="il-after-threading.txt" X19hdHRyaWJ1dGVfXygobm9pbmxpbmUpKQp2b2lkIHNpbXBsZV9yZWR1Y18xIChpbnQgbikKewog IGludCBpOwogIGludCBzdW07CiAgaW50IGo7CiAgaW50IF8xOwogIGludCBfMjsKICBpbnQgXzM7 CgogIDxiYiAyPiBbbG9jYWwgY291bnQ6IDE0NTk4MDYzXToKICBpZiAobl8xMChEKSA+IDApCiAg ICBnb3RvIDxiYiA3PjsgWzg5LjAwJV0KICBlbHNlCiAgICBnb3RvIDxiYiA2PjsgWzExLjAwJV0K CiAgPGJiIDc+IFtsb2NhbCBjb3VudDogMTI5OTIyNzZdOgoKICA8YmIgMz4gW2xvY2FsIGNvdW50 OiAxMjk5MjI3Nl06CiAgIyBqXzE5ID0gUEhJIDxqXzEzKDkpLCAwKDcpPgogIHN1bV8xMSA9IGNb al8xOV07CiAgaWYgKG5fMTAoRCkgPiAwKQogICAgZ290byA8YmIgOD47IFs2Ni45MiVdCiAgZWxz ZQogICAgZ290byA8YmIgNT47IFszMy4wOCVdCgogIDxiYiA4PiBbbG9jYWwgY291bnQ6IDEwNTEx OTMyNF06CgogIDxiYiA0PiBbbG9jYWwgY291bnQ6IDk1NTYzMDIyNV06CiAgIyBzdW1fMjAgPSBQ SEkgPHN1bV8xNCgxMCksIHN1bV8xMSg4KT4KICAjIGlfMjIgPSBQSEkgPGlfMTUoMTApLCAwKDgp PgogIF8xID0gYVtpXzIyXVtqXzE5XTsKICBfMiA9IGJbaV8yMl1bal8xOV07CiAgXzMgPSBfMSAq IF8yOwogIHN1bV8xNCA9IF8zICsgc3VtXzIwOwogIGlfMTUgPSBpXzIyICsgMTsKICBpZiAobl8x MChEKSA+IGlfMTUpCiAgICBnb3RvIDxiYiAxMD47IFs4OS4wMCVdCiAgZWxzZQogICAgZ290byA8 YmIgNT47IFsxMS4wMCVdCgogIDxiYiAxMD4gW2xvY2FsIGNvdW50OiA4NTA1MTA5MDFdOgogIGdv dG8gPGJiIDQ+OyBbMTAwLjAwJV0KCiAgPGJiIDU+IFtsb2NhbCBjb3VudDogMTE4MTExNjAwXToK ICAjIHN1bV8yMSA9IFBISSA8c3VtXzE0KDQpLCBzdW1fMTEoMyk+CiAgY1tqXzE5XSA9IHN1bV8y MTsKICBqXzEzID0gal8xOSArIDE7CiAgaWYgKG5fMTAoRCkgPiBqXzEzKQogICAgZ290byA8YmIg MTE+OyBbODkuMDAlXQogIGVsc2UKICAgIGdvdG8gPGJiIDY+OyBbMTEuMDAlXQoKICA8YmIgMTE+ IFtsb2NhbCBjb3VudDogMTA1MTE5MzI0XToKCiAgPGJiIDEyPiBbbG9jYWwgY291bnQ6IDEwNTEx OTMyNF06CiAgIyBqXzcgPSBQSEkgPGpfMTMoMTEpPgogIHN1bV82ID0gY1tqXzE5XTsKICBnb3Rv IDxiYiA4PjsgWzEwMC4wMCVdCgogIDxiYiA5PiBbbG9jYWwgY291bnQ6IDBdOgogIGdvdG8gPGJi IDM+OyBbMTAwLjAwJV0KCiAgPGJiIDY+IFtsb2NhbCBjb3VudDogMTQ1OTgwNjNdOgogIHJldHVy bjsKCn0K --------------15C7DDA1488FBDBD560A87D2 Content-Type: text/plain; charset=UTF-8; name="il-after-cleanup-and-rerunning-loop-init.txt" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="il-after-cleanup-and-rerunning-loop-init.txt" dm9pZCBzaW1wbGVfcmVkdWNfMSAoaW50IG4pCnsKICBpbnQgaTsKICBpbnQgc3VtOwogIGludCBq OwogIGludCBfMTsKICBpbnQgXzI7CiAgaW50IF8zOwoKICA8YmIgMj4gW2xvY2FsIGNvdW50OiAx NDU5ODA2M106CiAgaWYgKG5fMTAoRCkgPiAwKQogICAgZ290byA8YmIgMz47IFs4OS4wMCVdCiAg ZWxzZQogICAgZ290byA8YmIgOD47IFsxMS4wMCVdCgogIDxiYiAzPiBbbG9jYWwgY291bnQ6IDEy OTkyMjc2XToKICAjIGpfMTkgPSBQSEkgPDAoMik+CiAgc3VtXzExID0gY1tqXzE5XTsKICBpZiAo bl8xMChEKSA+IDApCiAgICBnb3RvIDxiYiA0PjsgWzY2LjkyJV0KICBlbHNlCiAgICBnb3RvIDxi YiA2PjsgWzMzLjA4JV0KCiAgPGJiIDQ+IFtsb2NhbCBjb3VudDogMTA1MTE5MzI0XToKICAjIHN1 bV81ID0gUEhJIDxzdW1fMTEoMyksIHN1bV82KDcpPgogICMgal8xNyA9IFBISSA8al8xOSgzKSwg al83KDcpPgoKICA8YmIgNT4gW2xvY2FsIGNvdW50OiA5NTU2MzAyMjVdOgogICMgc3VtXzIwID0g UEhJIDxzdW1fMTQoOSksIHN1bV81KDQpPgogICMgaV8yMiA9IFBISSA8aV8xNSg5KSwgMCg0KT4K ICBfMSA9IGFbaV8yMl1bal8xN107CiAgXzIgPSBiW2lfMjJdW2pfMTddOwogIF8zID0gXzEgKiBf MjsKICBzdW1fMTQgPSBfMyArIHN1bV8yMDsKICBpXzE1ID0gaV8yMiArIDE7CiAgaWYgKG5fMTAo RCkgPiBpXzE1KQogICAgZ290byA8YmIgOT47IFs4OS4wMCVdCiAgZWxzZQogICAgZ290byA8YmIg Nj47IFsxMS4wMCVdCgogIDxiYiA5PiBbbG9jYWwgY291bnQ6IDg1MDUxMDkwMV06CiAgZ290byA8 YmIgNT47IFsxMDAuMDAlXQoKICA8YmIgNj4gW2xvY2FsIGNvdW50OiAxMTgxMTE2MDBdOgogICMg c3VtXzIxID0gUEhJIDxzdW1fMTQoNSksIHN1bV8xMSgzKT4KICAjIGpfMTggPSBQSEkgPGpfMTco NSksIGpfMTkoMyk+CiAgY1tqXzE4XSA9IHN1bV8yMTsKICBqXzEzID0gal8xOCArIDE7CiAgaWYg KG5fMTAoRCkgPiBqXzEzKQogICAgZ290byA8YmIgNz47IFs4OS4wMCVdCiAgZWxzZQogICAgZ290 byA8YmIgOD47IFsxMS4wMCVdCgogIDxiYiA3PiBbbG9jYWwgY291bnQ6IDEwNTExOTMyNF06CiAg IyBqXzcgPSBQSEkgPGpfMTMoNik+CiAgc3VtXzYgPSBjW2pfN107CiAgZ290byA8YmIgND47IFsx MDAuMDAlXQoKICA8YmIgOD4gW2xvY2FsIGNvdW50OiAxNDU5ODA2M106CiAgcmV0dXJuOwoKfQo= --------------15C7DDA1488FBDBD560A87D2--