From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id A59313857B83 for ; Tue, 28 Nov 2023 10:53:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A59313857B83 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A59313857B83 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701168805; cv=none; b=I93bdmcKkOFZYM87rEEorY6Ur937I/5vaVu6IHb3OpcukoE3xAHez76PNZUhnHOkEVqn8ecSdBQMzISvn8mZezmjzlgYB3Xvqye+d5rrv9wnhu74ErraM16hVLYRSTkppRXGMXERyjCSpl5e7J9Wp/AeXi9awZ7AjuqEJzotasA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701168805; c=relaxed/simple; bh=yvtZF2SS1b0SeJmdi7k7tZV08gLte66DdF9oohIzow8=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=vtOl8vor8XJ/px0+y9HfIpIgBNzXLyfFUgG704nvKOgSjLg+MZp5gdBEE36uF7Zp3WWxnApaKZCZWFLpFWtcY/wIpGyIQYHWAVt2pxcV/rpLYWCK4NOG0cIfsRzwI/4gPASozuEsasZMNMoKBtJIksUVJIJHKzls4WtklxD3TLc= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1701168803; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=tcnoJ+NkY7+5PFlv7vWE9WSX9Oj11BsK+DowYr3mgws=; b=C+iPGeAIhm8W9AkdoGC8vHucOnwQipvNBszJZsljF/9QTeq52YIp8rcMd6Ex605xE5GFRd UQVpsZYBD0Ay/NauzFwqw0/moc5eMju1s7GH0XgNawGIFQX/tSahbB0KWSfj8h/PxOVpWe iYwYo0Pkm0il8mQd9poh1lpfVTKbas8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-382-XxAFakvrPYC7TC2AsRM8_Q-1; Tue, 28 Nov 2023 05:53:21 -0500 X-MC-Unique: XxAFakvrPYC7TC2AsRM8_Q-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4740F84AEE1; Tue, 28 Nov 2023 10:53:19 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.194.53]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 033531121307; Tue, 28 Nov 2023 10:53:18 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 3ASArGvJ491376 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 28 Nov 2023 11:53:16 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 3ASArFGR491375; Tue, 28 Nov 2023 11:53:15 +0100 Date: Tue, 28 Nov 2023 11:53:15 +0100 From: Jakub Jelinek To: Manolis Tsamis Cc: Jeff Law , gcc-patches@gcc.gnu.org, Richard Biener , Vineet Gupta , Philipp Tomsich Subject: Re: [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets. Message-ID: Reply-To: Jakub Jelinek References: <20231016180107.2019608-1-manolis.tsamis@vrull.eu> MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Nov 28, 2023 at 11:45:58AM +0200, Manolis Tsamis wrote: > > But, while get_single_def_in_bb checks for > > if (DF_INSN_LUID (def) > DF_INSN_LUID (insn)) > > return NULL; > > (OT, why not DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_chain->ref)) > > instead of DF_INSN_LUID (def), then it doesn't need to look up > > DF_INSN_INFO_GET (def)?), nothing when walking all uses of def does such > > luid check. > > I didn't actually know about the difference between DF_INSN_LUID (def) > and DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_chain->ref)), so I just > used the more concise version. > Thanks for pointing this out, I could make this a separate change. Note, in the end in the actually tested patch (see the follow-up) I've just used DF_INSN_LUID, after looking up that DF_INSN_INFO_GET is fairly cheap - #define DF_INSN_INFO_GET(INSN) (df->insns[(INSN_UID (INSN))]) I was afraid it could be a hash table lookup. Because, while using that DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_chain->ref)) etc. is tiny bit faster (doesn't have to read the u2.insn_uid from ref and look it up in the table), it is less readable, and because it is just tiny bit, I guess readability/maintainability should win. > Indeed. Your observations got me thinking about this issue and I see > two main weaknesses with my current implementation: > > --- gcc/fold-mem-offsets.cc.jj 2023-11-02 07:49:17.060865772 +0100 > > +++ gcc/fold-mem-offsets.cc 2023-11-27 21:35:35.089007365 +0100 > > @@ -511,6 +511,7 @@ fold_offsets (rtx_insn *insn, rtx reg, b > > if (!success) > > return 0; > > > > + unsigned luid = DF_INSN_LUID (def); > > for (ref_link = uses; ref_link; ref_link = ref_link->next) > > { > > rtx_insn *use = DF_REF_INSN (ref_link->ref); > > @@ -534,6 +535,11 @@ fold_offsets (rtx_insn *insn, rtx reg, b > > if (use_set && MEM_P (SET_DEST (use_set)) > > && reg_mentioned_p (dest, SET_SRC (use_set))) > > return 0; > > + > > + /* Punt if use appears before def in the basic block. See > > + PR111601. */ > > + if (DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_link->ref)) < luid) > > + return 0; > I think it would be fitting to put this condition within the get_uses > function? This way it would reflect what exists in > get_single_def_in_bb. get_uses is where I've initially added it to, but to do it there one needs to also copy the DEBUG_INSN_P check in there (because we shouldn't be doing such tests on debug insns). If we put it into get_uses, maybe we should in sync with get_single_def_in_bb also check it is a use in the same bb. So, like this (so far untested)? Note, the earlier posted patch passed bootstrap/regtest on {powerpc64le,x86_64,i686}-linux. 2023-11-28 Jakub Jelinek PR bootstrap/111601 * fold-mem-offsets.cc (get_uses): Ignore DEBUG_INSN uses. Otherwise, punt if use is in a different basic block from INSN or appears before INSN in the same basic block. Formatting fixes. (get_single_def_in_bb): Formatting fixes. (fold_offsets_1, pass_fold_mem_offsets::execute): Comment formatting fixes. * g++.dg/opt/pr111601.C: New test. --- gcc/fold-mem-offsets.cc.jj 2023-11-02 07:49:17.060865772 +0100 +++ gcc/fold-mem-offsets.cc 2023-11-28 11:47:34.941679105 +0100 @@ -154,7 +154,7 @@ static int stats_fold_count; The definition is desired for REG used in INSN. Return the definition insn or NULL if there's no definition with the desired criteria. */ -static rtx_insn* +static rtx_insn * get_single_def_in_bb (rtx_insn *insn, rtx reg) { df_ref use; @@ -205,11 +205,10 @@ get_single_def_in_bb (rtx_insn *insn, rt /* Get all uses of REG which is set in INSN. Return the use list or NULL if a use is missing / irregular. If SUCCESS is not NULL then set it to false if there are missing / irregular uses and true otherwise. */ -static struct df_link* +static df_link * get_uses (rtx_insn *insn, rtx reg, bool *success) { df_ref def; - struct df_link *ref_chain, *ref_link; if (success) *success = false; @@ -221,18 +220,30 @@ get_uses (rtx_insn *insn, rtx reg, bool if (!def) return NULL; - ref_chain = DF_REF_CHAIN (def); + df_link *ref_chain = DF_REF_CHAIN (def); + int insn_luid = DF_INSN_LUID (insn); + basic_block insn_bb = BLOCK_FOR_INSN (insn); - for (ref_link = ref_chain; ref_link; ref_link = ref_link->next) + for (df_link *ref_link = ref_chain; ref_link; ref_link = ref_link->next) { /* Problem getting a use for this instruction. */ if (ref_link->ref == NULL) return NULL; + + rtx_insn *use = DF_REF_INSN (ref_link->ref); + if (DEBUG_INSN_P (use)) + continue; + if (DF_REF_CLASS (ref_link->ref) != DF_REF_REGULAR) return NULL; /* We do not handle REG_EQUIV/REG_EQ notes for now. */ if (DF_REF_FLAGS (ref_link->ref) & DF_REF_IN_NOTE) return NULL; + if (BLOCK_FOR_INSN (use) != insn_bb) + return NULL; + /* Punt if use appears before def in the basic block. See PR111601. */ + if (DF_INSN_LUID (use) < insn_luid) + return NULL; } if (success) @@ -255,8 +266,7 @@ fold_offsets (rtx_insn *insn, rtx reg, b If DO_RECURSION is true and ANALYZE is false then offset that would result from folding is computed and is returned through the pointer OFFSET_OUT. - The instructions that can be folded are recorded in FOLDABLE_INSNS. -*/ + The instructions that can be folded are recorded in FOLDABLE_INSNS. */ static bool fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion, HOST_WIDE_INT *offset_out, bitmap foldable_insns) @@ -846,8 +856,8 @@ pass_fold_mem_offsets::execute (function FOR_ALL_BB_FN (bb, fn) { /* There is a conflict between this pass and RISCV's shorten-memrefs - pass. For now disable folding if optimizing for size because - otherwise this cancels the effects of shorten-memrefs. */ + pass. For now disable folding if optimizing for size because + otherwise this cancels the effects of shorten-memrefs. */ if (optimize_bb_for_size_p (bb)) continue; --- gcc/testsuite/g++.dg/opt/pr111601.C.jj 2023-11-27 21:33:12.605006881 +0100 +++ gcc/testsuite/g++.dg/opt/pr111601.C 2023-11-27 21:34:47.267678510 +0100 @@ -0,0 +1,86 @@ +// PR bootstrap/111601 +// { dg-do run { target c++11 } } +// { dg-options "-O2 -fno-exceptions -fno-rtti -fprofile-generate" } +// { dg-require-profiling "-fprofile-generate" } +// { dg-final { cleanup-coverage-files } } + +struct tree_base +{ + int code:16; +}; +struct saved_scope +{ + void *pad[14]; + int x_processing_template_decl; +}; +struct saved_scope *scope_chain; +struct z_candidate +{ + tree_base *fn; + void *pad[11]; + z_candidate *next; + int viable; + int flags; +}; + +__attribute__((noipa)) struct z_candidate * +splice_viable (struct z_candidate *cands, bool strict_p, bool *any_viable_p) +{ + struct z_candidate *viable; + struct z_candidate **last_viable; + struct z_candidate **cand; + bool found_strictly_viable = false; + if (scope_chain->x_processing_template_decl) + strict_p = true; + viable = (z_candidate *) 0; + last_viable = &viable; + *any_viable_p = false; + cand = &cands; + while (*cand) + { + struct z_candidate *c = *cand; + if (!strict_p && (c->viable == 1 || ((int) (c->fn)->code) == 273)) + { + strict_p = true; + if (viable && !found_strictly_viable) + { + *any_viable_p = false; + *last_viable = cands; + cands = viable; + viable = (z_candidate *) 0; + last_viable = &viable; + } + } + if (strict_p ? c->viable == 1 : c->viable) + { + *last_viable = c; + *cand = c->next; + c->next = (z_candidate *) 0; + last_viable = &c->next; + *any_viable_p = true; + if (c->viable == 1) + found_strictly_viable = true; + } + else + cand = &c->next; + } + return viable ? viable : cands; +} + +int +main () +{ + saved_scope s{}; + scope_chain = &s; + z_candidate z[4] = {}; + z[0].next = &z[1]; + z[1].viable = 1; + z[1].next = &z[2]; + z[2].viable = 1; + z[2].next = &z[3]; + bool b; + z_candidate *c = splice_viable (&z[0], true, &b); + if (c != &z[1] || z[1].next != &z[2] || z[2].next) + __builtin_abort (); + return 0; +} Jakub