From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 9FD1E3858D33 for ; Thu, 16 Feb 2023 09:36:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9FD1E3858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676540182; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TdTbM0b7iWv0Yw+BJtQSzKkKnEca6t5vFbhyF0Qf+Sc=; b=Aqug34H2ceuKrJtu3guVBqt2ZAlM0NeN/XC/TkIrBLdK5wYNNiJ66jgSoGJYZZyB0DCC+p yYOrm77IL5Fhmm7n/yocgH7Ox84CzzhILrWMqN/8tKwQ3OKKKSnO1OQVrKQMqFUDOuTgF2 AXov2W207XwS5LQTX8zzWKaR4nHGndM= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-210-kilizMTUO8Ky4L0JhwX8-Q-1; Thu, 16 Feb 2023 04:36:20 -0500 X-MC-Unique: kilizMTUO8Ky4L0JhwX8-Q-1 Received: by mail-wm1-f71.google.com with SMTP id l21-20020a05600c4f1500b003e00be23a70so2848757wmq.2 for ; Thu, 16 Feb 2023 01:36:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TdTbM0b7iWv0Yw+BJtQSzKkKnEca6t5vFbhyF0Qf+Sc=; b=vXmy2bvODPlEouek8n239riogYvFADCIXwgo4UmHovrzq4CNfmwvQqMWrfbm/n0OhF G1KbYDir10Gb1x+VmeqP+GB7lcyWeo3cXkaKGMBvvSWpyH1xNaJFEialf5b16GHht1Le b+5DhXzWU2r4rFT8x5ymjo5TZ6NYmK1SmbVlUb3totyCQ2iJg3cynZS4KbbZ8gxN76E3 kzKJ+JlmCqywfnDn/0P2yAkDII1A7FOsw1OvddHFJDuwBxDX2KksmXTXiAi4YsQD+9xT Wdqdx0DJ1osIBCpJNDFZCRDhC5n5NfBZIHf29Y/a5aVNfFgWDwXj5fuVZWzVXO1Lh9vv ZAsg== X-Gm-Message-State: AO0yUKXitADNmQkEuxpQjrpg4tems3pCoDcfViNisnxbghoZYfz6ZhAd oEHkr38Ka9JRE+UAdswMT2jlM+43+hpf9R6K4r3E1tFbmybteROwR9O3YMnO+CIXuVbc8NY1U7N hbUiy/SVkWhVqwP4Jdg== X-Received: by 2002:a05:600c:43ca:b0:3e2:6ef:ac1a with SMTP id f10-20020a05600c43ca00b003e206efac1amr3111033wmn.22.1676540179535; Thu, 16 Feb 2023 01:36:19 -0800 (PST) X-Google-Smtp-Source: AK7set9yYkWn0DBwqjDQT7HcM69OuZFVb8CMkWZB8dkvwkKhjE0/A82Jt+6bXq/F9sVJ0/Xq2e0kIQ== X-Received: by 2002:a05:600c:43ca:b0:3e2:6ef:ac1a with SMTP id f10-20020a05600c43ca00b003e206efac1amr3111020wmn.22.1676540179236; Thu, 16 Feb 2023 01:36:19 -0800 (PST) Received: from [192.168.1.201] ([139.47.42.170]) by smtp.gmail.com with ESMTPSA id he8-20020a05600c540800b003e208cec49bsm2769358wmb.3.2023.02.16.01.36.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 16 Feb 2023 01:36:18 -0800 (PST) Message-ID: Date: Thu, 16 Feb 2023 10:36:18 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1 Subject: Re: [PATCH] PR tree-optimization/108697 - Create a lazy ssa_cache To: Richard Biener , Andrew MacLeod Cc: gcc-patches References: <86ad2755-1e70-6c19-89ed-7817d61a5053@redhat.com> From: Aldy Hernandez In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_BARRACUDACENTRAL,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 2/16/23 08:55, Richard Biener wrote: > On Wed, Feb 15, 2023 at 6:07 PM Andrew MacLeod via Gcc-patches > wrote: >> >> This patch implements the suggestion that we have an alternative >> ssa-cache which does not zero memory, and instead uses a bitmap to track >> whether a value is currently set or not. It roughly mimics what >> path_range_query was doing internally. >> >> For sparsely used cases, expecially in large programs, this is more >> efficient. I changed path_range_query to use this, and removed it old >> bitmap (and a hack or two around PHI calculations), and also utilized >> this is the assume_query class. >> >> Performance wise, the patch doesn't affect VRP (since that still uses >> the original version). Switching to the lazy version caused a slowdown >> of 2.5% across VRP. >> >> There was a noticeable improvement elsewhere., across 230 GCC source >> files, threading ran over 12% faster!. Overall compilation improved by >> 0.3% Not sure it makes much difference in compiler.i, but it shouldn't >> hurt. >> >> bootstraps on x86_64-pc-linux-gnu with no regressions. OK for trunk? >> or do you want to wait for the next release... > > I see > > @@ -365,16 +335,8 @@ path_range_query::compute_ranges_in_phis (basic_block bb) > > Value_Range r (TREE_TYPE (name)); > if (range_defined_in_block (r, name, bb)) > - { > - unsigned v = SSA_NAME_VERSION (name); > - set_cache (r, name); > - bitmap_set_bit (phi_set, v); > - // Pretend we don't have a cache entry for this name until > - // we're done with all PHIs. > - bitmap_clear_bit (m_has_cache_entry, v); > - } > + m_cache.set_global_range (name, r); > } > - bitmap_ior_into (m_has_cache_entry, phi_set); > } > > // Return TRUE if relations may be invalidated after crossing edge E. > > which I think is not correct - if we have > > # _1 = PHI <..., _2> > # _2 = PHI <..., _1> > > then their effects are supposed to be executed in parallel, that is, > both PHI argument _2 and _1 are supposed to see the "old" version. > The previous code tried to make sure the range of the new _1 doesn't > get seen when processing the argument _1 in the definition of _2. Yes, the effects should appear in parallel, but ssa_range_in_phi() which is the only thing range_defined_in_block does for PHIs, is guaranteed to not do any additional cache lookups. The comment there should be adjusted to make this clear: // Since PHIs are calculated in parallel at the beginning of the // block, we must be careful to never save anything to the cache here. // It is the caller's responsibility to adjust the cache. Also, // calculating the PHI's range must not trigger additional lookups. We should instead say: "we must be careful to never set or access the cache here"... This was the original intent, but a subtle access to the cache crept in here: // Try to fold the phi exclusively with global or cached values. // This will get things like PHI <5(99), 6(88)>. We do this by // calling range_of_expr with no context. unsigned nargs = gimple_phi_num_args (phi); Value_Range arg_range (TREE_TYPE (name)); r.set_undefined (); for (size_t i = 0; i < nargs; ++i) { tree arg = gimple_phi_arg_def (phi, i); if (range_of_expr (arg_range, arg, /*stmt=*/NULL)) This range_of_expr call will indeed access the cache incorrectly, but Andrew fixed that here: @@ -264,7 +236,7 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi *phi) for (size_t i = 0; i < nargs; ++i) { tree arg = gimple_phi_arg_def (phi, i); - if (range_of_expr (arg_range, arg, /*stmt=*/NULL)) + if (m_ranger.range_of_expr (arg_range, arg, /*stmt=*/NULL)) r.union_ (arg_range); else { ...thus ensuring that function never uses the cache. All the lookups are done with the global ranger at either the path entry or globally as above (with stmt=NULL). I believe the switch from range_of_expr to m_ranger.range_of_expr is safe, as the original code was added to handle silly things like PHI <5(99), 6(88)> which shouldn't need path aware ranges. As you've found out, the update to the cache in this case was not obvious at all. Perhaps it should also be commented: "It is safe to set the cache here, as range_defined_in_block for PHIs (ssa_range_in_phi) is guaranteed not to do any cache lookups." > > The new version drops this, possibly resulting in wrong-code. > > While I think it's appropriate to sort out compile-time issues like this > during stage4 at least the above makes me think it should be defered > to next stage1. I defer to the release managers as to whether this is safe in light of my explanation above :). Aldy