From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=sHpd=6M=redhat.com=aldyh@sourceware.org>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by sourceware.org (Postfix) with ESMTPS id 9FD1E3858D33
	for <gcc-patches@gcc.gnu.org>; Thu, 16 Feb 2023 09:36:22 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9FD1E3858D33
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1676540182;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=TdTbM0b7iWv0Yw+BJtQSzKkKnEca6t5vFbhyF0Qf+Sc=;
	b=Aqug34H2ceuKrJtu3guVBqt2ZAlM0NeN/XC/TkIrBLdK5wYNNiJ66jgSoGJYZZyB0DCC+p
	yYOrm77IL5Fhmm7n/yocgH7Ox84CzzhILrWMqN/8tKwQ3OKKKSnO1OQVrKQMqFUDOuTgF2
	AXov2W207XwS5LQTX8zzWKaR4nHGndM=
Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com
 [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-210-kilizMTUO8Ky4L0JhwX8-Q-1; Thu, 16 Feb 2023 04:36:20 -0500
X-MC-Unique: kilizMTUO8Ky4L0JhwX8-Q-1
Received: by mail-wm1-f71.google.com with SMTP id l21-20020a05600c4f1500b003e00be23a70so2848757wmq.2
        for <gcc-patches@gcc.gnu.org>; Thu, 16 Feb 2023 01:36:20 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:in-reply-to:from:content-language
         :references:cc:to:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=TdTbM0b7iWv0Yw+BJtQSzKkKnEca6t5vFbhyF0Qf+Sc=;
        b=vXmy2bvODPlEouek8n239riogYvFADCIXwgo4UmHovrzq4CNfmwvQqMWrfbm/n0OhF
         G1KbYDir10Gb1x+VmeqP+GB7lcyWeo3cXkaKGMBvvSWpyH1xNaJFEialf5b16GHht1Le
         b+5DhXzWU2r4rFT8x5ymjo5TZ6NYmK1SmbVlUb3totyCQ2iJg3cynZS4KbbZ8gxN76E3
         kzKJ+JlmCqywfnDn/0P2yAkDII1A7FOsw1OvddHFJDuwBxDX2KksmXTXiAi4YsQD+9xT
         Wdqdx0DJ1osIBCpJNDFZCRDhC5n5NfBZIHf29Y/a5aVNfFgWDwXj5fuVZWzVXO1Lh9vv
         ZAsg==
X-Gm-Message-State: AO0yUKXitADNmQkEuxpQjrpg4tems3pCoDcfViNisnxbghoZYfz6ZhAd
	oEHkr38Ka9JRE+UAdswMT2jlM+43+hpf9R6K4r3E1tFbmybteROwR9O3YMnO+CIXuVbc8NY1U7N
	hbUiy/SVkWhVqwP4Jdg==
X-Received: by 2002:a05:600c:43ca:b0:3e2:6ef:ac1a with SMTP id f10-20020a05600c43ca00b003e206efac1amr3111033wmn.22.1676540179535;
        Thu, 16 Feb 2023 01:36:19 -0800 (PST)
X-Google-Smtp-Source: AK7set9yYkWn0DBwqjDQT7HcM69OuZFVb8CMkWZB8dkvwkKhjE0/A82Jt+6bXq/F9sVJ0/Xq2e0kIQ==
X-Received: by 2002:a05:600c:43ca:b0:3e2:6ef:ac1a with SMTP id f10-20020a05600c43ca00b003e206efac1amr3111020wmn.22.1676540179236;
        Thu, 16 Feb 2023 01:36:19 -0800 (PST)
Received: from [192.168.1.201] ([139.47.42.170])
        by smtp.gmail.com with ESMTPSA id he8-20020a05600c540800b003e208cec49bsm2769358wmb.3.2023.02.16.01.36.18
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Thu, 16 Feb 2023 01:36:18 -0800 (PST)
Message-ID: <b66347af-1404-41f2-0b6a-7b096ca6117f@redhat.com>
Date: Thu, 16 Feb 2023 10:36:18 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.7.1
Subject: Re: [PATCH] PR tree-optimization/108697 - Create a lazy ssa_cache
To: Richard Biener <richard.guenther@gmail.com>,
 Andrew MacLeod <amacleod@redhat.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>
References: <86ad2755-1e70-6c19-89ed-7817d61a5053@redhat.com>
 <CAFiYyc2CNCsEdA9WqWDEuizYVNtPsPJ4qBvOx6wZbQbSF_ijTQ@mail.gmail.com>
From: Aldy Hernandez <aldyh@redhat.com>
In-Reply-To: <CAFiYyc2CNCsEdA9WqWDEuizYVNtPsPJ4qBvOx6wZbQbSF_ijTQ@mail.gmail.com>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Language: en-US
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_BARRACUDACENTRAL,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>


On 2/16/23 08:55, Richard Biener wrote:
> On Wed, Feb 15, 2023 at 6:07 PM Andrew MacLeod via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> This patch implements the suggestion that we have an alternative
>> ssa-cache which does not zero memory, and instead uses a bitmap to track
>> whether a value is currently set or not.  It roughly mimics what
>> path_range_query was doing internally.
>>
>> For sparsely used cases, expecially in large programs, this is more
>> efficient.  I changed path_range_query to use this, and removed it old
>> bitmap (and a hack or two around PHI calculations), and also utilized
>> this is the assume_query class.
>>
>> Performance wise, the patch doesn't affect VRP (since that still uses
>> the original version).  Switching to the lazy version caused a slowdown
>> of 2.5% across VRP.
>>
>> There was a noticeable improvement elsewhere.,  across 230 GCC source
>> files, threading ran over 12% faster!.  Overall compilation improved by
>> 0.3%  Not sure it makes much difference in compiler.i, but it shouldn't
>> hurt.
>>
>> bootstraps on x86_64-pc-linux-gnu with no regressions.   OK for trunk?
>> or do you want to wait for the next release...
> 
> I see
> 
> @@ -365,16 +335,8 @@ path_range_query::compute_ranges_in_phis (basic_block bb)
> 
>         Value_Range r (TREE_TYPE (name));
>         if (range_defined_in_block (r, name, bb))
> -       {
> -         unsigned v = SSA_NAME_VERSION (name);
> -         set_cache (r, name);
> -         bitmap_set_bit (phi_set, v);
> -         // Pretend we don't have a cache entry for this name until
> -         // we're done with all PHIs.
> -         bitmap_clear_bit (m_has_cache_entry, v);
> -       }
> +       m_cache.set_global_range (name, r);
>       }
> -  bitmap_ior_into (m_has_cache_entry, phi_set);
>   }
> 
>   // Return TRUE if relations may be invalidated after crossing edge E.
> 
> which I think is not correct - if we have
> 
>   # _1 = PHI <..., _2>
>   # _2 = PHI <..., _1>
> 
> then their effects are supposed to be executed in parallel, that is,
> both PHI argument _2 and _1 are supposed to see the "old" version.
> The previous code tried to make sure the range of the new _1 doesn't
> get seen when processing the argument _1 in the definition of _2.

Yes, the effects should appear in parallel, but ssa_range_in_phi() which 
is the only thing range_defined_in_block does for PHIs, is guaranteed to 
not do any additional cache lookups.  The comment there should be 
adjusted to make this clear:

// Since PHIs are calculated in parallel at the beginning of the
// block, we must be careful to never save anything to the cache here.
// It is the caller's responsibility to adjust the cache.  Also,
// calculating the PHI's range must not trigger additional lookups.

We should instead say:

"we must be careful to never set or access the cache here"...

This was the original intent, but a subtle access to the cache crept in 
here:

       // Try to fold the phi exclusively with global or cached values.
       // This will get things like PHI <5(99), 6(88)>.  We do this by
       // calling range_of_expr with no context.
       unsigned nargs = gimple_phi_num_args (phi);
       Value_Range arg_range (TREE_TYPE (name));
       r.set_undefined ();
       for (size_t i = 0; i < nargs; ++i)
	{
	  tree arg = gimple_phi_arg_def (phi, i);
	  if (range_of_expr (arg_range, arg, /*stmt=*/NULL))

This range_of_expr call will indeed access the cache incorrectly, but 
Andrew fixed that here:

@@ -264,7 +236,7 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi 
*phi)
        for (size_t i = 0; i < nargs; ++i)
  	{
  	  tree arg = gimple_phi_arg_def (phi, i);
-	  if (range_of_expr (arg_range, arg, /*stmt=*/NULL))
+	  if (m_ranger.range_of_expr (arg_range, arg, /*stmt=*/NULL))
  	    r.union_ (arg_range);
  	  else
  	    {

...thus ensuring that function never uses the cache.  All the lookups 
are done with the global ranger at either the path entry or globally as 
above (with stmt=NULL).

I believe the switch from range_of_expr to m_ranger.range_of_expr is 
safe, as the original code was added to handle silly things like PHI 
<5(99), 6(88)> which shouldn't need path aware ranges.

As you've found out, the update to the cache in this case was not 
obvious at all.  Perhaps it should also be commented:

"It is safe to set the cache here, as range_defined_in_block for PHIs 
(ssa_range_in_phi) is guaranteed not to do any cache lookups."

> 
> The new version drops this, possibly resulting in wrong-code.
> 
> While I think it's appropriate to sort out compile-time issues like this
> during stage4 at least the above makes me think it should be defered
> to next stage1.

I defer to the release managers as to whether this is safe in light of 
my explanation above :).

Aldy