From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=jM9r=E4=gmail.com=jeffreyalaw@sourceware.org>
Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435])
	by sourceware.org (Postfix) with ESMTPS id C6BF33858D3C
	for <gcc-patches@gcc.gnu.org>; Tue, 12 Sep 2023 00:47:47 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C6BF33858D3C
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-pf1-x435.google.com with SMTP id d2e1a72fcca58-68fb7fb537dso1651587b3a.2
        for <gcc-patches@gcc.gnu.org>; Mon, 11 Sep 2023 17:47:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1694479667; x=1695084467; darn=gcc.gnu.org;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :from:to:cc:subject:date:message-id:reply-to;
        bh=l93E0AFSppzOuru2fJeXwxs1aI+19la3PGq1NQpSLgY=;
        b=IGJft9nyAwrEfTRIPvGUtb/qB6cJ3MB+okM8TozEqmBv9F250IiLi4Q0egBWTt7ugh
         rIOyPxk+YmrjIaewKBQ3KGE/koMZ2bfJtwIyAAi1b+HGzVMzaRh2zDsDx4NMj36LR3KW
         4eoDGYAN4IGeZU+gJf3itTK4ajmrT2EUJOXimyC4DrBgehyF7arqk/WI8leajJyXFFat
         ZNwCphn7z4rKG7bQTWKoTVw9rZx39J6CqVwQO0CgT2hsFCfXzNrUKjf/KoNupzrC98I6
         r/HOetaulCvN40zTkfoOeibsZGcU5Fd49Wo59n4jirz5o70GGtotT2r4ZTvvT/QIbXWa
         Q0AA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1694479667; x=1695084467;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=l93E0AFSppzOuru2fJeXwxs1aI+19la3PGq1NQpSLgY=;
        b=WSeDXXG3m7tAT3hXOHgPkHmPqrnRdNfWVCiWtvX0yHqD0302Llq2mqYDJ2OJiiG6LJ
         v/BV9+XIH9/VsHoQ6k3k5VA1YAOc79y3Ie2ch2wTUh8x3qs/le7Xh+Bj4wvmTo+vodPP
         PKXqJN8eN42h6/vZy3byBUmah36Ufd1mL4BIsGN6knaU0qg25orWih5QOdEQ91Nsu7vm
         zdtRXVNjSKuUxLkSrPhtieMg9YbTw9EwiBw/BYv1veA5IjG8gvfx9TwKIQgWxzt2wL5B
         OQUUPlP4gux96J8OlYatqTH3pAYBbFuhWmM0ifNHm98/HCjxIQaiR1zpI1Ilx/eGFriW
         lBCA==
X-Gm-Message-State: AOJu0Yyy/d3L08z5A/jFmWyLnlYzTx6TgR+rfBpW6v0Fk9Y2JVjMGrlb
	p+CJxoFtnQ+xG8yl8BaaSmc=
X-Google-Smtp-Source: AGHT+IH9lulYb0hMhMubraewhsK937NIxMnGgcOISXpFUCIY/ewfz6ehxgTridUNoNrSbVn00/vSCw==
X-Received: by 2002:a05:6a20:3d1d:b0:149:802f:28be with SMTP id y29-20020a056a203d1d00b00149802f28bemr11144177pzi.52.1694479666501;
        Mon, 11 Sep 2023 17:47:46 -0700 (PDT)
Received: from [172.31.0.109] ([136.36.130.248])
        by smtp.gmail.com with ESMTPSA id b15-20020a170902d50f00b001b8a85489a3sm7023137plg.262.2023.09.11.17.47.44
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Mon, 11 Sep 2023 17:47:45 -0700 (PDT)
Message-ID: <a8af8bf2-042d-492b-a939-47b1326bd4bc@gmail.com>
Date: Mon, 11 Sep 2023 18:47:44 -0600
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH v5] Implement new RTL optimizations pass:
 fold-mem-offsets.
Content-Language: en-US
To: Manolis Tsamis <manolis.tsamis@vrull.eu>, gcc-patches@gcc.gnu.org
Cc: Philipp Tomsich <philipp.tomsich@vrull.eu>,
 Vineet Gupta <vineetg@rivosinc.com>,
 Richard Biener <richard.guenther@gmail.com>
References: <20230909084652.2655745-1-manolis.tsamis@vrull.eu>
From: Jeff Law <jeffreyalaw@gmail.com>
In-Reply-To: <20230909084652.2655745-1-manolis.tsamis@vrull.eu>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>


On 9/9/23 02:46, Manolis Tsamis wrote:
> This is a new RTL pass that tries to optimize memory offset calculations
> by moving them from add immediate instructions to the memory loads/stores.
> For example it can transform this:
> 
>    addi t4,sp,16
>    add  t2,a6,t4
>    shl  t3,t2,1
>    ld   a2,0(t3)
>    addi a2,1
>    sd   a2,8(t2)
> 
> into the following (one instruction less):
> 
>    add  t2,a6,sp
>    shl  t3,t2,1
>    ld   a2,32(t3)
>    addi a2,1
>    sd   a2,24(t2)
> 
> Although there are places where this is done already, this pass is more
> powerful and can handle the more difficult cases that are currently not
> optimized. Also, it runs late enough and can optimize away unnecessary
> stack pointer calculations.
> 
> gcc/ChangeLog:
> 
> 	* Makefile.in: Add fold-mem-offsets.o.
> 	* passes.def: Schedule a new pass.
> 	* tree-pass.h (make_pass_fold_mem_offsets): Declare.
> 	* common.opt: New options.
> 	* doc/invoke.texi: Document new option.
> 	* fold-mem-offsets.cc: New file.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/fold-mem-offsets-1.c: New test.
> 	* gcc.target/riscv/fold-mem-offsets-2.c: New test.
> 	* gcc.target/riscv/fold-mem-offsets-3.c: New test.
> 
> Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> ---
> 
> Changes in v5:
>          - Introduce new helper function fold_offsets_1.
>          - Fix bug because constants could be partially propagated
>            through instructions that weren't understood.
>          - Introduce helper class fold_mem_info that stores f-m-o
>            info for an instruction.
>          - Calculate fold_offsets only once with do_fold_info_calculation.
>          - Fix correctness issue by introducing compute_validity_closure.
>          - Propagate in more cases for PLUS/MINUS with constant.
> 
> Changes in v4:
>          - Add DF_EQ_NOTES flag to avoid incorrect state in notes.
>          - Remove fold_mem_offsets_driver and enum fold_mem_phase.
>          - Call recog when patching offsets in do_commit_offset.
>          - Restore INSN_CODE after modifying insn in do_check_validity.
> 
> Changes in v3:
>          - Added propagation for more codes:
>            sub, neg, mul.
>          - Added folding / elimination for sub and
>            const int moves.
>          - For the validity check of the generated addresses
>            also test memory_address_addr_space_p.
>          - Replaced GEN_INT with gen_int_mode.
>          - Replaced some bitmap_head with auto_bitmap.
>          - Refactor each phase into own function for readability.
>          - Add dump details.
>          - Replace rtx iteration with reg_mentioned_p.
>          - Return early for codes that we can't propagate through.
> 
> Changes in v2:
>          - Made the pass target-independant instead of RISCV specific.
>          - Fixed a number of bugs.
>          - Add code to handle more ADD patterns as found
>            in other targets (x86, aarch64).
>          - Improved naming and comments.
>          - Fixed bitmap memory leak.
> 


> +
> +/* Get the single reaching definition of an instruction inside a BB.
> +   The definition is desired for REG used in INSN.
> +   Return the definition insn or NULL if there's no definition with
> +   the desired criteria.  */
> +static rtx_insn*
> +get_single_def_in_bb (rtx_insn *insn, rtx reg)
> +{
> +  df_ref use;
> +  struct df_link *ref_chain, *ref_link;
> +
> +  FOR_EACH_INSN_USE (use, insn)
> +    {
> +      if (GET_CODE (DF_REF_REG (use)) == SUBREG)
> +	return NULL;
> +      if (REGNO (DF_REF_REG (use)) == REGNO (reg))
> +	break;
> +    }
> +
> +  if (!use)
> +    return NULL;
> +
> +  ref_chain = DF_REF_CHAIN (use);
So what if there's two uses of REG in INSN?  I don't think it's be 
common at all, but probably better safe and reject than sorry, right? Or 
is that case filtered out earlier?


> +
> +  rtx_insn* def = DF_REF_INSN (ref_chain->ref);
Formatting nit.  The '*' should be next to the variable, not the type.

> +

> +
> +static HOST_WIDE_INT
> +fold_offsets (rtx_insn* insn, rtx reg, bool analyze, bitmap foldable_insns);
> +
> +/*  Helper function for fold_offsets.
> +
> +    If DO_RECURSION is false and ANALYZE is true this function returns true iff
> +    it understands the structure of INSN and knows how to propagate constants
> +    through it.  In this case OFFSET_OUT and FOLDABLE_INSNS are unused.
> +
> +    If DO_RECURSION is true then it also calls fold_offsets for each recognised
> +    part of INSN with the appropriate arguments.
> +
> +    If DO_RECURSION is true and ANALYZE is false then offset that would result
> +    from folding is computed and is returned through the pointer OFFSET_OUT.
> +    The instructions that can be folded are recorded in FOLDABLE_INSNS.
> +*/
> +static bool fold_offsets_1 (rtx_insn* insn, bool analyze, bool do_recursion,
> +			    HOST_WIDE_INT *offset_out, bitmap foldable_insns)
Nit.  Linkage and return type on separate line.  That makes the function 
name start at the beginning of a line.


> +
> +/* Test if INSN is a memory load / store that can have an offset folded to it.
> +   Return true iff INSN is such an instruction and return through MEM_OUT,
> +   REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that is
> +   used as a base address and the offset accordingly.
> +   All of the out pointers may be NULL in which case they will be ignored.  */
> +bool
> +get_fold_mem_root (rtx_insn* insn, rtx *mem_out, rtx *reg_out,
> +		   HOST_WIDE_INT *offset_out)
> +{
> +  rtx set = single_set (insn);
> +  rtx mem = NULL_RTX;
> +
> +  if (set != NULL_RTX)
> +    {
> +      rtx src = SET_SRC (set);
> +      rtx dest = SET_DEST (set);
> +
> +      /* Don't fold when we have unspec / volatile.  */
> +      if (GET_CODE (src) == UNSPEC
> +	  || GET_CODE (src) == UNSPEC_VOLATILE
> +	  || GET_CODE (dest) == UNSPEC
> +	  || GET_CODE (dest) == UNSPEC_VOLATILE)
> +	return false;
> +
> +      if (MEM_P (src))
> +	mem = src;
> +      else if (MEM_P (dest))
> +	mem = dest;
> +      else if ((GET_CODE (src) == SIGN_EXTEND
> +		|| GET_CODE (src) == ZERO_EXTEND)
> +	       && MEM_P (XEXP (src, 0)))
Note some architectures allow both a source and destination memory.  It 
looks like your code will prefer the source operand in that case. 
That's fine, just pointing it out.


> +
> +static bool
> +compute_validity_closure (fold_info_map *fold_info)
> +{
> +  /* Let's say we have an arbitrary chain of foldable instructions xN = xN + C
> +     and memory operations rN that use xN as shown below.  If folding x1 in r1
> +     turns out to be invalid for whatever reason then it's also invalid to fold
> +     any of the other xN into any rN.  That means that we need the transitive
> +     closure of validity to determine whether we can fold a xN instruction.
> +
> +     +--------------+    +-------------------+    +-------------------+
> +     | r1 = mem[x1] |    | r2 = mem[x1 + x2] |    | r3 = mem[x2 + x3] |   ...
> +     +--------------+    +-------------------+    +-------------------+
> +	    ^                ^       ^                ^       ^
> +	    |               /        |               /        |           ...
> +	    |              /         |              /         |
> +     +-------------+      /   +-------------+      /   +-------------+
> +     | x1 = x1 + 1 |-----+    | x2 = x2 + 1 |-----+    | x3 = x3 + 1 |--- ...
> +     +-------------+          +-------------+          +-------------+
> +	    ^                        ^                        ^
> +	    |                        |                        |
> +	   ...                      ...                      ...
> +  */
> +
> +  int max_iters = 5;
> +  for (int i = 0; i < max_iters; i++)
> +    {
> +      bool made_changes = false;
> +      for (fold_info_map::iterator iter = fold_info->begin ();
> +	   iter != fold_info->end (); ++iter)
> +	{
> +	  fold_mem_info *info = (*iter).second;
> +	  if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns))
> +	    made_changes |= bitmap_ior_into (&cannot_fold_insns,
> +					     info->fold_insns);
> +	}
> +
> +      if (!made_changes)
> +	return true;
> +    }
> +
> +  return false;
So how was the magic value of "5" determined here?  In general we try 
not to have magic #s like that and instead find a better way to control 
iterations, falling back to a PARAM when all else fails.


> +}
>> +
> +  machine_mode mode = GET_MODE (XEXP (mem, 0));
> +  XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode));
> +  INSN_CODE (insn) = recog (PATTERN (insn), insn, 0);
> +  df_insn_rescan (insn);
Don't we need to check if NEW_OFFSET is zero and if so generate simple 
register indirect addressing rather than indirect + displacement?

This looks *so* close ;-)  But I think we need a few questions answered 
and a few minor adjustments.

On a positive note, m68k passed with this version of the f-m-o patch :-)

Jeff