From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by sourceware.org (Postfix) with ESMTPS id C6BF33858D3C for ; Tue, 12 Sep 2023 00:47:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C6BF33858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pf1-x435.google.com with SMTP id d2e1a72fcca58-68fb7fb537dso1651587b3a.2 for ; Mon, 11 Sep 2023 17:47:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694479667; x=1695084467; darn=gcc.gnu.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=l93E0AFSppzOuru2fJeXwxs1aI+19la3PGq1NQpSLgY=; b=IGJft9nyAwrEfTRIPvGUtb/qB6cJ3MB+okM8TozEqmBv9F250IiLi4Q0egBWTt7ugh rIOyPxk+YmrjIaewKBQ3KGE/koMZ2bfJtwIyAAi1b+HGzVMzaRh2zDsDx4NMj36LR3KW 4eoDGYAN4IGeZU+gJf3itTK4ajmrT2EUJOXimyC4DrBgehyF7arqk/WI8leajJyXFFat ZNwCphn7z4rKG7bQTWKoTVw9rZx39J6CqVwQO0CgT2hsFCfXzNrUKjf/KoNupzrC98I6 r/HOetaulCvN40zTkfoOeibsZGcU5Fd49Wo59n4jirz5o70GGtotT2r4ZTvvT/QIbXWa Q0AA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694479667; x=1695084467; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=l93E0AFSppzOuru2fJeXwxs1aI+19la3PGq1NQpSLgY=; b=WSeDXXG3m7tAT3hXOHgPkHmPqrnRdNfWVCiWtvX0yHqD0302Llq2mqYDJ2OJiiG6LJ v/BV9+XIH9/VsHoQ6k3k5VA1YAOc79y3Ie2ch2wTUh8x3qs/le7Xh+Bj4wvmTo+vodPP PKXqJN8eN42h6/vZy3byBUmah36Ufd1mL4BIsGN6knaU0qg25orWih5QOdEQ91Nsu7vm zdtRXVNjSKuUxLkSrPhtieMg9YbTw9EwiBw/BYv1veA5IjG8gvfx9TwKIQgWxzt2wL5B OQUUPlP4gux96J8OlYatqTH3pAYBbFuhWmM0ifNHm98/HCjxIQaiR1zpI1Ilx/eGFriW lBCA== X-Gm-Message-State: AOJu0Yyy/d3L08z5A/jFmWyLnlYzTx6TgR+rfBpW6v0Fk9Y2JVjMGrlb p+CJxoFtnQ+xG8yl8BaaSmc= X-Google-Smtp-Source: AGHT+IH9lulYb0hMhMubraewhsK937NIxMnGgcOISXpFUCIY/ewfz6ehxgTridUNoNrSbVn00/vSCw== X-Received: by 2002:a05:6a20:3d1d:b0:149:802f:28be with SMTP id y29-20020a056a203d1d00b00149802f28bemr11144177pzi.52.1694479666501; Mon, 11 Sep 2023 17:47:46 -0700 (PDT) Received: from [172.31.0.109] ([136.36.130.248]) by smtp.gmail.com with ESMTPSA id b15-20020a170902d50f00b001b8a85489a3sm7023137plg.262.2023.09.11.17.47.44 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 11 Sep 2023 17:47:45 -0700 (PDT) Message-ID: Date: Mon, 11 Sep 2023 18:47:44 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets. Content-Language: en-US To: Manolis Tsamis , gcc-patches@gcc.gnu.org Cc: Philipp Tomsich , Vineet Gupta , Richard Biener References: <20230909084652.2655745-1-manolis.tsamis@vrull.eu> From: Jeff Law In-Reply-To: <20230909084652.2655745-1-manolis.tsamis@vrull.eu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 9/9/23 02:46, Manolis Tsamis wrote: > This is a new RTL pass that tries to optimize memory offset calculations > by moving them from add immediate instructions to the memory loads/stores. > For example it can transform this: > > addi t4,sp,16 > add t2,a6,t4 > shl t3,t2,1 > ld a2,0(t3) > addi a2,1 > sd a2,8(t2) > > into the following (one instruction less): > > add t2,a6,sp > shl t3,t2,1 > ld a2,32(t3) > addi a2,1 > sd a2,24(t2) > > Although there are places where this is done already, this pass is more > powerful and can handle the more difficult cases that are currently not > optimized. Also, it runs late enough and can optimize away unnecessary > stack pointer calculations. > > gcc/ChangeLog: > > * Makefile.in: Add fold-mem-offsets.o. > * passes.def: Schedule a new pass. > * tree-pass.h (make_pass_fold_mem_offsets): Declare. > * common.opt: New options. > * doc/invoke.texi: Document new option. > * fold-mem-offsets.cc: New file. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/fold-mem-offsets-1.c: New test. > * gcc.target/riscv/fold-mem-offsets-2.c: New test. > * gcc.target/riscv/fold-mem-offsets-3.c: New test. > > Signed-off-by: Manolis Tsamis > --- > > Changes in v5: > - Introduce new helper function fold_offsets_1. > - Fix bug because constants could be partially propagated > through instructions that weren't understood. > - Introduce helper class fold_mem_info that stores f-m-o > info for an instruction. > - Calculate fold_offsets only once with do_fold_info_calculation. > - Fix correctness issue by introducing compute_validity_closure. > - Propagate in more cases for PLUS/MINUS with constant. > > Changes in v4: > - Add DF_EQ_NOTES flag to avoid incorrect state in notes. > - Remove fold_mem_offsets_driver and enum fold_mem_phase. > - Call recog when patching offsets in do_commit_offset. > - Restore INSN_CODE after modifying insn in do_check_validity. > > Changes in v3: > - Added propagation for more codes: > sub, neg, mul. > - Added folding / elimination for sub and > const int moves. > - For the validity check of the generated addresses > also test memory_address_addr_space_p. > - Replaced GEN_INT with gen_int_mode. > - Replaced some bitmap_head with auto_bitmap. > - Refactor each phase into own function for readability. > - Add dump details. > - Replace rtx iteration with reg_mentioned_p. > - Return early for codes that we can't propagate through. > > Changes in v2: > - Made the pass target-independant instead of RISCV specific. > - Fixed a number of bugs. > - Add code to handle more ADD patterns as found > in other targets (x86, aarch64). > - Improved naming and comments. > - Fixed bitmap memory leak. > > + > +/* Get the single reaching definition of an instruction inside a BB. > + The definition is desired for REG used in INSN. > + Return the definition insn or NULL if there's no definition with > + the desired criteria. */ > +static rtx_insn* > +get_single_def_in_bb (rtx_insn *insn, rtx reg) > +{ > + df_ref use; > + struct df_link *ref_chain, *ref_link; > + > + FOR_EACH_INSN_USE (use, insn) > + { > + if (GET_CODE (DF_REF_REG (use)) == SUBREG) > + return NULL; > + if (REGNO (DF_REF_REG (use)) == REGNO (reg)) > + break; > + } > + > + if (!use) > + return NULL; > + > + ref_chain = DF_REF_CHAIN (use); So what if there's two uses of REG in INSN? I don't think it's be common at all, but probably better safe and reject than sorry, right? Or is that case filtered out earlier? > + > + rtx_insn* def = DF_REF_INSN (ref_chain->ref); Formatting nit. The '*' should be next to the variable, not the type. > + > + > +static HOST_WIDE_INT > +fold_offsets (rtx_insn* insn, rtx reg, bool analyze, bitmap foldable_insns); > + > +/* Helper function for fold_offsets. > + > + If DO_RECURSION is false and ANALYZE is true this function returns true iff > + it understands the structure of INSN and knows how to propagate constants > + through it. In this case OFFSET_OUT and FOLDABLE_INSNS are unused. > + > + If DO_RECURSION is true then it also calls fold_offsets for each recognised > + part of INSN with the appropriate arguments. > + > + If DO_RECURSION is true and ANALYZE is false then offset that would result > + from folding is computed and is returned through the pointer OFFSET_OUT. > + The instructions that can be folded are recorded in FOLDABLE_INSNS. > +*/ > +static bool fold_offsets_1 (rtx_insn* insn, bool analyze, bool do_recursion, > + HOST_WIDE_INT *offset_out, bitmap foldable_insns) Nit. Linkage and return type on separate line. That makes the function name start at the beginning of a line. > + > +/* Test if INSN is a memory load / store that can have an offset folded to it. > + Return true iff INSN is such an instruction and return through MEM_OUT, > + REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that is > + used as a base address and the offset accordingly. > + All of the out pointers may be NULL in which case they will be ignored. */ > +bool > +get_fold_mem_root (rtx_insn* insn, rtx *mem_out, rtx *reg_out, > + HOST_WIDE_INT *offset_out) > +{ > + rtx set = single_set (insn); > + rtx mem = NULL_RTX; > + > + if (set != NULL_RTX) > + { > + rtx src = SET_SRC (set); > + rtx dest = SET_DEST (set); > + > + /* Don't fold when we have unspec / volatile. */ > + if (GET_CODE (src) == UNSPEC > + || GET_CODE (src) == UNSPEC_VOLATILE > + || GET_CODE (dest) == UNSPEC > + || GET_CODE (dest) == UNSPEC_VOLATILE) > + return false; > + > + if (MEM_P (src)) > + mem = src; > + else if (MEM_P (dest)) > + mem = dest; > + else if ((GET_CODE (src) == SIGN_EXTEND > + || GET_CODE (src) == ZERO_EXTEND) > + && MEM_P (XEXP (src, 0))) Note some architectures allow both a source and destination memory. It looks like your code will prefer the source operand in that case. That's fine, just pointing it out. > + > +static bool > +compute_validity_closure (fold_info_map *fold_info) > +{ > + /* Let's say we have an arbitrary chain of foldable instructions xN = xN + C > + and memory operations rN that use xN as shown below. If folding x1 in r1 > + turns out to be invalid for whatever reason then it's also invalid to fold > + any of the other xN into any rN. That means that we need the transitive > + closure of validity to determine whether we can fold a xN instruction. > + > + +--------------+ +-------------------+ +-------------------+ > + | r1 = mem[x1] | | r2 = mem[x1 + x2] | | r3 = mem[x2 + x3] | ... > + +--------------+ +-------------------+ +-------------------+ > + ^ ^ ^ ^ ^ > + | / | / | ... > + | / | / | > + +-------------+ / +-------------+ / +-------------+ > + | x1 = x1 + 1 |-----+ | x2 = x2 + 1 |-----+ | x3 = x3 + 1 |--- ... > + +-------------+ +-------------+ +-------------+ > + ^ ^ ^ > + | | | > + ... ... ... > + */ > + > + int max_iters = 5; > + for (int i = 0; i < max_iters; i++) > + { > + bool made_changes = false; > + for (fold_info_map::iterator iter = fold_info->begin (); > + iter != fold_info->end (); ++iter) > + { > + fold_mem_info *info = (*iter).second; > + if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns)) > + made_changes |= bitmap_ior_into (&cannot_fold_insns, > + info->fold_insns); > + } > + > + if (!made_changes) > + return true; > + } > + > + return false; So how was the magic value of "5" determined here? In general we try not to have magic #s like that and instead find a better way to control iterations, falling back to a PARAM when all else fails. > +} >> + > + machine_mode mode = GET_MODE (XEXP (mem, 0)); > + XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode)); > + INSN_CODE (insn) = recog (PATTERN (insn), insn, 0); > + df_insn_rescan (insn); Don't we need to check if NEW_OFFSET is zero and if so generate simple register indirect addressing rather than indirect + displacement? This looks *so* close ;-) But I think we need a few questions answered and a few minor adjustments. On a positive note, m68k passed with this version of the f-m-o patch :-) Jeff