From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 42479 invoked by alias); 12 Sep 2019 16:19:59 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 42466 invoked by uid 89); 12 Sep 2019 16:19:59 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=consequently, Favor X-HELO: mail-qt1-f173.google.com Received: from mail-qt1-f173.google.com (HELO mail-qt1-f173.google.com) (209.85.160.173) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 12 Sep 2019 16:19:55 +0000 Received: by mail-qt1-f173.google.com with SMTP id l22so30234807qtp.10 for ; Thu, 12 Sep 2019 09:19:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; h=from:subject:openpgp:autocrypt:to:cc:message-id:date:user-agent :mime-version:content-language; bh=u/yJFF3eTcDRwqX6+OdzF3rrZv1m+RHmHievZlGEHo0=; b=f5VUEXftdbxvhBQ5LFUcYd+EYh0PPBYcRowwMmfUq/jXjmoB+8OmkM4IQh31vyh6sB bRtRrJIIQ2X6IiI+lywi35Oh2bZP1R6c00/paHTap7VR9+E8OSiAWvTI2gjvS6rpZ/FG an8wThEns4a03u6QTwYCRmmQL+FFYqRSfQws6v2fLnn8XSDPz1Eme0MH4e6xfGvfFYKB v8JX1Nm3P7gBVcC/aOD3omAhO2jehw3C5u9K5NCUADkhV/eZX/mD3Y0/5c9n3VViJv69 SQ0e8db3GI2JzbO2jv1u84RLKD7/ja24tZn0iWkbar6zQirFZsAzvi3ISSZqo05pa1My ckZQ== Return-Path: Received: from [10.39.127.205] ([207.253.95.5]) by smtp.gmail.com with ESMTPSA id k199sm12755518qke.45.2019.09.12.09.19.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 12 Sep 2019 09:19:52 -0700 (PDT) From: Craig Blackmore Subject: [PATCH] RISC-V: Allow more load/stores to be compressed Openpgp: preference=signencrypt To: gcc-patches@gcc.gnu.org Cc: jimw@sifive.com, Ofer Shinaar , Nidal Faour Message-ID: Date: Thu, 12 Sep 2019 16:19:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-SW-Source: 2019-09/txt/msg00857.txt.bz2 This patch aims to allow more load/store instructions to be compressed by replacing a load/store of 'base register + large offset' with a new load/st= ore of 'new base + small offset'. If the new base gets stored in a compressed register, then the new load/store can be compressed. Since there is an over= head in creating the new base, this change is only attempted when 'base register= ' is referenced in at least 4 load/stores in a basic block. The optimization is implemented in a new RISC-V specific pass called shorten_memrefs which is enabled for RVC targets. It has been developed for= the 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in fut= ure. The patch saves 164 bytes (0.3%) on a proprietary application (59450 bytes compared to 59286 bytes) compiled for rv32imc bare metal with -Os. On the Embench benchmark suite (https://www.embench.org/) we see code size reducti= ons of up to 18 bytes (0.7%) and only two cases where code size is increased slightly, by 2 bytes each: Embench results (.text size in bytes, excluding .rodata) Benchmark Without patch With patch Diff aha-mont64 1052 1052 0 crc32 232 232 0 cubic 2446 2448 2 edn 1454 1450 -4 huffbench 1642 1642 0 matmult-int 420 420 0 minver 1056 1056 0 nbody 714 714 0 nettle-aes 2888 2884 -4 nettle-sha256 5566 5564 -2 nsichneu 15052 15052 0 picojpeg 8078 8078 0 qrduino 6140 6140 0 sglib-combined 2444 2444 0 slre 2438 2420 -18 st 880 880 0 statemate 3842 3842 0 ud 702 702 0 wikisort 4278 4280 2 ------------------------------------------------- Total 61324 61300 -24 The patch has been tested on the following bare metal targets using QEMU and there were no regressions: rv32i rv32iac rv32im rv32imac rv32imafc rv64imac rv64imafdc We noticed that sched2 undoes some of the addresses generated by this optimization and consequently increases code size, therefore this patch add= s a check in sched-deps.c to avoid changes that are expected to increase code s= ize when not optimizing for speed. Since this change touches target-independent code, the patch has been bootstrapped and tested on x86 with no regressions. gcc/ChangeLog * config/riscv/riscv.c (tree-pass.h): New include. (cfg.h) Likewise. (context.h) Likewise. (riscv_compressed_reg_p): New function. (riscv_compressed_lw_address_p): Likewise. (riscv_legitimize_address): Attempt to convert base + large_offset to compressible new_base + small_offset. (riscv_address_cost): Make anticipated compressed load/stores cheaper for code size than uncompressed load/stores. (class pass_shorten_memrefs): New pass. (pass_shorten_memrefs::execute): Likewise. (make_pass_shorten_memrefs): Likewise. (riscv_option_override): Register shorten_memrefs pass for TARGET_RVC. (riscv_register_priority): Move compressed register check to riscv_compressed_reg_p. * sched-deps.c (attempt_change): When optimizing for code size don't make change if it increases code size. --- gcc/config/riscv/riscv.c | 179 +++++++++++++++++++++++++++++++++++++++++++= ++-- gcc/sched-deps.c | 10 +++ 2 files changed, 183 insertions(+), 6 deletions(-) diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c index 39bf87a..e510314 100644 --- a/gcc/config/riscv/riscv.c +++ b/gcc/config/riscv/riscv.c @@ -55,6 +55,9 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic.h" #include "builtins.h" #include "predict.h" +#include "tree-pass.h" +#include "cfg.h" +#include "context.h" =20 /* True if X is an UNSPEC wrapper around a SYMBOL_REF or LABEL_REF. */ #define UNSPEC_ADDRESS_P(X) \ @@ -848,6 +851,44 @@ riscv_legitimate_address_p (machine_mode mode, rtx x, = bool strict_p) return riscv_classify_address (&addr, x, mode, strict_p); } =20 +/* Return true if hard reg REGNO can be used in compressed instructions. = */ + +static bool +riscv_compressed_reg_p (int regno) +{ + /* x8-x15/f8-f15 are compressible registers. */ + return (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST += 15) + || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15))); +} + +/* Return true if load/store from/to address x can be compressed. */ + +static bool +riscv_compressed_lw_address_p (rtx x) +{ + struct riscv_address_info addr; + bool result =3D riscv_classify_address (&addr, x, GET_MODE (x), + reload_completed); + + /* Before reload, assuming all load/stores of valid addresses get compre= ssed + gives better code size than checking if the address is reg + small_of= fset + early on. */ + if (result && !reload_completed) + return true; + + /* Return false if address is not compressed_reg + small_offset. */ + if (!result + || addr.type !=3D ADDRESS_REG + || (!riscv_compressed_reg_p (REGNO (addr.reg)) + && addr.reg !=3D stack_pointer_rtx) + || !CONST_INT_P (addr.offset) + || (INTVAL (addr.offset) & 3) !=3D 0 + || !IN_RANGE (INTVAL (addr.offset), 0, 124)) + return false; + + return result; +} + /* Return the number of instructions needed to load or store a value of mode MODE at address X. Return 0 if X isn't valid for MODE. Assume that multiword moves may need to be split into word moves @@ -1318,7 +1359,9 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_U= NUSED, if (riscv_split_symbol (NULL, x, mode, &addr)) return riscv_force_address (addr, mode); =20 - /* Handle BASE + OFFSET using riscv_add_offset. */ + /* When optimizing for size, try to convert BASE + LARGE_OFFSET into + NEW_BASE + SMALL_OFFSET to allow possible compressed load/store, othe= rwise, + handle BASE + OFFSET using riscv_add_offset. */ if (GET_CODE (x) =3D=3D PLUS && CONST_INT_P (XEXP (x, 1)) && INTVAL (XEXP (x, 1)) !=3D 0) { @@ -1327,7 +1370,24 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_= UNUSED, =20 if (!riscv_valid_base_register_p (base, mode, false)) base =3D copy_to_mode_reg (Pmode, base); - addr =3D riscv_add_offset (NULL, base, offset); + if (optimize_function_for_size_p (cfun) + && (strcmp (current_pass->name, "shorten_memrefs") =3D=3D 0) + && mode =3D=3D SImode + && (offset & 3) =3D=3D 0 + && !IN_RANGE (offset, 0, 124)) + { + rtx high; + + /* Leave OFFSET as a 7-bit offset and put the excess in HIGH. */ + high =3D GEN_INT (offset & ~124); + offset &=3D 124; + if (!SMALL_OPERAND (INTVAL (high))) + high =3D force_reg (Pmode, high); + base =3D force_reg (Pmode, gen_rtx_PLUS (Pmode, high, base)); + addr =3D plus_constant (Pmode, base, offset); + } + else + addr =3D riscv_add_offset (NULL, base, offset); return riscv_force_address (addr, mode); } =20 @@ -1812,7 +1872,10 @@ riscv_address_cost (rtx addr, machine_mode mode, addr_space_t as ATTRIBUTE_UNUSED, bool speed ATTRIBUTE_UNUSED) { - return riscv_address_insns (addr, mode, false); + if (!speed && mode =3D=3D SImode + && riscv_compressed_lw_address_p (addr)) + return 1; + return !speed + riscv_address_insns (addr, mode, false); } =20 /* Return one word of double-word value OP. HIGH_P is true to select the @@ -4541,6 +4604,106 @@ riscv_init_machine_status (void) return ggc_cleared_alloc (); } =20 +namespace { + +const pass_data pass_data_shorten_memrefs =3D +{ + RTL_PASS, /* type */ + "shorten_memrefs", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_NONE, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_shorten_memrefs : public rtl_opt_pass +{ +public: + pass_shorten_memrefs (gcc::context *ctxt) + : rtl_opt_pass (pass_data_shorten_memrefs, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *) { return optimize > 0; } + virtual unsigned int execute (function *); + +}; // class pass_shorten_memrefs + +/* Try to make more use of compressed load and store instructions by repla= cing + a load/store at address BASE + LARGE_OFFSET with a new load/store at ad= dress + NEW BASE + SMALL OFFSET. If NEW BASE is stored in a compressed registe= r, the + load/store can be compressed. Since creating NEW BASE incurs an overhe= ad, + the change is only attempted when BASE is referenced by at least four + load/stores in the same basic block. */ +unsigned int +pass_shorten_memrefs::execute (function *fn) +{ + typedef int_hash regno_hash; + typedef hash_map regno_map; + + basic_block bb; + rtx_insn *insn; + + regstat_init_n_sets_and_refs (); + + FOR_ALL_BB_FN (bb, fn) + { + regno_map *m =3D hash_map::create_ggc (10); + for (int pass =3D 0; !optimize_bb_for_speed_p (bb) && pass < 2; pass++) + FOR_BB_INSNS (bb, insn) + { + if (!NONJUMP_INSN_P (insn)) + continue; + rtx pat =3D PATTERN (insn); + if (GET_CODE (pat) !=3D SET) + continue; + start_sequence (); + for (int i =3D 0; i < 2; i++) + { + rtx mem =3D XEXP (pat, i); + if (MEM_P (mem) && GET_MODE (mem) =3D=3D SImode) + { + rtx addr =3D XEXP (mem, 0); + if (GET_CODE (addr) !=3D PLUS) + continue; + if (!REG_P (XEXP (addr, 0))) + continue; + HOST_WIDE_INT regno =3D REGNO (XEXP (addr, 0)); + if (REG_N_REFS (regno) < 4) + continue; + if (pass =3D=3D 0) + m->get_or_insert (regno)++; + else if (m->get_or_insert (regno) > 3) + { + addr + =3D riscv_legitimize_address (addr, addr, GET_MODE (mem)); + XEXP (pat, i) =3D replace_equiv_address (mem, addr); + df_insn_rescan (insn); + } + } + } + rtx_insn *seq =3D get_insns (); + end_sequence (); + emit_insn_before (seq, insn); + } + + } + regstat_free_n_sets_and_refs (); + + return 0; +} + +} // anon namespace + +opt_pass * +make_pass_shorten_memrefs (gcc::context *ctxt) +{ + return new pass_shorten_memrefs (ctxt); +} + /* Implement TARGET_OPTION_OVERRIDE. */ =20 static void @@ -4637,6 +4800,10 @@ riscv_option_override (void) error ("%<-mriscv-attribute%> RISC-V ELF attribute requires GNU as 2.3= 2" " [%<-mriscv-attribute%>]"); #endif + + if (TARGET_RVC) + register_pass (make_pass_shorten_memrefs (g), + PASS_POS_INSERT_AFTER, "store_motion", 1); } =20 /* Implement TARGET_CONDITIONAL_REGISTER_USAGE. */ @@ -4676,9 +4843,9 @@ riscv_conditional_register_usage (void) static int riscv_register_priority (int regno) { - /* Favor x8-x15/f8-f15 to improve the odds of RVC instruction selection.= */ - if (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15) - || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15))) + /* Favor compressed registers to improve the odds of RVC instruction + selection. */ + if (riscv_compressed_reg_p (regno)) return 1; =20 return 0; diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c index 52db3cc..92a0893 100644 --- a/gcc/sched-deps.c +++ b/gcc/sched-deps.c @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3. If not see #include "sched-int.h" #include "params.h" #include "cselib.h" +#include "predict.h" =20 #ifdef INSN_SCHEDULING =20 @@ -4707,6 +4708,15 @@ attempt_change (struct mem_inc_info *mii, rtx new_ad= dr) rtx mem =3D *mii->mem_loc; rtx new_mem; =20 + /* When not optimizing for speed, avoid changes that are expected to mak= e code + size larger. */ + addr_space_t as =3D MEM_ADDR_SPACE (mem); + bool speed =3D optimize_bb_for_speed_p (BLOCK_FOR_INSN (mii->mem_insn)); + int old_cost =3D address_cost (XEXP (mem, 0), GET_MODE (mem), as, speed); + int new_cost =3D address_cost (new_addr, GET_MODE (mem), as, speed); + if (new_cost > old_cost && !speed) + return NULL_RTX; + /* Jump through a lot of hoops to keep the attributes up to date. We do not want to call one of the change address variants that take an offset even though we know the offset in many cases. These