From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.axis.com (smtp1.axis.com [195.60.68.17]) by sourceware.org (Postfix) with ESMTPS id 0214B3858C5F for ; Thu, 11 May 2023 15:05:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0214B3858C5F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=axis.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=axis.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=axis.com; q=dns/txt; s=axis-central1; t=1683817542; x=1715353542; h=from:to:cc:in-reply-to:subject:mime-version: content-transfer-encoding:references:message-id:date; bh=2UARz1olxYJlPrUNiULnHbWIpmoF1JhjuqPm6lVtZvw=; b=EVZ7GBOC7VZxc5x/R9b0e2df3kdVh2s237MhDxJz/G/1YLZ8OQXyCAAH Dkhvc76aQX50cQshF/ywU7JcBc7vmtWBz68Msr5SQ4K37eEGGG+SCUcBV DRf8+xJ9LFSNCFE27KwkTphbd5jdbUlx1Zyjo/YDUhgcgOQ3LttcsEWXN np5TgPi4ZMlihYot2q/8aKzIL5YczXtGe5/kOOaQ6mqyB2LBhOiipOPFC J2L2LCYGhP0FOA/8Hc52X0pdzE49+uXOTj7w1bylV0zfq5qVx4B62LfqZ h28VajqdC9X+fBNn7QAYiUwluqF6DpaBql0QusTDpOlzV6tLHmZ8EjYzY g==; From: Hans-Peter Nilsson To: Roger Sayle CC: , , In-Reply-To: <009601d97c85$de708170$9b518450$@nextmovesoftware.com> (roger@nextmovesoftware.com) Subject: Re: [committed] Convert xstormy16 to LRA MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8BIT References: <009601d97c85$de708170$9b518450$@nextmovesoftware.com> Message-ID: <20230511150540.9606F20420@pchp3.se.axis.com> Date: Thu, 11 May 2023 17:05:40 +0200 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > From: "Roger Sayle" > Date: Tue, 2 May 2023 00:37:14 +0100 > Jeff Law wrote: > > This patch converts the xstormy16 patch to LRA. It introduces a code > > quality regression in the shiftsi testcase, but it also fixes numerous > > aborts/errors. IMHO it's a good tradeoff. > > I've investigated the shiftsi regression on xstormy16 and the underlying > cause > appears to be an interaction between lower-subreg's "subreg3" pass and the > new LRA. Previously, reload was not phased by the "clobbers" that are > introduced by the decompose_multiword_subregs function, but they appear > to interfere with LRA's register assignments. > > combine's make_extra_copies introduces a new pseudo-to-pseudo move, > but when subreg3 inserts a naked clobber between the original and the > new move, LRA is recombine theses pseudos back to the same allocno. > > The shiftsi.cc regression on xstormy16 is fixed by adding > -fno-split-wide-types. > In fact, if all the regression tests pass, I'd suggest that > flag_split_wide-types = false > should be the default on xstormy16 now that we've moved to LRA. And if this > works for xstormy16, it might be useful to other targets for the LRA > transition; > it's a difference in behaviour between reload and LRA that could potentially > affect multiple targets. > > For reference, xstormy16 has a post-reload define_insn_and_split for movsi > (i.e. a multi-word move). If this insn was split during split1 (i.e. before > subreg3) > there wouldn't be a problem (no clobber), but alas the target's > xstormy16_split_move > function has several asserts insisting this only get called when > reload_completed. > > I hope this is useful. > Cheers, > Roger Yes, very interesting. Thank you for sharing this. I've seen regressions with LRA for CRIS too, for "double-register-sized" types, which for CRIS, a 32-bit target, translates to 64-bit types (DFmode and DImode), and where LRA does a much worse job than reload; spills a lot more often to stack, even after trying every register-allocation-related hook I found (and also an LRA patch which helped only by a fraction, but regressed results on x86_64-linux, so let's quickly forget it again). No fix or nicely stated bug entry yet, but at least a different observation: Coremark for cris-elf built with -O2 -march=v10, when going from reload to LRA is slightly faster but a bit bigger (for example before/after Jeffs r14-383-gfaf8bea79b6256, 5090593 to 5090567 cycles and 48887 to 48901 bytes), a relative observation which has not changed much since February when I started working on an LRA transition for CRIS. But, the case for code with heavy use of "double-register- sized" types is much worse; up to several percent slower. My favorite sharable example is gcc/testsuite/gcc.c-torture/execute/arith-rand-ll.c (with a few unimportant local tweaks not suitable for upstreaming but which I'm happy to share with anyone asking) which around that commit goes from 1295021 to 1317531 cycles (101.74%) and one percent larger; 4008 to 4048 bytes. Your suggestion to default to -fno-split-wide-types seemed too good to be true, and though worth a try, unfortunately it was. I'm seeing *horrible* regressions for double-register codes with the patch below on top of LRA. Coremark numbers suffer too (different baseline here than above; closer to today's sources) from 5078989 to 5081968 cycles and from 48537 to 50145 bytes. But, arith-rand-ll suffers much more: from 1317530 to 2182080 cycles (yes, 165.62%) and from 4044 to 4174 bytes. (With reload, it's bad too, but "only" regressing 143.67% by speed.) Next, I'll turn around completely, and try defaulting to -fsplit-wide-types-early, which sounds more promising. :) I don't like throwing defaults around randomly, but trying out a promising idea this way is easy. So because of the numbers above, this will never be committed, just passed for reference. I believe this is the correct way to default to -fno-split-wide-types: -- >8 -- [PATCH] CRIS: Default to -fno-split-wide-types * common/config/cris/cris-common.cc (cris_option_optimization_table): New. Default to -fno-split-wide-types. (TARGET_OPTION_OPTIMIZATION_TABLE): Define. --- gcc/common/config/cris/cris-common.cc | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/gcc/common/config/cris/cris-common.cc b/gcc/common/config/cris/cris-common.cc index b08d6014102d..cf00c1414651 100644 --- a/gcc/common/config/cris/cris-common.cc +++ b/gcc/common/config/cris/cris-common.cc @@ -26,6 +26,14 @@ along with GCC; see the file COPYING3. If not see #include "opts.h" #include "flags.h" +/* Implement TARGET_OPTION_OPTIMIZATION_TABLE. */ + +static const struct default_options cris_option_optimization_table[] = + { + { OPT_LEVELS_1_PLUS, OPT_fsplit_wide_types, NULL, 0 }, + { OPT_LEVELS_NONE, 0, NULL, 0 } + }; + /* TARGET_HANDLE_OPTION worker. We just store the values into local variables here. Checks for correct semantics are in cris_option_override. */ @@ -90,5 +98,7 @@ cris_handle_option (struct gcc_options *opts, #define TARGET_DEFAULT_TARGET_FLAGS (TARGET_DEFAULT | CRIS_SUBTARGET_DEFAULT) #undef TARGET_HANDLE_OPTION #define TARGET_HANDLE_OPTION cris_handle_option +#undef TARGET_OPTION_OPTIMIZATION_TABLE +#define TARGET_OPTION_OPTIMIZATION_TABLE cris_option_optimization_table struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER; -- 2.30.2