From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by sourceware.org (Postfix) with ESMTPS id 650223858D34 for ; Mon, 10 Jun 2024 06:26:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 650223858D34 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 650223858D34 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718000771; cv=none; b=s34ATRUqGzo4l+KNbMO7540hscTiihSBji/j0YbmPhyPrnrQZbydvzOT2/BI2F6d0M0WkjJJ7NqchFw/7cPDxN4t+VGk+UX+q4LmLL+blSzTz5qKPOn7QmS6H+T+X3mHBoP+IkDVup+shL4sbvBs3VHH/8uOvK5PSDsnK/cAz+4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718000771; c=relaxed/simple; bh=iD/ozhiSkCKn19iF3wC8E3J5EPQnLL69dIFrV0Ixwww=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:Message-ID:MIME-Version; b=VpMjCVdKvIu30n7FHCiuPgHtflpIgu0Gy5AGxTbtXVV5P/6BlsbvCj8VCAQTyFHEXRTHG5kdjKEDN2yTOWRiKzxNAmj2Z4afumj418G0NZFv826Q6ZScWREd9vm/AC62+Y/fu8ZGOwW5thfVqzliDzF1F4Wt3SaTwDVwdofffh8= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from murzim.nue2.suse.org (unknown [10.168.4.243]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 219041F7A5; Mon, 10 Jun 2024 06:26:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1718000768; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=k8FC66OjNl83jrv1lMP72fr1shX7EeuVaJelG7YHGQs=; b=zH4XUUyc0ri6bHvGet8BlevQX5QlO70f6KHVP4WpvvWWZJ0AasAcX0igbuYkza1h3NjZG6 Oeek/a6jkbUeEEonin+rQar7cF2YM3zbP+Sgo4Nn+qWk1nOj7UGbxq+xhpN7/IRjtknVJd 4Xg5m/BzSNobRkb8lkXWgtoSFWXM6e8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1718000768; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=k8FC66OjNl83jrv1lMP72fr1shX7EeuVaJelG7YHGQs=; b=QxgVMPaidohiQ0/NnLt5VmpmHGAxizVii7IOgDCRaPTsl+OYP7Dg73gfDPvfZDNzabJgY4 hMChRD+r+HcbQmAw== Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1718000767; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=k8FC66OjNl83jrv1lMP72fr1shX7EeuVaJelG7YHGQs=; b=RGfY/y2kgnFdyFu3wj+bJI3FKkMMyZTEM7oJ/A9ZZhmPXgU7i3wEo9iZj5fc4YumKDpsaM ILVs9RgnR+XlolgDvBfqptjGTkPT5NK4nG6YiTkOZFS9GrehPBLN8wnWD9uImsbtcbQks5 EalDRE8kKvzHP+T5hJFncStOSS522JQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1718000767; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=k8FC66OjNl83jrv1lMP72fr1shX7EeuVaJelG7YHGQs=; b=ujmvk3Wxk/ACZfxZuU6YQ8mj1hijyAck0bIQA3kS5l207j7f+qDQ4PPQOYtZyzzoAkfA4f mBQ7RiEY4t0lnSDw== Date: Mon, 10 Jun 2024 08:26:07 +0200 (CEST) From: Richard Biener To: Jeff Law cc: Manolis Tsamis , gcc-patches@gcc.gnu.org, Philipp Tomsich , =?ISO-8859-15?Q?Christoph_M=FCllner?= , Jiangning Liu , Jakub Jelinek , Andrew Pinski Subject: Re: [PATCH v2] Target-independent store forwarding avoidance. In-Reply-To: Message-ID: <24051r73-r4pr-6018-5045-r53oqp934731@fhfr.qr> References: <20240606101043.3682477-1-manolis.tsamis@vrull.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Level: X-Spamd-Result: default: False [-4.30 / 50.00]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.20)[-0.992]; MIME_GOOD(-0.10)[text/plain]; FREEMAIL_TO(0.00)[gmail.com]; RCVD_COUNT_ZERO(0.00)[0]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; MISSING_XM_UA(0.00)[]; RCPT_COUNT_SEVEN(0.00)[8]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email,murzim.nue2.suse.org:helo,gcc.target:url] X-Spam-Score: -4.30 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 7 Jun 2024, Jeff Law wrote: > > > On 6/6/24 4:10 AM, Manolis Tsamis wrote: > > This pass detects cases of expensive store forwarding and tries to avoid > > them > > by reordering the stores and using suitable bit insertion sequences. > > For example it can transform this: > > > > strb w2, [x1, 1] > > ldr x0, [x1] # Expensive store forwarding to larger load. > > > > To: > > > > ldr x0, [x1] > > strb w2, [x1] > > bfi x0, x2, 0, 8 > > > > Assembly like this can appear with bitfields or type punning / unions. > > On stress-ng when running the cpu-union microbenchmark the following > > speedups > > have been observed. > > > > Neoverse-N1: +29.4% > > Intel Coffeelake: +13.1% > > AMD 5950X: +17.5% > > > > gcc/ChangeLog: > > > > * Makefile.in: Add avoid-store-forwarding.o. > > * common.opt: New option -favoid-store-forwarding. > > * params.opt: New param store-forwarding-max-distance. > > * doc/invoke.texi: Document new pass. > > * doc/passes.texi: Document new pass. > > * passes.def: Schedule a new pass. > > * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare. > > * avoid-store-forwarding.cc: New file. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/aarch64/avoid-store-forwarding-1.c: New test. > > * gcc.target/aarch64/avoid-store-forwarding-2.c: New test. > > * gcc.target/aarch64/avoid-store-forwarding-3.c: New test. > > * gcc.target/aarch64/avoid-store-forwarding-4.c: New test. > So this is getting a lot more interesting. I think the first time I looked at > this it was more concerned with stores feeding something like a load-pair and > avoiding the store forwarding penalty for that case. Am I mis-remembering, or > did it get significantly more general? > > > > > > > + > > +static unsigned int stats_sf_detected = 0; > > +static unsigned int stats_sf_avoided = 0; > > + > > +static rtx > > +get_load_mem (rtx expr) > Needs a function comment. You should probably mention that EXPR must be a > single_set in that comment. > > > > + > > + rtx dest; > > + if (eliminate_load) > > + dest = gen_reg_rtx (load_inner_mode); > > + else > > + dest = SET_DEST (load); > > + > > + int move_to_front = -1; > > + int total_cost = 0; > > + > > + /* Check if we can emit bit insert instructions for all forwarded stores. > > */ > > + FOR_EACH_VEC_ELT (stores, i, it) > > + { > > + it->mov_reg = gen_reg_rtx (GET_MODE (it->store_mem)); > > + rtx_insn *insns = NULL; > > + > > + /* If we're eliminating the load then find the store with zero offset > > + and use it as the base register to avoid a bit insert. */ > > + if (eliminate_load && it->offset == 0) > So often is this triggering? We have various codes in the gimple optimizers > to detect store followed by a load from the same address and do the > forwarding. If they're happening with any frequency that would be a good sign > code in DOM and elsewhere isn't working well. > > THe way these passes detect this case is to take store, flip the operands > around (ie, it looks like a load) and enter that into the expression hash > tables. After that standard redundancy elimination approaches will work. > > > > + { > > + start_sequence (); > > + > > + /* We can use a paradoxical subreg to force this to a wider mode, as > > + the only use will be inserting the bits (i.e., we don't care > > about > > + the value of the higher bits). */ > Which may be a good hint about the cases you're capturing -- if the > modes/sizes differ that would make more sense since I don't think we're as > likely to be capturing those cases. Yeah, we handle stores from constants quite well and FRE can forward stores from SSA names to smaller loads by inserting BIT_FIELD_REFs but it does have some additional restrictions. > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > index 4e8967fd8ab..c769744d178 100644 > > --- a/gcc/doc/invoke.texi > > +++ b/gcc/doc/invoke.texi > > @@ -12657,6 +12657,15 @@ loop unrolling. > > This option is enabled by default at optimization levels @option{-O1}, > > @option{-O2}, @option{-O3}, @option{-Os}. > > > > +@opindex favoid-store-forwarding > > +@item -favoid-store-forwarding > > +@itemx -fno-avoid-store-forwarding > > +Many CPUs will stall for many cycles when a load partially depends on > > previous > > +smaller stores. This pass tries to detect such cases and avoid the penalty > > by > > +changing the order of the load and store and then fixing up the loaded > > value. > > + > > +Disabled by default. > Is there any particular reason why this would be off by default at -O1 or > higher? It would seem to me that on modern cores that this transformation > should easily be a win. Even on an old in-order core, avoiding the load with > the bit insert is likely profitable, just not as much so. I would think it's the targets to decide for a default. > > diff --git a/gcc/params.opt b/gcc/params.opt > > index d34ef545bf0..b8115f5c27a 100644 > > --- a/gcc/params.opt > > +++ b/gcc/params.opt > > @@ -1032,6 +1032,10 @@ Allow the store merging pass to introduce unaligned > > stores if it is legal to do > > Common Joined UInteger Var(param_store_merging_max_size) Init(65536) > > IntegerRange(1, 65536) Param Optimization > > Maximum size of a single store merging region in bytes. > > > > +-param=store-forwarding-max-distance= > > +Common Joined UInteger Var(param_store_forwarding_max_distance) Init(10) > > IntegerRange(1, 1000) Param Optimization > > +Maximum number of instruction distance that a small store forwarded to a > > larger load may stall. > I think you may need to run the update-urls script since you've added a new > option. > > > In general it seems pretty reasonable. > > I've actually added it to my tester just to see if there's any fallout. It'll > take a week to churn through the long running targets that bootstrap in QEMU, > but the crosses should have data Monday. > > jeff > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)