From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 68FAF3857C67; Wed, 21 Oct 2020 05:58:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 68FAF3857C67 From: "wilson at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool Date: Wed, 21 Oct 2020 05:58:40 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: other X-Bugzilla-Version: 10.2.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: wilson at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Oct 2020 05:58:40 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D97417 --- Comment #3 from Jim Wilson --- The basic idea here is that the movqi pattern in riscv.md currently emits R= TL for a load that looks like this (set (reg:QI target) (mem:QI (address))) As an experiment, we want to try changing that to something like this (set (reg:DI temp) (zero_extend:DI (mem:DI (address)))) (set (reg:QI target) (subreg:QI (reg:DI temp) 0)) The hope is that the optimizer will combine the subreg with following operations resulting in smaller faster code at the end. And this should al= so solve the volatile load optimization problem. So we need a patch, and then= we need experiments to see if the patch actually produces better code on real programs. It should be fairly easy to write the patch even if you don't ha= ve any gcc experience. The testing part of this is probably more work than the patch writing. The movqi pattern calls riscv_legitmize_move in riscv.c, so that would have= to be modified to look for qimode loads from memory, allocate a temporary register, do a zero_extending load into the temp reg, and then a subreg copy into the target register. You will probably also need to handle cases where both the target and source are memory locations, in which case this already gets split into two instructions, a load followed by a store. You can look at the movqi pattern in arm.md file to see an example of how t= o do this, where it calls gen_zero_extendqisi2. Though for RISC-V, we would want gen_zero_extendqidi2 for rv64 and gen_zero_extendqisi2 for rv32. If the movqi change works, then we would want similar changes for movhi and maybe also movsi for rv64. It might also be worth checking whether zero-extend or sign-extend is the better choice. We zero extend char by default, so that should be best. For rv64, the hardware sign extends simode to dimode by default, so sign-extend= is probably best for that. For himode I'm not sure, I think we prefer sign-ex= tend by default, but that isn't necessarily the best choice for loads. This wou= ld have to be tested. You can see rtl dumps by using -fdump-rtl-all. The combiner is the pass th= at should be optimizing away the unnecessary zero-extend. You can see details= of what the combiner pass is doing by using -fdump-rtl-combine-all.=