From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 11CED383642A; Fri, 14 Jan 2022 11:07:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 11CED383642A From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/98737] Atomic operation on x86 no optimized to use flags Date: Fri, 14 Jan 2022 11:07:15 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: jakub at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jan 2022 11:07:16 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98737 --- Comment #12 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:9896e96d4cae00d0f4d2b694284cb30bbd9c80fc commit r12-6577-g9896e96d4cae00d0f4d2b694284cb30bbd9c80fc Author: Jakub Jelinek Date: Fri Jan 14 12:04:59 2022 +0100 forwprop: Canonicalize atomic fetch_op op x to op_fetch or vice versa [PR98737] When writing the PR98737 fix, I've handled just the case where people use __atomic_op_fetch (p, x, y) etc. But some people actually use the other builtins, like __atomic_fetch_op (p, x, y) op x. The following patch canonicalizes the latter to the former and vice ver= sa when possible if the result of the builtin is a single use and if that use is a cast with same precision, also that cast's lhs has a sing= le use. For all ops of +, -, &, | and ^ we can do those __atomic_fetch_op (p, x, y) op x -> __atomic_op_fetch (p, x, y) (and __sync too) opts, but cases of INTEGER_CST and SSA_NAME x behave differently. For INTEGER_CST, typically - x is canonicalized to + (-x), while for SSA_NAME we need to handle various casts, which sometimes happen on the second argument of the builtin (there can be even two subsequent casts for char/short due to the promotions we do) and there can be a cast on the argument of op too. And all ops but - are commutative. For the other direction, i.e. __atomic_op_fetch (p, x, y) rop x -> __atomic_fetch_op (p, x, y) we can't handle op of & and |, those aren't reversible, for op + rop is -, for - rop is + and for ^ rop is ^, otherwise the same stuff as above applies. And, there is another case, we canonicalize x - y =3D=3D 0 (or !=3D 0) and x ^ y =3D=3D 0 (or !=3D 0) to x =3D=3D y= (or x !=3D y) and for constant y x + y =3D=3D 0 (or !=3D 0) to x =3D=3D -y (or !=3D -= y), so the patch also virtually undoes those canonicalizations, because e.g. for the earlier PR98737 patch but even generally, it is better if a result of atomic op fetch is compared against 0 than doing atomic fetch op and compare it to some variable or non-zero constant. As for debug info, for non-reversible operations (& and |) the patch resets debug stmts if there are any, for -fnon-call-exceptions too (didn't want to include debug temps right before all uses), but otherwise it emits (on richi's request) the reverse operation from the result as a new setter of the old lhs, so that later DCE fixes up the debug info. On the emitted assembly for the testcases which are fairly large, I see substantial decreases of the *.s size: -rw-rw-r--. 1 jakub jakub 116897 Jan 13 09:58 pr98737-1.svanilla -rw-rw-r--. 1 jakub jakub 93861 Jan 13 09:57 pr98737-1.spatched -rw-rw-r--. 1 jakub jakub 70257 Jan 13 09:57 pr98737-2.svanilla -rw-rw-r--. 1 jakub jakub 67537 Jan 13 09:57 pr98737-2.spatched There are some functions where due to RA we get one more instruction than previously, but most of them are smaller even when not hitting the PR98737 previous patch's optimizations. 2022-01-14 Jakub Jelinek PR target/98737 * tree-ssa-forwprop.c (simplify_builtin_call): Canonicalize __atomic_fetch_op (p, x, y) op x into __atomic_op_fetch (p, x, = y) and __atomic_op_fetch (p, x, y) iop x into __atomic_fetch_op (p, x, y). * gcc.dg/tree-ssa/pr98737-1.c: New test. * gcc.dg/tree-ssa/pr98737-2.c: New test.=