From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id A861C385802F; Wed, 19 Apr 2023 10:17:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A861C385802F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1681899449; bh=xyh9+ODBSk9DF7/0dfs4rgEC4I0B1o09AhXCenJHsvE=; h=From:To:Subject:Date:In-Reply-To:References:From; b=Otz9HjdJLR07Wltl9B0Pt+CqFYwNnMWKZzwo/xCP6uksi/8gzFCAJLLWvbENxE1AJ u47JJrZQ66udtnwikzbaORRpC8pUQiMynxA+9ojPK0SP4aPoaJum1oq3Xl5lFiefxT yT6Ms6lRWsvA+ZAugednxvqVrZ2PPOYKPcLn7+1c= From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/109465] LoongArch: The expansion of memcpy is slow and bloated for some sizes Date: Wed, 19 Apr 2023 10:17:28 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109465 --- Comment #2 from CVS Commits --- The master branch has been updated by Xi Ruoyao : https://gcc.gnu.org/g:6d7e0bcfa49e4ddc84dabe520bba8a023bc52692 commit r14-70-g6d7e0bcfa49e4ddc84dabe520bba8a023bc52692 Author: Xi Ruoyao Date: Wed Apr 12 11:45:48 2023 +0000 LoongArch: Improve cpymemsi expansion [PR109465] We'd been generating really bad block move sequences which is recently complained by kernel developers who tried __builtin_memcpy. To improve it: 1. Take the advantage of -mno-strict-align. When it is set, set mode size to UNITS_PER_WORD regardless of the alignment. 2. Half the mode size when (block size) % (mode size) !=3D 0, instead of falling back to ld.bu/st.b at once. 3. Limit the length of block move sequence considering the number of instructions, not the size of block. When -mstrict-align is set and the block is not aligned, the old size limit for straight-line implementation (64 bytes) was definitely too large (we don't have 64 registers anyway). Change since v1: add a comment about the calculation of num_reg. gcc/ChangeLog: PR target/109465 * config/loongarch/loongarch-protos.h (loongarch_expand_block_move): Add a parameter as alignment RTX. * config/loongarch/loongarch.h: (LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER): Remove. (LARCH_MAX_MOVE_BYTES_STRAIGHT): Remove. (LARCH_MAX_MOVE_OPS_PER_LOOP_ITER): Define. (LARCH_MAX_MOVE_OPS_STRAIGHT): Define. (MOVE_RATIO): Use LARCH_MAX_MOVE_OPS_PER_LOOP_ITER instead of LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER. * config/loongarch/loongarch.cc (loongarch_expand_block_move): Take the alignment from the parameter, but set it to UNITS_PER_WORD if !TARGET_STRICT_ALIGN. Limit the length of straight-line implementation with LARCH_MAX_MOVE_OPS_STRAIGHT instead of LARCH_MAX_MOVE_BYTES_STRAIGHT. (loongarch_block_move_straight): When there are left-over bytes, half the mode size instead of falling back to byte mode at once. (loongarch_block_move_loop): Limit the length of loop body with LARCH_MAX_MOVE_OPS_PER_LOOP_ITER instead of LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER. * config/loongarch/loongarch.md (cpymemsi): Pass the alignment to loongarch_expand_block_move. gcc/testsuite/ChangeLog: PR target/109465 * gcc.target/loongarch/pr109465-1.c: New test. * gcc.target/loongarch/pr109465-2.c: New test. * gcc.target/loongarch/pr109465-3.c: New test.=