From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id BB66638515E4; Thu, 25 Mar 2021 00:05:31 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BB66638515E4 From: "hjl.tools at gmail dot com" To: glibc-bugs@sourceware.org Subject: [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3() Date: Thu, 25 Mar 2021 00:05:31 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: libc X-Bugzilla-Version: 2.27 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: hjl.tools at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: security- X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Mar 2021 00:05:31 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D24872 --- Comment #2 from H.J. Lu --- On Intel i7-8559U, for glibc master branch, benchtests/bench-memcpy-large shows: Function: memcpy Variant: large __memcpy_avx_unaligned_erms __memcpy_ss= se3 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D length=3D65543, align1=3D0, align2=3D0: 14301.30 17782.90= =20=20=20=20 length=3D65551, align1=3D0, align2=3D3: 15940.10 17913.20= =20=20=20=20 length=3D65567, align1=3D3, align2=3D0: 13492.10 17706.70= =20=20=20=20 length=3D65599, align1=3D3, align2=3D5: 16658.40 17927.30= =20=20=20=20 length=3D131079, align1=3D0, align2=3D0: 30461.70 44844.90= =20=20=20=20 length=3D131087, align1=3D0, align2=3D3: 10647.30 12163.80= =20=20=20=20 length=3D131103, align1=3D3, align2=3D0: 9425.44 12177.90= =20=20=20=20 length=3D131135, align1=3D3, align2=3D5: 11126.40 12181.70= =20=20=20=20 length=3D262151, align1=3D0, align2=3D0: 23470.20 39790.60= =20=20=20=20 length=3D262159, align1=3D0, align2=3D3: 33363.40 33786.50= =20=20=20=20 length=3D262175, align1=3D3, align2=3D0: 23122.00 29927.70= =20=20=20=20 length=3D262207, align1=3D3, align2=3D5: 25862.10 28582.20= =20=20=20=20 length=3D524295, align1=3D0, align2=3D0: 45083.10 55485.80= =20=20=20=20 length=3D524303, align1=3D0, align2=3D3: 47938.20 54088.40= =20=20=20=20 length=3D524319, align1=3D3, align2=3D0: 42350.10 51983.70= =20=20=20=20 length=3D524351, align1=3D3, align2=3D5: 45029.20 52464.20= =20=20=20=20 length=3D1048583, align1=3D0, align2=3D0: 88527.90 101156.00= =20=20=20=20 length=3D1048591, align1=3D0, align2=3D3: 93855.80 100754.00= =20=20=20=20 length=3D1048607, align1=3D3, align2=3D0: 94034.90 100673.00= =20=20=20=20 length=3D1048639, align1=3D3, align2=3D5: 90740.50 103256.00= =20=20=20=20 length=3D2097159, align1=3D0, align2=3D0: 185803.00 193467.00= =20=20=20=20 length=3D2097167, align1=3D0, align2=3D3: 187839.00 211012.00= =20=20=20=20 length=3D2097183, align1=3D3, align2=3D0: 186758.00 195055.00= =20=20=20=20 length=3D2097215, align1=3D3, align2=3D5: 190751.00 195920.00= =20=20=20=20 length=3D4194311, align1=3D0, align2=3D0: 374530.00 391675.00= =20=20=20=20 length=3D4194319, align1=3D0, align2=3D3: 378556.00 395988.00= =20=20=20=20 length=3D4194335, align1=3D3, align2=3D0: 376987.00 396840.00= =20=20=20=20 length=3D4194367, align1=3D3, align2=3D5: 380713.00 399326.00= =20=20=20=20 length=3D8388615, align1=3D0, align2=3D0: 1248790.00 1296470.00= =20=20=20=20 length=3D8388623, align1=3D0, align2=3D3: 924123.00 1011000.00= =20=20=20=20 length=3D8388639, align1=3D3, align2=3D0: 910170.00 926244.00= =20=20=20=20 length=3D8388671, align1=3D3, align2=3D5: 915979.00 1011690.00= =20=20=20=20 length=3D16777223, align1=3D0, align2=3D0: 2119530.00 2228360.00= =20=20=20=20 length=3D16777231, align1=3D0, align2=3D3: 2123510.00 2321720.00= =20=20=20=20 length=3D16777247, align1=3D3, align2=3D0: 2092680.00 2231230.00= =20=20=20=20 length=3D16777279, align1=3D3, align2=3D5: 2121050.00 2280890.00= =20=20=20=20 length=3D33554439, align1=3D0, align2=3D0: 4881620.00 4770780.00= =20=20=20=20 length=3D33554447, align1=3D0, align2=3D3: 4634040.00 4795500.00= =20=20=20=20 length=3D33554463, align1=3D3, align2=3D0: 4599820.00 4676770.00= =20=20=20=20 length=3D33554495, align1=3D3, align2=3D5: 4638870.00 4841840.00 avx_unaligned_erms is faster than ssse3 --=20 You are receiving this mail because: You are on the CC list for the bug.=