From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 652953858D35; Fri, 16 Jun 2023 14:43:09 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 652953858D35 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1686926589; bh=MfEzY0Zbe49RSnsEsQf+tO4II8DoPLIwfkBxbP0NiOg=; h=From:To:Subject:Date:From; b=aFOmqAYun6h/8l1jZRS917YcpXcRoAzDamwGXejH8E3aRL9CRlQwB0686ruHl+D7k a3NzRkMi7u+/Cy1yLCq57rA55vqmaOezsFJcAtTQjZwNW5T/OxEv99hvxogtDtHx1d ogL80PZo2t3+MLuftDFDQk9At7aH5fGdUALsC7pc= From: "hubicka at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug libstdc++/110287] New: _M_check_len is expensive Date: Fri, 16 Jun 2023 14:43:08 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: libstdc++ X-Bugzilla-Version: 13.1.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: hubicka at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110287 Bug ID: 110287 Summary: _M_check_len is expensive Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- I am looking into ineffective codegen for loops controlled by std::vec based stack (see testcase in PR109849).=20 The problem is that we fail to inline enough of implementation of std::push_back to fully SRA the stack which makes it very slow. The reason = is that _M_realloc_insert currently does not fit in inlining limits for -O2. It seems to be inlined by clang. The first issue is rather large _M_check_len that currently does: size_type std::vector >::_M_check_len (const struct vector * const this, size_type __n, const char * __s) { const size_type __len; const long unsigned int D.27747; long unsigned int _1; long unsigned int __n.3_2; long unsigned int _3; size_type iftmp.4_4; long unsigned int _5; long unsigned int _8; long int _10; long int _13; struct pair * _14; struct pair * _15; const long unsigned int & _18; [local count: 1073741824]: _15 =3D this_7(D)->D.26707._M_impl.D.26014._M_finish; _14 =3D this_7(D)->D.26707._M_impl.D.26014._M_start; _13 =3D _15 - _14; _10 =3D _13 /[ex] 8; _8 =3D (long unsigned int) _10; _1 =3D 1152921504606846975 - _8; __n.3_2 =3D __n; if (_1 < __n.3_2) goto ; [0.04%] else goto ; [99.96%] [local count: 429496]: std::__throw_length_error (__s_16(D)); [local count: 1073312329]: D.27747 =3D _8; if (__n.3_2 > _8) goto ; [34.00%] else goto ; [66.00%] [local count: 364926196]: [local count: 1073312330]: # _18 =3D PHI <&D.27747(4), &__n(5)> _3 =3D *_18; __len_11 =3D _3 + _8; D.27747 =3D{v} {CLOBBER(eol)}; if (_8 > __len_11) goto ; [35.00%] else goto ; [65.00%] [local count: 697653013]: _5 =3D MIN_EXPR <__len_11, 1152921504606846975>; [local count: 1073312330]: # iftmp.4_4 =3D PHI <1152921504606846975(6), _5(7)> return iftmp.4_4; } Whis is used by _M_realloc_insert: _20 =3D std::vector >::_M_check_len (this_18(D), 1, "vector::_M_realloc_insert"); So _n is 1. The test: _1 =3D 1152921504606846975 - _8; __n.3_2 =3D __n; if (_1 < __n.3_2) goto ; [0.04%] else goto ; [99.96%] [local count: 429496]: std::__throw_length_error (__s_16(D)); Can IMO be never true, since we would need to have already allocated vector= of 1152921504606846975 elements which will not fit in memory anyway. This brings in the EH handling and error message. Perhaps for constantly sized _M_check_len and constantly sizes vector eleme= nts we can use __builtin_constant_p to avoid calling _M_check_len from _M_realloc_insert and invent _M_safe_check_len that avoids this test. (for IPA optimizations to work out that there will be no EH and call is che= aper we need to have the test at caller side instead in _M_check_len).=