From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 652953858D35; Fri, 16 Jun 2023 14:43:09 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 652953858D35
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1686926589;
	bh=MfEzY0Zbe49RSnsEsQf+tO4II8DoPLIwfkBxbP0NiOg=;
	h=From:To:Subject:Date:From;
	b=aFOmqAYun6h/8l1jZRS917YcpXcRoAzDamwGXejH8E3aRL9CRlQwB0686ruHl+D7k
	 a3NzRkMi7u+/Cy1yLCq57rA55vqmaOezsFJcAtTQjZwNW5T/OxEv99hvxogtDtHx1d
	 ogL80PZo2t3+MLuftDFDQk9At7aH5fGdUALsC7pc=
From: "hubicka at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug libstdc++/110287] New: _M_check_len is expensive
Date: Fri, 16 Jun 2023 14:43:08 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: libstdc++
X-Bugzilla-Version: 13.1.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: hubicka at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter target_milestone
Message-ID: <bug-110287-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110287

            Bug ID: 110287
           Summary: _M_check_len is expensive
           Product: gcc
           Version: 13.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

I am looking into ineffective codegen for loops controlled by std::vec based
stack (see testcase in PR109849).=20

The problem is that we fail to inline enough of implementation of
std::push_back to fully SRA the stack which makes it very slow. The reason =
is
that _M_realloc_insert currently does not fit in inlining limits for -O2. It
seems to be inlined by clang.

The first issue is rather large _M_check_len that currently does:

size_type std::vector<std::pair<unsigned int, unsigned int> >::_M_check_len
(const struct vector * const this, size_type __n, const char * __s)
{
  const size_type __len;
  const long unsigned int D.27747;
  long unsigned int _1;
  long unsigned int __n.3_2;
  long unsigned int _3;
  size_type iftmp.4_4;
  long unsigned int _5;
  long unsigned int _8;
  long int _10;
  long int _13;
  struct pair * _14;
  struct pair * _15;
  const long unsigned int & _18;

  <bb 2> [local count: 1073741824]:
  _15 =3D this_7(D)->D.26707._M_impl.D.26014._M_finish;
  _14 =3D this_7(D)->D.26707._M_impl.D.26014._M_start;
  _13 =3D _15 - _14;
  _10 =3D _13 /[ex] 8;
  _8 =3D (long unsigned int) _10;
  _1 =3D 1152921504606846975 - _8;
  __n.3_2 =3D __n;
  if (_1 < __n.3_2)
    goto <bb 3>; [0.04%]
  else
    goto <bb 4>; [99.96%]

  <bb 3> [local count: 429496]:
  std::__throw_length_error (__s_16(D));

  <bb 4> [local count: 1073312329]:
  D.27747 =3D _8;
  if (__n.3_2 > _8)
    goto <bb 5>; [34.00%]
  else
    goto <bb 6>; [66.00%]

  <bb 5> [local count: 364926196]:

  <bb 6> [local count: 1073312330]:
  # _18 =3D PHI <&D.27747(4), &__n(5)>
  _3 =3D *_18;
  __len_11 =3D _3 + _8;
  D.27747 =3D{v} {CLOBBER(eol)};
  if (_8 > __len_11)
    goto <bb 8>; [35.00%]
  else
    goto <bb 7>; [65.00%]

  <bb 7> [local count: 697653013]:
  _5 =3D MIN_EXPR <__len_11, 1152921504606846975>;

  <bb 8> [local count: 1073312330]:
  # iftmp.4_4 =3D PHI <1152921504606846975(6), _5(7)>
  return iftmp.4_4;

}

Whis is used by _M_realloc_insert:

_20 =3D std::vector<std::pair<unsigned int, unsigned int> >::_M_check_len
(this_18(D), 1, "vector::_M_realloc_insert");

So _n is 1. The test:

  _1 =3D 1152921504606846975 - _8;
  __n.3_2 =3D __n;
  if (_1 < __n.3_2)
    goto <bb 3>; [0.04%]
  else
    goto <bb 4>; [99.96%]

  <bb 3> [local count: 429496]:
  std::__throw_length_error (__s_16(D));

Can IMO be never true, since we would need to have already allocated vector=
 of
1152921504606846975 elements which will not fit in memory anyway.
This brings in the EH handling and error message.

Perhaps for constantly sized _M_check_len and constantly sizes vector eleme=
nts
we can use __builtin_constant_p to avoid calling _M_check_len from
_M_realloc_insert and invent _M_safe_check_len that avoids this test.

(for IPA optimizations to work out that there will be no EH and call is che=
aper
we need to have the test at caller side instead in _M_check_len).=