From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16]) by sourceware.org (Postfix) with ESMTPS id C427D3858C3A; Thu, 23 Nov 2023 15:33:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C427D3858C3A Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=ucw.cz Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kam.mff.cuni.cz ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C427D3858C3A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.113.20.16 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700753640; cv=none; b=n/YM7+gbU9NhME3mocKEHuS1XP6s+f2Rrr7XjLupA0eGbgb2L4ltB7e311qXAZy9lgMqQnJaL4MegfmqFettYhS4oDug51YyUAx3/zDlsse4RaPq3+woRrtvDru1TvAj3GODjyeuWu9Q+RbHkCJTHM7dTC8127Jk5a4rMXjD28U= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700753640; c=relaxed/simple; bh=Rd0nfhwhtqMN9JX1L8mum5nzfqZkLVxx3//rC9MHBEo=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=UDf44wPbqi4V6pwrDbyUMaeRwkioGZGR+o+uNaaWY/G344IdXjFC4B2gQ7KbRjbJSGz4DyJjqDlFFJl+igcXrNpqG45xEFx5LBJwVGjDTXwHdwmPk/rMBT2L0HyowsLduZqnrtXHsKAIOnS7LvC8fxfz6ZIH5rWF80KqvudJXvE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 5F48A28BA50; Thu, 23 Nov 2023 16:33:57 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ucw.cz; s=gen1; t=1700753637; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=YUJxYLpMbELUlUgwh8ZZpwZdexrN3pPg7aNvYQnS6Fg=; b=Zu9jcCYun+cGR7G4wKr5522G0l5E0/f009bOnzF+j2M1RK7Kso8SPNKuM2JsZSah3PqVFH bZ1gZAmO5AcTHdlf9CFgNMfqs8PhykooOrvencLKNi9DIJjEjAMnNkAmICv2zQzDRuuXbC 5rSb+0c7AMacxihra5BbnJteTt2AIr0= Date: Thu, 23 Nov 2023 16:33:57 +0100 From: Jan Hubicka To: Matthias Kretz , rguenther@suse.de Cc: jwakely@redhat.com, libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: Re: libstdc++: Speed up push_back Message-ID: References: <11345207.nUPlyArG6x@minbar> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,JMQ_SPF_NEUTRAL,KAM_SHORT,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > > On Sunday, 19 November 2023 22:53:37 CET Jan Hubicka wrote: > > > Sadly it is really hard to work out this > > > from IPA passes, since we basically care whether the iterator points to > > > the same place as the end pointer, which are both passed by reference. > > > This is inter-procedural value numbering that is quite out of reach. > > > > I've done a fair share of branching on __builtin_constant_p in > > std::experimental::simd to improve code-gen. It's powerful! But maybe we > > also need the other side of the story to tell the optimizer: "I know you > > can't const-prop everything; but this variable / expression, even if you > > need to put in a lot of effort, the performance difference will be worth > > it." > > > > For std::vector, the remaining capacity could be such a value. The > > functions f() and g() are equivalent (their code-gen isn't https:// > > compiler-explorer.com/z/r44ejK1qz): > > > > #include > > > > auto > > f() > > { > > std::vector x; > > x.reserve(10); > > for (int i = 0; i < 10; ++i) > > x.push_back(0); > > return x; > > } > > auto > > g() > > { return std::vector(10, 0); } > > With my changes at -O3 we now inline push_back, so we could optimize the > first loop to the second. However with > ~/trunk-install/bin/gcc -O3 auto.C -S -fdump-tree-all-details -fno-exceptions -fno-store-merging -fno-tree-slp-vectorize > the fist problem is right at the begining: > > [local count: 97603128]: > MEM[(struct _Vector_impl_data *)x_4(D)]._M_start = 0B; > MEM[(struct _Vector_impl_data *)x_4(D)]._M_finish = 0B; > MEM[(struct _Vector_impl_data *)x_4(D)]._M_end_of_storage = 0B; > _37 = operator new (40); I also wonder, if default operator new and malloc can be handled as not reading/modifying anything visible to the user code. That would help us to propagate here even if we lose track of points-to information. We have: /* If the call is to a replaceable operator delete and results from a delete expression as opposed to a direct call to such operator, then we can treat it as free. */ if (fndecl && DECL_IS_OPERATOR_DELETE_P (fndecl) && DECL_IS_REPLACEABLE_OPERATOR (fndecl) && gimple_call_from_new_or_delete (stmt)) return ". o "; /* Similarly operator new can be treated as malloc. */ if (fndecl && DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl) && gimple_call_from_new_or_delete (stmt)) return "m "; Which informs alias analysis that new returns pointer to memory not aliasing with anything and that free is not reading anything from its parameter (but it is modelled as a write to make it clear that the memory dies). stmt_kills_ref_p special cases BUILT_IN_FREE but not OPERATOR delete to make it clear that everything pointed to by it dies. This is needed because 'o' only means that some data may be overwritten, but it does not make it clear that all data dies. Not handling operator delete seems like an omision, but maybe it is not too critical since we emit clobbers around destructors that are usually right before call to delete. Also ipa-modref kill analysis does not understand BUILT_IN_FREE nor delete and could. I wonder if we can handle both as const except for side-effects described. Honza > _22 = x_4(D)->D.26019._M_impl.D.25320._M_finish; > _23 = x_4(D)->D.26019._M_impl.D.25320._M_start; > _24 = _22 - _23; > if (_24 > 0) > goto ; [41.48%] > else > goto ; [58.52%] > > So the vector is fist initialized with _M_start=_M_finish=0, but after > call to new we already are not able to propagate this. > > This is because x is returned and PTA considers it escaping. This is > problem discussed in > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653 > Which shows that it is likely worthwhile to fix PTA to handle this > correctly.