From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12897 invoked by alias); 5 Jun 2007 21:17:21 -0000 Received: (qmail 12373 invoked by alias); 5 Jun 2007 21:17:09 -0000 Date: Tue, 05 Jun 2007 21:17:00 -0000 Message-ID: <20070605211709.12372.qmail@sourceware.org> X-Bugzilla-Reason: CC References: Subject: [Bug libstdc++/29286] [4.0/4.1/4.2/4.3 Regression] placement new does not change the dynamic type as it should In-Reply-To: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "rguenther at suse dot de" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2007-06/txt/msg00317.txt.bz2 ------- Comment #168 from rguenther at suse dot de 2007-06-05 21:17 ------- Subject: Re: [4.0/4.1/4.2/4.3 Regression] placement new does not change the dynamic type as it should On Tue, 5 Jun 2007, ian at airs dot com wrote: > ------- Comment #167 from ian at airs dot com 2007-06-05 20:48 ------- > Can you give me a .ii file for the performance regression, and point me at the > relevant function? http://www.suse.de/~rguenther/tramp3d/tramp3d-v4.cpp.gz Amongst the interesting functions (yep, there are multiple) are those called Momentumflux*::operator(), one particular example is Adv5::Z::MomentumfluxX::operator(), which mangles as _ZNK4Adv51Z13MomentumfluxXILi3EEclI5FieldI22UniformRectilinearMeshI10MeshTraitsILi3Ed21UniformRectilinearTag12CartesianTagLi3EEEd7CompFwdI6EngineILi3E6VectorILi3Ed4FullE10BrickViewUE3LocILi1EEEES4_ISA_dSG_ESM_SM_EEvRKT_RKT0_RKT1_RKT2_RKSI_ILi3EE it has this initialization loop (which is fixed by the tramp3d patch) inside the computational kernel (triple nested loop): : D.760598_367 = &D.464122.engine_m.x_m[i_368]; <<>> iftmp.913_369 = &D.464122.engine_m.x_m[i_368]; if (1) goto ; else goto ; : *iftmp.913_369 = 0.0; : i_370 = i_368 + 1; : # i_368 = PHI <0(34), i_370(37)> if (i_368 <= 2) goto ; else goto ; (that's after forwprop1 actually). It's important that we unroll this loop completely (to make us recognize the 0.0 stores are all super-seeded by later stores) and that we move all loop invariant loads out of the triple-nested loops, crossing this initialization loop. The optimized dump for all these functions should be 'easy to grasp and obviously fast' - at least that's what it used to be. Now with this patch we stil have <<>> D.767646.engine_m.x_m[0] = 0.0; <<>> D.767646.engine_m.x_m[1] = 0.0; <<>> D.767646.engine_m.x_m[2] = 0.0; in there and loads of index/domain variables on the MEM expressions like MEM[base: &D.767646, index: D.1312916, step: 8] = D.767266->origin_m.engine_m.x_m[i] + D.767266->spacings_m.engine_m.x_m[i] * (double) (MEM[base: &D.767512, index: D.1312916, step: 4] - D.767266->D.225459.physicalCellDomain_m.D.114276.D.113975.domain_m[i].D.110801.D.45225.D.45039.domain_m[0]); Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29286