From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by sourceware.org (Postfix) with ESMTPS id 120EC3854805 for ; Fri, 15 Jan 2021 18:00:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 120EC3854805 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 10FHjHtF122709; Fri, 15 Jan 2021 17:59:56 GMT Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 360kvke1u8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 15 Jan 2021 17:59:56 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 10FHk7wQ158042; Fri, 15 Jan 2021 17:57:55 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3020.oracle.com with ESMTP id 360kebe36b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 15 Jan 2021 17:57:55 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 10FHvrex022884; Fri, 15 Jan 2021 17:57:53 GMT Received: from dhcp-10-154-190-241.vpn.oracle.com (/10.154.190.241) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 15 Jan 2021 09:57:52 -0800 From: Qing Zhao Message-Id: <4BD13E0F-A561-4429-AEB8-5BDEA6987D21@ORACLE.COM> Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Subject: Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init Date: Fri, 15 Jan 2021 11:57:51 -0600 In-Reply-To: Cc: Richard Sandiford , Richard Biener via Gcc-patches To: Richard Biener References: <33955130-9D2D-43D5-818D-1DCC13FC1988@ORACLE.COM> <89D58812-0F3E-47AE-95A5-0A07B66EED8C@ORACLE.COM> <9585CBB2-0082-4B9A-AC75-250F54F0797C@ORACLE.COM> <51911859-45D5-4566-B588-F828B9D7313B@ORACLE.COM> <9127AAB9-92C8-4A1B-BAD5-2F5F8762DCF9@ORACLE.COM> <5A0F7219-DAFA-4EAA-B845-0E236A108738@ORACLE.COM> <7E70D6B0-CA52-4957-BF84-401AA6E094D7@ORACLE.COM> X-Mailer: Apple Mail (2.3608.120.23.2.4) X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9865 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 bulkscore=0 mlxscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2101150108 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9865 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 phishscore=0 lowpriorityscore=0 bulkscore=0 priorityscore=1501 malwarescore=0 clxscore=1015 impostorscore=0 spamscore=0 mlxscore=0 suspectscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2101150108 X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, HTML_MESSAGE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jan 2021 18:00:05 -0000 > On Jan 15, 2021, at 11:22 AM, Richard Biener = wrote: >=20 > On January 15, 2021 5:16:40 PM GMT+01:00, Qing Zhao = > wrote: >>=20 >>=20 >>> On Jan 15, 2021, at 2:11 AM, Richard Biener >> wrote: >>>=20 >>>=20 >>>=20 >>> On Thu, 14 Jan 2021, Qing Zhao wrote: >>>=20 >>>> Hi,=20 >>>> More data on code size and compilation time with CPU2017: >>>> ********Compilation time data: the numbers are the slowdown >> against the >>>> default =E2=80=9Cno=E2=80=9D: >>>> benchmarks A/no D/no >>>>=20 >>>> 500.perlbench_r 5.19% 1.95% >>>> 502.gcc_r 0.46% -0.23% >>>> 505.mcf_r 0.00% 0.00% >>>> 520.omnetpp_r 0.85% 0.00% >>>> 523.xalancbmk_r 0.79% -0.40% >>>> 525.x264_r -4.48% 0.00% >>>> 531.deepsjeng_r 16.67% 16.67% >>>> 541.leela_r 0.00% 0.00% >>>> 557.xz_r 0.00% 0.00% >>>>=20 >>>> 507.cactuBSSN_r 1.16% 0.58% >>>> 508.namd_r 9.62% 8.65% >>>> 510.parest_r 0.48% 1.19% >>>> 511.povray_r 3.70% 3.70% >>>> 519.lbm_r 0.00% 0.00% >>>> 521.wrf_r 0.05% 0.02% >>>> 526.blender_r 0.33% 1.32% >>>> 527.cam4_r -0.93% -0.93% >>>> 538.imagick_r 1.32% 3.95% >>>> 544.nab_r 0.00% 0.00% >>>> =46rom the above data, looks like that the compilation time impact >>>> from implementation A and D are almost the same. >>>> *******code size data: the numbers are the code size increase >> against the >>>> default =E2=80=9Cno=E2=80=9D: >>>> benchmarks A/no D/no >>>>=20 >>>> 500.perlbench_r 2.84% 0.34% >>>> 502.gcc_r 2.59% 0.35% >>>> 505.mcf_r 3.55% 0.39% >>>> 520.omnetpp_r 0.54% 0.03% >>>> 523.xalancbmk_r 0.36% 0.39% >>>> 525.x264_r 1.39% 0.13% >>>> 531.deepsjeng_r 2.15% -1.12% >>>> 541.leela_r 0.50% -0.20% >>>> 557.xz_r 0.31% 0.13% >>>>=20 >>>> 507.cactuBSSN_r 5.00% -0.01% >>>> 508.namd_r 3.64% -0.07% >>>> 510.parest_r 1.12% 0.33% >>>> 511.povray_r 4.18% 1.16% >>>> 519.lbm_r 8.83% 6.44% >>>> 521.wrf_r 0.08% 0.02% >>>> 526.blender_r 1.63% 0.45% >>>> 527.cam4_r 0.16% 0.06% >>>> 538.imagick_r 3.18% -0.80% >>>> 544.nab_r 5.76% -1.11% >>>> Avg 2.52% 0.36% >>>> =46rom the above data, the implementation D is always better than = A, >> it=E2=80=99s a >>>> surprising to me, not sure what=E2=80=99s the reason for this. >>>=20 >>> D probably inhibits most interesting loop transforms (check SPEC FP >>> performance). >>=20 >> The call to .DEFERRED_INIT is marked as ECF_CONST: >>=20 >> /* A function to represent an artifical initialization to an >> uninitialized >> automatic variable. The first argument is the variable itself, the >> second argument is the initialization type. */ >> DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, >> NULL) >>=20 >> So, I assume that such const call should minimize the impact to loop >> optimizations. But yes, it will still inhibit some of the loop >> transformations. >>=20 >>> It will also most definitely disallow SRA which, when >>> an aggregate is not completely elided, tends to grow code. >>=20 >> Make sense to me.=20 >>=20 >> The run-time performance data for D and A are actually very similar = as >> I posted in the previous email (I listed it here for convenience) >>=20 >> Run-time performance overhead with A and D: >>=20 >> benchmarks A / no D /no >>=20 >> 500.perlbench_r 1.25% 1.25% >> 502.gcc_r 0.68% 1.80% >> 505.mcf_r 0.68% 0.14% >> 520.omnetpp_r 4.83% 4.68% >> 523.xalancbmk_r 0.18% 1.96% >> 525.x264_r 1.55% 2.07% >> 531.deepsjeng_ 11.57% 11.85% >> 541.leela_r 0.64% 0.80% >> 557.xz_ -0.41% -0.41% >>=20 >> 507.cactuBSSN_r 0.44% 0.44% >> 508.namd_r 0.34% 0.34% >> 510.parest_r 0.17% 0.25% >> 511.povray_r 56.57% 57.27% >> 519.lbm_r 0.00% 0.00% >> 521.wrf_r -0.28% -0.37% >> 526.blender_r 16.96% 17.71% >> 527.cam4_r 0.70% 0.53% >> 538.imagick_r 2.40% 2.40% >> 544.nab_r 0.00% -0.65% >>=20 >> avg 5.17% 5.37% >>=20 >> Especially for the SPEC FP benchmarks, I didn=E2=80=99t see too much >> performance difference between A and D.=20 >> I guess that the RTL optimizations might be enough to get rid of most >> of the overhead introduced by the additional initialization.=20 >>=20 >>>=20 >>>> ********stack usage data, I added -fstack-usage to the compilation >> line when >>>> compiling CPU2017 benchmarks. And all the *.su files were generated >> for each >>>> of the modules. >>>> Since there a lot of such files, and the stack size information are >> embedded >>>> in each of the files. I just picked up one benchmark 511.povray to >>>> check. Which is the one that=20 >>>> has the most runtime overhead when adding initialization (both A = and >> D).=20 >>>> I identified all the *.su files that are different between A and D >> and do a >>>> diff on those *.su files, and looks like that the stack size is = much >> higher >>>> with D than that with A, for example: >>>> $ diff build_base_auto_init.D.0000/bbox.su >>>> build_base_auto_init.A.0000/bbox.su5c5 >>>> < bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**, >>>> pov::BBOX_TREE**&, long int*, long int, long int) 160 static >>>> --- >>>>> bbox.cpp:1782:12:int pov::sort_and_split(pov::BBOX_TREE**, >>>> pov::BBOX_TREE**&, long int*, long int, long int) 96 static >>>> $ diff build_base_auto_init.D.0000/image.su >>>> build_base_auto_init.A.0000/image.su >>>> 9c9 >>>> < image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, >> double*) 624 >>>> static >>>> --- >>>>> image.cpp:240:6:void pov::bump_map(double*, pov::TNORMAL*, >> double*) 272 >>>> static >>>> =E2=80=A6. >>>> Looks like that implementation D has more stack size impact than A.=20= >>>> Do you have any insight on what the reason for this? >>>=20 >>> D will keep all initialized aggregates as aggregates and live which >>> means stack will be allocated for it. With A the usual = optimizations >>> to reduce stack usage can be applied. >>=20 >> I checked the routine =E2=80=9Cpoverties::bump_map=E2=80=9D in = 511.povray_r since it >> has a lot stack increase=20 >> due to implementation D, by examine the IR immediate before RTL >> expansion phase. =20 >> (image.cpp.244t.optimized), I found that we have the following >> additional statements for the array elements: >>=20 >> void pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, = double >> * normal) >> { >> =E2=80=A6 >> double p3[3]; >> double p2[3]; >> double p1[3]; >> float colour3[5]; >> float colour2[5]; >> float colour1[5]; >> =E2=80=A6 >> # DEBUG BEGIN_STMT >> colour1 =3D .DEFERRED_INIT (colour1, 2); >> colour2 =3D .DEFERRED_INIT (colour2, 2); >> colour3 =3D .DEFERRED_INIT (colour3, 2); >> # DEBUG BEGIN_STMT >> MEM [(double[3] *)&p1] =3D p1$0_144(D); >> MEM [(double[3] *)&p1 + 8B] =3D p1$1_135(D); >> MEM [(double[3] *)&p1 + 16B] =3D p1$2_138(D); >> p1 =3D .DEFERRED_INIT (p1, 2); >> # DEBUG D#12 =3D> MEM [(double[3] *)&p1] >> # DEBUG p1$0 =3D> D#12 >> # DEBUG D#11 =3D> MEM [(double[3] *)&p1 + 8B] >> # DEBUG p1$1 =3D> D#11 >> # DEBUG D#10 =3D> MEM [(double[3] *)&p1 + 16B] >> # DEBUG p1$2 =3D> D#10 >> MEM [(double[3] *)&p2] =3D p2$0_109(D); >> MEM [(double[3] *)&p2 + 8B] =3D p2$1_111(D); >> MEM [(double[3] *)&p2 + 16B] =3D p2$2_254(D); >> p2 =3D .DEFERRED_INIT (p2, 2); >> # DEBUG D#9 =3D> MEM [(double[3] *)&p2] >> # DEBUG p2$0 =3D> D#9 >> # DEBUG D#8 =3D> MEM [(double[3] *)&p2 + 8B] >> # DEBUG p2$1 =3D> D#8 >> # DEBUG D#7 =3D> MEM [(double[3] *)&p2 + 16B] >> # DEBUG p2$2 =3D> D#7 >> MEM [(double[3] *)&p3] =3D p3$0_256(D); >> MEM [(double[3] *)&p3 + 8B] =3D p3$1_258(D); >> MEM [(double[3] *)&p3 + 16B] =3D p3$2_260(D); >> p3 =3D .DEFERRED_INIT (p3, 2); >> =E2=80=A6. >> } >>=20 >> I guess that the above =E2=80=9CMEM =E2=80=A6.. =3D =E2=80=A6=E2= =80=9D are the ones that make the >> differences. Which phase introduced them? >=20 > Looks like SRA. But you can just dump all and grep for the first = occurrence.=20 Yes, looks like that SRA is the one: image.cpp.035t.esra: MEM [(double[3] *)&p1] =3D p1$0_195(D); image.cpp.035t.esra: MEM [(double[3] *)&p1 + 8B] =3D = p1$1_182(D); image.cpp.035t.esra: MEM [(double[3] *)&p1 + 16B] =3D = p1$2_185(D); Qing >=20 >=20 >>>=20 >>>> Let me know if you have any comments and suggestions. >>>=20 >>> First of all I would check whether the prototype implementations >>> work as expected. >> I have done such check with small testing cases already, checking the >> IR generated with the implementation A or D, mainly >> Focus on *.c.006t.gimple. and *.c.*t.expand, all worked as expected.=20= >>=20 >> For the CPU2017, for example as the above, I also checked the IR for >> both A and D, looks like all worked as expected. >>=20 >> Thanks.=20 >>=20 >> Qing >>>=20 >>> Richard. >>>=20 >>>=20 >>>> thanks. >>>> Qing >>>> On Jan 13, 2021, at 1:39 AM, Richard Biener >>>> wrote: >>>>=20 >>>> On Tue, 12 Jan 2021, Qing Zhao wrote: >>>>=20 >>>> Hi,=20 >>>>=20 >>>> Just check in to see whether you have any comments >>>> and suggestions on this: >>>>=20 >>>> FYI, I have been continue with Approach D >>>> implementation since last week: >>>>=20 >>>> D. Adding calls to .DEFFERED_INIT during >>>> gimplification, expand the .DEFFERED_INIT during >>>> expand to >>>> real initialization. Adjusting uninitialized pass >>>> with the new refs with =E2=80=9C.DEFFERED_INIT=E2=80=9D. >>>>=20 >>>> For the remaining work of Approach D: >>>>=20 >>>> ** complete the implementation of >>>> -ftrivial-auto-var-init=3Dpattern; >>>> ** complete the implementation of uninitialized >>>> warnings maintenance work for D.=20 >>>>=20 >>>> I have completed the uninitialized warnings >>>> maintenance work for D. >>>> And finished partial of the >>>> -ftrivial-auto-var-init=3Dpattern implementation.=20 >>>>=20 >>>> The following are remaining work of Approach D: >>>>=20 >>>> ** -ftrivial-auto-var-init=3Dpattern for VLA; >>>> **add a new attribute for variable: >>>> __attribute((uninitialized) >>>> the marked variable is uninitialized intentionaly >>>> for performance purpose. >>>> ** adding complete testing cases; >>>>=20 >>>> Please let me know if you have any objection on my >>>> current decision on implementing approach D.=20 >>>>=20 >>>> Did you do any analysis on how stack usage and code size are >>>> changed=20 >>>> with approach D? How does compile-time behave (we could gobble >>>> up >>>> lots of .DEFERRED_INIT calls I guess)? >>>>=20 >>>> Richard. >>>>=20 >>>> Thanks a lot for your help. >>>>=20 >>>> Qing >>>>=20 >>>> On Jan 5, 2021, at 1:05 PM, Qing Zhao >>>> via Gcc-patches >>>> wrote: >>>>=20 >>>> Hi, >>>>=20 >>>> This is an update for our previous >>>> discussion.=20 >>>>=20 >>>> 1. I implemented the following two >>>> different implementations in the latest >>>> upstream gcc: >>>>=20 >>>> A. Adding real initialization during >>>> gimplification, not maintain the >>>> uninitialized warnings. >>>>=20 >>>> D. Adding calls to .DEFFERED_INIT >>>> during gimplification, expand the >>>> .DEFFERED_INIT during expand to >>>> real initialization. Adjusting >>>> uninitialized pass with the new refs >>>> with =E2=80=9C.DEFFERED_INIT=E2=80=9D. >>>>=20 >>>> Note, in this initial implementation, >>>> ** I ONLY implement >>>> -ftrivial-auto-var-init=3Dzero, the >>>> implementation of >>>> -ftrivial-auto-var-init=3Dpattern=20 >>>> is not done yet. Therefore, the >>>> performance data is only about >>>> -ftrivial-auto-var-init=3Dzero.=20 >>>>=20 >>>> ** I added an temporary option >>>> -fauto-var-init-approach=3DA|B|C|D to >>>> choose implementation A or D for=20 >>>> runtime performance study. >>>> ** I didn=E2=80=99t finish the uninitialized >>>> warnings maintenance work for D. (That >>>> might take more time than I expected).=20 >>>>=20 >>>> 2. I collected runtime data for CPU2017 >>>> on a x86 machine with this new gcc for >>>> the following 3 cases: >>>>=20 >>>> no: default. (-g -O2 -march=3Dnative ) >>>> A: default + >>>> -ftrivial-auto-var-init=3Dzero >>>> -fauto-var-init-approach=3DA=20 >>>> D: default + >>>> -ftrivial-auto-var-init=3Dzero >>>> -fauto-var-init-approach=3DD=20 >>>>=20 >>>> And then compute the slowdown data for >>>> both A and D as following: >>>>=20 >>>> benchmarks A / no D /no >>>>=20 >>>> 500.perlbench_r 1.25% 1.25% >>>> 502.gcc_r 0.68% 1.80% >>>> 505.mcf_r 0.68% 0.14% >>>> 520.omnetpp_r 4.83% 4.68% >>>> 523.xalancbmk_r 0.18% 1.96% >>>> 525.x264_r 1.55% 2.07% >>>> 531.deepsjeng_ 11.57% 11.85% >>>> 541.leela_r 0.64% 0.80% >>>> 557.xz_ -0.41% -0.41% >>>>=20 >>>> 507.cactuBSSN_r 0.44% 0.44% >>>> 508.namd_r 0.34% 0.34% >>>> 510.parest_r 0.17% 0.25% >>>> 511.povray_r 56.57% 57.27% >>>> 519.lbm_r 0.00% 0.00% >>>> 521.wrf_r -0.28% -0.37% >>>> 526.blender_r 16.96% 17.71% >>>> 527.cam4_r 0.70% 0.53% >>>> 538.imagick_r 2.40% 2.40% >>>> 544.nab_r 0.00% -0.65% >>>>=20 >>>> avg 5.17% 5.37% >>>>=20 >>>> =46rom the above data, we can see that in >>>> general, the runtime performance >>>> slowdown for=20 >>>> implementation A and D are similar for >>>> individual benchmarks. >>>>=20 >>>> There are several benchmarks that have >>>> significant slowdown with the new added >>>> initialization for both >>>> A and D, for example, 511.povray_r, >>>> 526.blender_, and 531.deepsjeng_r, I >>>> will try to study a little bit >>>> more on what kind of new initializations >>>> introduced such slowdown.=20 >>>>=20 >>>> =46rom the current study so far, I think >>>> that approach D should be good enough >>>> for our final implementation.=20 >>>> So, I will try to finish approach D with >>>> the following remaining work >>>>=20 >>>> ** complete the implementation of >>>> -ftrivial-auto-var-init=3Dpattern; >>>> ** complete the implementation of >>>> uninitialized warnings maintenance work >>>> for D.=20 >>>>=20 >>>> Let me know if you have any comments and >>>> suggestions on my current and future >>>> work. >>>>=20 >>>> Thanks a lot for your help. >>>>=20 >>>> Qing >>>>=20 >>>> On Dec 9, 2020, at 10:18 AM, >>>> Qing Zhao via Gcc-patches >>>> >>>> wrote: >>>>=20 >>>> The following are the >>>> approaches I will implement >>>> and compare: >>>>=20 >>>> Our final goal is to keep >>>> the uninitialized warning >>>> and minimize the run-time >>>> performance cost. >>>>=20 >>>> A. Adding real >>>> initialization during >>>> gimplification, not maintain >>>> the uninitialized warnings. >>>> B. Adding real >>>> initialization during >>>> gimplification, marking them >>>> with =E2=80=9Cartificial_init=E2=80=9D.=20 >>>> Adjusting uninitialized >>>> pass, maintaining the >>>> annotation, making sure the >>>> real init not >>>> Deleted from the fake >>>> init.=20 >>>> C. Marking the DECL for an >>>> uninitialized auto variable >>>> as =E2=80=9Cno_explicit_init=E2=80=9D during >>>> gimplification, >>>> maintain this >>>> =E2=80=9Cno_explicit_init=E2=80=9D bit till >>>> after >>>> pass_late_warn_uninitialized, >>>> or till pass_expand,=20 >>>> add real initialization >>>> for all DECLs that are >>>> marked with >>>> =E2=80=9Cno_explicit_init=E2=80=9D. >>>> D. Adding .DEFFERED_INIT >>>> during gimplification, >>>> expand the .DEFFERED_INIT >>>> during expand to >>>> real initialization. >>>> Adjusting uninitialized pass >>>> with the new refs with >>>> =E2=80=9C.DEFFERED_INIT=E2=80=9D. >>>>=20 >>>> In the above, approach A >>>> will be the one that have >>>> the minimum run-time cost, >>>> will be the base for the >>>> performance >>>> comparison.=20 >>>>=20 >>>> I will implement approach D >>>> then, this one is expected >>>> to have the most run-time >>>> overhead among the above >>>> list, but >>>> Implementation should be the >>>> cleanest among B, C, D. >>>> Let=E2=80=99s see how much more >>>> performance overhead this >>>> approach >>>> will be. If the data is >>>> good, maybe we can avoid the >>>> effort to implement B, and >>>> C.=20 >>>>=20 >>>> If the performance of D is >>>> not good, I will implement B >>>> or C at that time. >>>>=20 >>>> Let me know if you have any >>>> comment or suggestions. >>>>=20 >>>> Thanks. >>>>=20 >>>> Qing >>>>=20 >>>> --=20 >>>> Richard Biener >>>> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 >>>> Nuernberg, >>>> Germany; GF: Felix Imend=C3=B6rffer; HRB 36809 (AG Nuernberg)