From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by sourceware.org (Postfix) with ESMTPS id 5771F3858C31 for ; Wed, 10 May 2023 06:42:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5771F3858C31 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=huawei.com Received: from lhrpeml100003.china.huawei.com (unknown [172.18.147.206]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4QGQQc0kwJz67Qtq; Wed, 10 May 2023 14:40:56 +0800 (CST) Received: from lhrpeml500004.china.huawei.com (7.191.163.9) by lhrpeml100003.china.huawei.com (7.191.160.210) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Wed, 10 May 2023 07:42:34 +0100 Received: from lhrpeml500004.china.huawei.com ([7.191.163.9]) by lhrpeml500004.china.huawei.com ([7.191.163.9]) with mapi id 15.01.2507.023; Wed, 10 May 2023 07:42:34 +0100 From: Benjamin Minguez To: Kyrylo Tkachov , "gcc-help@gcc.gnu.org" Subject: RE: Condition execution optimization with gcc 7.5 Thread-Topic: Condition execution optimization with gcc 7.5 Thread-Index: Adl/TMBSKMw/jWBlR02Ebyaz76pGRwDDZEEgACudNSA= Date: Wed, 10 May 2023 06:42:34 +0000 Message-ID: References: <5847ae4810754a6dbff4cf212a83eb8a@huawei.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.206.150.211] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,KAM_NUMSUBJECT,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, Thank for the answer. I had a look at the wrong function definition, gcc-7.5.0/gcc/target.def:=20 DEFHOOK (have_conditional_execution, "This target hook returns true if the target supports conditional executi= on.\n\ This target hook is required only when the target has several different\n\ modes and they have different conditional execution capability, such as AR= M.", bool, (void), default_have_conditional_execution) and find this one, gcc-7.5.0/gcc/targhooks.c: bool default_have_conditional_execution (void) { return HAVE_conditional_execution; } Finally, the macro HAVE_conditional_execution is defined here: build-gcc/gc= c/insn-config.h,=20 I will investigate the -march or -mcpu option. Again, thanks a lot, Benjamin Minguez -----Original Message----- From: Kyrylo Tkachov =20 Sent: Tuesday, May 9, 2023 11:50 AM To: Benjamin Minguez ; gcc-help@gcc.gnu.org Subject: RE: Condition execution optimization with gcc 7.5 Hi Benjamin, > -----Original Message----- > From: Gcc-help > On Behalf Of Benjamin Minguez via Gcc-help > Sent: Tuesday, May 9, 2023 8:54 AM > To: gcc-help@gcc.gnu.org > Subject: Condition execution optimization with gcc 7.5 >=20 > Hello everyone, >=20 > I'm trying to optimize an application that contains a lot of branches.=20 > I'm targeting armv8 processors and I'm using GCC 7.5.0 for compatibility = reason. Of course GCC 7.5 is quite old now but if you're forced to use it... > As the original application is similar to NGINX, I investigated on=20 > NGINX. I'm focusing on the HTTP header parsing. Basically, the=20 > algorithm parse byte per byte and based on the value stores some variable= s. > Here is an example, /src/http/ngx_http_parse.c: ngx_http_parse_header_lin= e > if (c) { > hash =3D ngx_hash(0, c); > r->lowcase_header[0] =3D c; > i =3D 1; > break; > } >=20 > if (ch =3D=3D '_') { > if (allow_underscores) { > hash =3D ngx_hash(0, ch); > r->lowcase_header[0] =3D ch; > i =3D 1; >=20 > } else { > r->invalid_header =3D 1; > } >=20 > break; > } > Also, most of branches are not predictable because it compares against=20 > data coming from the network. > From these observations, I looked at the conditional execution=20 > optimization step in GCC and I found this function that should do the wor= k: > cond_exec_find_if_block. And how to customize the decision to use=20 > conditional instructions: ... This relates to the arm port i.e. the 32-bit target in Armv8-a, is that= what you're targeting? AArch64 has had more tuning work put into it over the years so may do bette= r performance-wise if your processor and environment supports it. If you're indeed looking at arm... > #define MAX_CONDITIONAL_EXECUTE=20 > arm_max_conditional_execute () > int > arm_max_conditional_execute (void) > { > return max_insns_skipped; > } > static int max_insns_skipped =3D 5; >=20 > I tried to compile NGNIX in -O2 (that should enable if-conversion2)=20 > but I did not noticed any change in the code. I enable GCC debug (-da)=20 > and also add some debug in this function and I figure out that=20 > targetm.have_conditional_execution is set to false. >=20 > First, do you how to switch this variable to true. I guess it is an=20 > option during the configuration step of GCC. It's definition on that branch is: /* Only thumb1 can't support conditional execution, so return true if the target is not thumb1. */ static bool arm_have_conditional_execution (void) { return !TARGET_THUMB1; } So it looks like you're maybe not setting the right -march or -mcpu option = to enable the full armv8-a features? Thanks, Kyrill > Then, I know that the decision to use conditional execution is based=20 > on the extra cost added to compute both branches compare to the cost of a= branch. > In this specific case, branches are miss predicted and the cost is, indee= d, high. > Do you think that increasing the max_insns_skipped will be enough to=20 > help GCC to use conditional execution? >=20 > Thank you in advance for your answers. >=20 > Best, > Benjamin Minguez