public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Condition execution optimization with gcc 7.5
@ 2023-05-09  7:54 Benjamin Minguez
  2023-05-09  9:49 ` Kyrylo Tkachov
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Minguez @ 2023-05-09  7:54 UTC (permalink / raw)
  To: gcc-help

[-- Attachment #1: Type: text/plain, Size: 2420 bytes --]

Hello everyone,

I'm trying to optimize an application that contains a lot of branches. I'm targeting armv8 processors and I'm using GCC 7.5.0 for compatibility reason.
As the original application is similar to NGINX, I investigated on NGINX. I'm focusing on the HTTP header parsing. Basically, the algorithm parse byte per byte and based on the value stores some variables.
Here is an example, /src/http/ngx_http_parse.c: ngx_http_parse_header_line
                if (c) {
                    hash = ngx_hash(0, c);
                    r->lowcase_header[0] = c;
                    i = 1;
                    break;
                }

                if (ch == '_') {
                    if (allow_underscores) {
                        hash = ngx_hash(0, ch);
                        r->lowcase_header[0] = ch;
                        i = 1;

                    } else {
                        r->invalid_header = 1;
                    }

                    break;
                }
Also, most of branches are not predictable because it compares against data coming from the network.
From these observations, I looked at the conditional execution optimization step in GCC and I found this function that should do the work: cond_exec_find_if_block. And how to customize the decision to use conditional instructions:
                #define MAX_CONDITIONAL_EXECUTE arm_max_conditional_execute ()
                int
                arm_max_conditional_execute (void)
                {
                  return max_insns_skipped;
                }
                static int max_insns_skipped = 5;

I tried to compile NGNIX in -O2 (that should enable if-conversion2) but I did not noticed any change in the code. I enable GCC debug (-da) and also add some debug in this function and I figure out that targetm.have_conditional_execution is set to false.

First, do you how to switch this variable to true. I guess it is an option during the configuration step of GCC.
Then, I know  that the decision to use conditional execution is based on the extra cost added to compute both branches compare to the cost of a branch. In this specific case, branches are miss predicted and the cost is, indeed, high. Do you think that increasing the max_insns_skipped will be enough to help GCC to use conditional execution?

Thank you in advance for your answers.

Best,
Benjamin Minguez

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Condition execution optimization with gcc 7.5
  2023-05-09  7:54 Condition execution optimization with gcc 7.5 Benjamin Minguez
@ 2023-05-09  9:49 ` Kyrylo Tkachov
  2023-05-10  6:42   ` Benjamin Minguez
  0 siblings, 1 reply; 8+ messages in thread
From: Kyrylo Tkachov @ 2023-05-09  9:49 UTC (permalink / raw)
  To: Benjamin Minguez, gcc-help

Hi Benjamin,

> -----Original Message-----
> From: Gcc-help <gcc-help-bounces+kyrylo.tkachov=arm.com@gcc.gnu.org>
> On Behalf Of Benjamin Minguez via Gcc-help
> Sent: Tuesday, May 9, 2023 8:54 AM
> To: gcc-help@gcc.gnu.org
> Subject: Condition execution optimization with gcc 7.5
> 
> Hello everyone,
> 
> I'm trying to optimize an application that contains a lot of branches. I'm
> targeting armv8 processors and I'm using GCC 7.5.0 for compatibility reason.

Of course GCC 7.5 is quite old now but if you're forced to use it...

> As the original application is similar to NGINX, I investigated on NGINX. I'm
> focusing on the HTTP header parsing. Basically, the algorithm parse byte per
> byte and based on the value stores some variables.
> Here is an example, /src/http/ngx_http_parse.c: ngx_http_parse_header_line
>                 if (c) {
>                     hash = ngx_hash(0, c);
>                     r->lowcase_header[0] = c;
>                     i = 1;
>                     break;
>                 }
> 
>                 if (ch == '_') {
>                     if (allow_underscores) {
>                         hash = ngx_hash(0, ch);
>                         r->lowcase_header[0] = ch;
>                         i = 1;
> 
>                     } else {
>                         r->invalid_header = 1;
>                     }
> 
>                     break;
>                 }
> Also, most of branches are not predictable because it compares against data
> coming from the network.
> From these observations, I looked at the conditional execution optimization
> step in GCC and I found this function that should do the work:
> cond_exec_find_if_block. And how to customize the decision to use
> conditional instructions:

... This relates to the arm port i.e. the 32-bit target in Armv8-a, is that what you're targeting?
AArch64 has had more tuning work put into it over the years so may do better performance-wise if your processor and environment supports it.
If you're indeed looking at arm...

>                 #define MAX_CONDITIONAL_EXECUTE
> arm_max_conditional_execute ()
>                 int
>                 arm_max_conditional_execute (void)
>                 {
>                   return max_insns_skipped;
>                 }
>                 static int max_insns_skipped = 5;
> 
> I tried to compile NGNIX in -O2 (that should enable if-conversion2) but I did
> not noticed any change in the code. I enable GCC debug (-da) and also add
> some debug in this function and I figure out that
> targetm.have_conditional_execution is set to false.
> 
> First, do you how to switch this variable to true. I guess it is an option during
> the configuration step of GCC.

It's definition on that branch is:
/* Only thumb1 can't support conditional execution, so return true if
   the target is not thumb1.  */
static bool
arm_have_conditional_execution (void)
{
  return !TARGET_THUMB1;
}

So it looks like you're maybe not setting the right -march or -mcpu option to enable the full armv8-a features?

Thanks,
Kyrill

> Then, I know  that the decision to use conditional execution is based on the
> extra cost added to compute both branches compare to the cost of a branch.
> In this specific case, branches are miss predicted and the cost is, indeed, high.
> Do you think that increasing the max_insns_skipped will be enough to help
> GCC to use conditional execution?
> 
> Thank you in advance for your answers.
> 
> Best,
> Benjamin Minguez

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Condition execution optimization with gcc 7.5
  2023-05-09  9:49 ` Kyrylo Tkachov
@ 2023-05-10  6:42   ` Benjamin Minguez
  2023-05-17  8:17     ` Benjamin Minguez
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Minguez @ 2023-05-10  6:42 UTC (permalink / raw)
  To: Kyrylo Tkachov, gcc-help

Hi,

Thank for the answer.

I had a look at the wrong function definition, gcc-7.5.0/gcc/target.def: 
	DEFHOOK
	(have_conditional_execution,
	 "This target hook returns true if the target supports conditional execution.\n\
	This target hook is required only when the target has several different\n\
	modes and they have different conditional execution capability, such as ARM.",
	 bool, (void),
	 default_have_conditional_execution)
and find this one,  gcc-7.5.0/gcc/targhooks.c:
	bool
	default_have_conditional_execution (void)
	{
	  return HAVE_conditional_execution;
	}
Finally, the macro HAVE_conditional_execution is defined here: build-gcc/gcc/insn-config.h, 

I will investigate the -march or -mcpu option.

Again, thanks a lot,

Benjamin Minguez

-----Original Message-----
From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> 
Sent: Tuesday, May 9, 2023 11:50 AM
To: Benjamin Minguez <benjamin.minguez@huawei.com>; gcc-help@gcc.gnu.org
Subject: RE: Condition execution optimization with gcc 7.5

Hi Benjamin,

> -----Original Message-----
> From: Gcc-help <gcc-help-bounces+kyrylo.tkachov=arm.com@gcc.gnu.org>
> On Behalf Of Benjamin Minguez via Gcc-help
> Sent: Tuesday, May 9, 2023 8:54 AM
> To: gcc-help@gcc.gnu.org
> Subject: Condition execution optimization with gcc 7.5
> 
> Hello everyone,
> 
> I'm trying to optimize an application that contains a lot of branches. 
> I'm targeting armv8 processors and I'm using GCC 7.5.0 for compatibility reason.

Of course GCC 7.5 is quite old now but if you're forced to use it...

> As the original application is similar to NGINX, I investigated on 
> NGINX. I'm focusing on the HTTP header parsing. Basically, the 
> algorithm parse byte per byte and based on the value stores some variables.
> Here is an example, /src/http/ngx_http_parse.c: ngx_http_parse_header_line
>                 if (c) {
>                     hash = ngx_hash(0, c);
>                     r->lowcase_header[0] = c;
>                     i = 1;
>                     break;
>                 }
> 
>                 if (ch == '_') {
>                     if (allow_underscores) {
>                         hash = ngx_hash(0, ch);
>                         r->lowcase_header[0] = ch;
>                         i = 1;
> 
>                     } else {
>                         r->invalid_header = 1;
>                     }
> 
>                     break;
>                 }
> Also, most of branches are not predictable because it compares against 
> data coming from the network.
> From these observations, I looked at the conditional execution 
> optimization step in GCC and I found this function that should do the work:
> cond_exec_find_if_block. And how to customize the decision to use 
> conditional instructions:

... This relates to the arm port i.e. the 32-bit target in Armv8-a, is that what you're targeting?
AArch64 has had more tuning work put into it over the years so may do better performance-wise if your processor and environment supports it.
If you're indeed looking at arm...

>                 #define MAX_CONDITIONAL_EXECUTE 
> arm_max_conditional_execute ()
>                 int
>                 arm_max_conditional_execute (void)
>                 {
>                   return max_insns_skipped;
>                 }
>                 static int max_insns_skipped = 5;
> 
> I tried to compile NGNIX in -O2 (that should enable if-conversion2) 
> but I did not noticed any change in the code. I enable GCC debug (-da) 
> and also add some debug in this function and I figure out that 
> targetm.have_conditional_execution is set to false.
> 
> First, do you how to switch this variable to true. I guess it is an 
> option during the configuration step of GCC.

It's definition on that branch is:
/* Only thumb1 can't support conditional execution, so return true if
   the target is not thumb1.  */
static bool
arm_have_conditional_execution (void)
{
  return !TARGET_THUMB1;
}

So it looks like you're maybe not setting the right -march or -mcpu option to enable the full armv8-a features?

Thanks,
Kyrill

> Then, I know  that the decision to use conditional execution is based 
> on the extra cost added to compute both branches compare to the cost of a branch.
> In this specific case, branches are miss predicted and the cost is, indeed, high.
> Do you think that increasing the max_insns_skipped will be enough to 
> help GCC to use conditional execution?
> 
> Thank you in advance for your answers.
> 
> Best,
> Benjamin Minguez

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Condition execution optimization with gcc 7.5
  2023-05-10  6:42   ` Benjamin Minguez
@ 2023-05-17  8:17     ` Benjamin Minguez
  2023-05-18 11:02       ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Minguez @ 2023-05-17  8:17 UTC (permalink / raw)
  To: Kyrylo Tkachov, gcc-help

Hello,

I did add -march=armv8-a (and the others armv8.*-a) to GCC command line, but it looks like the conditional execution optimization, cond_exec_find_if_block function, is never called. I enabled all gcc dumps (-da option) and this function debug message are never printed.
In parallel, I also try  with different version of GCC: 9.5.0 and 11.3.0, and again the I had the same results.

 Do you have any idea why the this optimization step is not called?

Thank you in advance for your help.

Best,
Benjamin Minguez

-----Original Message-----
From: Benjamin Minguez 
Sent: Wednesday, May 10, 2023 8:43 AM
To: 'Kyrylo Tkachov' <Kyrylo.Tkachov@arm.com>; gcc-help@gcc.gnu.org
Subject: RE: Condition execution optimization with gcc 7.5

Hi,

Thank for the answer.

I had a look at the wrong function definition, gcc-7.5.0/gcc/target.def: 
	DEFHOOK
	(have_conditional_execution,
	 "This target hook returns true if the target supports conditional execution.\n\
	This target hook is required only when the target has several different\n\
	modes and they have different conditional execution capability, such as ARM.",
	 bool, (void),
	 default_have_conditional_execution)
and find this one,  gcc-7.5.0/gcc/targhooks.c:
	bool
	default_have_conditional_execution (void)
	{
	  return HAVE_conditional_execution;
	}
Finally, the macro HAVE_conditional_execution is defined here: build-gcc/gcc/insn-config.h, 

I will investigate the -march or -mcpu option.

Again, thanks a lot,

Benjamin Minguez

-----Original Message-----
From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
Sent: Tuesday, May 9, 2023 11:50 AM
To: Benjamin Minguez <benjamin.minguez@huawei.com>; gcc-help@gcc.gnu.org
Subject: RE: Condition execution optimization with gcc 7.5

Hi Benjamin,

> -----Original Message-----
> From: Gcc-help <gcc-help-bounces+kyrylo.tkachov=arm.com@gcc.gnu.org>
> On Behalf Of Benjamin Minguez via Gcc-help
> Sent: Tuesday, May 9, 2023 8:54 AM
> To: gcc-help@gcc.gnu.org
> Subject: Condition execution optimization with gcc 7.5
> 
> Hello everyone,
> 
> I'm trying to optimize an application that contains a lot of branches. 
> I'm targeting armv8 processors and I'm using GCC 7.5.0 for compatibility reason.

Of course GCC 7.5 is quite old now but if you're forced to use it...

> As the original application is similar to NGINX, I investigated on 
> NGINX. I'm focusing on the HTTP header parsing. Basically, the 
> algorithm parse byte per byte and based on the value stores some variables.
> Here is an example, /src/http/ngx_http_parse.c: ngx_http_parse_header_line
>                 if (c) {
>                     hash = ngx_hash(0, c);
>                     r->lowcase_header[0] = c;
>                     i = 1;
>                     break;
>                 }
> 
>                 if (ch == '_') {
>                     if (allow_underscores) {
>                         hash = ngx_hash(0, ch);
>                         r->lowcase_header[0] = ch;
>                         i = 1;
> 
>                     } else {
>                         r->invalid_header = 1;
>                     }
> 
>                     break;
>                 }
> Also, most of branches are not predictable because it compares against 
> data coming from the network.
> From these observations, I looked at the conditional execution 
> optimization step in GCC and I found this function that should do the work:
> cond_exec_find_if_block. And how to customize the decision to use 
> conditional instructions:

... This relates to the arm port i.e. the 32-bit target in Armv8-a, is that what you're targeting?
AArch64 has had more tuning work put into it over the years so may do better performance-wise if your processor and environment supports it.
If you're indeed looking at arm...

>                 #define MAX_CONDITIONAL_EXECUTE 
> arm_max_conditional_execute ()
>                 int
>                 arm_max_conditional_execute (void)
>                 {
>                   return max_insns_skipped;
>                 }
>                 static int max_insns_skipped = 5;
> 
> I tried to compile NGNIX in -O2 (that should enable if-conversion2) 
> but I did not noticed any change in the code. I enable GCC debug (-da) 
> and also add some debug in this function and I figure out that 
> targetm.have_conditional_execution is set to false.
> 
> First, do you how to switch this variable to true. I guess it is an 
> option during the configuration step of GCC.

It's definition on that branch is:
/* Only thumb1 can't support conditional execution, so return true if
   the target is not thumb1.  */
static bool
arm_have_conditional_execution (void)
{
  return !TARGET_THUMB1;
}

So it looks like you're maybe not setting the right -march or -mcpu option to enable the full armv8-a features?

Thanks,
Kyrill

> Then, I know  that the decision to use conditional execution is based 
> on the extra cost added to compute both branches compare to the cost of a branch.
> In this specific case, branches are miss predicted and the cost is, indeed, high.
> Do you think that increasing the max_insns_skipped will be enough to 
> help GCC to use conditional execution?
> 
> Thank you in advance for your answers.
> 
> Best,
> Benjamin Minguez

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Condition execution optimization with gcc 7.5
  2023-05-17  8:17     ` Benjamin Minguez
@ 2023-05-18 11:02       ` Richard Earnshaw (lists)
  2023-05-22 15:43         ` Benjamin Minguez
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Earnshaw (lists) @ 2023-05-18 11:02 UTC (permalink / raw)
  To: Benjamin Minguez, Kyrylo Tkachov, gcc-help

On 17/05/2023 09:17, Benjamin Minguez via Gcc-help wrote:
> Hello,
> 
> I did add -march=armv8-a (and the others armv8.*-a) to GCC command line, but it looks like the conditional execution optimization, cond_exec_find_if_block function, is never called. I enabled all gcc dumps (-da option) and this function debug message are never printed.

Just to be certain, are you compiling for aarch32 (arm/thumb), or 
aarch64?  The latter does not support conditional execution, except via 
instructions such as CSEL.

[more comments lower down]

> In parallel, I also try  with different version of GCC: 9.5.0 and 11.3.0, and again the I had the same results.
> 
>   Do you have any idea why the this optimization step is not called?
> 
> Thank you in advance for your help.
> 
> Best,
> Benjamin Minguez
> 
> -----Original Message-----
> From: Benjamin Minguez
> Sent: Wednesday, May 10, 2023 8:43 AM
> To: 'Kyrylo Tkachov' <Kyrylo.Tkachov@arm.com>; gcc-help@gcc.gnu.org
> Subject: RE: Condition execution optimization with gcc 7.5
> 
> Hi,
> 
> Thank for the answer.
> 
> I had a look at the wrong function definition, gcc-7.5.0/gcc/target.def:
> 	DEFHOOK
> 	(have_conditional_execution,
> 	 "This target hook returns true if the target supports conditional execution.\n\
> 	This target hook is required only when the target has several different\n\
> 	modes and they have different conditional execution capability, such as ARM.",
> 	 bool, (void),
> 	 default_have_conditional_execution)
> and find this one,  gcc-7.5.0/gcc/targhooks.c:
> 	bool
> 	default_have_conditional_execution (void)
> 	{
> 	  return HAVE_conditional_execution;
> 	}
> Finally, the macro HAVE_conditional_execution is defined here: build-gcc/gcc/insn-config.h,
> 
> I will investigate the -march or -mcpu option.
> 
> Again, thanks a lot,
> 
> Benjamin Minguez
> 
> -----Original Message-----
> From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Sent: Tuesday, May 9, 2023 11:50 AM
> To: Benjamin Minguez <benjamin.minguez@huawei.com>; gcc-help@gcc.gnu.org
> Subject: RE: Condition execution optimization with gcc 7.5
> 
> Hi Benjamin,
> 
>> -----Original Message-----
>> From: Gcc-help <gcc-help-bounces+kyrylo.tkachov=arm.com@gcc.gnu.org>
>> On Behalf Of Benjamin Minguez via Gcc-help
>> Sent: Tuesday, May 9, 2023 8:54 AM
>> To: gcc-help@gcc.gnu.org
>> Subject: Condition execution optimization with gcc 7.5
>>
>> Hello everyone,
>>
>> I'm trying to optimize an application that contains a lot of branches.
>> I'm targeting armv8 processors and I'm using GCC 7.5.0 for compatibility reason.
> 
> Of course GCC 7.5 is quite old now but if you're forced to use it...
> 
>> As the original application is similar to NGINX, I investigated on
>> NGINX. I'm focusing on the HTTP header parsing. Basically, the
>> algorithm parse byte per byte and based on the value stores some variables.
>> Here is an example, /src/http/ngx_http_parse.c: ngx_http_parse_header_line
>>                  if (c) {
>>                      hash = ngx_hash(0, c);
>>                      r->lowcase_header[0] = c;
>>                      i = 1;
>>                      break;
>>                  }
>>
>>                  if (ch == '_') {
>>                      if (allow_underscores) {
>>                          hash = ngx_hash(0, ch);
>>                          r->lowcase_header[0] = ch;
>>                          i = 1;
>>
>>                      } else {
>>                          r->invalid_header = 1;
>>                      }
>>
>>                      break;
>>                  }

Your example code isn't complete enough to do a full analysis, but I 
doubt code like this would generate conditional execution anyway.  There 
are several reasons:

1) It's likely too long once machine instructions are generated
2) There are function calls (ngx_hash) in the body of the conditional 
blocks (calls cannot be conditionally executed); if they are inlined 
then see 1) above.
3) you have nested conditions (only the innermost block could be 
conditionally executed).
4) you wouldn't want to conditionally execute 'if (allow_underscores)' 
anyway as it's probably highly predictable as a branch.

R.

>> Also, most of branches are not predictable because it compares against
>> data coming from the network.
>>  From these observations, I looked at the conditional execution
>> optimization step in GCC and I found this function that should do the work:
>> cond_exec_find_if_block. And how to customize the decision to use
>> conditional instructions:
> 
> ... This relates to the arm port i.e. the 32-bit target in Armv8-a, is that what you're targeting?
> AArch64 has had more tuning work put into it over the years so may do better performance-wise if your processor and environment supports it.
> If you're indeed looking at arm...
> 
>>                  #define MAX_CONDITIONAL_EXECUTE
>> arm_max_conditional_execute ()
>>                  int
>>                  arm_max_conditional_execute (void)
>>                  {
>>                    return max_insns_skipped;
>>                  }
>>                  static int max_insns_skipped = 5;
>>
>> I tried to compile NGNIX in -O2 (that should enable if-conversion2)
>> but I did not noticed any change in the code. I enable GCC debug (-da)
>> and also add some debug in this function and I figure out that
>> targetm.have_conditional_execution is set to false.
>>
>> First, do you how to switch this variable to true. I guess it is an
>> option during the configuration step of GCC.
> 
> It's definition on that branch is:
> /* Only thumb1 can't support conditional execution, so return true if
>     the target is not thumb1.  */
> static bool
> arm_have_conditional_execution (void)
> {
>    return !TARGET_THUMB1;
> }
> 
> So it looks like you're maybe not setting the right -march or -mcpu option to enable the full armv8-a features?
> 
> Thanks,
> Kyrill
> 
>> Then, I know  that the decision to use conditional execution is based
>> on the extra cost added to compute both branches compare to the cost of a branch.
>> In this specific case, branches are miss predicted and the cost is, indeed, high.
>> Do you think that increasing the max_insns_skipped will be enough to
>> help GCC to use conditional execution?
>>
>> Thank you in advance for your answers.
>>
>> Best,
>> Benjamin Minguez

R.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Condition execution optimization with gcc 7.5
  2023-05-18 11:02       ` Richard Earnshaw (lists)
@ 2023-05-22 15:43         ` Benjamin Minguez
  2023-05-22 16:12           ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Minguez @ 2023-05-22 15:43 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Kyrylo Tkachov, gcc-help

Hello Richard,

I'm compiling for aarch64. Indeed, I was expecting conversion via conditional move or set.
I understand that code such as NGINX HTTP parser is suitable for such conversion. But I was expecting that, for example, this code can benefit of it (ngx_hash is an inline function and is a simple xor operation):
>>                  if (c) {
>>                      hash = ngx_hash(0, c);
>>                      r->lowcase_header[0] = c;
>>                      i = 1;
>>                      break;
>>                  }

Thank for your help and your answers.

Best,
Benjamin Minguez

-----Original Message-----
From: Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> 
Sent: Thursday, May 18, 2023 1:02 PM
To: Benjamin Minguez <benjamin.minguez@huawei.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc-help@gcc.gnu.org
Subject: Re: Condition execution optimization with gcc 7.5

On 17/05/2023 09:17, Benjamin Minguez via Gcc-help wrote:
> Hello,
> 
> I did add -march=armv8-a (and the others armv8.*-a) to GCC command line, but it looks like the conditional execution optimization, cond_exec_find_if_block function, is never called. I enabled all gcc dumps (-da option) and this function debug message are never printed.

Just to be certain, are you compiling for aarch32 (arm/thumb), or aarch64?  The latter does not support conditional execution, except via instructions such as CSEL.

[more comments lower down]

> In parallel, I also try  with different version of GCC: 9.5.0 and 11.3.0, and again the I had the same results.
> 
>   Do you have any idea why the this optimization step is not called?
> 
> Thank you in advance for your help.
> 
> Best,
> Benjamin Minguez
> 
> -----Original Message-----
> From: Benjamin Minguez
> Sent: Wednesday, May 10, 2023 8:43 AM
> To: 'Kyrylo Tkachov' <Kyrylo.Tkachov@arm.com>; gcc-help@gcc.gnu.org
> Subject: RE: Condition execution optimization with gcc 7.5
> 
> Hi,
> 
> Thank for the answer.
> 
> I had a look at the wrong function definition, gcc-7.5.0/gcc/target.def:
> 	DEFHOOK
> 	(have_conditional_execution,
> 	 "This target hook returns true if the target supports conditional execution.\n\
> 	This target hook is required only when the target has several different\n\
> 	modes and they have different conditional execution capability, such as ARM.",
> 	 bool, (void),
> 	 default_have_conditional_execution)
> and find this one,  gcc-7.5.0/gcc/targhooks.c:
> 	bool
> 	default_have_conditional_execution (void)
> 	{
> 	  return HAVE_conditional_execution;
> 	}
> Finally, the macro HAVE_conditional_execution is defined here: 
> build-gcc/gcc/insn-config.h,
> 
> I will investigate the -march or -mcpu option.
> 
> Again, thanks a lot,
> 
> Benjamin Minguez
> 
> -----Original Message-----
> From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Sent: Tuesday, May 9, 2023 11:50 AM
> To: Benjamin Minguez <benjamin.minguez@huawei.com>; 
> gcc-help@gcc.gnu.org
> Subject: RE: Condition execution optimization with gcc 7.5
> 
> Hi Benjamin,
> 
>> -----Original Message-----
>> From: Gcc-help <gcc-help-bounces+kyrylo.tkachov=arm.com@gcc.gnu.org>
>> On Behalf Of Benjamin Minguez via Gcc-help
>> Sent: Tuesday, May 9, 2023 8:54 AM
>> To: gcc-help@gcc.gnu.org
>> Subject: Condition execution optimization with gcc 7.5
>>
>> Hello everyone,
>>
>> I'm trying to optimize an application that contains a lot of branches.
>> I'm targeting armv8 processors and I'm using GCC 7.5.0 for compatibility reason.
> 
> Of course GCC 7.5 is quite old now but if you're forced to use it...
> 
>> As the original application is similar to NGINX, I investigated on 
>> NGINX. I'm focusing on the HTTP header parsing. Basically, the 
>> algorithm parse byte per byte and based on the value stores some variables.
>> Here is an example, /src/http/ngx_http_parse.c: ngx_http_parse_header_line
>>                  if (c) {
>>                      hash = ngx_hash(0, c);
>>                      r->lowcase_header[0] = c;
>>                      i = 1;
>>                      break;
>>                  }
>>
>>                  if (ch == '_') {
>>                      if (allow_underscores) {
>>                          hash = ngx_hash(0, ch);
>>                          r->lowcase_header[0] = ch;
>>                          i = 1;
>>
>>                      } else {
>>                          r->invalid_header = 1;
>>                      }
>>
>>                      break;
>>                  }

Your example code isn't complete enough to do a full analysis, but I doubt code like this would generate conditional execution anyway.  There are several reasons:

1) It's likely too long once machine instructions are generated
2) There are function calls (ngx_hash) in the body of the conditional blocks (calls cannot be conditionally executed); if they are inlined then see 1) above.
3) you have nested conditions (only the innermost block could be conditionally executed).
4) you wouldn't want to conditionally execute 'if (allow_underscores)' 
anyway as it's probably highly predictable as a branch.

R.

>> Also, most of branches are not predictable because it compares against
>> data coming from the network.
>>  From these observations, I looked at the conditional execution
>> optimization step in GCC and I found this function that should do the work:
>> cond_exec_find_if_block. And how to customize the decision to use
>> conditional instructions:
> 
> ... This relates to the arm port i.e. the 32-bit target in Armv8-a, is that what you're targeting?
> AArch64 has had more tuning work put into it over the years so may do better performance-wise if your processor and environment supports it.
> If you're indeed looking at arm...
> 
>>                  #define MAX_CONDITIONAL_EXECUTE
>> arm_max_conditional_execute ()
>>                  int
>>                  arm_max_conditional_execute (void)
>>                  {
>>                    return max_insns_skipped;
>>                  }
>>                  static int max_insns_skipped = 5;
>>
>> I tried to compile NGNIX in -O2 (that should enable if-conversion2)
>> but I did not noticed any change in the code. I enable GCC debug (-da)
>> and also add some debug in this function and I figure out that
>> targetm.have_conditional_execution is set to false.
>>
>> First, do you how to switch this variable to true. I guess it is an
>> option during the configuration step of GCC.
> 
> It's definition on that branch is:
> /* Only thumb1 can't support conditional execution, so return true if
>     the target is not thumb1.  */
> static bool
> arm_have_conditional_execution (void)
> {
>    return !TARGET_THUMB1;
> }
> 
> So it looks like you're maybe not setting the right -march or -mcpu option to enable the full armv8-a features?
> 
> Thanks,
> Kyrill
> 
>> Then, I know  that the decision to use conditional execution is based
>> on the extra cost added to compute both branches compare to the cost of a branch.
>> In this specific case, branches are miss predicted and the cost is, indeed, high.
>> Do you think that increasing the max_insns_skipped will be enough to
>> help GCC to use conditional execution?
>>
>> Thank you in advance for your answers.
>>
>> Best,
>> Benjamin Minguez

R.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Condition execution optimization with gcc 7.5
  2023-05-22 15:43         ` Benjamin Minguez
@ 2023-05-22 16:12           ` Richard Earnshaw (lists)
  2023-05-23  6:36             ` Benjamin Minguez
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Earnshaw (lists) @ 2023-05-22 16:12 UTC (permalink / raw)
  To: Benjamin Minguez, Kyrylo Tkachov, gcc-help

On aarch64 this code cannot use conditional select.  An operation such as
	if (c) {
	  ...
	  r->lowcase_header[0] = c;
	  ...
	}

would be a conditional store to memory and can only happen if the 
guarding condition is true.  It's not safe to convert this into, say

	cmp c, #0
	...
	ldr w1, [ptr]
	csel w1, w1, c, eq
	str w1, [ptr]

because the store would introduce a possible race with any other thread 
that might be writing to the same location.  The compiler would also 
have to prove that ptr always contained a valid address when 'c' was 
false as well, something that might not be possible given the 
information available.

The function arm_max_conditional_execute is only used for 32-bit arm 
targets.  It's not part of the aarch64 compiler.

R.

On 22/05/2023 16:43, Benjamin Minguez via Gcc-help wrote:
> Hello Richard,
> 
> I'm compiling for aarch64. Indeed, I was expecting conversion via conditional move or set.
> I understand that code such as NGINX HTTP parser is suitable for such conversion. But I was expecting that, for example, this code can benefit of it (ngx_hash is an inline function and is a simple xor operation):
>>>                   if (c) {
>>>                       hash = ngx_hash(0, c);
>>>                       r->lowcase_header[0] = c;
>>>                       i = 1;
>>>                       break;
>>>                   }
> 
> Thank for your help and your answers.
> 
> Best,
> Benjamin Minguez
> 
> -----Original Message-----
> From: Richard Earnshaw (lists) <Richard.Earnshaw@arm.com>
> Sent: Thursday, May 18, 2023 1:02 PM
> To: Benjamin Minguez <benjamin.minguez@huawei.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc-help@gcc.gnu.org
> Subject: Re: Condition execution optimization with gcc 7.5
> 
> On 17/05/2023 09:17, Benjamin Minguez via Gcc-help wrote:
>> Hello,
>>
>> I did add -march=armv8-a (and the others armv8.*-a) to GCC command line, but it looks like the conditional execution optimization, cond_exec_find_if_block function, is never called. I enabled all gcc dumps (-da option) and this function debug message are never printed.
> 
> Just to be certain, are you compiling for aarch32 (arm/thumb), or aarch64?  The latter does not support conditional execution, except via instructions such as CSEL.
> 
> [more comments lower down]
> 
>> In parallel, I also try  with different version of GCC: 9.5.0 and 11.3.0, and again the I had the same results.
>>
>>    Do you have any idea why the this optimization step is not called?
>>
>> Thank you in advance for your help.
>>
>> Best,
>> Benjamin Minguez
>>
>> -----Original Message-----
>> From: Benjamin Minguez
>> Sent: Wednesday, May 10, 2023 8:43 AM
>> To: 'Kyrylo Tkachov' <Kyrylo.Tkachov@arm.com>; gcc-help@gcc.gnu.org
>> Subject: RE: Condition execution optimization with gcc 7.5
>>
>> Hi,
>>
>> Thank for the answer.
>>
>> I had a look at the wrong function definition, gcc-7.5.0/gcc/target.def:
>> 	DEFHOOK
>> 	(have_conditional_execution,
>> 	 "This target hook returns true if the target supports conditional execution.\n\
>> 	This target hook is required only when the target has several different\n\
>> 	modes and they have different conditional execution capability, such as ARM.",
>> 	 bool, (void),
>> 	 default_have_conditional_execution)
>> and find this one,  gcc-7.5.0/gcc/targhooks.c:
>> 	bool
>> 	default_have_conditional_execution (void)
>> 	{
>> 	  return HAVE_conditional_execution;
>> 	}
>> Finally, the macro HAVE_conditional_execution is defined here:
>> build-gcc/gcc/insn-config.h,
>>
>> I will investigate the -march or -mcpu option.
>>
>> Again, thanks a lot,
>>
>> Benjamin Minguez
>>
>> -----Original Message-----
>> From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> Sent: Tuesday, May 9, 2023 11:50 AM
>> To: Benjamin Minguez <benjamin.minguez@huawei.com>;
>> gcc-help@gcc.gnu.org
>> Subject: RE: Condition execution optimization with gcc 7.5
>>
>> Hi Benjamin,
>>
>>> -----Original Message-----
>>> From: Gcc-help <gcc-help-bounces+kyrylo.tkachov=arm.com@gcc.gnu.org>
>>> On Behalf Of Benjamin Minguez via Gcc-help
>>> Sent: Tuesday, May 9, 2023 8:54 AM
>>> To: gcc-help@gcc.gnu.org
>>> Subject: Condition execution optimization with gcc 7.5
>>>
>>> Hello everyone,
>>>
>>> I'm trying to optimize an application that contains a lot of branches.
>>> I'm targeting armv8 processors and I'm using GCC 7.5.0 for compatibility reason.
>>
>> Of course GCC 7.5 is quite old now but if you're forced to use it...
>>
>>> As the original application is similar to NGINX, I investigated on
>>> NGINX. I'm focusing on the HTTP header parsing. Basically, the
>>> algorithm parse byte per byte and based on the value stores some variables.
>>> Here is an example, /src/http/ngx_http_parse.c: ngx_http_parse_header_line
>>>                   if (c) {
>>>                       hash = ngx_hash(0, c);
>>>                       r->lowcase_header[0] = c;
>>>                       i = 1;
>>>                       break;
>>>                   }
>>>
>>>                   if (ch == '_') {
>>>                       if (allow_underscores) {
>>>                           hash = ngx_hash(0, ch);
>>>                           r->lowcase_header[0] = ch;
>>>                           i = 1;
>>>
>>>                       } else {
>>>                           r->invalid_header = 1;
>>>                       }
>>>
>>>                       break;
>>>                   }
> 
> Your example code isn't complete enough to do a full analysis, but I doubt code like this would generate conditional execution anyway.  There are several reasons:
> 
> 1) It's likely too long once machine instructions are generated
> 2) There are function calls (ngx_hash) in the body of the conditional blocks (calls cannot be conditionally executed); if they are inlined then see 1) above.
> 3) you have nested conditions (only the innermost block could be conditionally executed).
> 4) you wouldn't want to conditionally execute 'if (allow_underscores)'
> anyway as it's probably highly predictable as a branch.
> 
> R.
> 
>>> Also, most of branches are not predictable because it compares against
>>> data coming from the network.
>>>   From these observations, I looked at the conditional execution
>>> optimization step in GCC and I found this function that should do the work:
>>> cond_exec_find_if_block. And how to customize the decision to use
>>> conditional instructions:
>>
>> ... This relates to the arm port i.e. the 32-bit target in Armv8-a, is that what you're targeting?
>> AArch64 has had more tuning work put into it over the years so may do better performance-wise if your processor and environment supports it.
>> If you're indeed looking at arm...
>>
>>>                   #define MAX_CONDITIONAL_EXECUTE
>>> arm_max_conditional_execute ()
>>>                   int
>>>                   arm_max_conditional_execute (void)
>>>                   {
>>>                     return max_insns_skipped;
>>>                   }
>>>                   static int max_insns_skipped = 5;
>>>
>>> I tried to compile NGNIX in -O2 (that should enable if-conversion2)
>>> but I did not noticed any change in the code. I enable GCC debug (-da)
>>> and also add some debug in this function and I figure out that
>>> targetm.have_conditional_execution is set to false.
>>>
>>> First, do you how to switch this variable to true. I guess it is an
>>> option during the configuration step of GCC.
>>
>> It's definition on that branch is:
>> /* Only thumb1 can't support conditional execution, so return true if
>>      the target is not thumb1.  */
>> static bool
>> arm_have_conditional_execution (void)
>> {
>>     return !TARGET_THUMB1;
>> }
>>
>> So it looks like you're maybe not setting the right -march or -mcpu option to enable the full armv8-a features?
>>
>> Thanks,
>> Kyrill
>>
>>> Then, I know  that the decision to use conditional execution is based
>>> on the extra cost added to compute both branches compare to the cost of a branch.
>>> In this specific case, branches are miss predicted and the cost is, indeed, high.
>>> Do you think that increasing the max_insns_skipped will be enough to
>>> help GCC to use conditional execution?
>>>
>>> Thank you in advance for your answers.
>>>
>>> Best,
>>> Benjamin Minguez
> 
> R.
> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Condition execution optimization with gcc 7.5
  2023-05-22 16:12           ` Richard Earnshaw (lists)
@ 2023-05-23  6:36             ` Benjamin Minguez
  0 siblings, 0 replies; 8+ messages in thread
From: Benjamin Minguez @ 2023-05-23  6:36 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Kyrylo Tkachov, gcc-help

Hello,

Thanks for the answer, it is very clear to me now.

Again thanks a lot.
Best,
Benjamin

-----Original Message-----
From: Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> 
Sent: Monday, May 22, 2023 6:12 PM
To: Benjamin Minguez <benjamin.minguez@huawei.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc-help@gcc.gnu.org
Subject: Re: Condition execution optimization with gcc 7.5

On aarch64 this code cannot use conditional select.  An operation such as
	if (c) {
	  ...
	  r->lowcase_header[0] = c;
	  ...
	}

would be a conditional store to memory and can only happen if the guarding condition is true.  It's not safe to convert this into, say

	cmp c, #0
	...
	ldr w1, [ptr]
	csel w1, w1, c, eq
	str w1, [ptr]

because the store would introduce a possible race with any other thread that might be writing to the same location.  The compiler would also have to prove that ptr always contained a valid address when 'c' was false as well, something that might not be possible given the information available.

The function arm_max_conditional_execute is only used for 32-bit arm targets.  It's not part of the aarch64 compiler.

R.

On 22/05/2023 16:43, Benjamin Minguez via Gcc-help wrote:
> Hello Richard,
> 
> I'm compiling for aarch64. Indeed, I was expecting conversion via conditional move or set.
> I understand that code such as NGINX HTTP parser is suitable for such conversion. But I was expecting that, for example, this code can benefit of it (ngx_hash is an inline function and is a simple xor operation):
>>>                   if (c) {
>>>                       hash = ngx_hash(0, c);
>>>                       r->lowcase_header[0] = c;
>>>                       i = 1;
>>>                       break;
>>>                   }
> 
> Thank for your help and your answers.
> 
> Best,
> Benjamin Minguez
> 
> -----Original Message-----
> From: Richard Earnshaw (lists) <Richard.Earnshaw@arm.com>
> Sent: Thursday, May 18, 2023 1:02 PM
> To: Benjamin Minguez <benjamin.minguez@huawei.com>; Kyrylo Tkachov 
> <Kyrylo.Tkachov@arm.com>; gcc-help@gcc.gnu.org
> Subject: Re: Condition execution optimization with gcc 7.5
> 
> On 17/05/2023 09:17, Benjamin Minguez via Gcc-help wrote:
>> Hello,
>>
>> I did add -march=armv8-a (and the others armv8.*-a) to GCC command line, but it looks like the conditional execution optimization, cond_exec_find_if_block function, is never called. I enabled all gcc dumps (-da option) and this function debug message are never printed.
> 
> Just to be certain, are you compiling for aarch32 (arm/thumb), or aarch64?  The latter does not support conditional execution, except via instructions such as CSEL.
> 
> [more comments lower down]
> 
>> In parallel, I also try  with different version of GCC: 9.5.0 and 11.3.0, and again the I had the same results.
>>
>>    Do you have any idea why the this optimization step is not called?
>>
>> Thank you in advance for your help.
>>
>> Best,
>> Benjamin Minguez
>>
>> -----Original Message-----
>> From: Benjamin Minguez
>> Sent: Wednesday, May 10, 2023 8:43 AM
>> To: 'Kyrylo Tkachov' <Kyrylo.Tkachov@arm.com>; gcc-help@gcc.gnu.org
>> Subject: RE: Condition execution optimization with gcc 7.5
>>
>> Hi,
>>
>> Thank for the answer.
>>
>> I had a look at the wrong function definition, gcc-7.5.0/gcc/target.def:
>> 	DEFHOOK
>> 	(have_conditional_execution,
>> 	 "This target hook returns true if the target supports conditional execution.\n\
>> 	This target hook is required only when the target has several different\n\
>> 	modes and they have different conditional execution capability, such as ARM.",
>> 	 bool, (void),
>> 	 default_have_conditional_execution)
>> and find this one,  gcc-7.5.0/gcc/targhooks.c:
>> 	bool
>> 	default_have_conditional_execution (void)
>> 	{
>> 	  return HAVE_conditional_execution;
>> 	}
>> Finally, the macro HAVE_conditional_execution is defined here:
>> build-gcc/gcc/insn-config.h,
>>
>> I will investigate the -march or -mcpu option.
>>
>> Again, thanks a lot,
>>
>> Benjamin Minguez
>>
>> -----Original Message-----
>> From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> Sent: Tuesday, May 9, 2023 11:50 AM
>> To: Benjamin Minguez <benjamin.minguez@huawei.com>; 
>> gcc-help@gcc.gnu.org
>> Subject: RE: Condition execution optimization with gcc 7.5
>>
>> Hi Benjamin,
>>
>>> -----Original Message-----
>>> From: Gcc-help <gcc-help-bounces+kyrylo.tkachov=arm.com@gcc.gnu.org>
>>> On Behalf Of Benjamin Minguez via Gcc-help
>>> Sent: Tuesday, May 9, 2023 8:54 AM
>>> To: gcc-help@gcc.gnu.org
>>> Subject: Condition execution optimization with gcc 7.5
>>>
>>> Hello everyone,
>>>
>>> I'm trying to optimize an application that contains a lot of branches.
>>> I'm targeting armv8 processors and I'm using GCC 7.5.0 for compatibility reason.
>>
>> Of course GCC 7.5 is quite old now but if you're forced to use it...
>>
>>> As the original application is similar to NGINX, I investigated on 
>>> NGINX. I'm focusing on the HTTP header parsing. Basically, the 
>>> algorithm parse byte per byte and based on the value stores some variables.
>>> Here is an example, /src/http/ngx_http_parse.c: ngx_http_parse_header_line
>>>                   if (c) {
>>>                       hash = ngx_hash(0, c);
>>>                       r->lowcase_header[0] = c;
>>>                       i = 1;
>>>                       break;
>>>                   }
>>>
>>>                   if (ch == '_') {
>>>                       if (allow_underscores) {
>>>                           hash = ngx_hash(0, ch);
>>>                           r->lowcase_header[0] = ch;
>>>                           i = 1;
>>>
>>>                       } else {
>>>                           r->invalid_header = 1;
>>>                       }
>>>
>>>                       break;
>>>                   }
> 
> Your example code isn't complete enough to do a full analysis, but I doubt code like this would generate conditional execution anyway.  There are several reasons:
> 
> 1) It's likely too long once machine instructions are generated
> 2) There are function calls (ngx_hash) in the body of the conditional blocks (calls cannot be conditionally executed); if they are inlined then see 1) above.
> 3) you have nested conditions (only the innermost block could be conditionally executed).
> 4) you wouldn't want to conditionally execute 'if (allow_underscores)'
> anyway as it's probably highly predictable as a branch.
> 
> R.
> 
>>> Also, most of branches are not predictable because it compares 
>>> against data coming from the network.
>>>   From these observations, I looked at the conditional execution 
>>> optimization step in GCC and I found this function that should do the work:
>>> cond_exec_find_if_block. And how to customize the decision to use 
>>> conditional instructions:
>>
>> ... This relates to the arm port i.e. the 32-bit target in Armv8-a, is that what you're targeting?
>> AArch64 has had more tuning work put into it over the years so may do better performance-wise if your processor and environment supports it.
>> If you're indeed looking at arm...
>>
>>>                   #define MAX_CONDITIONAL_EXECUTE 
>>> arm_max_conditional_execute ()
>>>                   int
>>>                   arm_max_conditional_execute (void)
>>>                   {
>>>                     return max_insns_skipped;
>>>                   }
>>>                   static int max_insns_skipped = 5;
>>>
>>> I tried to compile NGNIX in -O2 (that should enable if-conversion2) 
>>> but I did not noticed any change in the code. I enable GCC debug 
>>> (-da) and also add some debug in this function and I figure out that 
>>> targetm.have_conditional_execution is set to false.
>>>
>>> First, do you how to switch this variable to true. I guess it is an 
>>> option during the configuration step of GCC.
>>
>> It's definition on that branch is:
>> /* Only thumb1 can't support conditional execution, so return true if
>>      the target is not thumb1.  */
>> static bool
>> arm_have_conditional_execution (void) {
>>     return !TARGET_THUMB1;
>> }
>>
>> So it looks like you're maybe not setting the right -march or -mcpu option to enable the full armv8-a features?
>>
>> Thanks,
>> Kyrill
>>
>>> Then, I know  that the decision to use conditional execution is 
>>> based on the extra cost added to compute both branches compare to the cost of a branch.
>>> In this specific case, branches are miss predicted and the cost is, indeed, high.
>>> Do you think that increasing the max_insns_skipped will be enough to 
>>> help GCC to use conditional execution?
>>>
>>> Thank you in advance for your answers.
>>>
>>> Best,
>>> Benjamin Minguez
> 
> R.
> 
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-05-23  6:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-09  7:54 Condition execution optimization with gcc 7.5 Benjamin Minguez
2023-05-09  9:49 ` Kyrylo Tkachov
2023-05-10  6:42   ` Benjamin Minguez
2023-05-17  8:17     ` Benjamin Minguez
2023-05-18 11:02       ` Richard Earnshaw (lists)
2023-05-22 15:43         ` Benjamin Minguez
2023-05-22 16:12           ` Richard Earnshaw (lists)
2023-05-23  6:36             ` Benjamin Minguez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).