From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x72b.google.com (mail-qk1-x72b.google.com [IPv6:2607:f8b0:4864:20::72b]) by sourceware.org (Postfix) with ESMTPS id AAD25385802B for ; Wed, 5 May 2021 17:59:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org AAD25385802B Received: by mail-qk1-x72b.google.com with SMTP id i17so2411847qki.3 for ; Wed, 05 May 2021 10:59:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Zqj/X6GBdwgG+Q7r45jBVdL2/82V1l+YlWJD9RO9aqM=; b=nMMw7rZ1tvhDcFWaSc4xFdSvOhsCdH+TB8pHdYqWjnBWMnWPBfc7scXbL4GJpZ2JgX 6KOGhsc9IiO+8t/jJpQxQENyJ6KoV/ZvEFKBvwXcKCU02UzCD8oblGUP4Yu8sA1O4eE8 YmJnGHVI6hoNNCbAnHfVnRGTd+I3tzomrZthqL3WHDr4hWdrLd30zvk+gJZni2W6Gp5H 9JHhyogScvn0ePVb1WVWKOwpPWy4EHYTxAir+4xOduIBZYqbKapvXIJhO6ljNJ4bUlqe X3ldDIz6E8BduO3sk+vdEr6n3BqkNBLVUXoKAH+vofLgWLPbK436wUd3dTFX3uzDmib9 4nzQ== X-Gm-Message-State: AOAM533C/6H0kH+bXSba34EeZFeHIQZeDzdOddZXxtHdO+VBYJTRzf1D 9C8LE6pC2WwbVcZy5m8HjRS+8zbdrtNFXExaEQQ= X-Google-Smtp-Source: ABdhPJxqfRuJUYTOmlpLs+pNLKyQQ6ad06RdTltd+BKHM0aNsxyT+8sHEDksYR9BllVdBNBTf1nPLGIgb+aKRlI9+6U= X-Received: by 2002:a37:9e0a:: with SMTP id h10mr26905683qke.61.1620237582015; Wed, 05 May 2021 10:59:42 -0700 (PDT) MIME-Version: 1.0 References: <20210503083948.449-1-vanyacpp@gmail.com> In-Reply-To: From: Uros Bizjak Date: Wed, 5 May 2021 19:59:31 +0200 Message-ID: Subject: Re: [PATCH v2] x86: Build only one __cpu_model/__cpu_features2 variables To: "H.J. Lu" Cc: Richard Biener , Ivan Sorokin , GCC Patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 May 2021 17:59:46 -0000 On Wed, May 5, 2021 at 4:50 PM H.J. Lu wrote: > > On Wed, May 05, 2021 at 09:36:16AM +0200, Richard Biener wrote: > > On Mon, May 3, 2021 at 11:31 AM Ivan Sorokin via Gcc-patches > > wrote: > > > > > > Prior to this commit GCC -O2 generated quite bad code for this > > > function: > > > > > > bool f() > > > { > > > return __builtin_cpu_supports("popcnt") > > > && __builtin_cpu_supports("ssse3"); > > > } > > > > > > f: > > > movl __cpu_model+12(%rip), %eax > > > xorl %r8d, %r8d > > > testb $4, %al > > > je .L1 > > > shrl $6, %eax > > > movl %eax, %r8d > > > andl $1, %r8d > > > .L1: > > > movl %r8d, %eax > > > ret > > > > > > The problem was caused by the fact that internally every invocation > > > of __builtin_cpu_supports built a new variable __cpu_model and a new > > > type __processor_model. Because of this GIMPLE level optimizers > > > weren't able to CSE the loads of __cpu_model and optimize > > > bit-operations properly. > > > > > > This commit fixes the problem by caching created __cpu_model > > > variable and __processor_model type. Now the GCC -O2 generates: > > > > > > f: > > > movl __cpu_model+12(%rip), %eax > > > andl $68, %eax > > > cmpl $68, %eax > > > sete %al > > > ret > > > > The patch looks good, the function could need a comment > > and the global variables better names, not starting with __ > > > > Up to the x86 maintainers - HJ, can you pick up this work? > > > > Here is the updated patch to also handle and__cpu_features2. > OK for master? LGTM (Richi effectively approved tree parts). Thanks, Uros. > > Thanks. > > H.J. > --- > GCC -O2 generated quite bad code for this function: > > bool > f (void) > { > return __builtin_cpu_supports("popcnt") > && __builtin_cpu_supports("ssse3"); > } > > f: > movl __cpu_model+12(%rip), %edx > movl %edx, %eax > shrl $6, %eax > andl $1, %eax > andl $4, %edx > movl $0, %edx > cmove %edx, %eax > ret > > The problem was caused by the fact that internally every invocation of > __builtin_cpu_supports built a new variable __cpu_model and a new type > __processor_model. Because of this, GIMPLE level optimizers weren't able > to CSE the loads of __cpu_model and optimize bit-operations properly. > > Improve GCC -O2 code generation by caching __cpu_model and__cpu_features2 > variables as well as their types: > > f: > movl __cpu_model+12(%rip), %eax > andl $68, %eax > cmpl $68, %eax > sete %al > ret > > gcc/ChangeLog: > > 2021-05-05 Ivan Sorokin > H.J. Lu > > PR target/91400 > * config/i386/i386-builtins.c (ix86_cpu_model_type_node): New. > (ix86_cpu_model_var): Likewise. > (ix86_cpu_features2_type_node): Likewise. > (ix86_cpu_features2_var): Likewise. > (fold_builtin_cpu): Cache __cpu_model and __cpu_features2 with > their types. > > gcc/testsuite/Changelog: > > 2021-05-05 Ivan Sorokin > H.J. Lu > > PR target/91400 > * gcc.target/i386/pr91400-1.c: New test. > * gcc.target/i386/pr91400-2.c: Likewise. > --- > gcc/config/i386/i386-builtins.c | 52 +++++++++++++++-------- > gcc/testsuite/gcc.target/i386/pr91400-1.c | 14 ++++++ > gcc/testsuite/gcc.target/i386/pr91400-2.c | 14 ++++++ > 3 files changed, 63 insertions(+), 17 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr91400-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr91400-2.c > > diff --git a/gcc/config/i386/i386-builtins.c b/gcc/config/i386/i386-builtins.c > index b66911082ab..8036aedebac 100644 > --- a/gcc/config/i386/i386-builtins.c > +++ b/gcc/config/i386/i386-builtins.c > @@ -2103,6 +2103,11 @@ make_var_decl (tree type, const char *name) > return new_decl; > } > > +static GTY(()) tree ix86_cpu_model_type_node; > +static GTY(()) tree ix86_cpu_model_var; > +static GTY(()) tree ix86_cpu_features2_type_node; > +static GTY(()) tree ix86_cpu_features2_var; > + > /* FNDECL is a __builtin_cpu_is or a __builtin_cpu_supports call that is folded > into an integer defined in libgcc/config/i386/cpuinfo.c */ > > @@ -2114,12 +2119,16 @@ fold_builtin_cpu (tree fndecl, tree *args) > = (enum ix86_builtins) DECL_MD_FUNCTION_CODE (fndecl); > tree param_string_cst = NULL; > > - tree __processor_model_type = build_processor_model_struct (); > - tree __cpu_model_var = make_var_decl (__processor_model_type, > - "__cpu_model"); > - > - > - varpool_node::add (__cpu_model_var); > + if (ix86_cpu_model_var == nullptr) > + { > + /* Build a single __cpu_model variable for all references to > + __cpu_model so that GIMPLE level optimizers can CSE the loads > + of __cpu_model and optimize bit-operations properly. */ > + ix86_cpu_model_type_node = build_processor_model_struct (); > + ix86_cpu_model_var = make_var_decl (ix86_cpu_model_type_node, > + "__cpu_model"); > + varpool_node::add (ix86_cpu_model_var); > + } > > gcc_assert ((args != NULL) && (*args != NULL)); > > @@ -2160,7 +2169,7 @@ fold_builtin_cpu (tree fndecl, tree *args) > return integer_zero_node; > } > > - field = TYPE_FIELDS (__processor_model_type); > + field = TYPE_FIELDS (ix86_cpu_model_type_node); > field_val = processor_alias_table[i].model; > > /* CPU types are stored in the next field. */ > @@ -2179,7 +2188,7 @@ fold_builtin_cpu (tree fndecl, tree *args) > } > > /* Get the appropriate field in __cpu_model. */ > - ref = build3 (COMPONENT_REF, TREE_TYPE (field), __cpu_model_var, > + ref = build3 (COMPONENT_REF, TREE_TYPE (field), ix86_cpu_model_var, > field, NULL_TREE); > > /* Check the value. */ > @@ -2212,13 +2221,22 @@ fold_builtin_cpu (tree fndecl, tree *args) > > if (isa_names_table[i].feature >= 32) > { > - tree index_type > - = build_index_type (size_int (SIZE_OF_CPU_FEATURES)); > - tree type = build_array_type (unsigned_type_node, index_type); > - tree __cpu_features2_var = make_var_decl (type, > - "__cpu_features2"); > + if (ix86_cpu_features2_var == nullptr) > + { > + /* Build a single __cpu_features2 variable for all > + references to __cpu_features2 so that GIMPLE level > + optimizers can CSE the loads of __cpu_features2 and > + optimize bit-operations properly. */ > + tree index_type > + = build_index_type (size_int (SIZE_OF_CPU_FEATURES)); > + ix86_cpu_features2_type_node > + = build_array_type (unsigned_type_node, index_type); > + ix86_cpu_features2_var > + = make_var_decl (ix86_cpu_features2_type_node, > + "__cpu_features2"); > + varpool_node::add (ix86_cpu_features2_var); > + } > > - varpool_node::add (__cpu_features2_var); > for (unsigned int j = 0; j < SIZE_OF_CPU_FEATURES; j++) > if (isa_names_table[i].feature < (32 + 32 + j * 32)) > { > @@ -2226,7 +2244,7 @@ fold_builtin_cpu (tree fndecl, tree *args) > - (32 + j * 32))); > tree index = size_int (j); > array_elt = build4 (ARRAY_REF, unsigned_type_node, > - __cpu_features2_var, > + ix86_cpu_features2_var, > index, NULL_TREE, NULL_TREE); > /* Return __cpu_features2[index] & field_val */ > final = build2 (BIT_AND_EXPR, unsigned_type_node, > @@ -2237,13 +2255,13 @@ fold_builtin_cpu (tree fndecl, tree *args) > } > } > > - field = TYPE_FIELDS (__processor_model_type); > + field = TYPE_FIELDS (ix86_cpu_model_type_node); > /* Get the last field, which is __cpu_features. */ > while (DECL_CHAIN (field)) > field = DECL_CHAIN (field); > > /* Get the appropriate field: __cpu_model.__cpu_features */ > - ref = build3 (COMPONENT_REF, TREE_TYPE (field), __cpu_model_var, > + ref = build3 (COMPONENT_REF, TREE_TYPE (field), ix86_cpu_model_var, > field, NULL_TREE); > > /* Access the 0th element of __cpu_features array. */ > diff --git a/gcc/testsuite/gcc.target/i386/pr91400-1.c b/gcc/testsuite/gcc.target/i386/pr91400-1.c > new file mode 100644 > index 00000000000..6124058dc1e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr91400-1.c > @@ -0,0 +1,14 @@ > +/* PR target/91400 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler-times "andl" 1 } } */ > +/* { dg-final { scan-assembler-times "cmpl" 1 } } */ > +/* { dg-final { scan-assembler-times "sete" 1 } } */ > +/* { dg-final { scan-assembler-not "cmove" } } */ > + > +_Bool > +f (void) > +{ > + return __builtin_cpu_supports("popcnt") > + && __builtin_cpu_supports("ssse3"); > +} > diff --git a/gcc/testsuite/gcc.target/i386/pr91400-2.c b/gcc/testsuite/gcc.target/i386/pr91400-2.c > new file mode 100644 > index 00000000000..1af5a2f41cd > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr91400-2.c > @@ -0,0 +1,14 @@ > +/* PR target/91400 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler-times "andl" 1 } } */ > +/* { dg-final { scan-assembler-times "cmpl" 1 } } */ > +/* { dg-final { scan-assembler-times "sete" 1 } } */ > +/* { dg-final { scan-assembler-not "cmove" } } */ > + > +_Bool > +f (void) > +{ > + return __builtin_cpu_supports("avx512vnni") > + && __builtin_cpu_supports("3dnow"); > +} > -- > 2.31.1 >