From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by sourceware.org (Postfix) with ESMTPS id 2CD993857404 for ; Wed, 5 May 2021 14:50:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 2CD993857404 Received: by mail-pl1-x635.google.com with SMTP id h7so1232121plt.1 for ; Wed, 05 May 2021 07:50:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=R6hCz97JGvNjfI5q8ze4i4OE6xdD4g4S/ktRGt93Mto=; b=nOAN2Yz+d1fJOs8DmFoDLNVSjgv0C9O38apSr1r8FrOhxxGJxZU/R7P6TGAytDV0ef KI6lecuIiozsmI4E3NygEJ5p/PUuv+uVHMreTxuEdGnwOJOD+Ynmva+Vkz+76iO7UYZE 8DS8UbXBw8mUUZ1aVh9DvMUVmXJ6Z3GnbWhGkTF729nnErWR6mEvBQ8uBdZJblJ2o4TZ BiRAo7rIFslEI09vXIhD2CXp9HVEXwrFHoHQimmvTa0LebMNEqsyhOD2tuWljm8w45bQ RNGk4aRhLeKTvzjDlYxa8Lp7Bzdse9LOy++bIFmPgVA34rtnUfIBVr0cV6hskhWZ5hBf CmBw== X-Gm-Message-State: AOAM533F9JUAYynH+lxfIVf7lVn2F1SqCJBVc4fKy0r4t7XyORZJNfuL 0mKX/fUkHmPNVZhH4PDWW8txEBvXQPdDWw== X-Google-Smtp-Source: ABdhPJyM3YiUZdUndETm5SzDwCkMB7UYmbnxXqDOzeOVyNMkDR7MWFbBKwmQAPKP/3zOlMR983VeeQ== X-Received: by 2002:a17:902:6bc7:b029:ee:f84f:1093 with SMTP id m7-20020a1709026bc7b02900eef84f1093mr3266908plt.37.1620226204787; Wed, 05 May 2021 07:50:04 -0700 (PDT) Received: from gnu-cfl-2.localdomain ([172.58.35.177]) by smtp.gmail.com with ESMTPSA id cm22sm17892275pjb.32.2021.05.05.07.50.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 May 2021 07:50:04 -0700 (PDT) Received: by gnu-cfl-2.localdomain (Postfix, from userid 1000) id 921E4C0311; Wed, 5 May 2021 07:50:02 -0700 (PDT) Date: Wed, 5 May 2021 07:50:02 -0700 From: "H.J. Lu" To: Richard Biener Cc: Ivan Sorokin , Uros Bizjak , GCC Patches Subject: [PATCH v2] x86: Build only one __cpu_model/__cpu_features2 variables Message-ID: References: <20210503083948.449-1-vanyacpp@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-3035.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 May 2021 14:50:08 -0000 On Wed, May 05, 2021 at 09:36:16AM +0200, Richard Biener wrote: > On Mon, May 3, 2021 at 11:31 AM Ivan Sorokin via Gcc-patches > wrote: > > > > Prior to this commit GCC -O2 generated quite bad code for this > > function: > > > > bool f() > > { > > return __builtin_cpu_supports("popcnt") > > && __builtin_cpu_supports("ssse3"); > > } > > > > f: > > movl __cpu_model+12(%rip), %eax > > xorl %r8d, %r8d > > testb $4, %al > > je .L1 > > shrl $6, %eax > > movl %eax, %r8d > > andl $1, %r8d > > .L1: > > movl %r8d, %eax > > ret > > > > The problem was caused by the fact that internally every invocation > > of __builtin_cpu_supports built a new variable __cpu_model and a new > > type __processor_model. Because of this GIMPLE level optimizers > > weren't able to CSE the loads of __cpu_model and optimize > > bit-operations properly. > > > > This commit fixes the problem by caching created __cpu_model > > variable and __processor_model type. Now the GCC -O2 generates: > > > > f: > > movl __cpu_model+12(%rip), %eax > > andl $68, %eax > > cmpl $68, %eax > > sete %al > > ret > > The patch looks good, the function could need a comment > and the global variables better names, not starting with __ > > Up to the x86 maintainers - HJ, can you pick up this work? > Here is the updated patch to also handle and__cpu_features2. OK for master? Thanks. H.J. --- GCC -O2 generated quite bad code for this function: bool f (void) { return __builtin_cpu_supports("popcnt") && __builtin_cpu_supports("ssse3"); } f: movl __cpu_model+12(%rip), %edx movl %edx, %eax shrl $6, %eax andl $1, %eax andl $4, %edx movl $0, %edx cmove %edx, %eax ret The problem was caused by the fact that internally every invocation of __builtin_cpu_supports built a new variable __cpu_model and a new type __processor_model. Because of this, GIMPLE level optimizers weren't able to CSE the loads of __cpu_model and optimize bit-operations properly. Improve GCC -O2 code generation by caching __cpu_model and__cpu_features2 variables as well as their types: f: movl __cpu_model+12(%rip), %eax andl $68, %eax cmpl $68, %eax sete %al ret gcc/ChangeLog: 2021-05-05 Ivan Sorokin H.J. Lu PR target/91400 * config/i386/i386-builtins.c (ix86_cpu_model_type_node): New. (ix86_cpu_model_var): Likewise. (ix86_cpu_features2_type_node): Likewise. (ix86_cpu_features2_var): Likewise. (fold_builtin_cpu): Cache __cpu_model and __cpu_features2 with their types. gcc/testsuite/Changelog: 2021-05-05 Ivan Sorokin H.J. Lu PR target/91400 * gcc.target/i386/pr91400-1.c: New test. * gcc.target/i386/pr91400-2.c: Likewise. --- gcc/config/i386/i386-builtins.c | 52 +++++++++++++++-------- gcc/testsuite/gcc.target/i386/pr91400-1.c | 14 ++++++ gcc/testsuite/gcc.target/i386/pr91400-2.c | 14 ++++++ 3 files changed, 63 insertions(+), 17 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr91400-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr91400-2.c diff --git a/gcc/config/i386/i386-builtins.c b/gcc/config/i386/i386-builtins.c index b66911082ab..8036aedebac 100644 --- a/gcc/config/i386/i386-builtins.c +++ b/gcc/config/i386/i386-builtins.c @@ -2103,6 +2103,11 @@ make_var_decl (tree type, const char *name) return new_decl; } +static GTY(()) tree ix86_cpu_model_type_node; +static GTY(()) tree ix86_cpu_model_var; +static GTY(()) tree ix86_cpu_features2_type_node; +static GTY(()) tree ix86_cpu_features2_var; + /* FNDECL is a __builtin_cpu_is or a __builtin_cpu_supports call that is folded into an integer defined in libgcc/config/i386/cpuinfo.c */ @@ -2114,12 +2119,16 @@ fold_builtin_cpu (tree fndecl, tree *args) = (enum ix86_builtins) DECL_MD_FUNCTION_CODE (fndecl); tree param_string_cst = NULL; - tree __processor_model_type = build_processor_model_struct (); - tree __cpu_model_var = make_var_decl (__processor_model_type, - "__cpu_model"); - - - varpool_node::add (__cpu_model_var); + if (ix86_cpu_model_var == nullptr) + { + /* Build a single __cpu_model variable for all references to + __cpu_model so that GIMPLE level optimizers can CSE the loads + of __cpu_model and optimize bit-operations properly. */ + ix86_cpu_model_type_node = build_processor_model_struct (); + ix86_cpu_model_var = make_var_decl (ix86_cpu_model_type_node, + "__cpu_model"); + varpool_node::add (ix86_cpu_model_var); + } gcc_assert ((args != NULL) && (*args != NULL)); @@ -2160,7 +2169,7 @@ fold_builtin_cpu (tree fndecl, tree *args) return integer_zero_node; } - field = TYPE_FIELDS (__processor_model_type); + field = TYPE_FIELDS (ix86_cpu_model_type_node); field_val = processor_alias_table[i].model; /* CPU types are stored in the next field. */ @@ -2179,7 +2188,7 @@ fold_builtin_cpu (tree fndecl, tree *args) } /* Get the appropriate field in __cpu_model. */ - ref = build3 (COMPONENT_REF, TREE_TYPE (field), __cpu_model_var, + ref = build3 (COMPONENT_REF, TREE_TYPE (field), ix86_cpu_model_var, field, NULL_TREE); /* Check the value. */ @@ -2212,13 +2221,22 @@ fold_builtin_cpu (tree fndecl, tree *args) if (isa_names_table[i].feature >= 32) { - tree index_type - = build_index_type (size_int (SIZE_OF_CPU_FEATURES)); - tree type = build_array_type (unsigned_type_node, index_type); - tree __cpu_features2_var = make_var_decl (type, - "__cpu_features2"); + if (ix86_cpu_features2_var == nullptr) + { + /* Build a single __cpu_features2 variable for all + references to __cpu_features2 so that GIMPLE level + optimizers can CSE the loads of __cpu_features2 and + optimize bit-operations properly. */ + tree index_type + = build_index_type (size_int (SIZE_OF_CPU_FEATURES)); + ix86_cpu_features2_type_node + = build_array_type (unsigned_type_node, index_type); + ix86_cpu_features2_var + = make_var_decl (ix86_cpu_features2_type_node, + "__cpu_features2"); + varpool_node::add (ix86_cpu_features2_var); + } - varpool_node::add (__cpu_features2_var); for (unsigned int j = 0; j < SIZE_OF_CPU_FEATURES; j++) if (isa_names_table[i].feature < (32 + 32 + j * 32)) { @@ -2226,7 +2244,7 @@ fold_builtin_cpu (tree fndecl, tree *args) - (32 + j * 32))); tree index = size_int (j); array_elt = build4 (ARRAY_REF, unsigned_type_node, - __cpu_features2_var, + ix86_cpu_features2_var, index, NULL_TREE, NULL_TREE); /* Return __cpu_features2[index] & field_val */ final = build2 (BIT_AND_EXPR, unsigned_type_node, @@ -2237,13 +2255,13 @@ fold_builtin_cpu (tree fndecl, tree *args) } } - field = TYPE_FIELDS (__processor_model_type); + field = TYPE_FIELDS (ix86_cpu_model_type_node); /* Get the last field, which is __cpu_features. */ while (DECL_CHAIN (field)) field = DECL_CHAIN (field); /* Get the appropriate field: __cpu_model.__cpu_features */ - ref = build3 (COMPONENT_REF, TREE_TYPE (field), __cpu_model_var, + ref = build3 (COMPONENT_REF, TREE_TYPE (field), ix86_cpu_model_var, field, NULL_TREE); /* Access the 0th element of __cpu_features array. */ diff --git a/gcc/testsuite/gcc.target/i386/pr91400-1.c b/gcc/testsuite/gcc.target/i386/pr91400-1.c new file mode 100644 index 00000000000..6124058dc1e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr91400-1.c @@ -0,0 +1,14 @@ +/* PR target/91400 */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler-times "andl" 1 } } */ +/* { dg-final { scan-assembler-times "cmpl" 1 } } */ +/* { dg-final { scan-assembler-times "sete" 1 } } */ +/* { dg-final { scan-assembler-not "cmove" } } */ + +_Bool +f (void) +{ + return __builtin_cpu_supports("popcnt") + && __builtin_cpu_supports("ssse3"); +} diff --git a/gcc/testsuite/gcc.target/i386/pr91400-2.c b/gcc/testsuite/gcc.target/i386/pr91400-2.c new file mode 100644 index 00000000000..1af5a2f41cd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr91400-2.c @@ -0,0 +1,14 @@ +/* PR target/91400 */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler-times "andl" 1 } } */ +/* { dg-final { scan-assembler-times "cmpl" 1 } } */ +/* { dg-final { scan-assembler-times "sete" 1 } } */ +/* { dg-final { scan-assembler-not "cmove" } } */ + +_Bool +f (void) +{ + return __builtin_cpu_supports("avx512vnni") + && __builtin_cpu_supports("3dnow"); +} -- 2.31.1