From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by sourceware.org (Postfix) with ESMTPS id 94B533858C27 for ; Tue, 3 Aug 2021 21:23:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 94B533858C27 Received: by mail-pj1-x1035.google.com with SMTP id ca5so31344206pjb.5 for ; Tue, 03 Aug 2021 14:23:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=RRRjJ5K+oSLYd/mds3J+VHOvfj15r410/odL9o/GSks=; b=qkqzbMPn+Yaj6+rjVrMsAE/sV5PUq74aBe32Pevb5PLjOW48cjuGQzV58gDeCW/MVc Wn3EncRW/E8ttv5gYxoMhq1Pqe3B0VyOFo0oRNu+znkV1KYD1SdA5Ln77jNlyF5kWOhW TM3TMLSnLupdPa96GOqxMgGxtUrHeF/0RC5/iaIxAfjSI9jpIRjsP7nL8j1b5usrlxsh iTntWTQCRNI01Dz/GRABqqxEE6Vhr3cGrvGj+b5QzVN14Ab38wj7JtdAEbPttg8z2tbE +01mtFbsejMkxYtzboROGLvXtBiKH/hMnHUf/HEDxlTu1ngocsKr4PEZg8P0VkM4ohwL aWIg== X-Gm-Message-State: AOAM533ZxQBi5dJ8cc0pRNpTauge+YB6hYm0/mNcTvkxeFN85GjP/HRf RlzUm7v7QeNwMvMF7bynqFQW//1dWtRWo2FKFqGjY9ACJPY= X-Google-Smtp-Source: ABdhPJxEuFvRRDlqBNGctJTxruFLnKuNjOrE86phwpU/Gu3cMoTLyveP2grzsaG9Hpo3S1kZ4ZXES7ER1LpXyJUTxkc= X-Received: by 2002:a63:83c1:: with SMTP id h184mr753485pge.37.1628025815356; Tue, 03 Aug 2021 14:23:35 -0700 (PDT) MIME-Version: 1.0 References: <20210803135646.2545430-1-hjl.tools@gmail.com> In-Reply-To: <20210803135646.2545430-1-hjl.tools@gmail.com> From: "H.J. Lu" Date: Tue, 3 Aug 2021 14:22:59 -0700 Message-ID: Subject: Re: [PATCH] by_pieces: Properly set m_max_size in op_by_pieces To: GCC Patches Cc: Uros Bizjak , Richard Sandiford Content-Type: multipart/mixed; boundary="00000000000068f34b05c8ae4c51" X-Spam-Status: No, score=-3031.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Aug 2021 21:23:40 -0000 --00000000000068f34b05c8ae4c51 Content-Type: text/plain; charset="UTF-8" On Tue, Aug 3, 2021 at 6:56 AM H.J. Lu wrote: > > 1. Update x86 STORE_MAX_PIECES to use OImode and XImode only if inter-unit > move is enabled since x86 uses vec_duplicate, which is enabled only when > inter-unit move is enabled, to implement store_by_pieces. > 2. Update op_by_pieces_d::op_by_pieces_d to set m_max_size to > STORE_MAX_PIECES for store_by_pieces and to COMPARE_MAX_PIECES for > compare_by_pieces. > > gcc/ > > PR target/101742 > * expr.c (op_by_pieces_d::op_by_pieces_d): Set m_max_size to > STORE_MAX_PIECES for store_by_pieces and to COMPARE_MAX_PIECES > for compare_by_pieces. > * config/i386/i386.h (STORE_MAX_PIECES): Use OImode and XImode > only if TARGET_INTER_UNIT_MOVES_TO_VEC is true. > > gcc/testsuite/ > > PR target/101742 > * gcc.target/i386/pr101742a.c: New test. > * gcc.target/i386/pr101742b.c: Likewise. > --- > gcc/config/i386/i386.h | 20 +++++++++++--------- > gcc/expr.c | 6 +++++- > gcc/testsuite/gcc.target/i386/pr101742a.c | 16 ++++++++++++++++ > gcc/testsuite/gcc.target/i386/pr101742b.c | 4 ++++ > 4 files changed, 36 insertions(+), 10 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr101742a.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr101742b.c > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > index bed9cd9da18..9b416abd5f4 100644 > --- a/gcc/config/i386/i386.h > +++ b/gcc/config/i386/i386.h > @@ -1783,15 +1783,17 @@ typedef struct ix86_args { > /* STORE_MAX_PIECES is the number of bytes at a time that we can > store efficiently. */ > #define STORE_MAX_PIECES \ > - ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \ > - ? 64 \ > - : ((TARGET_AVX \ > - && !TARGET_PREFER_AVX128 \ > - && !TARGET_AVX256_SPLIT_UNALIGNED_STORE) \ > - ? 32 \ > - : ((TARGET_SSE2 \ > - && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \ > - ? 16 : UNITS_PER_WORD))) > + (TARGET_INTER_UNIT_MOVES_TO_VEC \ > + ? ((TARGET_AVX512F && !TARGET_PREFER_AVX256) \ > + ? 64 \ > + : ((TARGET_AVX \ > + && !TARGET_PREFER_AVX128 \ > + && !TARGET_AVX256_SPLIT_UNALIGNED_STORE) \ > + ? 32 \ > + : ((TARGET_SSE2 \ > + && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \ > + ? 16 : UNITS_PER_WORD))) \ > + : UNITS_PER_WORD) > > /* If a memory-to-memory move would take MOVE_RATIO or more simple > move-instruction pairs, we will do a cpymem or libcall instead. > diff --git a/gcc/expr.c b/gcc/expr.c > index b65cfcfdcd1..2964b38b9a5 100644 > --- a/gcc/expr.c > +++ b/gcc/expr.c > @@ -1131,7 +1131,11 @@ op_by_pieces_d::op_by_pieces_d (rtx to, bool to_load, > bool qi_vector_mode) > : m_to (to, to_load, NULL, NULL), > m_from (from, from_load, from_cfn, from_cfn_data), > - m_len (len), m_max_size (MOVE_MAX_PIECES + 1), > + m_len (len), > + m_max_size (((!to_load && from == nullptr) > + ? STORE_MAX_PIECES > + : (from_cfn != nullptr > + ? COMPARE_MAX_PIECES : MOVE_MAX_PIECES)) + 1), > m_push (push), m_qi_vector_mode (qi_vector_mode) > { > int toi = m_to.get_addr_inc (); This larger expr.c patch passes the proper MAX_PIECES directly. > diff --git a/gcc/testsuite/gcc.target/i386/pr101742a.c b/gcc/testsuite/gcc.target/i386/pr101742a.c > new file mode 100644 > index 00000000000..67ea40587dd > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr101742a.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O3 -mtune=nano-x2" } */ > + > +int n2; > + > +__attribute__ ((simd)) char > +w7 (void) > +{ > + short int xb = n2; > + int qp; > + > + for (qp = 0; qp < 2; ++qp) > + xb = xb < 1; > + > + return xb; > +} > diff --git a/gcc/testsuite/gcc.target/i386/pr101742b.c b/gcc/testsuite/gcc.target/i386/pr101742b.c > new file mode 100644 > index 00000000000..ba19064077b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr101742b.c > @@ -0,0 +1,4 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O3 -mtune=nano-x2 -mtune-ctrl=sse_unaligned_store_optimal" } */ > + > +#include "pr101742a.c" > -- > 2.31.1 > -- H.J. --00000000000068f34b05c8ae4c51 Content-Type: text/x-patch; charset="US-ASCII"; name="p.patch" Content-Disposition: attachment; filename="p.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_krwkgtup0 ZGlmZiAtLWdpdCBhL2djYy9leHByLmMgYi9nY2MvZXhwci5jCmluZGV4IGI2NWNmY2ZkY2QxLi42 NmFjMTk4NmYwMiAxMDA2NDQKLS0tIGEvZ2NjL2V4cHIuYworKysgYi9nY2MvZXhwci5jCkBAIC0x MTEwLDggKzExMTAsOCBAQCBjbGFzcyBvcF9ieV9waWVjZXNfZAogICB9CiAKICBwdWJsaWM6Ci0g IG9wX2J5X3BpZWNlc19kIChydHgsIGJvb2wsIHJ0eCwgYm9vbCwgYnlfcGllY2VzX2NvbnN0Zm4s IHZvaWQgKiwKLQkJICB1bnNpZ25lZCBIT1NUX1dJREVfSU5ULCB1bnNpZ25lZCBpbnQsIGJvb2ws CisgIG9wX2J5X3BpZWNlc19kICh1bnNpZ25lZCBpbnQsIHJ0eCwgYm9vbCwgcnR4LCBib29sLCBi eV9waWVjZXNfY29uc3RmbiwKKwkJICB2b2lkICosIHVuc2lnbmVkIEhPU1RfV0lERV9JTlQsIHVu c2lnbmVkIGludCwgYm9vbCwKIAkJICBib29sID0gZmFsc2UpOwogICB2b2lkIHJ1biAoKTsKIH07 CkBAIC0xMTIyLDggKzExMjIsOCBAQCBjbGFzcyBvcF9ieV9waWVjZXNfZAogICAgYW5kIGl0cyBh c3NvY2lhdGVkIEZST01fQ0ZOX0RBVEEgY2FuIGJlIHVzZWQgdG8gcmVwbGFjZSBsb2FkcyB3aXRo CiAgICBjb25zdGFudCB2YWx1ZXMuICBMRU4gZGVzY3JpYmVzIHRoZSBsZW5ndGggb2YgdGhlIG9w ZXJhdGlvbi4gICovCiAKLW9wX2J5X3BpZWNlc19kOjpvcF9ieV9waWVjZXNfZCAocnR4IHRvLCBi b29sIHRvX2xvYWQsCi0JCQkJcnR4IGZyb20sIGJvb2wgZnJvbV9sb2FkLAorb3BfYnlfcGllY2Vz X2Q6Om9wX2J5X3BpZWNlc19kICh1bnNpZ25lZCBpbnQgbWF4X3BpZWNlcywgcnR4IHRvLAorCQkJ CWJvb2wgdG9fbG9hZCwgcnR4IGZyb20sIGJvb2wgZnJvbV9sb2FkLAogCQkJCWJ5X3BpZWNlc19j b25zdGZuIGZyb21fY2ZuLAogCQkJCXZvaWQgKmZyb21fY2ZuX2RhdGEsCiAJCQkJdW5zaWduZWQg SE9TVF9XSURFX0lOVCBsZW4sCkBAIC0xMTMxLDcgKzExMzEsNyBAQCBvcF9ieV9waWVjZXNfZDo6 b3BfYnlfcGllY2VzX2QgKHJ0eCB0bywgYm9vbCB0b19sb2FkLAogCQkJCWJvb2wgcWlfdmVjdG9y X21vZGUpCiAgIDogbV90byAodG8sIHRvX2xvYWQsIE5VTEwsIE5VTEwpLAogICAgIG1fZnJvbSAo ZnJvbSwgZnJvbV9sb2FkLCBmcm9tX2NmbiwgZnJvbV9jZm5fZGF0YSksCi0gICAgbV9sZW4gKGxl biksIG1fbWF4X3NpemUgKE1PVkVfTUFYX1BJRUNFUyArIDEpLAorICAgIG1fbGVuIChsZW4pLCBt X21heF9zaXplIChtYXhfcGllY2VzICsgMSksCiAgICAgbV9wdXNoIChwdXNoKSwgbV9xaV92ZWN0 b3JfbW9kZSAocWlfdmVjdG9yX21vZGUpCiB7CiAgIGludCB0b2kgPSBtX3RvLmdldF9hZGRyX2lu YyAoKTsKQEAgLTEzMjQsOCArMTMyNCw4IEBAIGNsYXNzIG1vdmVfYnlfcGllY2VzX2QgOiBwdWJs aWMgb3BfYnlfcGllY2VzX2QKICBwdWJsaWM6CiAgIG1vdmVfYnlfcGllY2VzX2QgKHJ0eCB0bywg cnR4IGZyb20sIHVuc2lnbmVkIEhPU1RfV0lERV9JTlQgbGVuLAogCQkgICAgdW5zaWduZWQgaW50 IGFsaWduKQotICAgIDogb3BfYnlfcGllY2VzX2QgKHRvLCBmYWxzZSwgZnJvbSwgdHJ1ZSwgTlVM TCwgTlVMTCwgbGVuLCBhbGlnbiwKLQkJICAgICAgUFVTSEdfUCAodG8pKQorICAgIDogb3BfYnlf cGllY2VzX2QgKE1PVkVfTUFYX1BJRUNFUywgdG8sIGZhbHNlLCBmcm9tLCB0cnVlLCBOVUxMLAor CQkgICAgICBOVUxMLCBsZW4sIGFsaWduLCBQVVNIR19QICh0bykpCiAgIHsKICAgfQogICBydHgg ZmluaXNoX3JldG1vZGUgKG1lbW9wX3JldCk7CkBAIC0xNDIxLDggKzE0MjEsOCBAQCBjbGFzcyBz dG9yZV9ieV9waWVjZXNfZCA6IHB1YmxpYyBvcF9ieV9waWVjZXNfZAogICBzdG9yZV9ieV9waWVj ZXNfZCAocnR4IHRvLCBieV9waWVjZXNfY29uc3RmbiBjZm4sIHZvaWQgKmNmbl9kYXRhLAogCQkg ICAgIHVuc2lnbmVkIEhPU1RfV0lERV9JTlQgbGVuLCB1bnNpZ25lZCBpbnQgYWxpZ24sCiAJCSAg ICAgYm9vbCBxaV92ZWN0b3JfbW9kZSkKLSAgICA6IG9wX2J5X3BpZWNlc19kICh0bywgZmFsc2Us IE5VTExfUlRYLCB0cnVlLCBjZm4sIGNmbl9kYXRhLCBsZW4sCi0JCSAgICAgIGFsaWduLCBmYWxz ZSwgcWlfdmVjdG9yX21vZGUpCisgICAgOiBvcF9ieV9waWVjZXNfZCAoU1RPUkVfTUFYX1BJRUNF UywgdG8sIGZhbHNlLCBOVUxMX1JUWCwgdHJ1ZSwgY2ZuLAorCQkgICAgICBjZm5fZGF0YSwgbGVu LCBhbGlnbiwgZmFsc2UsIHFpX3ZlY3Rvcl9tb2RlKQogICB7CiAgIH0KICAgcnR4IGZpbmlzaF9y ZXRtb2RlIChtZW1vcF9yZXQpOwpAQCAtMTYxOCw4ICsxNjE4LDggQEAgY2xhc3MgY29tcGFyZV9i eV9waWVjZXNfZCA6IHB1YmxpYyBvcF9ieV9waWVjZXNfZAogICBjb21wYXJlX2J5X3BpZWNlc19k IChydHggb3AwLCBydHggb3AxLCBieV9waWVjZXNfY29uc3RmbiBvcDFfY2ZuLAogCQkgICAgICAg dm9pZCAqb3AxX2Nmbl9kYXRhLCBIT1NUX1dJREVfSU5UIGxlbiwgaW50IGFsaWduLAogCQkgICAg ICAgcnR4X2NvZGVfbGFiZWwgKmZhaWxfbGFiZWwpCi0gICAgOiBvcF9ieV9waWVjZXNfZCAob3Aw LCB0cnVlLCBvcDEsIHRydWUsIG9wMV9jZm4sIG9wMV9jZm5fZGF0YSwgbGVuLAotCQkgICAgICBh bGlnbiwgZmFsc2UpCisgICAgOiBvcF9ieV9waWVjZXNfZCAoQ09NUEFSRV9NQVhfUElFQ0VTLCBv cDAsIHRydWUsIG9wMSwgdHJ1ZSwgb3AxX2NmbiwKKwkJICAgICAgb3AxX2Nmbl9kYXRhLCBsZW4s IGFsaWduLCBmYWxzZSkKICAgewogICAgIG1fZmFpbF9sYWJlbCA9IGZhaWxfbGFiZWw7CiAgIH0K --00000000000068f34b05c8ae4c51--