From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by sourceware.org (Postfix) with ESMTPS id AAA613858282 for ; Tue, 19 Dec 2023 09:53:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AAA613858282 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org AAA613858282 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::335 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702979636; cv=none; b=KauaQ4pHh4wpMCQ1KwQcxtqkKM9Ku9ib5tnWPT3QQL3p2Z6L1OxVyfGWW4d/KTdWGP9AaXx0pTqdiZTDBhIb3uVKyWkrs4m5NjF2pwD13zC3nex5VZftbtbod6ZoDk8O3HzYgcqf/NOvJBnZE1eW9iZY7Gc3DdnJ1sCYQDu2XNU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702979636; c=relaxed/simple; bh=hYzLOiNx2razIdnMnmogHY+w+m/oxGTg3rMCk3o3/74=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=i5nu6+5y06hyK+/Ez/l8gmd8YSPNiz+NEz691lzmQEgn5iwaIkwWW9HS235zZfLmr2knTNJtGWXXad50qpx3CfdZrvwQMpdFWoJpbwrGR1zVu3k6FY6CdAoL9O7EOqxl5C6Hu++rI6J4SLY3ePcK5RUQEDt3JY2CKBK/5LUhrVE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x335.google.com with SMTP id 5b1f17b1804b1-40c38e292c8so21537575e9.0 for ; Tue, 19 Dec 2023 01:53:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1702979631; x=1703584431; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=4svtwVnCcahwTuOm53ref/4bH6PwPLfcm7TatQZA5/4=; b=ZNM5dE05oV4SZ6lTcY2uIfPG4cywwbCGqDCK+4615OVX6LF1V93Nfhm10pZxeW0slS Fomq6c2ap3j6Qe8VCegj+su2BKSiNSAQCq6LUygsj9SdsiRQq/7NI61jwzDkXFi/CDDR qkAT2EJhnf1YEpDNuk3SmcrRG3T5QL+LWqzpXLhVxKFa2qMMpyEz+uQqyOGuNNeS6U4P jW3BdHfuvJiUor2SELLX48NawQY7lrDr9kUI5UnkpzvKQCILssl5e55f7xTgnPMaMKuP 5wDi49ecL6mHVNnW0CyFhHOEKoPfg6IPlaVPXbOtTT3ID4pHWht8RDcfamb+gu13YXrd becQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702979631; x=1703584431; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4svtwVnCcahwTuOm53ref/4bH6PwPLfcm7TatQZA5/4=; b=bca5XNVE91whQjbnK6F3C9ttKzu75xUw1jRXzYXb+cm5cGAQum3hMM/2GlZqBGkU/1 sLGiOmurEzSAZNdOE+Q7ddhPcRTk1/KN0nX3PDUZrO3/FuuGMnefIl4vIssx4KQSpZyi eqh57PY6AjG9bw9eMZKR1Sa4O+Jm3HKLAnv9TvU25mJSkCPTfppjcxruun4m8YpjlLQO beCTSjvWSHK8aefe399B7wT2wzr6tr9Ksg3l3i/GsnzWzo/YmY933DCfVNgsNti2fG8F T0Fg8WdZOmsUgOviBJ0ebU7hXA4hie4NV2TZBlCxrwCP5oH9Z2cQI28JRVGNkn1ywRIj fu1w== X-Gm-Message-State: AOJu0YydhZ+FHC9b2oGRZq3ICP/J1smXSzRGf720kwB7SZcFOpzXH7l+ dFxu/MUKO/Oqo7tOXLlIBLx56KfBclFvDsTeJJMzxw== X-Google-Smtp-Source: AGHT+IFBeMj6J63bSVofJ1EKo4rIQgOjpR+ACxCH+Uhiok5Wkoa8Yx8pkQRrr8zWdvakiSZDm8mOjA== X-Received: by 2002:a05:600c:695:b0:40c:2710:f67 with SMTP id a21-20020a05600c069500b0040c27100f67mr386600wmn.85.1702979630772; Tue, 19 Dec 2023 01:53:50 -0800 (PST) Received: from slewis-laptop.ba.rivosinc.com ([51.52.155.69]) by smtp.gmail.com with ESMTPSA id q19-20020a05600c46d300b0040b632f31d2sm2079985wmo.5.2023.12.19.01.53.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Dec 2023 01:53:50 -0800 (PST) From: Sergei Lewis To: gcc-patches@gcc.gnu.org Subject: [PATCH v2 2/3] RISC-V: setmem for RISCV with V extension Date: Tue, 19 Dec 2023 09:53:47 +0000 Message-Id: <20231219095348.356551-3-slewis@rivosinc.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231219095348.356551-1-slewis@rivosinc.com> References: <20231219095348.356551-1-slewis@rivosinc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: gcc/ChangeLog * config/riscv/riscv-protos.h (riscv_vector::expand_vec_setmem): New function declaration. * config/riscv/riscv-string.cc (riscv_vector::expand_vec_setmem): New function: this generates an inline vectorised memory set, if and only if we know the entire operation can be performed in a single vector store * config/riscv/riscv.md (setmem): Try riscv_vector::expand_vec_setmem for constant lengths gcc/testsuite/ChangeLog * gcc.target/riscv/rvv/base/setmem-1.c: New tests * gcc.target/riscv/rvv/base/setmem-2.c: New tests * gcc.target/riscv/rvv/base/setmem-3.c: New tests --- gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-string.cc | 90 +++++++++++++++ gcc/config/riscv/riscv.md | 14 +++ .../gcc.target/riscv/rvv/base/setmem-1.c | 103 ++++++++++++++++++ .../gcc.target/riscv/rvv/base/setmem-2.c | 51 +++++++++ .../gcc.target/riscv/rvv/base/setmem-3.c | 69 ++++++++++++ 6 files changed, 328 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index eaee53ce94e..c4531589300 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -637,6 +637,7 @@ void expand_popcount (rtx *); void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false); bool expand_strcmp (rtx, rtx, rtx, rtx, unsigned HOST_WIDE_INT, bool); void emit_vec_extract (rtx, rtx, rtx); +bool expand_vec_setmem (rtx, rtx, rtx, rtx); /* Rounding mode bitfield for fixed point VXRM. */ enum fixed_point_rounding_mode diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index 11c1f74d0b3..e506b92a552 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -1247,4 +1247,94 @@ expand_strcmp (rtx result, rtx src1, rtx src2, rtx nbytes, return true; } +/* Check we are permitted to vectorise a memory operation. + If so, return true and populate lmul_out. + Otherwise, return false and leave lmul_out unchanged. */ +static bool +check_vectorise_memory_operation (rtx length_in, HOST_WIDE_INT &lmul_out) +{ + /* If we either can't or have been asked not to vectorise, respect this. */ + if (!TARGET_VECTOR) + return false; + if (!(stringop_strategy & STRATEGY_VECTOR)) + return false; + + /* If we can't reason about the length, don't vectorise. */ + if (!CONST_INT_P (length_in)) + return false; + + HOST_WIDE_INT length = INTVAL (length_in); + + /* If it's tiny, default operation is likely better; maybe worth + considering fractional lmul in the future as well. */ + if (length < (TARGET_MIN_VLEN / 8)) + return false; + + /* If we've been asked to use a specific LMUL, + check the operation fits and do that. */ + if (riscv_autovec_lmul != RVV_DYNAMIC) + { + lmul_out = TARGET_MAX_LMUL; + return (length <= ((TARGET_MAX_LMUL * TARGET_MIN_VLEN) / 8)); + } + + /* Find smallest lmul large enough for entire op. */ + HOST_WIDE_INT lmul = 1; + while ((lmul <= 8) && (length > ((lmul * TARGET_MIN_VLEN) / 8))) + { + lmul <<= 1; + } + + if (lmul > 8) + return false; + + lmul_out = lmul; + return true; +} + +/* Used by setmemdi in riscv.md. */ +bool +expand_vec_setmem (rtx dst_in, rtx length_in, rtx fill_value_in, + rtx alignment_in) +{ + HOST_WIDE_INT lmul; + /* Check we are able and allowed to vectorise this operation; + bail if not. */ + if (!check_vectorise_memory_operation (length_in, lmul)) + return false; + + machine_mode vmode + = riscv_vector::get_vector_mode (QImode, BYTES_PER_RISCV_VECTOR * lmul) + .require (); + rtx dst_addr = copy_addr_to_reg (XEXP (dst_in, 0)); + rtx dst = change_address (dst_in, vmode, dst_addr); + + rtx fill_value = gen_reg_rtx (vmode); + rtx broadcast_ops[] = { fill_value, fill_value_in }; + + /* If the length is exactly vlmax for the selected mode, do that. + Otherwise, use a predicated store. */ + if (known_eq (GET_MODE_SIZE (vmode), INTVAL (length_in))) + { + emit_vlmax_insn (code_for_pred_broadcast (vmode), UNARY_OP, + broadcast_ops); + emit_move_insn (dst, fill_value); + } + else + { + if (!satisfies_constraint_K (length_in)) + length_in = force_reg (Pmode, length_in); + emit_nonvlmax_insn (code_for_pred_broadcast (vmode), UNARY_OP, + broadcast_ops, length_in); + machine_mode mask_mode + = riscv_vector::get_vector_mode (BImode, GET_MODE_NUNITS (vmode)) + .require (); + rtx mask = CONSTM1_RTX (mask_mode); + emit_insn (gen_pred_store (vmode, dst, mask, fill_value, length_in, + get_avl_type_rtx (riscv_vector::NONVLMAX))); + } + + return true; +} + } diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 1b3f66fd15c..dd34211ca80 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -2387,6 +2387,20 @@ FAIL; }) +(define_expand "setmemsi" + [(set (match_operand:BLK 0 "memory_operand") ;; Dest + (match_operand:QI 2 "nonmemory_operand")) ;; Value + (use (match_operand:SI 1 "const_int_operand")) ;; Length + (match_operand:SI 3 "const_int_operand")] ;; Align + "TARGET_VECTOR" +{ + if (riscv_vector::expand_vec_setmem (operands[0], operands[1], operands[2], + operands[3])) + DONE; + else + FAIL; +}) + ;; Expand in-line code to clear the instruction cache between operand[0] and ;; operand[1]. (define_expand "clear_cache" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c new file mode 100644 index 00000000000..1c08be978a6 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c @@ -0,0 +1,103 @@ +/* { dg-do compile } */ +/* { dg-add-options riscv_v } */ +/* { dg-additional-options "-O3 --param=riscv-autovec-lmul=dynamic" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) + +/* Tiny memsets should use scalar ops. +** f1: +** sb\s+a1,0\(a0\) +** ret +*/ +void * +f1 (void *a, int const b) +{ + return __builtin_memset (a, b, 1); +} + +/* Tiny memsets should use scalar ops. +** f2: +** sb\s+a1,0\(a0\) +** sb\s+a1,1\(a0\) +** ret +*/ +void * +f2 (void *a, int const b) +{ + return __builtin_memset (a, b, 2); +} + +/* Tiny memsets should use scalar ops. +** f3: +** sb\s+a1,0\(a0\) +** sb\s+a1,1\(a0\) +** sb\s+a1,2\(a0\) +** ret +*/ +void * +f3 (void *a, int const b) +{ + return __builtin_memset (a, b, 3); +} + +/* Vectorise+inline minimum vector register width with LMUL=1 +** f4: +** ( +** vsetivli\s+zero,\d+,e8,m1,ta,ma +** | +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m1,ta,ma +** ) +** vmv\.v\.x\s+v\d+,a1 +** vse8\.v\s+v\d+,0\(a0\) +** ret +*/ +void * +f4 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES); +} + +/* Vectorised code should use smallest lmul known to fit length +** f5: +** ( +** vsetivli\s+zero,\d+,e8,m2,ta,ma +** | +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m2,ta,ma +** ) +** vmv\.v\.x\s+v\d+,a1 +** vse8\.v\s+v\d+,0\(a0\) +** ret +*/ +void * +f5 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES + 1); +} + +/* Vectorise+inline up to LMUL=8 +** f6: +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m8,ta,ma +** vmv\.v\.x\s+v\d+,a1 +** vse8\.v\s+v\d+,0\(a0\) +** ret +*/ +void * +f6 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES * 8); +} + +/* Don't vectorise if the move is too large for one operation. +** f7: +** li\s+a2,\d+ +** tail\s+memset +*/ +void * +f7 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES * 8 + 1); +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c new file mode 100644 index 00000000000..82d181dff3f --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c @@ -0,0 +1,51 @@ +/* { dg-do compile } */ +/* { dg-add-options riscv_v } */ +/* { dg-additional-options "-O3 --param riscv-autovec-lmul=m1" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) + +/* Small memsets shouldn't be vectorised. +** f1: +** ( +** sb\s+a1,0\(a0\) +** ... +** | +** li\s+a2,\d+ +** tail\s+memset +** ) +*/ +void * +f1 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES - 1); +} + +/* Vectorise+inline minimum vector register width using requested lmul. +** f2: +** ( +** vsetivli\s+zero,\d+,e8,m1,ta,ma +** | +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m1,ta,ma +** ) +** vmv\.v\.x\s+v\d+,a1 +** vse8\.v\s+v\d+,0\(a0\) +** ret +*/ +void * +f2 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES); +} + +/* Don't vectorise if the move is too large for requested lmul. +** f3: +** li\s+a2,\d+ +** tail\s+memset +*/ +void * +f3 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES + 1); +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c new file mode 100644 index 00000000000..f043d9e0784 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c @@ -0,0 +1,69 @@ +/* { dg-do compile } */ +/* { dg-add-options riscv_v } */ +/* { dg-additional-options "-O3 --param riscv-autovec-lmul=m8" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8) + +/* Small memsets shouldn't be vectorised. +** f1: +** ( +** sb\s+a1,0\(a0\) +** ... +** | +** li\s+a2,\d+ +** tail\s+memset +** ) +*/ +void * +f1 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES - 1); +} + +/* Vectorise+inline minimum vector register width using requested lmul. +** f2: +** ( +** vsetivli\s+zero,\d+,e8,m8,ta,ma +** | +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m8,ta,ma +** ) +** vmv\.v\.x\s+v\d+,a1 +** vse8\.v\s+v\d+,0\(a0\) +** ret +*/ +void * +f2 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES); +} + +/* Vectorise+inline operations up to requested lmul. +** f3: +** ( +** vsetivli\s+zero,\d+,e8,m8,ta,ma +** | +** li\s+a\d+,\d+ +** vsetvli\s+zero,a\d+,e8,m8,ta,ma +** ) +** vmv\.v\.x\s+v\d+,a1 +** vse8\.v\s+v\d+,0\(a0\) +** ret +*/ +void * +f3 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES * 8); +} + +/* Don't vectorise if the move is too large for requested lmul. +** f4: +** li\s+a2,\d+ +** tail\s+memset +*/ +void * +f4 (void *a, int const b) +{ + return __builtin_memset (a, b, MIN_VECTOR_BYTES * 8 + 1); +} -- 2.34.1