From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by sourceware.org (Postfix) with ESMTPS id 4E91B3858D1E for ; Tue, 25 Apr 2023 06:20:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4E91B3858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1a920d484bdso43957865ad.1 for ; Mon, 24 Apr 2023 23:20:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682403613; x=1684995613; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=5jJnJ3VLCQJcsNdxofvuXEzz0YPf1C3G0y0SRpYDl2A=; b=lcA0UW8cxYZQxaoA3OBQ665oH4Sse3MNNE5wH6VtpOfOd6vh0CFWvIsDBgEnEbeGcS r+lKQ0Nw616fGJhYb2NFJHMAsjl+hKSRW4aAG6H0Ru/GdXXhQNDL/JRRrt95kCFEvI/Q mMRJ/vONejwL9bacAf1W3erw2JJAnb5tXvvPS7rR9Y6Y7lc5E7+rhOUGEOFgBUm5M4yI jvKarXUNqFnS77aAnWhpuix8XGLxsOojNB3bRLJ9PhnnnwUh40VQr1wBUpRDj4J++MUI xrPX3IZaI+rJ26P8jyIYZJYVsUsbxWQ4ii4blr8Qoox4OXP/VVZDdSUkM8T+0XOXx5Cy MlzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682403613; x=1684995613; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=5jJnJ3VLCQJcsNdxofvuXEzz0YPf1C3G0y0SRpYDl2A=; b=mGnfTfMAVhQUIdnb4ODEp3IZ39TQUYaSlWdA8eyXJoicHdsXIjWpa1/S7oDokUTo/I pYtw/Ykr8hFawT/d5aTNrlIQ+jNeZtY+6heTGQcT6SkzkCVGXF+GRAflhzN/64aJReV4 /XsQSwX8P1IvQM/1u8Z7UBOzfkFDriPZsqvwXV/UPMkfJ3WpehRU/8xEt4xnHCOGhPF6 qWh72EeZ/vMvcoxp2AniyG/3tW/RIEa0VwSHO3sTQTEtiFi7ttQRSjThWE5+aC3Q2/eG D8aei+hKuFCOcAZCaJpUdLFRxIUCgOjVRgCyfOIIiBZliUg+FB2nNQAl5+5iqXEahSbR jzWQ== X-Gm-Message-State: AC+VfDzQxQ37GdyVGsVydFLZ41aeOySUTONyRVtYcs4bg1ZVJc3/piBv 6u5re/dzQmKDF2QvYWElocs= X-Google-Smtp-Source: ACHHUZ6Z4mfWl3mOuHDj0H5oa7m6PjZEMT8bXomC+I+ULefZoiRwczuoYH8egbed+fw7ni5Y+62HZQ== X-Received: by 2002:a17:902:d48b:b0:1a9:7c6d:abb with SMTP id c11-20020a170902d48b00b001a97c6d0abbmr5623235plg.29.1682403613016; Mon, 24 Apr 2023 23:20:13 -0700 (PDT) Received: from ?IPV6:2601:681:8600:13d0::99f? ([2601:681:8600:13d0::99f]) by smtp.gmail.com with ESMTPSA id i6-20020a170902eb4600b001a69c759af3sm7487251pli.35.2023.04.24.23.20.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 24 Apr 2023 23:20:12 -0700 (PDT) Message-ID: Date: Tue, 25 Apr 2023 00:20:11 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 Subject: Re: [PATCH] RISC-V: Enable basic RVV auto-vectorization support Content-Language: en-US To: juzhe.zhong@rivai.ai, gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, palmer@dabbelt.com References: <20230407012503.65215-1-juzhe.zhong@rivai.ai> From: Jeff Law In-Reply-To: <20230407012503.65215-1-juzhe.zhong@rivai.ai> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 4/6/23 19:25, juzhe.zhong@rivai.ai wrote: > From: Juzhe-Zhong > > Enable basic auto-vectorization support of WHILE_LEN/LEN_LOAD/LEN_STORE. > gcc/ChangeLog: > > * config/riscv/riscv-protos.h (preferred_simd_mode): New function. > (expand_while_len): Ditto. > * config/riscv/riscv-v.cc (autovec_use_vlmax_p): Ditto. > (preferred_simd_mode): Ditto. > (expand_while_len): Ditto. > * config/riscv/riscv.cc (riscv_convert_vector_bits): Add basic auto-vectorization support. > (riscv_preferred_simd_mode): New function. > (TARGET_VECTORIZE_PREFERRED_SIMD_MODE): New targethook for RVV auto-vectorization support. > * config/riscv/vector.md: Add basic autovec. > * config/riscv/autovec.md: New file. > > + > +;; len_load/len_store is a sub-optimal pattern for RVV auto-vectorization support. > +;; We will replace them when len_maskload/len_maskstore is supported in loop vectorizer. Presumably these are the key primitive you want to build all the basic vector memory operations on top of? We should keep in mind strided accesses which can be important for x264. > @@ -729,4 +730,81 @@ gen_avl_for_scalar_move (rtx avl) > } > } > > +/* SCALABLE means that the vector-length is agnostic (run-time invariant and > + compile-time unknown). FIXED meands that the vector-length is specific > + (compile-time known). Both RVV_SCALABLE and RVV_FIXED_VLMAX are doing > + auto-vectorization using VLMAX vsetvl configuration. */ Typo. meands -> means. > +static bool > +autovec_use_vlmax_p (void) > +{ > + return riscv_autovec_preference == RVV_SCALABLE > + || riscv_autovec_preference == RVV_FIXED_VLMAX; > +} Formatting nit. Add parens when you have to wrap lines. > + > +/* Return the vectorization machine mode for RVV according to LMUL. */ mode -> MODE. In general when referring to a function argument use all caps. > +machine_mode > +preferred_simd_mode (scalar_mode mode) > +{ > + /* We only enable auto-vectorization when TARGET_MIN_VLEN >= 128 > + which is -march=rv64gcv. Since GCC loop vectorizer report ICE > + when we enable -march=rv64gc_zve32* and -march=rv32gc_zve64*. > + in the 'can_duplicate_and_interleave_p' of tree-vect-slp.cc. Since we have > + VNx1SImode in -march=*zve32* and VNx1DImode in -march=*zve64*, they are > + enabled in targetm. vector_mode_supported_p and SLP vectorizer will try to > + use them. Currently, we can support auto-vectorization in > + -march=rv32_zve32x_zvl128b. Wheras, -march=rv32_zve32x_zvl32b or > + -march=rv32_zve32x_zvl64b are disabled. > + */ > + if (autovec_use_vlmax_p ()) > + { > + /* If TARGET_MIN_VLEN * riscv_autovec_lmul < 128, we don't allow > + auto-vectorization since Loop Vectorizer may use VNx1SImode or > + VNx1DImode to vectorize which will create ICE in the > + 'can_duplicate_and_interleave_p' of tree-vect-slp.cc. */ > + if (TARGET_MIN_VLEN * riscv_autovec_lmul < 128) > + return word_mode; > + /* We use LMUL = 1 as base bytesize which is BYTES_PER_RISCV_VECTOR and > + riscv_autovec_lmul as multiply factor to calculate the the NUNITS to > + get the auto-vectorization mode. */ > + poly_uint64 nunits; > + poly_uint64 vector_size > + = BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul); > + poly_uint64 scalar_size = GET_MODE_SIZE (mode); > + if (!multiple_p (vector_size, scalar_size, &nunits)) > + return word_mode; > + machine_mode rvv_mode; > + if (get_vector_mode (mode, nunits).exists (&rvv_mode)) > + return rvv_mode; > + } Is there a reason not to emit a diagnostic when the user asks for a configuration we can not currently support? Are the limitations documented in invoke.texi? > + > +/* Expand WHILE_LEN pattern. If we can find a mode for a corresponding > + NUNITS, we emit vsetvl instructions directly. Otherwise, we emit > + UMIN (operand1, NUNITS). */ > +void > +expand_while_len (rtx *ops) > +{ > + poly_int64 nunits; > + gcc_assert (poly_int_rtx_p (ops[2], &nunits)); > + /* We arbitrary picked QImode as inner scalar mode to get vector mode. > + since vsetvl only demand ratio. We let VSETVL PASS to optimize it. */ > + scalar_int_mode mode = QImode; > + machine_mode rvv_mode; > + if (get_vector_mode (mode, nunits).exists (&rvv_mode)) > + { > + rtx vsetvl_rtx > + = gen_no_side_effects_vsetvl_rtx (rvv_mode, ops[0], ops[1]); > + emit_insn (vsetvl_rtx); > + } > + else > + { > + rtx tmp = gen_reg_rtx (Pmode); > + emit_move_insn (tmp, gen_int_mode (nunits, Pmode)); > + expand_binop (Pmode, umin_optab, tmp, ops[1], ops[0], true, OPTAB_LIB); > + } > +} I thought it had been determined that WHILE_LEN wasn't actually a true MIN operation and instead was slightly more complex? Or did I mis-remember? No major concerns here. I'll hold off ACKing pending answers to the questions about whether or not we should emit diagnostics, documentation for limitations and the WHILE_LEN question. Jeff