From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f]) by sourceware.org (Postfix) with ESMTPS id 9F8D23858416 for ; Mon, 12 Jun 2023 19:42:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9F8D23858416 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pj1-x102f.google.com with SMTP id 98e67ed59e1d1-25bf7568f73so811828a91.2 for ; Mon, 12 Jun 2023 12:42:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686598959; x=1689190959; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=1CJTMeSlPRmsLXYp2KwPGrWO5B7KNU/joulbfCaIvoU=; b=TafYgWE8+vLsm9qxYXyNs9iWKpwvNGZGAYEmg0+mpxCmF+X0EWQimjd0LeMlso2XTi 2DPvB7I5lOE+g70n+7zN7JZJCock8AVjba4yx+yQMI5WuxqkRFvVcnBtUuOTx2wcZ9Kq nKYD9Lf3Ts7VtbUVzPBgpmv6yBLYNw9oj35PZN/3NJEfWcgvaFD9L4cpLk7DkeG0LYj+ SZmU7Z8I9NtefjNN4sui1+uclYABuMIG50SPxALEyiearwurRafLHPAeymmTtB5OdVqY C3qXHOlw3JaVsea2wT5r+Kz2DpjnFIVx+JcLpfZBy/T4rOsdfpIt0lZPwO/n941dcw+7 K4PQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686598959; x=1689190959; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1CJTMeSlPRmsLXYp2KwPGrWO5B7KNU/joulbfCaIvoU=; b=ZQQ4PfSD6zn+XoXRsipBlJlU+kBMaV+4rzkxrXZ2eVl5j32k/Yr6K0i2DniRjXHedW knuvFmbHROJ9GWb0IzV8F+ZivqzZrNUL5TPUa4nypzMx7otXjlHUKeJVvg0sCtw9oTQY YO5OFMIEn4UMhRxIbFUFLIa3xjnbtnUhtyALgTeH+HXZ4wRRHsMAbdO4TPbkRSxtL1uF eg8yCI6uW4/K6pr1UjFmZkVg0Gz0Pqttj++EqM8MmUF4CDzLe/XzvDNSarqMYa8GDF1i AVTmiKUZHREdzQ9rsfCv5kemXIY05eZal1SwdB3jWGafvex7b7D6Ci64zPu6M2E0NeS5 s0Sw== X-Gm-Message-State: AC+VfDxdr6rcKj06YRPmwRruKGWV8ZgCvaVj2C+UeRAbLw8D78JT3Jg+ 3FIV6sYmU63o1IsB6C0LVR8= X-Google-Smtp-Source: ACHHUZ6PU0akUTLOIjiQ7TvvzDw7FVVYfUJ2WuLlKLVUXzfjIrTiNy9pPiTp7XXad8lfart1krYi8A== X-Received: by 2002:a17:90a:6c01:b0:25b:c59f:7c0a with SMTP id x1-20020a17090a6c0100b0025bc59f7c0amr5594620pjj.21.1686598959328; Mon, 12 Jun 2023 12:42:39 -0700 (PDT) Received: from [172.31.0.109] ([136.36.130.248]) by smtp.gmail.com with ESMTPSA id ne11-20020a17090b374b00b002555689006esm9544449pjb.47.2023.06.12.12.42.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 12 Jun 2023 12:42:38 -0700 (PDT) Message-ID: Date: Mon, 12 Jun 2023 13:42:37 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 Subject: Re: [PATCH V2] RISC-V: Enhance RVV VLA SLP auto-vectorization with decompress operation Content-Language: en-US To: juzhe.zhong@rivai.ai, gcc-patches@gcc.gnu.org Cc: kito.cheng@sifive.com, palmer@rivosinc.com, rdapp.gcc@gmail.com References: <20230612151107.13373-1-juzhe.zhong@rivai.ai> From: Jeff Law In-Reply-To: <20230612151107.13373-1-juzhe.zhong@rivai.ai> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 6/12/23 09:11, juzhe.zhong@rivai.ai wrote: > From: Juzhe-Zhong > > According to RVV ISA: > https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc > > We can enhance VLA SLP auto-vectorization with (16.5.1. Synthesizing vdecompress) > Decompress operation. > > Case 1 (nunits = POLY_INT_CST [16, 16]): > _48 = VEC_PERM_EXPR <_37, _35, { 0, POLY_INT_CST [16, 16], 1, POLY_INT_CST [17, 16], 2, POLY_INT_CST [18, 16], ... }>; > We can optimize such VLA SLP permuation pattern into: > _48 = vdecompress (_37, _35, mask = { 0, 1, 0, 1, ... }; > > Case 2 (nunits = POLY_INT_CST [16, 16]): > _23 = VEC_PERM_EXPR <_46, _44, { POLY_INT_CST [1, 1], POLY_INT_CST [3, 3], POLY_INT_CST [2, 1], POLY_INT_CST [4, 3], POLY_INT_CST [3, 1], POLY_INT_CST [5, 3], ... }>; > We can optimize such VLA SLP permuation pattern into: > _48 = vdecompress (slidedown(_46, 1/2 nunits), slidedown(_44, 1/2 nunits), mask = { 0, 1, 0, 1, ... }; > > For example: > void __attribute__ ((noinline, noclone)) > vec_slp (uint64_t *restrict a, uint64_t b, uint64_t c, int n) > { > for (int i = 0; i < n; ++i) > { > a[i * 2] += b; > a[i * 2 + 1] += c; > } > } > > ASM: > ... > vid.v v0 > vand.vi v0,v0,1 > vmseq.vi v0,v0,1 ===> mask = { 0, 1, 0, 1, ... } > vdecompress: > viota.m v3,v0 > vrgather.vv v2,v1,v3,v0.t > Loop: > vsetvli zero,a5,e64,m1,ta,ma > vle64.v v1,0(a0) > vsetvli a6,zero,e64,m1,ta,ma > vadd.vv v1,v2,v1 > vsetvli zero,a5,e64,m1,ta,ma > mv a5,a3 > vse64.v v1,0(a0) > add a3,a3,a1 > add a0,a0,a2 > bgtu a5,a4,.L4 > > > gcc/ChangeLog: > > * config/riscv/riscv-v.cc (emit_vlmax_decompress_insn): New function. > (shuffle_decompress_patterns): New function. > (expand_vec_perm_const_1): Add decompress optimization. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/partial/slp-8.c: New test. > * gcc.target/riscv/rvv/autovec/partial/slp-9.c: New test. > * gcc.target/riscv/rvv/autovec/partial/slp_run-8.c: New test. > * gcc.target/riscv/rvv/autovec/partial/slp_run-9.c: New test. I've been wanting to get inside expand_vec_perm_const to see what opportunities might exist to improve code in there. We had good success mining this space at a prior employer. While we had a lot of weird idioms and costs to consider it was well worth the time. So quite happy to see you diving into this code. OK for the trunk, Jeff