From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by sourceware.org (Postfix) with ESMTPS id 137FD3858C5E for ; Fri, 4 Aug 2023 20:52:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 137FD3858C5E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pl1-x632.google.com with SMTP id d9443c01a7336-1bbc06f830aso17805715ad.0 for ; Fri, 04 Aug 2023 13:52:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691182354; x=1691787154; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=uVtD48Vw8O3/f13KflVhYFqfXwwp8uQaZUAmeDqk+Ow=; b=bXremU1PrZVqFdffbEFUj/7sow9+wr8L3sq+VZUTPxqh3zED2jvPNSDeWh2FkOvt3A ky6Z8JqKYKiol2QFnh/IHyfKpsQjmQTH4lNXOGAhXGrdv03e8JrwwvzAI4s4rVzHINfU kJLgJ5PccH6tXBYZzPRWou4zx+4lqRvOsE2mKpl3CAbwVaY+6ejFdYEksHao9/3Y75LG D0FzV8hfU+mYqXuPvqqkhKM4So/eGffswHGKm1mPYA6KQX3jxwSpl9KZl63hgrnKBLog Dn8gJ8aHycKKqv8X4aElMd34jRwRk+5hOPuAwYTRnMM3vNqrUuxaBPMWtgMy9wmUqpU/ oH9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691182354; x=1691787154; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uVtD48Vw8O3/f13KflVhYFqfXwwp8uQaZUAmeDqk+Ow=; b=c0fEnyKJ83Aem15OwXAxghIMeHipNZH9tIv5MB1f4aP+ZnsYdg5SvhFYcWG4HrjrXy xeHOhtqKruJE6hhBxJa5qm9DmMlzqB6wNIEB2p60PWVwulpWntNeGDAHHbZRl3Vcd+s+ /aTdGABNmfztSVpmmZvPnqeUTKyByGxcWO5Kt8gMDPXWaMGagYG6vkRZ+TPgOhfs3tlW y07fNCdOoA5L7tWTxIIwTKHYk1wYleUpfYBuvb/5xAz5wxt14cvKVk/tDXfT+1e3AGpf eJ8W7JochqeBos4l5qkSaxwXAVOIg1AV3BgjVPdGm9vV9TbA5iXc3BGHu1q6A5Yzm86b AyeQ== X-Gm-Message-State: AOJu0Yws4qaprbvBR0Uanz/upk7JeVVjj8bseBkCrriuQ1TqVBOQU7u4 2vuB3/GF83GI54qvrDJvUwo= X-Google-Smtp-Source: AGHT+IGGNUwxE/k7EVD9lEfFNz7hkhd/zsFsh1HMFZwhDbzeZMboOyMZpR5iIQax1du0cXfTLaVZPQ== X-Received: by 2002:a17:902:f693:b0:1b5:674d:2aa5 with SMTP id l19-20020a170902f69300b001b5674d2aa5mr963666plg.13.1691182353787; Fri, 04 Aug 2023 13:52:33 -0700 (PDT) Received: from [172.31.0.109] ([136.36.130.248]) by smtp.gmail.com with ESMTPSA id t15-20020a170902b20f00b001b9cb27e07dsm2182528plr.45.2023.08.04.13.52.32 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 04 Aug 2023 13:52:33 -0700 (PDT) Message-ID: Date: Fri, 4 Aug 2023 14:52:31 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: cpymem for RISCV with v extension Content-Language: en-US To: Joern Rennecke , GCC Patches References: From: Jeff Law In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 7/17/23 22:47, Joern Rennecke wrote: > Subject: > cpymem for RISCV with v extension > From: > Joern Rennecke > Date: > 7/17/23, 22:47 > > To: > GCC Patches > > > As discussed on last week's patch call, this patch uses either a > straight copy or an opaque pattern that emits the loop as assembly to > optimize cpymem for the 'v' extension. > I used Ju-Zhe Zhong's patch - starting in git with: > > Author: zhongjuzhe<66454988+zhongjuzhe@users.noreply.github.com> > Date: Mon Mar 21 14:20:42 2022 +0800 > > PR for RVV support using splitted small chunks (#334) > > as a starting point, even though not all that much of the original code remains. > > Regression tested on x86_64-pc-linux-gnu X > riscv-sim > riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zba_zbb_zbc_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32f > riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32 > riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32f > riscv-sim/-march=rv32imfdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32 > riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zba_zbb_zbc_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d > riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zba_zbb_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d > riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d > > > cpymem-diff-20230718.txt > > 2023-07-12 Ju-Zhe Zhong > Joern Rennecke > > * config/riscv/riscv-protos.h (riscv_vector::expand_block_move): > Declare. > * config/riscv/riscv-v.cc (riscv_vector::expand_block_move): > New function. > * config/riscv/riscv.md (cpymemsi): Use riscv_vector::expand_block_move. > * config/riscv/vector.md (@cpymem_straight): > New define_insn patterns. > (@cpymem_loop): Likewise. > (@cpymem_loop_fast): Likewise. > > diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc > index b4884a30872..e61110fa3ad 100644 > --- a/gcc/config/riscv/riscv-v.cc > +++ b/gcc/config/riscv/riscv-v.cc > @@ -49,6 +49,7 @@ > #include "tm-constrs.h" > #include "rtx-vector-builder.h" > #include "targhooks.h" > +#include "predict.h" Not sure this is needed, but I didn't scan for it explicitly. If it's not needed, then remove it. > + if (CONST_INT_P (length_in)) > + { > + HOST_WIDE_INT length = INTVAL (length_in); > + > + /* By using LMUL=8, we can copy as many bytes in one go as there > + are bits in a vector register. If the entire block thus fits, > + we don't need a loop. */ > + if (length <= TARGET_MIN_VLEN) > + { > + need_loop = false; > + > + /* If a single scalar load / store pair can do the job, leave it > + to the scalar code to do that. */ > + > + if (pow2p_hwi (length) && length <= potential_ew) > + return false; > + } We could probably argue over the threshold for doing the copy on the scalar side, but I don't think it's necessary. Once we start seeing V hardware we can revisit. > + > + /* Find the vector mode to use. Using the largest possible element > + size is likely to give smaller constants, and thus potentially > + reducing code size. However, if we need a loop, we need to update > + the pointers, and that is more complicated with a larger element > + size, unless we use an immediate, which prevents us from dynamically > + using the largets transfer size that the hart supports. And then, > + unless we know the*exact* vector size of the hart, we'd need > + multiple vsetvli / branch statements, so it's not even a size win. > + If, in the future, we find an RISCV-V implementation that is slower > + for small element widths, we might allow larger element widths for > + loops too. */ s/largets/largest/ And a space is missing in "the*extact*" Note that I think the proposed glibc copier does allow larger elements widths for this case. > + > + /* Unless we get an implementation that's slow for small element > + size / non-word-aligned accesses, we assume that the hardware > + handles this well, and we don't want to complicate the code > + with shifting word contents around or handling extra bytes at > + the start and/or end. So we want the total transfer size and > + alignemnt to fit with the element size. */ s/alignemnt/alignment/ Yes, let's not try to support every uarch we can envision and instead do a good job on the uarches we know about. If a uarch with slow element or non-word aligned accesses comes along, they can propose changes at that time. > + > + // The VNx*?I modes have a factor of riscv_vector_chunks for nunits. Comment might need updating after the recent work to adjust the modes. I don't recall if we kept the VNx*?I modes or not. So the adjustments are all comment related, so this is OK after fixing the comments. Just post the update for archival purposes and consider it pre-approved for the trunk. Thanks and sorry for the wait Joern. jeff