From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) by sourceware.org (Postfix) with ESMTPS id 16B4E3858C1F for ; Wed, 14 Jun 2023 19:10:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 16B4E3858C1F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pl1-x62e.google.com with SMTP id d9443c01a7336-1b4f8523197so48495ad.1 for ; Wed, 14 Jun 2023 12:10:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686769847; x=1689361847; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=tfAtpWvlI1evTALrFo1gy4fQiBA4ubNZ0hGnw1PB+/0=; b=I0+q1ngdcbrlE3D24vwDNg+omP5ehD+HiXL6u4j0BTInAcK03Vkh/puRGaAe4Er8P1 SCSO7hQ6fbjairVRX0uIlE2/o/XYIUFGoyynzr2I8smnvfj7qsPW/FzcgeNQUh3OXcSV x38ncnSM2I9wyzApfHmGJxn6CeE35gefCSRCYp/WELPXD23n6L2PjWVTrafsfP6BTfJZ oa2WjivSntha9nSXdVeBLEPTbf7753qN3JUiuymEmllGp7UE5MWtavgl9BOC4L1Agkw2 DhBawgCIwqfL7S7PspaXMvSLHdkBP4lMnCyHoNaW+zJDjQRlBWYNS+sWfHCZaZYKtk3Y fpQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686769847; x=1689361847; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tfAtpWvlI1evTALrFo1gy4fQiBA4ubNZ0hGnw1PB+/0=; b=Y6zmV1T6sAwVZtx6B9L1DeUzSFGCFfiDQLo4lUDenIT+JxDXejyiWSfL8+FemqWn8e of84TWKj73gcFJMWKJi99YApAQIUhWxxTvLOv2DE7HGiULn+UhAI3KmNUg+SbOEIEET7 BTKqzo9pm7KhBNwA3tM9zNCLBWsSk6uDahdCN/7zxW8b5suujCMzXP/T9Rg6T5q/a2Pe LCEpPnooTychTjz94oREmK4awddamyLeQs8emo7SWe42x86JK8d0ndEobh38Sxu2UWsd /JYvFaAWemNa3Q/DvoHIlMUpEvOkv067lS1wFU5+8Lt5Mm3GjmVyekLBkVAg3H7/3Ucj 4TCQ== X-Gm-Message-State: AC+VfDwAED5ge7KMs1Dzd4+PSLku34p2/QQMfOfDzPVvBcLatNXldPNX Tc3MKV5wJCBInKlibmijfzo= X-Google-Smtp-Source: ACHHUZ41VXzSWsQYc86CoB5oosQ/gTwJKe/3WN/HxmhvbuDPCq6sGjv0AaGcsD+FGaDsrUGJA9zR8g== X-Received: by 2002:a17:902:8d8a:b0:1ac:a88a:70b6 with SMTP id v10-20020a1709028d8a00b001aca88a70b6mr2683868plo.31.1686769847042; Wed, 14 Jun 2023 12:10:47 -0700 (PDT) Received: from [172.31.0.109] ([136.36.130.248]) by smtp.gmail.com with ESMTPSA id jj1-20020a170903048100b001b3f47ea2e8sm3499331plb.117.2023.06.14.12.10.45 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 14 Jun 2023 12:10:46 -0700 (PDT) Message-ID: <16a20d04-e954-4fa1-f8ed-e743b0faea8a@gmail.com> Date: Wed, 14 Jun 2023 13:10:44 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 Subject: Re: [PATCH] RISC-V: Use merge approach to optimize vector permutation Content-Language: en-US To: Robin Dapp , juzhe.zhong@rivai.ai, gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, palmer@dabbelt.com, palmer@rivosinc.com References: <20230614042409.266841-1-juzhe.zhong@rivai.ai> From: Jeff Law In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 6/14/23 09:00, Robin Dapp wrote: > Hi Juzhe, > > the general method seems sane and useful (it's not very complicated). > I was just distracted by > >> Selector = { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }, the common expression: >> { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ... } >> >> For this selector, we can use vmsltu + vmerge to optimize the codegen. > > because it's actually { 0, nunits + 1, 2, nunits + 3, ... } or maybe > { 0, nunits, 0, nunits, ... } + { 0, 1, 2, 3, ..., nunits - 1 }. > > Because of the ascending/monotonic? selector structure we can use vmerge > instead of vrgather. > >> +/* Recognize the patterns that we can use merge operation to shuffle the >> + vectors. The value of Each element (index i) in selector can only be >> + either i or nunits + i. >> + >> + E.g. >> + v = VEC_PERM_EXPR (v0, v1, selector), >> + selector = { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ... } > > Same. > >> + >> + We can transform such pattern into: >> + >> + v = vcond_mask (v0, v1, mask), >> + mask = { 0, 1, 0, 1, 0, 1, ... }. */ >> + >> +static bool >> +shuffle_merge_patterns (struct expand_vec_perm_d *d) >> +{ >> + machine_mode vmode = d->vmode; >> + machine_mode sel_mode = related_int_vector_mode (vmode).require (); >> + int n_patterns = d->perm.encoding ().npatterns (); >> + poly_int64 vec_len = d->perm.length (); >> + >> + for (int i = 0; i < n_patterns; ++i) >> + if (!known_eq (d->perm[i], i) && !known_eq (d->perm[i], vec_len + i)) >> + return false; >> + >> + for (int i = n_patterns; i < n_patterns * 2; i++) >> + if (!d->perm.series_p (i, n_patterns, i, n_patterns) >> + && !d->perm.series_p (i, n_patterns, vec_len + i, n_patterns)) >> + return false; > > Maybe add a comment that we check that the pattern is actually monotonic > or however you prefet to call it? > > I didn't go through all tests in detail but skimmed several. All in all > looks good to me. So I think that means we want a V2 for the comment updates. But I think we can go ahead and consider V2 pre-approved. jeff