Re: Re: [PATCH V2] RISC-V: Support POPCOUNT auto-vectorization

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: 钟居哲 <juzhe.zhong@rivai.ai>
To: rdapp.gcc <rdapp.gcc@gmail.com>,  gcc-patches <gcc-patches@gcc.gnu.org>
Cc: rdapp.gcc <rdapp.gcc@gmail.com>,
	 kito.cheng <kito.cheng@sifive.com>,
	 kito.cheng <kito.cheng@gmail.com>
Subject: Re: Re: [PATCH V2] RISC-V: Support POPCOUNT auto-vectorization
Date: Tue, 1 Aug 2023 06:03:06 +0800	[thread overview]
Message-ID: <3A2B597C07204E6C+2023080106030634603987@rivai.ai> (raw)
In-Reply-To: <80c23bdc-6a4c-55e5-d930-962f3b24d8b9@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3054 bytes --]

>> From my recollection this is usually 30-40% faster than the naive tree
>> adder and also amenable to vectorization.  As long as the multiplication
>> is not terribly slow, that is.  Mula's algorithm should be significantly
>> faster even, another 30% IIRC.
 
>> I'm not against continuing with the more well-known approach for now
>> but we should keep in mind that might still be potential for improvement.

No. I don't think it's faster.

>> Wait, why do we need vec_pack_trunc for popcountll?  For me vectorizing
>> it "just works" when the output is a uint64_t just like the standard
>> name demands.

>> If you're referring to something else, please detail in the comment.

I have no ideal. I saw ARM SVE generate:
POP_COUNT
POP_COUNT
VEC_PACK_TRUNC.

I am gonna drop this patch since it's meaningless.

Thanks.


juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-08-01 03:38
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng
Subject: Re: [PATCH V2] RISC-V: Support POPCOUNT auto-vectorization
Hi Juzhe,
 
> +/* Expand Vector POPCOUNT by parallel popcnt:
> +
> +   int parallel_popcnt(uint32_t n) {
> +   #define POW2(c)      (1U << (c))
> +   #define MASK(c)      (static_cast<uint32_t>(-1) / (POW2(POW2(c)) + 1U))
> +   #define COUNT(x, c)  ((x) & MASK(c)) + (((x)>>(POW2(c))) & MASK(c))
> + n = COUNT(n, 0);
> + n = COUNT(n, 1);
> + n = COUNT(n, 2);
> + n = COUNT(n, 3);
> + n = COUNT(n, 4);
> +   // n = COUNT(n, 5);  // uncomment this line for 64-bit integers
> + return n;
> +   #undef COUNT
> +   #undef MASK
> +   #undef POW2
> +   }
 
That's quite a heavy implementation but I suppose with the proper cost
function it can still be worth it.  Did you also try some alternatives?
WWG comes to mind:
 
uint64_t c1 = 0x5555555555555555;
uint64_t c2 = 0x3333333333333333;
uint64_t c4 = 0x0F0F0F0F0F0F0F0F;
 
uint64_t wwg (uint64_t x) {
    x -= (x >> 1) & c1;
    x = ((x >> 2) & c2) + (x & c2);
    x = (x + (x >> 4) ) & c4;
    x *= 0x0101010101010101;
    return x >> 56;
}
 
From my recollection this is usually 30-40% faster than the naive tree
adder and also amenable to vectorization.  As long as the multiplication
is not terribly slow, that is.  Mula's algorithm should be significantly
faster even, another 30% IIRC.
 
I'm not against continuing with the more well-known approach for now
but we should keep in mind that might still be potential for improvement.
 
>  } // namespace riscv_vector
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c
 
Any particular reason why the tests are in widen?
 
> +extern void abort (void) __attribute__ ((noreturn));
 
Why no __builtin_unreachable as in the other tests? 
 
> +      asm volatile ("" ::: "memory");
 
Is this necessary?  I doesn't hurt of course, just wondering.
 
All in all LGTM in case you'd rather get this upstream now.  We can
always improve later.
 
Regards
Robin

next prev parent reply	other threads:[~2023-07-31 22:03 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-31 14:13 Juzhe-Zhong
2023-07-31 19:38 ` Robin Dapp
2023-07-31 20:27   ` Robin Dapp
2023-07-31 22:03   ` 钟居哲 [this message]
2023-08-01  6:47     ` Robin Dapp
2023-08-04 20:56       ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3A2B597C07204E6C+2023080106030634603987@rivai.ai \
    --to=juzhe.zhong@rivai.ai \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=kito.cheng@gmail.com \
    --cc=kito.cheng@sifive.com \
    --cc=rdapp.gcc@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).