From: David Edelsohn <dje.gcc@gmail.com>
To: "Paul A. Clarke" <pc@us.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] rs6000: Add optimizations for _mm_sad_epu8
Date: Fri, 7 Jan 2022 16:03:35 -0500 [thread overview]
Message-ID: <CAGWvnynyo++Vj6B7UeqoiVs5+bTvJUK2UtbVbWMSp=2ttnnjmQ@mail.gmail.com> (raw)
In-Reply-To: <YdipRMjEDBZu569K@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com>
On Fri, Jan 7, 2022 at 3:57 PM Paul A. Clarke <pc@us.ibm.com> wrote:
>
> On Fri, Jan 07, 2022 at 02:40:51PM -0500, David Edelsohn via Gcc-patches wrote:
> > +#ifdef __LITTLE_ENDIAN__
> > + /* Sum across four integers with two integer results. */
> > + asm ("vsum2sws %0,%1,%2" : "=v" (result) : "v" (vsum), "v" (zero));
> > + /* Note: vec_sum2s could be used here, but on little-endian, vector
> > + shifts are added that are not needed for this use-case.
> > + A vector shift to correctly position the 32-bit integer results
> > + (currently at [0] and [2]) to [1] and [3] would then need to be
> > + swapped back again since the desired results are two 64-bit
> > + integers ([1]|[0] and [3]|[2]). Thus, no shift is performed. */
> > +#else
> > /* Sum across four integers with two integer results. */
> > result = vec_sum2s (vsum, (__vector signed int) zero);
> >
> > If little-endian adds shifts to correct for the position and
> > big-endian does not, why not use the inline asm without the shifts for
> > both? It seems confusing to add the inline asm only for LE instead of
> > always using it with the appropriate comment.
> >
> > It's a good and valuable optimization for LE. Fewer variants are less
> > fragile, easier to test and easier to maintain. If you're going to go
> > to the trouble of using inline asm for LE, use it for both.
>
> BE (only) _does_ need a shift as seen on the next two lines after the
> code snippet above:
> /* Sum across four integers with two integer results. */
> result = vec_sum2s (vsum, (__vector signed int) zero);
> /* Rotate the sums into the correct position. */
> result = vec_sld (result, result, 6);
>
> So, when using {vec_sum2s;vec_sld}:
> - LE gets an implicit shift in vec_sum2s which just needs to be undone
> by the vec_sld, and those shifts don't "cancel out" and get removed
> by GCC.
> - BE does not get any implicit shifts, but needs one that comes from
> vec_sld.
>
> Are you saying use the asm(vsum2sws) and then conditionally call
> vec_sld on BE only?
>
> I viewed this change as a temporary bandage unless and until GCC can
> remove the unnecessary swaps. It seems like the preferred code is
> vec_sum2s/vec_sld, not the asm, but that currently is suboptimal for LE.
Nevermind. I thought that these patches had not been reviewed.
Thanks, David
next prev parent reply other threads:[~2022-01-07 21:03 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-07 19:40 David Edelsohn
2022-01-07 20:57 ` Paul A. Clarke
2022-01-07 21:03 ` David Edelsohn [this message]
-- strict thread matches above, loose matches on Subject: below --
2021-10-22 17:28 Paul A. Clarke
2021-11-19 18:09 ` Segher Boessenkool
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGWvnynyo++Vj6B7UeqoiVs5+bTvJUK2UtbVbWMSp=2ttnnjmQ@mail.gmail.com' \
--to=dje.gcc@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=pc@us.ibm.com \
--cc=segher@kernel.crashing.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).