public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/104600] New: VCE<integer_type>(vector){} should be converted (or expanded) into BIT_INSERT_EXPR
@ 2022-02-18 21:15 pinskia at gcc dot gnu.org
  2022-02-21  8:00 ` [Bug tree-optimization/104600] " rguenth at gcc dot gnu.org
  2022-02-21  8:45 ` pinskia at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-18 21:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104600

            Bug ID: 104600
           Summary: VCE<integer_type>(vector){} should be converted (or
                    expanded) into BIT_INSERT_EXPR
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

When I looked at PR 104582, I Noticed that we had:

 _14 = {_1, _5};
  _8 = VIEW_CONVERT_EXPR<__int128>(_14);

Which can be converted into (with the ordering corrected for endianness):
t1 = (__128)_1
_8 = BIT_INSERT_EXPR(t1, 64, _5);

You can see this by taking the following testcases:

#define vector __attribute__((vector_size(16)))

__int128 f(long a, long b)
{
  vector long t = {a, b};
  return (__int128)t;
}

void f1(__int128 *t1, long a, long b)
{
  vector long t = {a, b};
  *t1 =  (__int128)t;
}

void f2(__int128 *t1, long a, long b)
{
  vector long t = {a, b};
  *t1 =  ((__int128)t) + 1;
}


f2 is really bad for x86_64 as GCC does a store to the stack and then loads
back.

Note if you use | instead of +, GCC does the right thing even.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/104600] VCE<integer_type>(vector){} should be converted (or expanded) into BIT_INSERT_EXPR
  2022-02-18 21:15 [Bug tree-optimization/104600] New: VCE<integer_type>(vector){} should be converted (or expanded) into BIT_INSERT_EXPR pinskia at gcc dot gnu.org
@ 2022-02-21  8:00 ` rguenth at gcc dot gnu.org
  2022-02-21  8:45 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-21  8:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104600

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
BIT_INSERT_EXPR is releatively "new" so code-gen needs to be investigated on
multiple targets.  Also look for targets with native 64bit vectors and
uint64_t / uint32_t, thus where the vector size matches WORD_SIZE.

Might be also something for specialized expansion with TER or rather
in ISEL nowadays depending on how TImode is laid out.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/104600] VCE<integer_type>(vector){} should be converted (or expanded) into BIT_INSERT_EXPR
  2022-02-18 21:15 [Bug tree-optimization/104600] New: VCE<integer_type>(vector){} should be converted (or expanded) into BIT_INSERT_EXPR pinskia at gcc dot gnu.org
  2022-02-21  8:00 ` [Bug tree-optimization/104600] " rguenth at gcc dot gnu.org
@ 2022-02-21  8:45 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-21  8:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104600

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Here is another example even with 32bit int and 64bit long:
#define vector __attribute__((vector_size(8)))

long f(int a, int b)
{
  vector int t = {a, b};
  return (long)t;
}

void f1(long *t1, int a, int b)
{
  vector int t = {a, b};
  *t1 =  (long)t;
}

void f2(long *t1, int a, int b)
{
  vector int t = {a, b};
  *t1 =  ((long)t) + 1;
}

long f_1(unsigned a, unsigned b)
{
  long t = (((unsigned long)a) << 32) | (unsigned long)b;
  return (long)t;
}

void f1_1(long *t1, unsigned a, unsigned b)
{
  long t = (((unsigned long)a) << 32) | (unsigned long)b;
  *t1 =  (long)t;
}

void f2_1(long *t1, unsigned a, unsigned b)
{
  long t = (((unsigned long)a) << 32) | (unsigned long)b;
  *t1 =  ((long)t) + 1;
}

----- CUT ----
For f2 and f2_1 we have:

        movd    %esi, %xmm0
        movd    %edx, %xmm1
        punpckldq       %xmm1, %xmm0
        movq    %xmm0, %rsi
        addq    $1, %rsi
        movq    %rsi, (%rdi)

vs
        salq    $32, %rsi
        movl    %edx, %edx
        orq     %rdx, %rsi
        addq    $1, %rsi
        movq    %rsi, (%rdi)

It all depends on how fast move between GPRs and SSE register sets is vs doing
it all in the integer.

I only think this should be done for the 2x size case and nothing more.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-02-21  8:45 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-18 21:15 [Bug tree-optimization/104600] New: VCE<integer_type>(vector){} should be converted (or expanded) into BIT_INSERT_EXPR pinskia at gcc dot gnu.org
2022-02-21  8:00 ` [Bug tree-optimization/104600] " rguenth at gcc dot gnu.org
2022-02-21  8:45 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).