public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/114943] New: X86 AVX2: inefficient code generated to convert SIMD Vectors
@ 2024-05-04 13:10 vincenzo.innocente at cern dot ch
  2024-05-06  3:57 ` [Bug target/114943] " liuhongt at gcc dot gnu.org
  0 siblings, 1 reply; 2+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2024-05-04 13:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114943

            Bug ID: 114943
           Summary: X86 AVX2: inefficient code generated to convert SIMD
                    Vectors
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vincenzo.innocente at cern dot ch
  Target Milestone: ---

in the example below (see https://godbolt.org/z/qnfT4fE5G )
convert and covert3 produce code that looks to me inefficient w/r/t convert2
(and clang)  for target x86-64-v3

#define VECTOR_EXT(N) __attribute__((vector_size(N)))
typedef float VECTOR_EXT(16) float32x4_t;
typedef double VECTOR_EXT(32) float64x4_t;

float32x4_t f1,f2,f3,f4,f;
float64x4_t d1,d2,d3,d4,d;


void covert() {
   for (int i=0;i<4;++i) {
    d1[i] = f1[i];
    d2[i] = f2[i];
    d3[i] = f3[i];
    d4[i] = f4[i];
  }

}

void covert2() {
   for (int i=0;i<4;++i)
    d1[i] = f1[i];
     for (int i=0;i<4;++i)
    d2[i] = f2[i];
     for (int i=0;i<4;++i)
    d3[i] = f3[i];
     for (int i=0;i<4;++i)
    d4[i] = f4[i];
}



void covert3() {
  d1 = __builtin_convertvector(f1,float64x4_t);
}

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug target/114943] X86 AVX2: inefficient code generated to convert SIMD Vectors
  2024-05-04 13:10 [Bug target/114943] New: X86 AVX2: inefficient code generated to convert SIMD Vectors vincenzo.innocente at cern dot ch
@ 2024-05-06  3:57 ` liuhongt at gcc dot gnu.org
  0 siblings, 0 replies; 2+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-05-06  3:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114943

Hongtao Liu <liuhongt at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |liuhongt at gcc dot gnu.org

--- Comment #1 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
For convert3, we already have a patch for it, and will post soon.
For convert, the current loop vectorizer has a limitation to keep the same
vector length while vectorizing, thus generating extra packing/unpacking
instructions compared to convert2. But there's no such limitation in BB
vectorizer, so w/ -O3 -march=x86-64-v3 -fno-tree-loop-vectorize, convert is as
good as convert2

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-05-06  3:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-04 13:10 [Bug target/114943] New: X86 AVX2: inefficient code generated to convert SIMD Vectors vincenzo.innocente at cern dot ch
2024-05-06  3:57 ` [Bug target/114943] " liuhongt at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).