public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* How to use the KNC Vectorregisters with GCC?  Race condition with ICC & KNC?
       [not found] <1390297511.8654.ezmlm@gcc.gnu.org>
@ 2014-01-21 10:23 ` Stephan Walter
  2014-01-21 19:53   ` Brian Budge
  0 siblings, 1 reply; 3+ messages in thread
From: Stephan Walter @ 2014-01-21 10:23 UTC (permalink / raw)
  To: gcc-help

Hi,

i am new to the gcc mailinglist, so i hope i am right here.

As the subject shows, i work with KNC. My problem is, that i have 
developed a kernel modul for a NIC and now want to use the 512Bit 
registers of KNC for some memcopy jobs.

I have experience how to use the GCC to compile der KNC-linux and kernel 
moduls. So no problem at the moment. Everything works fine.

Before i started to write inline assembler with the 512Bit registers, i 
have written some minimal examples.

On a normal i5-3470 everything works fine together with the gcc. Also on 
KNC everything works. The problem now is, that when i try to use the 
512Bit registers, it looks like GCC doesn't know the register names and 
instructions.

To solve the problem with the instructions i think is no problem, 
because i have the instruction manual, but i have no idea how to solve 
the register problem.

So i try to use the ICC with -mmic. The source compiles, but when i 
measure the clock cycles with rdtsc, the two first check work, but the 
3. and 4. not.
I tried to solve the problem with the gdb, but when i use -g the mistake 
no longer occur. Also when i use a printf, sleep(1) or usleep(1), the 
problem is fixed. So i think there is a race condition with the write of 
the value into the memory, because 1 or even 100 nops have no effects.

My inline assembler knowledge is rudimental, so i don't know if i have 
some problems with the use of clobber registers and so on or if there is 
a bug in gcc or icc.

That the -g with the icc solve the problem makes it impossible for me to 
debug the problem. So i hope somebody is able to help me.

My favourite is to use gcc together with the 512Bit registers, if there 
is a bug in my inline assembler, a solution/hint would be also fine.

So there is my code:

#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>


int rdtsc_count(void){
int count;
__asm__ __volatile__(   "rdtsc;                 \n\t"
                         "movl   %%eax, %0;      \n\t"
                          :"=m"(count)//, "=r"(brd), "=r"(crd), "=r"(drd)
                          :
                          :"%eax", "memory"//, "cc"//, "%ebx", "%ecx"
                         );

return count;
}


int main(int argc, char *argv[]){

int starta=0, startb=0, stopa=0, stopb=0;
int buffer_size=32;
uint64_t* buffer;
uint32_t buflen=atoi(argv[1]);


/////////////setup
buffer = (uint64_t*) malloc (buffer_size*sizeof(uint64_t));
packet_buffer = (uint64_t*) malloc (buffer_size*sizeof(uint64_t));
packet_buffer_ref= (uint64_t*) malloc (buffer_size*sizeof(uint64_t));//REF

waddr=0;

//printf("Adresse von packet_buffer %x", waddr);
printf("Orginaldaten\n");
for(i=0; i<buffer_size; i++){
         buffer[i]=i+i*i;
         packet_buffer[i]=0;
         packet_buffer_ref[i]=0;
         printf("%x\t", buffer[i]);
};
printf("\n");

printf("packet_buffer start\n");
for(i=0; i<buffer_size; i++){
         printf("%x\t", packet_buffer[i]);
};
printf("\n");

////////////end_setup

if(buflen==0 | buflen>120){
         printf("buflen too big or too small\n");
         return 0;
}


########################################
starta=rdtsc_count();
memcpy(&(packet_buffer_ref[waddr+1]), buffer, 
sizeof(uint64_t)*(buflen));//REF
stopa=rdtsc_count();
printf("memcpy took\t%d\tclocks\n", stopa-starta);
########################################
##Here everything is fine
########################################

########################################
startb=rdtsc_count();
__asm__ (             "movq   %1,             %%rsi;          \n\t"
                         "movq   %0,             %%rdi;          \n\t"
                         "movl   %2,             %%ecx;          \n\t"
                         "addq   $8,             %%rdi;          \n\t"
//                      "shl    $3,             %%ecx;          \n\t"
         "Schleife:       movsq;                                 \n\t"
                         "loop Schleife;                         \n\t"
                         :"=m"(packet_buffer)
                         :"r"(buffer), "r"(buflen)
                         :"%rsi", "%rdi", "%rcx", "memory"
                         );

stopb=rdtsc_count();

######################################### If i use one of this 
functions, everything is fine.
//usleep(1);
//printf("stopa %d\n", stopa);
//printf("fdsagfa\n");
#########################################
printf("asm movsq took\t%d\tclocks\n", stopb-startb);

########################################
##Here i have the problem. It looks like stopb or startb is still 0, 
when i use no function between the output and the rdtsc_count()
########################################


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: How to use the KNC Vectorregisters with GCC? Race condition with ICC & KNC?
  2014-01-21 10:23 ` How to use the KNC Vectorregisters with GCC? Race condition with ICC & KNC? Stephan Walter
@ 2014-01-21 19:53   ` Brian Budge
  2014-01-21 20:03     ` Stephan Walter
  0 siblings, 1 reply; 3+ messages in thread
From: Brian Budge @ 2014-01-21 19:53 UTC (permalink / raw)
  To: Stephan Walter; +Cc: GCC-help

On Tue, Jan 21, 2014 at 2:23 AM, Stephan Walter
<stephan.walter@ziti.uni-heidelberg.de> wrote:
> Hi,
>
> i am new to the gcc mailinglist, so i hope i am right here.
>
> As the subject shows, i work with KNC. My problem is, that i have developed
> a kernel modul for a NIC and now want to use the 512Bit registers of KNC for
> some memcopy jobs.
>
> I have experience how to use the GCC to compile der KNC-linux and kernel
> moduls. So no problem at the moment. Everything works fine.
>
> Before i started to write inline assembler with the 512Bit registers, i have
> written some minimal examples.
>
> On a normal i5-3470 everything works fine together with the gcc. Also on KNC
> everything works. The problem now is, that when i try to use the 512Bit
> registers, it looks like GCC doesn't know the register names and
> instructions.
>
> To solve the problem with the instructions i think is no problem, because i
> have the instruction manual, but i have no idea how to solve the register
> problem.
>
> So i try to use the ICC with -mmic. The source compiles, but when i measure
> the clock cycles with rdtsc, the two first check work, but the 3. and 4.
> not.
> I tried to solve the problem with the gdb, but when i use -g the mistake no
> longer occur. Also when i use a printf, sleep(1) or usleep(1), the problem
> is fixed. So i think there is a race condition with the write of the value
> into the memory, because 1 or even 100 nops have no effects.
>
> My inline assembler knowledge is rudimental, so i don't know if i have some
> problems with the use of clobber registers and so on or if there is a bug in
> gcc or icc.
>
> That the -g with the icc solve the problem makes it impossible for me to
> debug the problem. So i hope somebody is able to help me.
>
> My favourite is to use gcc together with the 512Bit registers, if there is a
> bug in my inline assembler, a solution/hint would be also fine.
>
> So there is my code:
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <inttypes.h>
>
>
> int rdtsc_count(void){
> int count;
> __asm__ __volatile__(   "rdtsc;                 \n\t"
>                         "movl   %%eax, %0;      \n\t"
>                          :"=m"(count)//, "=r"(brd), "=r"(crd), "=r"(drd)
>                          :
>                          :"%eax", "memory"//, "cc"//, "%ebx", "%ecx"
>                         );
>
> return count;
> }
>
>
> int main(int argc, char *argv[]){
>
> int starta=0, startb=0, stopa=0, stopb=0;
> int buffer_size=32;
> uint64_t* buffer;
> uint32_t buflen=atoi(argv[1]);
>
>
> /////////////setup
> buffer = (uint64_t*) malloc (buffer_size*sizeof(uint64_t));
> packet_buffer = (uint64_t*) malloc (buffer_size*sizeof(uint64_t));
> packet_buffer_ref= (uint64_t*) malloc (buffer_size*sizeof(uint64_t));//REF
>
> waddr=0;
>
> //printf("Adresse von packet_buffer %x", waddr);
> printf("Orginaldaten\n");
> for(i=0; i<buffer_size; i++){
>         buffer[i]=i+i*i;
>         packet_buffer[i]=0;
>         packet_buffer_ref[i]=0;
>         printf("%x\t", buffer[i]);
> };
> printf("\n");
>
> printf("packet_buffer start\n");
> for(i=0; i<buffer_size; i++){
>         printf("%x\t", packet_buffer[i]);
> };
> printf("\n");
>
> ////////////end_setup
>
> if(buflen==0 | buflen>120){
>         printf("buflen too big or too small\n");
>         return 0;
> }
>
>
> ########################################
> starta=rdtsc_count();
> memcpy(&(packet_buffer_ref[waddr+1]), buffer,
> sizeof(uint64_t)*(buflen));//REF
> stopa=rdtsc_count();
> printf("memcpy took\t%d\tclocks\n", stopa-starta);
> ########################################
> ##Here everything is fine
> ########################################
>
> ########################################
> startb=rdtsc_count();
> __asm__ (             "movq   %1,             %%rsi;          \n\t"
>                         "movq   %0,             %%rdi;          \n\t"
>                         "movl   %2,             %%ecx;          \n\t"
>                         "addq   $8,             %%rdi;          \n\t"
> //                      "shl    $3,             %%ecx;          \n\t"
>         "Schleife:       movsq;                                 \n\t"
>                         "loop Schleife;                         \n\t"
>                         :"=m"(packet_buffer)
>                         :"r"(buffer), "r"(buflen)
>                         :"%rsi", "%rdi", "%rcx", "memory"
>                         );
>
> stopb=rdtsc_count();
>
> ######################################### If i use one of this functions,
> everything is fine.
> //usleep(1);
> //printf("stopa %d\n", stopa);
> //printf("fdsagfa\n");
> #########################################
> printf("asm movsq took\t%d\tclocks\n", stopb-startb);
>
> ########################################
> ##Here i have the problem. It looks like stopb or startb is still 0, when i
> use no function between the output and the rdtsc_count()
> ########################################
>
>

I'm unsure if this is what is causing your problem, but rdtsc can be
executed out of order to other instructions, and so instructions
issued prior to rdtsc need not be complete before the measurement is
made.  I've seen that the cpuid instruction  forces this to be the
case.  I believe also that using rdtscp will prevent the reordering on
its own.

  Brian

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: How to use the KNC Vectorregisters with GCC? Race condition with ICC & KNC?
  2014-01-21 19:53   ` Brian Budge
@ 2014-01-21 20:03     ` Stephan Walter
  0 siblings, 0 replies; 3+ messages in thread
From: Stephan Walter @ 2014-01-21 20:03 UTC (permalink / raw)
  To: Brian Budge; +Cc: GCC-help

Am 21.01.2014 20:53, schrieb Brian Budge:
> On Tue, Jan 21, 2014 at 2:23 AM, Stephan Walter
> <stephan.walter@ziti.uni-heidelberg.de> wrote:
>> Hi,
>>
>> i am new to the gcc mailinglist, so i hope i am right here.
>>
>> As the subject shows, i work with KNC. My problem is, that i have developed
>> a kernel modul for a NIC and now want to use the 512Bit registers of KNC for
>> some memcopy jobs.
>>
>> I have experience how to use the GCC to compile der KNC-linux and kernel
>> moduls. So no problem at the moment. Everything works fine.
>>
>> Before i started to write inline assembler with the 512Bit registers, i have
>> written some minimal examples.
>>
>> On a normal i5-3470 everything works fine together with the gcc. Also on KNC
>> everything works. The problem now is, that when i try to use the 512Bit
>> registers, it looks like GCC doesn't know the register names and
>> instructions.
>>
>> To solve the problem with the instructions i think is no problem, because i
>> have the instruction manual, but i have no idea how to solve the register
>> problem.
>>
>> So i try to use the ICC with -mmic. The source compiles, but when i measure
>> the clock cycles with rdtsc, the two first check work, but the 3. and 4.
>> not.
>> I tried to solve the problem with the gdb, but when i use -g the mistake no
>> longer occur. Also when i use a printf, sleep(1) or usleep(1), the problem
>> is fixed. So i think there is a race condition with the write of the value
>> into the memory, because 1 or even 100 nops have no effects.
>>
>> My inline assembler knowledge is rudimental, so i don't know if i have some
>> problems with the use of clobber registers and so on or if there is a bug in
>> gcc or icc.
>>
>> That the -g with the icc solve the problem makes it impossible for me to
>> debug the problem. So i hope somebody is able to help me.
>>
>> My favourite is to use gcc together with the 512Bit registers, if there is a
>> bug in my inline assembler, a solution/hint would be also fine.
>>
>> So there is my code:
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <inttypes.h>
>>
>>
>> int rdtsc_count(void){
>> int count;
>> __asm__ __volatile__(   "rdtsc;                 \n\t"
>>                          "movl   %%eax, %0;      \n\t"
>>                           :"=m"(count)//, "=r"(brd), "=r"(crd), "=r"(drd)
>>                           :
>>                           :"%eax", "memory"//, "cc"//, "%ebx", "%ecx"
>>                          );
>>
>> return count;
>> }
>>
>>
>> int main(int argc, char *argv[]){
>>
>> int starta=0, startb=0, stopa=0, stopb=0;
>> int buffer_size=32;
>> uint64_t* buffer;
>> uint32_t buflen=atoi(argv[1]);
>>
>>
>> /////////////setup
>> buffer = (uint64_t*) malloc (buffer_size*sizeof(uint64_t));
>> packet_buffer = (uint64_t*) malloc (buffer_size*sizeof(uint64_t));
>> packet_buffer_ref= (uint64_t*) malloc (buffer_size*sizeof(uint64_t));//REF
>>
>> waddr=0;
>>
>> //printf("Adresse von packet_buffer %x", waddr);
>> printf("Orginaldaten\n");
>> for(i=0; i<buffer_size; i++){
>>          buffer[i]=i+i*i;
>>          packet_buffer[i]=0;
>>          packet_buffer_ref[i]=0;
>>          printf("%x\t", buffer[i]);
>> };
>> printf("\n");
>>
>> printf("packet_buffer start\n");
>> for(i=0; i<buffer_size; i++){
>>          printf("%x\t", packet_buffer[i]);
>> };
>> printf("\n");
>>
>> ////////////end_setup
>>
>> if(buflen==0 | buflen>120){
>>          printf("buflen too big or too small\n");
>>          return 0;
>> }
>>
>>
>> ########################################
>> starta=rdtsc_count();
>> memcpy(&(packet_buffer_ref[waddr+1]), buffer,
>> sizeof(uint64_t)*(buflen));//REF
>> stopa=rdtsc_count();
>> printf("memcpy took\t%d\tclocks\n", stopa-starta);
>> ########################################
>> ##Here everything is fine
>> ########################################
>>
>> ########################################
>> startb=rdtsc_count();
>> __asm__ (             "movq   %1,             %%rsi;          \n\t"
>>                          "movq   %0,             %%rdi;          \n\t"
>>                          "movl   %2,             %%ecx;          \n\t"
>>                          "addq   $8,             %%rdi;          \n\t"
>> //                      "shl    $3,             %%ecx;          \n\t"
>>          "Schleife:       movsq;                                 \n\t"
>>                          "loop Schleife;                         \n\t"
>>                          :"=m"(packet_buffer)
>>                          :"r"(buffer), "r"(buflen)
>>                          :"%rsi", "%rdi", "%rcx", "memory"
>>                          );
>>
>> stopb=rdtsc_count();
>>
>> ######################################### If i use one of this functions,
>> everything is fine.
>> //usleep(1);
>> //printf("stopa %d\n", stopa);
>> //printf("fdsagfa\n");
>> #########################################
>> printf("asm movsq took\t%d\tclocks\n", stopb-startb);
>>
>> ########################################
>> ##Here i have the problem. It looks like stopb or startb is still 0, when i
>> use no function between the output and the rdtsc_count()
>> ########################################
>>
>>
>
> I'm unsure if this is what is causing your problem, but rdtsc can be
> executed out of order to other instructions, and so instructions
> issued prior to rdtsc need not be complete before the measurement is
> made.  I've seen that the cpuid instruction  forces this to be the
> case.  I believe also that using rdtscp will prevent the reordering on
> its own.
>
>    Brian
>
KNC is a inorder CPU-Design, so there should be no instruction 
reodering. The only possibility is the superscalarity, but then i don't 
know how to be save, that a instruction have been already done.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-01-21 20:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1390297511.8654.ezmlm@gcc.gnu.org>
2014-01-21 10:23 ` How to use the KNC Vectorregisters with GCC? Race condition with ICC & KNC? Stephan Walter
2014-01-21 19:53   ` Brian Budge
2014-01-21 20:03     ` Stephan Walter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).