public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Fwd: [MPICH] non-blocking sending/receiving an array
       [not found]   ` <131b56be0706121819j29e9f8dagf9fb792421602e64@mail.gmail.com>
@ 2007-06-13  4:35     ` Manal Helal
  0 siblings, 0 replies; only message in thread
From: Manal Helal @ 2007-06-13  4:35 UTC (permalink / raw)
  To: gcc-help

Hi

I am actually still having this packed send/receive problem, but it
happens sometimes, and then works fine some other times, lately it
works fine only if I use the following running command:

mpirun -np 4 valgrind --leak-check=full -v --log-file= val3.out myprog
myprogarguments

like when I run with valgrind, it is alright, and I think it is all
about pointers being shifted while receiving the packed array whether
blocking or non-blocking,MPI_Recv or MPI_Irecv, I  will need to run on
 high performance machine, and won't be able to run it with valgrind
there, and need to make sure the program is stable and can run on
large data sizes without problems,

each process is multi-threaded in my program, but I tried to run the
program all sequential within the process (no threads), and the
problem is still the same, so, it is not about thread-safety or
synchronization,

I am copying the gcc list, may be I can get some insight about the
problem, and also some alternatives to ANSI C atoi or sprintf
alternative, because some of the valgrind problems are caused by
sprintf, and so far I couldn't find a safe alternative, the way I use
sprintf now is for example:

#define SHORT_MESSAGE_SIZE 200
char msg[SHORT_MESSAGE_SIZE];
sprintf (msg, "%ld: add OC w %ld, pi %ld, ci %ld, cs %ld, dp %d af %d
", OCout_ub, waveNo, partIndex, cellIndex,cellScore, depProc,
addflag);

then I print the msg to a debugging file corresponding to the process
and the thread it came out from,

the valgrind output is as shown below if you are interested to have a
look, mostly are mpi library implementation problems, rather than
mine, however, both problems, don't seem to cause all this
memory-shifting.



==4138== Memcheck, a memory error detector.
==4138== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==4138== Using LibVEX rev 1732, a library for dynamic binary translation.
==4138== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==4138== Using valgrind-3.2.3, a dynamic binary instrumentation framework.
==4138== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==4138==
--4138-- Startup, with flags:
--4138--    --leak-check=full
--4138--    -v
--4138--    --log-file=val3.out
--4138-- Contents of /proc/version:
--4138--   Linux version 2.6.21-1.3194.fc7
(kojibuilder@xenbuilder4.fedora.phx.redhat.com ) (gcc version 4.1.2
20070502 (Red Hat 4.1.2-12)) #1 SMP Wed May 23 22:35:01 EDT 2007
--4138-- Arch and hwcaps: X86, x86-sse1-sse2
--4138-- Page sizes: currently 4096, max supported 4096
--4138-- Valgrind library directory: /usr/lib/valgrind
--4138-- Reading syms from /home/mhelal/thesis/exp/ver2.1/mmDst (0x8048000)
--4138-- Reading syms from /usr/lib/valgrind/x86-linux/memcheck (0x38000000)
--4138--    object doesn't have a dynamic symbol table
--4138-- Reading syms from /lib/ld-2.6.so (0x46C44000)
--4138-- Reading suppressions file: /usr/lib/valgrind/default.supp
--4138-- REDIR: 0x46C596F0 (index) redirected to 0x38027EDF
(vgPlain_x86_linux_REDIR_FOR_index)
--4138-- Reading syms from
/usr/lib/valgrind/x86-linux/vgpreload_core.so (0x4001000)
--4138-- Reading syms from
/usr/lib/valgrind/x86-linux/vgpreload_memcheck.so (0x4003000)
==4138== WARNING: new redirection conflicts with existing -- ignoring it
--4138--     new: 0x46C596F0 (index     ) R-> 0x040061F0 index
--4138-- REDIR: 0x46C59890 (strlen) redirected to 0x40062A0 (strlen)
--4138-- Reading syms from /lib/libm-2.6.so (0x4776B000)
--4138-- Reading syms from /lib/libpthread-2.6.so (0x479B7000)
--4138-- Reading syms from /home/mhelal/Install/mpi/lib/libmpich.so (0x4017000)
--4138-- Reading syms from /lib/librt- 2.6.so (0x46CC5000)
--4138-- Reading syms from /lib/libc-2.6.so (0x47615000)
==4138== Conditional jump or move depends on uninitialised value(s)
==4138==    at 0x46C4EBDB: _dl_relocate_object (in /lib/ld- 2.6.so)
==4138==    by 0x46C478D8: dl_main (in /lib/ld-2.6.so)
==4138==    by 0x46C57F6A: _dl_sysdep_start (in /lib/ld-2.6.so)
==4138==    by 0x46C452B7: _dl_start (in /lib/ld- 2.6.so)
==4138==    by 0x46C44816: (within /lib/ld-2.6.so)
==4138==
==4138== Conditional jump or move depends on uninitialised value(s)
==4138==    at 0x46C4EBE3: _dl_relocate_object (in /lib/ld- 2.6.so)
==4138==    by 0x46C478D8: dl_main (in /lib/ld-2.6.so)
==4138==    by 0x46C57F6A: _dl_sysdep_start (in /lib/ld-2.6.so)
==4138==    by 0x46C452B7: _dl_start (in /lib/ld- 2.6.so)
==4138==    by 0x46C44816: (within /lib/ld-2.6.so)
==4138==
==4138== Conditional jump or move depends on uninitialised value(s)
==4138==    at 0x46C4ED25: _dl_relocate_object (in /lib/ld- 2.6.so)
==4138==    by 0x46C478D8: dl_main (in /lib/ld-2.6.so)
==4138==    by 0x46C57F6A: _dl_sysdep_start (in /lib/ld-2.6.so)
==4138==    by 0x46C452B7: _dl_start (in /lib/ld- 2.6.so)
==4138==    by 0x46C44816: (within /lib/ld-2.6.so)
==4138==
==4138== Conditional jump or move depends on uninitialised value(s)
==4138==    at 0x46C4F01B: _dl_relocate_object (in /lib/ld- 2.6.so)
==4138==    by 0x46C478D8: dl_main (in /lib/ld-2.6.so)
==4138==    by 0x46C57F6A: _dl_sysdep_start (in /lib/ld-2.6.so)
==4138==    by 0x46C452B7: _dl_start (in /lib/ld- 2.6.so)
==4138==    by 0x46C44816: (within /lib/ld-2.6.so)
==4138==
==4138== Conditional jump or move depends on uninitialised value(s)
==4138==    at 0x46C4F4F0: _dl_relocate_object (in /lib/ld- 2.6.so)
==4138==    by 0x46C478D8: dl_main (in /lib/ld-2.6.so)
==4138==    by 0x46C57F6A: _dl_sysdep_start (in /lib/ld-2.6.so)
==4138==    by 0x46C452B7: _dl_start (in /lib/ld- 2.6.so)
==4138==    by 0x46C44816: (within /lib/ld-2.6.so)
==4138==
==4138== Conditional jump or move depends on uninitialised value(s)
==4138==    at 0x46C4EBDB: _dl_relocate_object (in /lib/ld- 2.6.so)
==4138==    by 0x46C47A84: dl_main (in /lib/ld-2.6.so)
==4138==    by 0x46C57F6A: _dl_sysdep_start (in /lib/ld-2.6.so)
==4138==    by 0x46C452B7: _dl_start (in /lib/ld- 2.6.so)
==4138==    by 0x46C44816: (within /lib/ld-2.6.so)
==4138==
==4138== Conditional jump or move depends on uninitialised value(s)
==4138==    at 0x46C4EBE3: _dl_relocate_object (in /lib/ld- 2.6.so)
==4138==    by 0x46C47A84: dl_main (in /lib/ld-2.6.so)
==4138==    by 0x46C57F6A: _dl_sysdep_start (in /lib/ld-2.6.so)
==4138==    by 0x46C452B7: _dl_start (in /lib/ld- 2.6.so)
==4138==    by 0x46C44816: (within /lib/ld-2.6.so)
==4138==
==4138== Conditional jump or move depends on uninitialised value(s)
==4138==    at 0x46C4ED25: _dl_relocate_object (in /lib/ld- 2.6.so)
==4138==    by 0x46C47A84: dl_main (in /lib/ld-2.6.so)
==4138==    by 0x46C57F6A: _dl_sysdep_start (in /lib/ld-2.6.so)
==4138==    by 0x46C452B7: _dl_start (in /lib/ld- 2.6.so)
==4138==    by 0x46C44816: (within /lib/ld-2.6.so)
--4138-- REDIR: 0x47684810 (memset) redirected to 0x4006600 (memset)
--4138-- REDIR: 0x47684D00 (memcpy) redirected to 0x4007030 (memcpy)
--4138-- REDIR: 0x47683930 (rindex) redirected to 0x40060D0 (rindex)
--4138-- REDIR: 0x4767EC90 (calloc) redirected to 0x400478D (calloc)
--4138-- REDIR: 0x47683590 (strlen) redirected to 0x4006280 (strlen)
--4138-- REDIR: 0x47683780 (strncmp) redirected to 0x40062E0 (strncmp)
--4138-- REDIR: 0x4767EF90 (malloc) redirected to 0x4005460 (malloc)
--4138-- REDIR: 0x476804F0 (free) redirected to 0x400507A (free)
--4138-- REDIR: 0x47684310 (memchr) redirected to 0x4006470 (memchr)
--4138-- REDIR: 0x47683880 (strncpy) redirected to 0x40068D0 (strncpy)
--4138-- REDIR: 0x47682EC0 (index) redirected to 0x40061C0 (index)
--4138-- REDIR: 0x476830A0 (strcpy) redirected to 0x4007290 (strcpy)
--4138-- REDIR: 0x47684870 (mempcpy) redirected to 0x4006B10 (mempcpy)
--4138-- REDIR: 0x47683030 (strcmp) redirected to 0x4006350 (strcmp)
==4138==
==4138== Syscall param writev(vector[...]) points to uninitialised byte(s)
==4138==    at 0x476DE118: writev (in /lib/libc-2.6.so)
==4138==    by 0x41056E8: MPIDU_Socki_handle_write (sock_wait.i:689)
==4138==    by 0x41044E3: MPIDU_Sock_wait (sock_wait.i:329)
==4138==    by 0x406E66E: MPIDI_CH3_Progress_wait (ch3_progress.c:189)
==4138==    by 0x40B52FF: MPIC_Wait (helper_fns.c:275)
==4138==    by 0x40B4C0B: MPIC_Sendrecv (helper_fns.c:121)
==4138==    by 0x405904A: MPIR_Allreduce (allreduce.c:284)
==4138==    by 0x405AA0D: PMPI_Allreduce (allreduce.c:684)
==4138==    by 0x4091B30: MPIR_Get_contextid (commutil.c:384)
==4138==    by 0x4089EB4: PMPI_Comm_create (comm_create.c:121)
==4138==    by 0x804B817: main (main.c:513)
==4138==  Address 0x41922E0 is 32 bytes inside a block of size 72 alloc'd
==4138==    at 0x40054E5: malloc (vg_replace_malloc.c:149)
==4138==    by 0x4071262: MPIDI_CH3I_Connection_alloc (ch3u_connect_sock.c:125)
==4138==    by 0x4073080: MPIDI_CH3I_VC_post_sockconnect
(ch3u_connect_sock.c:1023)
==4138==    by 0x406F8C4: MPIDI_CH3I_VC_post_connect (ch3_progress.c:857)
==4138==    by 0x406D5E2: MPIDI_CH3_iSendv (ch3_isendv.c:194)
==4138==    by 0x4073A1C: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:460)
==4138==    by 0x40C66F4: MPID_Isend (mpid_isend.c:117)
==4138==    by 0x40B4BB0: MPIC_Sendrecv (helper_fns.c:117)
==4138==    by 0x405904A: MPIR_Allreduce ( allreduce.c:284)
==4138==    by 0x405AA0D: PMPI_Allreduce (allreduce.c:684)
==4138==    by 0x4091B30: MPIR_Get_contextid (commutil.c:384)
==4138==    by 0x4089EB4: PMPI_Comm_create (comm_create.c:121)
==4138==
==4138== Syscall param writev(vector[...]) points to uninitialised byte(s)
==4138==    at 0x476DE118: writev (in /lib/libc-2.6.so)
==4138==    by 0x41033C2: MPIDU_Sock_writev (sock_immed.i:604)
==4138==    by 0x406D08A: MPIDI_CH3_iSendv (ch3_isendv.c:83)
==4138==    by 0x4073A1C: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:460)
==4138==    by 0x40C66F4: MPID_Isend (mpid_isend.c:117)
==4138==    by 0x40B4BB0: MPIC_Sendrecv (helper_fns.c:117)
==4138==    by 0x405904A: MPIR_Allreduce (allreduce.c:284)
==4138==    by 0x405AA0D: PMPI_Allreduce (allreduce.c:684)
==4138==    by 0x4091B30: MPIR_Get_contextid (commutil.c:384)
==4138==    by 0x4089EB4: PMPI_Comm_create (comm_create.c:121)
==4138==    by 0x804B817: main (main.c:513)
==4138==  Address 0xBEF02118 is on thread 1's stack
--4138-- REDIR: 0x476806E0 (realloc) redirected to 0x400550F (realloc)
==4138==
==4138== Thread 2:
==4138== Source and destination overlap in mempcpy(0x4C8BAA8, 0x4C8BAA8, 24)
==4138==    at 0x4006B94: mempcpy (mc_replace_strmem.c:116)
==4138==    by 0x47679314: _IO_default_xsputn (in /lib/libc-2.6.so)
==4138==    by 0x476544ED: vfprintf (in /lib/libc- 2.6.so)
==4138==    by 0x4766E4CB: vsprintf (in /lib/libc-2.6.so)
==4138==    by 0x4765A0BD: sprintf (in /lib/libc-2.6.so)
==4138==    by 0x80589D5: getPrevCells ( scoring.c:230)
==4138==    by 0x8058EF4: getScore (scoring.c:305)
==4138==    by 0x80599F3: ComputePartitionScores (scoring.c:470)
==4138==    by 0x804B215: ScoreCompThread (main.c:392)
==4138==    by 0x479BC2FA: start_thread (in /lib/libpthread- 2.6.so)
==4138==    by 0x476E593D: clone (in /lib/libc-2.6.so)


On 17/05/07, Blankenship, David  <David.Blankenship@kla-tencor.com> wrote:
>   I am doing the same type of thing with the blocking calls. Here is how
> I am doing it. This code uses the C++ MPI interface.
>
> // Probe for a message from any source
> MPI::COMM_WORLD.Probe( MPI_ANY_SOURCE, MPI_ANY_TAG, cMPIStatus );
> int iMessageLength = cMPIStatus.Get_count( MPI_CHAR );
> // Here I resize my receive buffer if necessary
>
> // Receive the message that was just probed
> int iSource = cMPIStatus.Get_source();
> MPI::COMM_WORLD.Recv( &(cBuffer[0],  cBuffer.size(), MPI_CHAR, iSource,
> MPI_ANY_TAG, cMPIStatus );
>
>
> You could also use the tag to differentiate messages from a single
> source. This does eliminate the need to send 2 messages, one with the
> size and then one with the array. That is what I liked most about this
> solution.
>
> I hope this helps.
>
> David Blankenship
>
>
>
> -----Original Message-----
> From: owner-mpich-discuss@mcs.anl.gov
> [mailto: owner-mpich-discuss@mcs.anl.gov] On Behalf Of Manal Helal
> Sent: Wednesday, May 16, 2007 2:44 AM
> To: mpich-discuss-digest@mcs.anl.gov
> Subject: [MPICH] non-blocking sending/receiving an array
>
> Hi
>
> I am trying to send an array, I send its size first, and then send the
> array itself, however, I am sending in a loop and receiving in a loop,
> so I end up receiving in different order, like I receive the array
> size, and then receive from the same sender the array of different
> size sent at another iteration, and I am using non-blocking
> communication,  and testing now for 3 processes, but could be more
> later, so, I can only specify the sender in the receive of the array,
> as the one I received the array size from, but I can't specify the
> size, it is giving me:
>
> rank 2 in job 4  localhost.localdomain_54476   caused collective abort
> of all ranks
>   exit status of rank 2: killed by signal 9
> 2:  MPI_Wait(140)..........................:
> MPI_Wait(request=0xb6b55198, status0xb6b5519c) failed
> 2:  MPIDI_CH3U_Post_data_receive_found(163): Message from rank 0 and
> tag 92 truncated; 224 bytes received but buffer size is 56
>
>
> is there a way to probe for a specific size, and receive only if this
> is the size, in the MPI_Iprobe, there is no specification for the
> count,
>
> any ideas will greatly help,
>
> Thank you very much, Kind Regards,
>
> Manal
>
>

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2007-06-13  3:12 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <131b56be0705160043o44c03390o796d9c3b1edff8ac@mail.gmail.com>
     [not found] ` <FA93A103CA8CE34D982477A2624EAFE205BB07A5@CA1EXCLV01.adcorp.kla-tencor.com>
     [not found]   ` <131b56be0706121819j29e9f8dagf9fb792421602e64@mail.gmail.com>
2007-06-13  4:35     ` Fwd: [MPICH] non-blocking sending/receiving an array Manal Helal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).