public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/11087] New: mallinfo miscounting hblks because of missing mutex
@ 2009-12-14  8:56 stef dot van-vlierberghe at telenet dot be
  2010-04-05  4:38 ` [Bug libc/11087] " drepper at redhat dot com
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: stef dot van-vlierberghe at telenet dot be @ 2009-12-14  8:56 UTC (permalink / raw)
  To: glibc-bugs

I have an issue with non-atomic increment/
decrement
of hblks, (i.e. n_mmaps) in a multi-threaded mission critical process
that monitors
heap usage as part of supervision. This value can drift astray of the
true count and
rarerly triggers an assertion in the total heap calculatio. There is a
similar disabled
assert in glibc:

#ifdef NO_THREADS
  assert(total <= (unsigned long)(mp_.max_total_mem));
  assert(mp_.n_mmaps >= 0);
#endif

Seem to be a problem of wanting to pay the cost of mutex contention
instead of
the (application specific) cost of providing wrong values. In the
context of a malloc call
that asks for 2 Mb, I have not figured out that part yet, but I can't
imagine that keeping
a correct total count in a multi-threaded is beyond state of the art
(but am sure it may
be beyond desire of hackers or motivation of maintainers). There is a
disabled
THREAD_STATS switch but that doesn't seem to handle this case. The
problem might be
resolved, but at home I can still reproduce it with latest glibc:

with Text_Io;
with Unchecked_Deallocation;
procedure T is

   type Mallinfo_Field is (Arena, Ordblks, Smblks, Hblks, Hblkhd,
Usmblks, Fsmblks, Uordblks, Fordblks, Keepcost);

   type Mallinfo_Stats is array (Mallinfo_Field) of Integer;
   -- on 32 bit.

   function mallinfo return Mallinfo_Stats;
   pragma Interface (C, mallinfo);

   type Large is array (1..2**22) of Character; -- 4M
   type Large_Ptr is access Large;
   procedure Free is new Unchecked_Deallocation (Large, Large_Ptr);

   task type Alloc is
      pragma Priority (0);
   end Alloc;

   task body Alloc is
      L : array (1..10) of Large_Ptr;
   begin
      for Count in Natural loop
         for I in L'Range loop
            L(I) := new Large;
         end loop;
         for I in L'Range loop
            Free (L(I));
         end loop;
         -- To check for sufficient parallelism
         -- Test on multi-core/multi-processor
         if Count mod 10000 = 0 then
           Text_Io.Put_Line (Count'Img);
         end if;
      end loop;
   end;

   Loads : array (1..4) of Alloc;

   procedure Show_Stats is
      Result : constant Mallinfo_Stats := mallinfo;
   begin
      for Field in Result'Range loop
         Text_IO.Put (Field'Img & "=" & Result(Field)'Img & " ");
      end loop;
      Text_Io.New_Line;
      if Result(Hblks) not in 0..40 then
         for T in Loads'Range loop
            abort Loads(T);
         end loop;
         raise Program_Error;
      end if;
   end Show_Stats;

begin
   loop
      delay 1.0;
      Show_Stats;
   end loop;
end;

output :

ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 0 HBLKHD=-20992000 USMBLKS=
0 FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
 22700000
 22620000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 17 HBLKHD= 50380800 USMBLKS=
0 FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
 22630000
 22570000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 7 HBLKHD= 8396800 USMBLKS= 0
FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
 22710000
 22630000
 22640000
 22580000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 2 HBLKHD=-12595200 USMBLKS=
0 FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
 22720000
 22640000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 0 HBLKHD=-20992000 USMBLKS=
0 FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
 22650000
 22590000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 26 HBLKHD= 88166400 USMBLKS=
0 FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560
 22730000
 22650000
 22660000
 22600000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS=-1 HBLKHD=-25190400 USMBLKS=
0 FSMBLKS= 0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560

raised PROGRAM_ERROR : t.adb:52 explicit raise 

Posted on :
http://groups.google.com/group/comp.os.linux.development.apps/browse_thread/thread/80b0c703fe7004c2/f512a2856b0eb3e3?lnk=raot&pli=1

Found a comment stating mutexing was expensive :
http://groups.google.com/group/comp.os.linux.development.system/browse_thread/thread/c25c962a87d81895/674575250fd87044?q=mallinfo+hblk#674575250fd87044

I've tried adding a mutex, and I don't see what is expensive. When I compare the
latest version of glibc malloc after adding the mutex locking against the system
default glibc behaviour I don't measure a difference, results are within 3%
accurate (some fluctuations on the measure, just used time command), I guess
mutex cost is much smaller than mmap cost. Anyway, that is only performance of
iterative malloc/free of a 4 MB block, adding nothing more but a zero-fill of
this memory makes the operation about 1000 times slower, so I don't see a
significant issue with this mutex overhead.

In case overhead would be significant, at least a #define THREAD_SAFE 1 should
activate a fix, or a dynamic option, but surely correct total heap count should
be achievable.

My experiment :

$ diff -U3 arena.c.orig arena.c
--- arena.c.orig	2009-12-13 18:00:55.000000000 +0100
+++ arena.c	2009-12-13 18:03:12.000000000 +0100
@@ -93,6 +93,13 @@
 /* Mapped memory in non-main arenas (reliable only for NO_THREADS). */
 static unsigned long arena_mem;
 
+static mutex_t mp_mmap_mutex;
+/*Serialize access to (malloc.c) mp_.
+  int              n_mmaps;
+  int              n_mmaps_max;
+  int              max_n_mmaps;
+*/
+
 /* Already initialized? */
 int __malloc_initialized = -1;
 
@@ -354,6 +361,8 @@
     if(ar_ptr == &main_arena) break;
   }
   mutex_init(&list_lock);
+  mutex_init(&mp_mmap_mutex);
+
   atfork_recursive_cntr = 0;
 }
 
$ diff -U3 malloc.c.org malloc.c
--- malloc.c.org	2009-12-13 13:26:22.000000000 +0100
+++ malloc.c	2009-12-13 18:13:13.000000000 +0100
@@ -2359,6 +2359,8 @@
    ----------- Internal state representation and initialization -----------
 */
 
+#define THREAD_STATS 1
+
 struct malloc_state {
   /* Serialize access.  */
   mutex_t mutex;
@@ -2443,6 +2445,12 @@
 /* There is only one instance of the malloc parameters.  */
 
 static struct malloc_par mp_;
+static mutex_t mp_mmap_mutex;
+/*Serialize access to (malloc.c) mp_.
+  int              n_mmaps;
+  int              n_mmaps_max;
+  int              max_n_mmaps;
+*/
 
 
 #ifdef PER_THREAD
@@ -3055,7 +3063,8 @@
 	    set_head(p, size|IS_MMAPPED);
 	  }
 
-	/* update statistics */
+	/* begin update statistics */
+	(void)mutex_lock(&mp_mmap_mutex);
 
 	if (++mp_.n_mmaps > mp_.max_n_mmaps)
 	  mp_.max_n_mmaps = mp_.n_mmaps;
@@ -3071,6 +3080,10 @@
 
 	check_chunk(av, p);
 
+	(void)mutex_unlock(&mp_mmap_mutex);
+	/* end update statistics */
+
+
 	return chunk2mem(p);
       }
     }
@@ -3521,6 +3534,7 @@
 #endif
 {
   INTERNAL_SIZE_T size = chunksize(p);
+  (void)mutex_lock(&mp_mmap_mutex);
 
   assert (chunk_is_mmapped(p));
 #if 0
@@ -3544,6 +3558,7 @@
 
   mp_.n_mmaps--;
   mp_.mmapped_mem -= total_size;
+  (void)mutex_unlock(&mp_mmap_mutex);
 
   int ret __attribute__ ((unused)) = munmap((char *)block, total_size);
 
@@ -3726,6 +3741,7 @@
   _int_free(ar_ptr, p, 0);
 #else
 # if THREAD_STATS
+
   if(!mutex_trylock(&ar_ptr->mutex))
     ++(ar_ptr->stat_lock_direct);
   else {

P.S. This is actually a problem of a mission-critical system planned to become
operational on Redhat el5 next year. Assertion on positive results is now turned
into recoverable error, so non-critical but would like to get heap supervision
correct in the future for monitoring of unexpected memory leaks. We pay support
to HP (forwarding such problems to Redhat), so request for fixing might turn up
via official channels.

Test still running, but at least getting further than before :
 64660000
 64800000
 64240000
 64810000
ARENA= 135984 ORDBLKS= 1 SMBLKS= 0 HBLKS= 7 HBLKHD= 29388800 USMBLKS= 0 FSMBLKS=
0 UORDBLKS= 62424 FORDBLKS= 73560 KEEPCOST= 73560 
 64670000
 64810000

-- 
           Summary: mallinfo miscounting hblks because of missing mutex
           Product: glibc
           Version: 2.9
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
        AssignedTo: drepper at redhat dot com
        ReportedBy: stef dot van-vlierberghe at telenet dot be
                CC: glibc-bugs at sources dot redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=11087

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libc/11087] mallinfo miscounting hblks because of missing mutex
  2009-12-14  8:56 [Bug libc/11087] New: mallinfo miscounting hblks because of missing mutex stef dot van-vlierberghe at telenet dot be
@ 2010-04-05  4:38 ` drepper at redhat dot com
  2010-04-05 19:38 ` drepper at redhat dot com
  2010-04-10 12:33 ` stef dot van-vlierberghe at telenet dot be
  2 siblings, 0 replies; 5+ messages in thread
From: drepper at redhat dot com @ 2010-04-05  4:38 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From drepper at redhat dot com  2010-04-05 04:37 -------
mallinfo does not work at all.  It's an obsolete interface.  Use malloc_info. 
Adding a lock for statistics is also far too expensive.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=11087

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libc/11087] mallinfo miscounting hblks because of missing mutex
  2009-12-14  8:56 [Bug libc/11087] New: mallinfo miscounting hblks because of missing mutex stef dot van-vlierberghe at telenet dot be
  2010-04-05  4:38 ` [Bug libc/11087] " drepper at redhat dot com
@ 2010-04-05 19:38 ` drepper at redhat dot com
  2010-04-10 12:33 ` stef dot van-vlierberghe at telenet dot be
  2 siblings, 0 replies; 5+ messages in thread
From: drepper at redhat dot com @ 2010-04-05 19:38 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From drepper at redhat dot com  2010-04-05 19:37 -------
Should have been closed.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX


http://sourceware.org/bugzilla/show_bug.cgi?id=11087

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libc/11087] mallinfo miscounting hblks because of missing mutex
  2009-12-14  8:56 [Bug libc/11087] New: mallinfo miscounting hblks because of missing mutex stef dot van-vlierberghe at telenet dot be
  2010-04-05  4:38 ` [Bug libc/11087] " drepper at redhat dot com
  2010-04-05 19:38 ` drepper at redhat dot com
@ 2010-04-10 12:33 ` stef dot van-vlierberghe at telenet dot be
  2 siblings, 0 replies; 5+ messages in thread
From: stef dot van-vlierberghe at telenet dot be @ 2010-04-10 12:33 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From stef dot van-vlierberghe at telenet dot be  2010-04-10 12:33 -------
I'm running 32 bit so I have at present no problem with mallinfo API, so not
understood why it would be classified as "broken". Anyway changing to new API is
only minor "damage" in the sense that we have no need for fork/exec activity or
creating files for heap statistic collection (context of use is the European
flow management and flight planning systems that run on Linux since a month, see
www.eurocontrol.int). In this context (and I can imagine many others) correct
stats are worth one pthread lock operation for each large block malloc (which is
a rare event in these processes, doesn't occur at all in several critical
processes), so the "too expensive" is also not clear, unless if it means too
expensive in terms of validating/testing. In that case I would appreciate if you
could clarify what precisely is required to get precise statistics using glibc
malloc ? (I validated the fix using the test as described before, if you require
additional measurements or tests please clarify).

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|WONTFIX                     |


http://sourceware.org/bugzilla/show_bug.cgi?id=11087

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug libc/11087] mallinfo miscounting hblks because of missing mutex
       [not found] <bug-11087-131@http.sourceware.org/bugzilla/>
@ 2011-02-07 10:22 ` jsarenik at redhat dot com
  0 siblings, 0 replies; 5+ messages in thread
From: jsarenik at redhat dot com @ 2011-02-07 10:22 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=11087

Jan Sarenik <jsarenik at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jsarenik at redhat dot com

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-02-07 10:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-14  8:56 [Bug libc/11087] New: mallinfo miscounting hblks because of missing mutex stef dot van-vlierberghe at telenet dot be
2010-04-05  4:38 ` [Bug libc/11087] " drepper at redhat dot com
2010-04-05 19:38 ` drepper at redhat dot com
2010-04-10 12:33 ` stef dot van-vlierberghe at telenet dot be
     [not found] <bug-11087-131@http.sourceware.org/bugzilla/>
2011-02-07 10:22 ` jsarenik at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).