public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Fix nscd assertion failure in gc (bug 19755)
@ 2016-03-02 17:13 Andreas Schwab
  0 siblings, 0 replies; 3+ messages in thread
From: Andreas Schwab @ 2016-03-02 17:13 UTC (permalink / raw)
  To: libc-alpha

If a GETxxBYyy request (for passwd or group) is running in parallel to
an INVALIDATE request (for the same database) then in a particular order
of events the garbage collector is not properly marking all used memory
and fails an assertion:

   GETGRBYNAME (root)
Haven't found "root" in group cache!
add new entry "root" of type GETGRBYNAME for group to cache (first)
handle_request: request received (Version = 2) from PID 7413
   INVALIDATE (group)
pruning group cache; time 9223372036854775807
considering GETGRBYNAME entry "root", timeout 1456763027
add new entry "0" of type GETGRBYGID for group to cache
remove GETGRBYNAME entry "root"
nscd: mem.c:403: gc: Assertion `next_data == &he_data[db->head->nentries]' failed.

Here the first call to cache_add added the GETGRBYNAME entry, which is
immediately marked for collection by prune_cache.  Then the GETGRBYGID
entry is added which shares the data packet with the first entry and
therefore is marked as !first, while the marking look in prune_cache has
already finished.  When the garbage collector runs, it only considers
references by entries marked as first, missing the reference by the
secondary entry.

	[BZ #19755]
	* nscd/mem.c (gc): Consider all references to the same data packet.
---
 nscd/mem.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/nscd/mem.c b/nscd/mem.c
index 1aafd37..49a3b60 100644
--- a/nscd/mem.c
+++ b/nscd/mem.c
@@ -175,14 +175,15 @@ gc (struct database_dyn *db)
 	  /* This is the hash entry itself.  */
 	  markrange (mark, run, sizeof (struct hashentry));
 
-	  /* Add the information for the data itself.  We do this
-	     only for the one special entry marked with FIRST.  */
-	  if (he[cnt]->first)
-	    {
-	      struct datahead *dh
-		= (struct datahead *) (db->data + he[cnt]->packet);
-	      markrange (mark, he[cnt]->packet, dh->allocsize);
-	    }
+	  /* Add the information for the data itself.  We need to do this
+	     for all entries that point to the same data packet, not only
+	     the one marked as first.  Otherwise if the entry marked as
+	     first has already been collected while the secondary entry
+	     was added we may miss the reference from the latter one.
+	     This can happen when processing an INVALIDATE request.  */
+	  struct datahead *dh
+	    = (struct datahead *) (db->data + he[cnt]->packet);
+	  markrange (mark, he[cnt]->packet, dh->allocsize);
 
 	  run = he[cnt]->next;
 
-- 
2.7.2

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Fix nscd assertion failure in gc (bug 19755)
  2016-06-08 13:35 Andreas Schwab
@ 2016-06-08 19:58 ` Carlos O'Donell
  0 siblings, 0 replies; 3+ messages in thread
From: Carlos O'Donell @ 2016-06-08 19:58 UTC (permalink / raw)
  To: Andreas Schwab, libc-alpha

On 06/08/2016 09:35 AM, Andreas Schwab wrote:
> If a GETxxBYyy request (for passwd or group) is running in parallel to
> an INVALIDATE request (for the same database) then in a particular order
> of events the garbage collector is not properly marking all used memory
> and fails an assertion:
> 
>    GETGRBYNAME (root)
> Haven't found "root" in group cache!
> add new entry "root" of type GETGRBYNAME for group to cache (first)
> handle_request: request received (Version = 2) from PID 7413
>    INVALIDATE (group)
> pruning group cache; time 9223372036854775807
> considering GETGRBYNAME entry "root", timeout 1456763027
> add new entry "0" of type GETGRBYGID for group to cache
> remove GETGRBYNAME entry "root"
> nscd: mem.c:403: gc: Assertion `next_data == &he_data[db->head->nentries]' failed.
> 
> Here the first call to cache_add added the GETGRBYNAME entry, which is
> immediately marked for collection by prune_cache.  Then the GETGRBYGID
> entry is added which shares the data packet with the first entry and
> therefore is marked as !first, while the marking look in prune_cache has
> already finished.  When the garbage collector runs, it only considers
> references by entries marked as first, missing the reference by the
> secondary entry.
> 
> The only way to fix that is to prevent prune_cache from running while the
> two related entries are added.
> 
> 	[BZ #19755]
> 	* nscd/pwdcache.c (cache_addpw): Lock prune_run_lock while adding
> 	new entries in auto-propagate mode.
> 	* nscd/grpcache.c (cache_addgr): Likewise.

This looks good to me.

> ---
>  nscd/grpcache.c | 13 ++++++++++++-
>  nscd/pwdcache.c | 13 ++++++++++++-
>  2 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/nscd/grpcache.c b/nscd/grpcache.c
> index 3831170..8b9b13d 100644
> --- a/nscd/grpcache.c
> +++ b/nscd/grpcache.c
> @@ -205,10 +205,19 @@ cache_addgr (struct database_dyn *db, int fd, request_header *req,
>        dataset = NULL;
>  
>        if (he == NULL)
> -	dataset = (struct dataset *) mempool_alloc (db, total + n, 1);
> +	{
> +	  /* Prevent an INVALIDATE request from pruning the data between
> +	     the two calls to cache_add.  */
> +	  if (db->propagate)
> +	    pthread_mutex_lock (&db->prune_run_lock);

OK, using prune_run_lock to prevent concurrent changes.

> +	  dataset = (struct dataset *) mempool_alloc (db, total + n, 1);
> +	}
>  
>        if (dataset == NULL)
>  	{
> +	  if (he == NULL && db->propagate)
> +	    pthread_mutex_unlock (&db->prune_run_lock);

OK. Unlock if we're using alloca since we won't add these entries to the
database.

> +
>  	  /* We cannot permanently add the result in the moment.  But
>  	     we can provide the result as is.  Store the data in some
>  	     temporary memory.  */
> @@ -396,6 +405,8 @@ cache_addgr (struct database_dyn *db, int fd, request_header *req,
>  
>  	out:
>  	  pthread_rwlock_unlock (&db->lock);
> +	  if (he == NULL && db->propagate)
> +	    pthread_mutex_unlock (&db->prune_run_lock);

OK. Unlock in the !alloca_used path.

>  	}
>      }
>  
> diff --git a/nscd/pwdcache.c b/nscd/pwdcache.c
> index 6dd6746..5ef8485 100644
> --- a/nscd/pwdcache.c
> +++ b/nscd/pwdcache.c
> @@ -198,10 +198,19 @@ cache_addpw (struct database_dyn *db, int fd, request_header *req,
>        dataset = NULL;
>  
>        if (he == NULL)
> -	dataset = (struct dataset *) mempool_alloc (db, total + n, 1);
> +	{
> +	  /* Prevent an INVALIDATE request from pruning the data between
> +	     the two calls to cache_add.  */
> +	  if (db->propagate)
> +	    pthread_mutex_lock (&db->prune_run_lock);
> +	  dataset = (struct dataset *) mempool_alloc (db, total + n, 1);

OK.

> +	}
>  
>        if (dataset == NULL)
>  	{
> +	  if (he == NULL && db->propagate)
> +	    pthread_mutex_unlock (&db->prune_run_lock);

OK.

> +
>  	  /* We cannot permanently add the result in the moment.  But
>  	     we can provide the result as is.  Store the data in some
>  	     temporary memory.  */
> @@ -374,6 +383,8 @@ cache_addpw (struct database_dyn *db, int fd, request_header *req,
>  
>  	out:
>  	  pthread_rwlock_unlock (&db->lock);
> +	  if (he == NULL && db->propagate)
> +	    pthread_mutex_unlock (&db->prune_run_lock);

OK.

>  	}
>      }
>  
> 


-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] Fix nscd assertion failure in gc (bug 19755)
@ 2016-06-08 13:35 Andreas Schwab
  2016-06-08 19:58 ` Carlos O'Donell
  0 siblings, 1 reply; 3+ messages in thread
From: Andreas Schwab @ 2016-06-08 13:35 UTC (permalink / raw)
  To: libc-alpha

If a GETxxBYyy request (for passwd or group) is running in parallel to
an INVALIDATE request (for the same database) then in a particular order
of events the garbage collector is not properly marking all used memory
and fails an assertion:

   GETGRBYNAME (root)
Haven't found "root" in group cache!
add new entry "root" of type GETGRBYNAME for group to cache (first)
handle_request: request received (Version = 2) from PID 7413
   INVALIDATE (group)
pruning group cache; time 9223372036854775807
considering GETGRBYNAME entry "root", timeout 1456763027
add new entry "0" of type GETGRBYGID for group to cache
remove GETGRBYNAME entry "root"
nscd: mem.c:403: gc: Assertion `next_data == &he_data[db->head->nentries]' failed.

Here the first call to cache_add added the GETGRBYNAME entry, which is
immediately marked for collection by prune_cache.  Then the GETGRBYGID
entry is added which shares the data packet with the first entry and
therefore is marked as !first, while the marking look in prune_cache has
already finished.  When the garbage collector runs, it only considers
references by entries marked as first, missing the reference by the
secondary entry.

The only way to fix that is to prevent prune_cache from running while the
two related entries are added.

	[BZ #19755]
	* nscd/pwdcache.c (cache_addpw): Lock prune_run_lock while adding
	new entries in auto-propagate mode.
	* nscd/grpcache.c (cache_addgr): Likewise.
---
 nscd/grpcache.c | 13 ++++++++++++-
 nscd/pwdcache.c | 13 ++++++++++++-
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/nscd/grpcache.c b/nscd/grpcache.c
index 3831170..8b9b13d 100644
--- a/nscd/grpcache.c
+++ b/nscd/grpcache.c
@@ -205,10 +205,19 @@ cache_addgr (struct database_dyn *db, int fd, request_header *req,
       dataset = NULL;
 
       if (he == NULL)
-	dataset = (struct dataset *) mempool_alloc (db, total + n, 1);
+	{
+	  /* Prevent an INVALIDATE request from pruning the data between
+	     the two calls to cache_add.  */
+	  if (db->propagate)
+	    pthread_mutex_lock (&db->prune_run_lock);
+	  dataset = (struct dataset *) mempool_alloc (db, total + n, 1);
+	}
 
       if (dataset == NULL)
 	{
+	  if (he == NULL && db->propagate)
+	    pthread_mutex_unlock (&db->prune_run_lock);
+
 	  /* We cannot permanently add the result in the moment.  But
 	     we can provide the result as is.  Store the data in some
 	     temporary memory.  */
@@ -396,6 +405,8 @@ cache_addgr (struct database_dyn *db, int fd, request_header *req,
 
 	out:
 	  pthread_rwlock_unlock (&db->lock);
+	  if (he == NULL && db->propagate)
+	    pthread_mutex_unlock (&db->prune_run_lock);
 	}
     }
 
diff --git a/nscd/pwdcache.c b/nscd/pwdcache.c
index 6dd6746..5ef8485 100644
--- a/nscd/pwdcache.c
+++ b/nscd/pwdcache.c
@@ -198,10 +198,19 @@ cache_addpw (struct database_dyn *db, int fd, request_header *req,
       dataset = NULL;
 
       if (he == NULL)
-	dataset = (struct dataset *) mempool_alloc (db, total + n, 1);
+	{
+	  /* Prevent an INVALIDATE request from pruning the data between
+	     the two calls to cache_add.  */
+	  if (db->propagate)
+	    pthread_mutex_lock (&db->prune_run_lock);
+	  dataset = (struct dataset *) mempool_alloc (db, total + n, 1);
+	}
 
       if (dataset == NULL)
 	{
+	  if (he == NULL && db->propagate)
+	    pthread_mutex_unlock (&db->prune_run_lock);
+
 	  /* We cannot permanently add the result in the moment.  But
 	     we can provide the result as is.  Store the data in some
 	     temporary memory.  */
@@ -374,6 +383,8 @@ cache_addpw (struct database_dyn *db, int fd, request_header *req,
 
 	out:
 	  pthread_rwlock_unlock (&db->lock);
+	  if (he == NULL && db->propagate)
+	    pthread_mutex_unlock (&db->prune_run_lock);
 	}
     }
 
-- 
2.8.3

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-06-08 19:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-02 17:13 [PATCH] Fix nscd assertion failure in gc (bug 19755) Andreas Schwab
2016-06-08 13:35 Andreas Schwab
2016-06-08 19:58 ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).