From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 9E6BC3858CDB for ; Mon, 20 Feb 2023 18:40:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9E6BC3858CDB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676918405; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hWTmNiSORi7GAeB7wPDoX76ERdAFNbQySMhsYq6rjQU=; b=WtyJt5z3AbZ52Nx5FH+3jVh2JLqBKHkkICDnQaYU3Nnk7CGdESakRF1x24te4vfw5T7308 qFUSAQC0RbJH6F1NjZDdGdu+LawGkwr4kRJ24IWGmeyTebDvQaiQY/kWy25Ke17DOXMORx zttcte+kio9PQj09SryQy/zR6wzrODI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-36-zBdS9rBAME6PbXOYC7mWYw-1; Mon, 20 Feb 2023 13:40:01 -0500 X-MC-Unique: zBdS9rBAME6PbXOYC7mWYw-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 704C2803D48; Mon, 20 Feb 2023 18:40:01 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.2.16.38]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 214DA492B00; Mon, 20 Feb 2023 18:40:00 +0000 (UTC) From: Florian Weimer To: Adhemerval Zanella Cc: libc-alpha@sourceware.org, "Andreas K . Huettel" , Paul Eggert Subject: Re: [PATCH v5 2/5] linux: Set internal DIR filepos as off64_t (BZ #23960, BZ #24050) References: <20230127172834.391311-1-adhemerval.zanella@linaro.org> <20230127172834.391311-3-adhemerval.zanella@linaro.org> Date: Mon, 20 Feb 2023 19:39:58 +0100 In-Reply-To: <20230127172834.391311-3-adhemerval.zanella@linaro.org> (Adhemerval Zanella's message of "Fri, 27 Jan 2023 14:28:31 -0300") Message-ID: <87ilfwdvup.fsf@oldenburg.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: * Adhemerval Zanella: > diff --git a/dirent/tst-seekdir.c b/dirent/tst-seekdir.c > index dcdd699b09..187eda7584 100644 > --- a/dirent/tst-seekdir.c > +++ b/dirent/tst-seekdir.c > @@ -41,6 +41,14 @@ do_test (void) > if (i =3D=3D 400) > =09break; > } > + if (i < 3) > + { > + /* Non-lfs opendir skips entries that can not be represented (for > +=09 instance if d_off is not an offset but rather an internal filesystem > +=09 representation. For this case there is no point in continue the > +=09 testcase. */ > + return 77; > + } > =20 > printf ("going back past 4-th entry...\n"); I think this needs to check errno properly (set to zero before readdir, and if readdir fails, report a test error if errno is not zero). > index adcf8234f1..8f58a1c3a6 100644 > --- a/sysdeps/unix/sysv/linux/dirstream.h > +++ b/sysdeps/unix/sysv/linux/dirstream.h > @@ -22,6 +22,7 @@ > #include > =20 > #include > +#include > =20 > /* Directory stream type. > =20 > @@ -38,13 +39,16 @@ struct __dirstream > size_t size;=09=09/* Total valid data in the block. */ > size_t offset;=09=09/* Current offset into the block. */ > =20 > - off_t filepos;=09=09/* Position of next entry to read. */ > + off64_t filepos;=09=09/* Position of next entry to read. */ > =20 > int errcode;=09=09/* Delayed error code. */ > =20 > #if !defined __OFF_T_MATCHES_OFF64_T || !defined __INO_T_MATCHES_INO64_T > struct dirent tdp; > #endif > +#if _DIRENT_OFFSET_TRANSLATION > + struct dirstream_loc_t locs; /* off64_t to long int map for telldir.= */ > +#endif Please add a comment explaining the purpose of the new field. > diff --git a/sysdeps/unix/sysv/linux/readdir.c b/sysdeps/unix/sysv/linux/= readdir.c > index cd0ccaf33a..7a7f484c36 100644 > --- a/sysdeps/unix/sysv/linux/readdir.c > +++ b/sysdeps/unix/sysv/linux/readdir.c > @@ -36,6 +36,15 @@ dirstream_entry (struct __dirstream *ds, const struct = dirent64 *dp64) > if (dp64->d_reclen - offsetof (struct dirent64, d_name) > NAME_MAX) > return false; > =20 > + /* telldir can not return an error, so preallocate the map if the entr= y can > + not be packed directly. */ > + if (telldir_need_dirstream (dp64->d_off)) > + { > + dirstream_loc_add (&ds->locs, dp64->d_off); > + if (dirstream_loc_has_failed (&ds->locs)) > +=09return false; > + } I recommend to reserve the space here (but only for d_off values that need it), and perform the assignment only if there is an actual telldir call. This avoids all the additional processing and allocations for the common case that telldir is never called. This will matter quite a bit for large directories. Reservation can be implemented by adding a 0 dummy value (offset 0 will never be recorded), unless the last element is already 0. I don't think we need to change dynarray for this. > diff --git a/sysdeps/unix/sysv/linux/seekdir.c b/sysdeps/unix/sysv/linux/= seekdir.c > index 939ccc4447..30cce691a4 100644 > --- a/sysdeps/unix/sysv/linux/seekdir.c > +++ b/sysdeps/unix/sysv/linux/seekdir.c > + > + if (dirp->filepos !=3D filepos) > + { > + __lseek64 (dirp->fd, filepos, SEEK_SET); > + dirp->filepos =3D filepos; > + dirp->offset =3D 0; > + dirp->size =3D 0; > + } I think we should seek unconditionally, just in case someone messed with dirfd directly. > diff --git a/sysdeps/unix/sysv/linux/telldir.c b/sysdeps/unix/sysv/linux/= telldir.c > index 1e5c129e9f..c3ef14f3da 100644 > --- a/sysdeps/unix/sysv/linux/telldir.c > +++ b/sysdeps/unix/sysv/linux/telldir.c > @@ -15,9 +15,11 @@ > License along with the GNU C Library; if not, see > . */ > + else > + { > + dsp.l =3D -1; > + > + size_t i; > + for (i =3D 0; i < dirstream_loc_size (&dirp->locs); i++) > +=09if (*dirstream_loc_at (&dirp->locs, i) =3D=3D dirp->filepos) > +=09 break; > + /* It should be pre-allocated on readdir. */ > + assert (i !=3D dirstream_loc_size (&dirp->locs)); This should overwrite the final 0 element if the offset could not be found. I think the quadratic behavior is acceptable in this case, given that we do not really expect telldir to be called. > diff --git a/sysdeps/unix/sysv/linux/telldir.h b/sysdeps/unix/sysv/linux/= telldir.h > new file mode 100644 > index 0000000000..758bcb0eb3 > --- /dev/null > +++ b/sysdeps/unix/sysv/linux/telldir.h > @@ -0,0 +1,65 @@ > +/* Linux internal telldir definitions. > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifndef _TELLDIR_H > +#define _TELLDIR_H 1 > + > +#if _DIRENT_OFFSET_TRANSLATION > +/* On platforms where 'long int' is smaller than 'off64_t' this is how t= he > + returned value is encoded and returned by 'telldir'. If the director= y > + offset can be enconded in 31 bits it is returned in the 'info' member > + with 'is_packed' set to 1. > + > + Otherwise, the 'info' member describes an index in a dynamic array at > + 'DIR' structure. */ > + > +union dirstream_packed > +{ > + long int l; > + struct > + { > + unsigned long int is_packed:1; > + unsigned long int info:31; > + } p; > +}; Good explanation in the comment, but =E2=80=A6 can't you use negative value= s for indices, and ~ for encoding/decoding? I think that's a bit clearer. > +static __always_inline bool > +telldir_need_dirstream (__off64_t d_off) > +{ > + return d_off >=3D 1UL << 31; > +} Hmm, can d_off be negative? > diff --git a/sysdeps/unix/sysv/linux/tst-opendir-nolfs.c b/sysdeps/unix/s= ysv/linux/tst-opendir-nolfs.c > new file mode 100644 > index 0000000000..52e18171a7 > +static int > +do_test (void) > +{ > + DIR *dirp =3D opendir (dirname); > + TEST_VERIFY_EXIT (dirp !=3D NULL); > + > + long int *tdirp =3D xmalloc (nfiles * sizeof (long int)); > + struct dirent **ddirp =3D xmalloc (nfiles * sizeof (struct dirent *)); xreallocarray? > + /* For non-LFS, the entry is skipped if it can not be converted. */ > + int count =3D 0; > + for (; count < nfiles; count++) > + { > + tdirp[count] =3D telldir (dirp); > + struct dirent *dp =3D readdir (dirp); > + if (dp =3D=3D NULL) > +=09break; Should probably verify the count variable here (and check errno too?). > + ddirp[count] =3D xmalloc (dp->d_reclen); > + memcpy (ddirp[count], dp, dp->d_reclen); > + } > + > + closedir (dirp); > + > + /* Check against the getdents64 syscall. */ > + int fd =3D xopen (dirname, O_RDONLY | O_DIRECTORY, 0); > + int i =3D 0; > + while (true) If we create predictable file names (maybe just numbers?), this kind of checking becomes way easier. Thanks, Florian