From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x32.google.com (mail-oa1-x32.google.com [IPv6:2001:4860:4864:20::32]) by sourceware.org (Postfix) with ESMTPS id 07BBA3858D33 for ; Wed, 1 Mar 2023 17:40:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 07BBA3858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oa1-x32.google.com with SMTP id 586e51a60fabf-17264e9b575so15190488fac.9 for ; Wed, 01 Mar 2023 09:40:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1677692456; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=Oz12z6jDqEPGlSBrjOkxQKGiwEsGCwWP87UHbk7ul14=; b=k/zNFLqbKf0Elc/36IKBOiSwlJZ67CIuOaqOvVySIbJJseWEWcFvi2KvqYu+yyTw/z l1eB7JWKQ7F1rg2inLs8Dg78qYYoMjKIwFH8rnl997jzr593OlF/MPh07C1JjUFjC10R nhYQoidq6PMeO9G2kxSdOt+8CdI4E9heGNfUBWyniIHK/p8FfovTeKYeEG/U6NlMY+fK 2M+gkBX7f/A9I8D4tAtf39yPQCn9+3RqkM+6hMfYuRBNK/1bHmNrgJ/uUnW43uClh7wc ZMnOYAufMs8drrM9jHSz7N8eDBIbVbNXz/Brcia4Wzn7JMTqTLQjWWW2kd63IyJJe5ad IjIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677692456; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Oz12z6jDqEPGlSBrjOkxQKGiwEsGCwWP87UHbk7ul14=; b=Crc67koDC24siDiWcG++rprwG1KBbmK6SstJ55W57y0YLOtgJcIXoY9yaEEMMz5huJ hRBBICoR2UJhwvej81F9MFNilFbsMX+Figlpb39ZlDGJWR1jODNusB7sn8SKLJo48Udk xa4bVGhjQWXzZjnkZJag/iSCcsNVqfYW378exvPMiy69BwLe59KJmlOeCbXZIMWG8xHh odtdCeBrEAlw0/Jg6Vxs3lKut2hzjTOI1Kj4Jo/TNbt8hFewqoroO/sgsmIzwjUKnxKu hP9EOJGcVvVM7z+lEQF6wMO2mRqgH5GowP7HNIuAecO+8dEh1r3DN746uBdVcK4lCdi+ 27/w== X-Gm-Message-State: AO0yUKXmmdyBhMjwgnqcXdRcCI8BWGwaz3HcRYZ5asPLiXtG00tI0UaO iPNduogz3rMcTkXSGau0FwtCHja3rG0fEUWEP/Q= X-Google-Smtp-Source: AK7set8wsxPWBH0ASVUv7b/VaePHHMBK7xSxvJW+Sok5yV3fy1pSz5Z0/mdMf6mNF1Z90J+QgRuRmg== X-Received: by 2002:a05:6871:297:b0:16a:ca72:ae85 with SMTP id i23-20020a056871029700b0016aca72ae85mr4401899oae.16.1677692456173; Wed, 01 Mar 2023 09:40:56 -0800 (PST) Received: from ?IPV6:2804:1b3:a7c3:d849:ed82:aa7c:6d0c:c8ba? ([2804:1b3:a7c3:d849:ed82:aa7c:6d0c:c8ba]) by smtp.gmail.com with ESMTPSA id b19-20020a056820135300b004f9cd1e42d3sm5106837oow.26.2023.03.01.09.40.54 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 01 Mar 2023 09:40:55 -0800 (PST) Message-ID: <4710d4c5-effe-e6a8-270e-21da137935eb@linaro.org> Date: Wed, 1 Mar 2023 14:40:52 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [PATCH v5 2/5] linux: Set internal DIR filepos as off64_t (BZ #23960, BZ #24050) Content-Language: en-US To: Florian Weimer Cc: libc-alpha@sourceware.org, "Andreas K . Huettel" , Paul Eggert References: <20230127172834.391311-1-adhemerval.zanella@linaro.org> <20230127172834.391311-3-adhemerval.zanella@linaro.org> <87ilfwdvup.fsf@oldenburg.str.redhat.com> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: <87ilfwdvup.fsf@oldenburg.str.redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 20/02/23 15:39, Florian Weimer wrote: > * Adhemerval Zanella: > >> diff --git a/dirent/tst-seekdir.c b/dirent/tst-seekdir.c >> index dcdd699b09..187eda7584 100644 >> --- a/dirent/tst-seekdir.c >> +++ b/dirent/tst-seekdir.c >> @@ -41,6 +41,14 @@ do_test (void) >> if (i == 400) >> break; >> } >> + if (i < 3) >> + { >> + /* Non-lfs opendir skips entries that can not be represented (for >> + instance if d_off is not an offset but rather an internal filesystem >> + representation. For this case there is no point in continue the >> + testcase. */ >> + return 77; >> + } >> >> printf ("going back past 4-th entry...\n"); > > I think this needs to check errno properly (set to zero before readdir, > and if readdir fails, report a test error if errno is not zero). It can't in fact, since the entries will be ignored nad not reported as errors. Yes, it is subpar; but the whole non-LFS interface has similar corner cases and I think we can't paper over to much of its limitations. > >> index adcf8234f1..8f58a1c3a6 100644 >> --- a/sysdeps/unix/sysv/linux/dirstream.h >> +++ b/sysdeps/unix/sysv/linux/dirstream.h >> @@ -22,6 +22,7 @@ >> #include >> >> #include >> +#include >> >> /* Directory stream type. >> >> @@ -38,13 +39,16 @@ struct __dirstream >> size_t size; /* Total valid data in the block. */ >> size_t offset; /* Current offset into the block. */ >> >> - off_t filepos; /* Position of next entry to read. */ >> + off64_t filepos; /* Position of next entry to read. */ >> >> int errcode; /* Delayed error code. */ >> >> #if !defined __OFF_T_MATCHES_OFF64_T || !defined __INO_T_MATCHES_INO64_T >> struct dirent tdp; >> #endif >> +#if _DIRENT_OFFSET_TRANSLATION >> + struct dirstream_loc_t locs; /* off64_t to long int map for telldir. */ >> +#endif > > Please add a comment explaining the purpose of the new field. Ack. > >> diff --git a/sysdeps/unix/sysv/linux/readdir.c b/sysdeps/unix/sysv/linux/readdir.c >> index cd0ccaf33a..7a7f484c36 100644 >> --- a/sysdeps/unix/sysv/linux/readdir.c >> +++ b/sysdeps/unix/sysv/linux/readdir.c >> @@ -36,6 +36,15 @@ dirstream_entry (struct __dirstream *ds, const struct dirent64 *dp64) >> if (dp64->d_reclen - offsetof (struct dirent64, d_name) > NAME_MAX) >> return false; >> >> + /* telldir can not return an error, so preallocate the map if the entry can >> + not be packed directly. */ >> + if (telldir_need_dirstream (dp64->d_off)) >> + { >> + dirstream_loc_add (&ds->locs, dp64->d_off); >> + if (dirstream_loc_has_failed (&ds->locs)) >> + return false; >> + } > > I recommend to reserve the space here (but only for d_off values that > need it), and perform the assignment only if there is an actual telldir > call. This avoids all the additional processing and allocations for the > common case that telldir is never called. This will matter quite a bit > for large directories. > > Reservation can be implemented by adding a 0 dummy value (offset 0 will > never be recorded), unless the last element is already 0. I don't think > we need to change dynarray for this. We need the d_off value, since it will be used by telldir to actually return the correct dynarray position to return as dirstream_packed union. > > >> diff --git a/sysdeps/unix/sysv/linux/seekdir.c b/sysdeps/unix/sysv/linux/seekdir.c >> index 939ccc4447..30cce691a4 100644 >> --- a/sysdeps/unix/sysv/linux/seekdir.c >> +++ b/sysdeps/unix/sysv/linux/seekdir.c > >> + >> + if (dirp->filepos != filepos) >> + { >> + __lseek64 (dirp->fd, filepos, SEEK_SET); >> + dirp->filepos = filepos; >> + dirp->offset = 0; >> + dirp->size = 0; >> + } > > I think we should seek unconditionally, just in case someone messed with > dirfd directly. Alright. > >> diff --git a/sysdeps/unix/sysv/linux/telldir.c b/sysdeps/unix/sysv/linux/telldir.c >> index 1e5c129e9f..c3ef14f3da 100644 >> --- a/sysdeps/unix/sysv/linux/telldir.c >> +++ b/sysdeps/unix/sysv/linux/telldir.c >> @@ -15,9 +15,11 @@ >> License along with the GNU C Library; if not, see >> . */ > >> + else >> + { >> + dsp.l = -1; >> + >> + size_t i; >> + for (i = 0; i < dirstream_loc_size (&dirp->locs); i++) >> + if (*dirstream_loc_at (&dirp->locs, i) == dirp->filepos) >> + break; >> + /* It should be pre-allocated on readdir. */ >> + assert (i != dirstream_loc_size (&dirp->locs)); > > This should overwrite the final 0 element if the offset could not be > found. > > I think the quadratic behavior is acceptable in this case, given that we > do not really expect telldir to be called. Ack. > >> diff --git a/sysdeps/unix/sysv/linux/telldir.h b/sysdeps/unix/sysv/linux/telldir.h >> new file mode 100644 >> index 0000000000..758bcb0eb3 >> --- /dev/null >> +++ b/sysdeps/unix/sysv/linux/telldir.h >> @@ -0,0 +1,65 @@ >> +/* Linux internal telldir definitions. >> + Copyright (C) 2023 Free Software Foundation, Inc. >> + This file is part of the GNU C Library. >> + >> + The GNU C Library is free software; you can redistribute it and/or >> + modify it under the terms of the GNU Lesser General Public >> + License as published by the Free Software Foundation; either >> + version 2.1 of the License, or (at your option) any later version. >> + >> + The GNU C Library is distributed in the hope that it will be useful, >> + but WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + Lesser General Public License for more details. >> + >> + You should have received a copy of the GNU Lesser General Public >> + License along with the GNU C Library; if not, see >> + . */ >> + >> +#ifndef _TELLDIR_H >> +#define _TELLDIR_H 1 >> + >> +#if _DIRENT_OFFSET_TRANSLATION >> +/* On platforms where 'long int' is smaller than 'off64_t' this is how the >> + returned value is encoded and returned by 'telldir'. If the directory >> + offset can be enconded in 31 bits it is returned in the 'info' member >> + with 'is_packed' set to 1. >> + >> + Otherwise, the 'info' member describes an index in a dynamic array at >> + 'DIR' structure. */ >> + >> +union dirstream_packed >> +{ >> + long int l; >> + struct >> + { >> + unsigned long int is_packed:1; >> + unsigned long int info:31; >> + } p; >> +}; > > Good explanation in the comment, but … can't you use negative values for > indices, and ~ for encoding/decoding? I think that's a bit clearer. Do you mean remove the bitfield? > >> +static __always_inline bool >> +telldir_need_dirstream (__off64_t d_off) >> +{ >> + return d_off >= 1UL << 31; >> +} > > Hmm, can d_off be negative? But it is an unsigned comparion, so it should not matter afaiu. > >> diff --git a/sysdeps/unix/sysv/linux/tst-opendir-nolfs.c b/sysdeps/unix/sysv/linux/tst-opendir-nolfs.c >> new file mode 100644 >> index 0000000000..52e18171a7 > >> +static int >> +do_test (void) >> +{ >> + DIR *dirp = opendir (dirname); >> + TEST_VERIFY_EXIT (dirp != NULL); >> + >> + long int *tdirp = xmalloc (nfiles * sizeof (long int)); >> + struct dirent **ddirp = xmalloc (nfiles * sizeof (struct dirent *)); > > xreallocarray? Ack. > >> + /* For non-LFS, the entry is skipped if it can not be converted. */ >> + int count = 0; >> + for (; count < nfiles; count++) >> + { >> + tdirp[count] = telldir (dirp); >> + struct dirent *dp = readdir (dirp); >> + if (dp == NULL) >> + break; > > Should probably verify the count variable here (and check errno too?). The issue is we don't know if the entry will be skipped due overflow (the test check for non-LFS interface). And this lead me to add a test for LFS interface and realize that we need to use dirstream_packed for LFS as well if the ABI sets _DIRENT_OFFSET_TRANSLATION. > >> + ddirp[count] = xmalloc (dp->d_reclen); >> + memcpy (ddirp[count], dp, dp->d_reclen); >> + } >> + >> + closedir (dirp); >> + >> + /* Check against the getdents64 syscall. */ >> + int fd = xopen (dirname, O_RDONLY | O_DIRECTORY, 0); >> + int i = 0; >> + while (true) > > If we create predictable file names (maybe just numbers?), this kind of > checking becomes way easier. I am not sure because the test can have a 'count' less than 'nfiles'.