* timeout in LDAP access @ 2014-06-16 20:39 Denis Excoffier 2014-06-17 10:00 ` Corinna Vinschen 0 siblings, 1 reply; 33+ messages in thread From: Denis Excoffier @ 2014-06-16 20:39 UTC (permalink / raw) To: Cygwin Mailing List Hello, I’ve exercised ‘getent' a little bit those days (with 'db_enum: all’ in /etc/nsswitch.conf), and it seems to me that the timeout ‘tv' (3 seconds, in ldap.cc) is probably too small for servers not so quickly responsive or with many (500000, fake or real) users around (see the call to ldap_get_next_page_s()). 300 seconds should be enough i suppose. Also it is a pity that LDAP_TIMEOUT is not announced to the user (except under strace: 0x55). I don’t know the general policy for timeouts, but i consider that the user would like to be informed when the passwd/group list was truncated. Another (unrelated and less important) problem is that 'getent' happily produces lines with some extra ‘:’, in particular when the gecos field itself contains ‘:’. Regards, Denis Excoffier. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-06-16 20:39 timeout in LDAP access Denis Excoffier @ 2014-06-17 10:00 ` Corinna Vinschen 2014-06-17 10:30 ` gecos from AD? (was Re: timeout in LDAP access) Corinna Vinschen 2014-06-17 22:41 ` timeout in LDAP access Denis Excoffier 0 siblings, 2 replies; 33+ messages in thread From: Corinna Vinschen @ 2014-06-17 10:00 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 3775 bytes --] Hi Denis, On Jun 16 22:39, Denis Excoffier wrote: > Hello, > > I’ve exercised ‘getent' a little bit those days (with 'db_enum: all’ > in /etc/nsswitch.conf), and it seems to me that the timeout ‘tv' (3 > seconds, in ldap.cc) is probably too small for servers not so quickly > responsive or with many (500000, fake or real) users around (see the > call to ldap_get_next_page_s()). 300 seconds should be enough i > suppose. 300 seconds is a lot. I'm not quite sure I'm following you here. Let me start by explaining how the timeout is applied so we're all on the same page. When opening the conection to the DC, the bind operatrion will wait 3 seconds for the bind operation to complete. In calls to getpwnam, getpwuid, getgrname, getgrgid, the 3 seconds timeout is the timeout for fetching a single user or group entry. And it's not the timeout for fetching the basic info (name<->SID mapping), but only the timeout for the LDAP call returning the extended user info (pgid, gecos, home, shell). So, typically the user<->uid mapping is correct, only the secondary info might be wrong, if it's set in AD at all. When enumerating accounts (getpwent, getgrent) the timeout is applied to every call fetching the next 100 accounts. The *only* information which is actually enumerated is the list of existing SIDs, with a timeout of 3 seconds per 100 SIDs. So it's taking more than 3 seconds to fetch 100 account SIDs? And then... > Also it is a pity that LDAP_TIMEOUT is not announced to the user > (except under strace: 0x55). I don’t know the general policy for > timeouts, but i consider that the user would like to be informed when > the passwd/group list was truncated. ...you really get an LDAP_TIMEOUT from ldap_get_next_page_s? This puzzels me a bit since the documentation implies that tyhis won't happen. Here's the snippet from MSDN: When parsing the results set, it is possible for the server to return an empty page of results and yet still respond with an LDAP_SUCCESS return code. This indicates that the server was unable to retrieve a page of results, due to a timeout or other reason, but has not completed the search request. The proper behavior in this instance is to continue to call ldap_get_next_page_s until either another page of results are successfully retrieved, an error code is returned, or LDAP_NO_RESULTS_RETURNED is returned to indicate the search is complete. So I expect an LDAP_SUCCESS with ldap_count_entries() == 0 and then repeat the request. But the code doesn't expect LDAP_TIMEOUT in this case. Do I have to handle LDAP_TIMEOUT here as well? As far as propagating the timeout to the user, that's kind of tricky. I'm not looking forward to do that, but if so, it could only be an EIO error returned from get{pw,gr}ent. The general problem with timeouts is that they are always wrong. I'm wondering if the timeout, at least for enumerating accounts, should go away entirely. In case of a connection problem this could result in a hang for about 2 minutes by default I think (LDAP_OPT_PING_LIMIT). I could also raise the timeout, but the value doesn't really matter, it will be just as wrong as 3 seconds, just differently. Thoughts? > Another (unrelated and less important) problem is that 'getent' > happily produces lines with some extra ‘:’, in particular when the > gecos field itself contains ‘:’. Wow, that *is* important. All fields returned from the server have to get their colons converted to commas. I'll fix that. Thanks, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* gecos from AD? (was Re: timeout in LDAP access) 2014-06-17 10:00 ` Corinna Vinschen @ 2014-06-17 10:30 ` Corinna Vinschen 2014-06-17 12:51 ` Corinna Vinschen 2014-06-17 22:59 ` Denis Excoffier 2014-06-17 22:41 ` timeout in LDAP access Denis Excoffier 1 sibling, 2 replies; 33+ messages in thread From: Corinna Vinschen @ 2014-06-17 10:30 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 906 bytes --] On Jun 17 12:00, Corinna Vinschen wrote: > On Jun 16 22:39, Denis Excoffier wrote: > > Another (unrelated and less important) problem is that 'getent' > > happily produces lines with some extra ‘:’, in particular when the > > gecos field itself contains ‘:’. > > Wow, that *is* important. All fields returned from the server have to > get their colons converted to commas. I'll fix that. While we're at it... do we really need the gecos info? Cygwin fills out this field with the Windows username and SID info for internal purposes, and then adds the gecos info from AD. However, it's just informational and usually only used by the finger(1) tool. Shall I just remove fetching the gecos fields from AD entirely? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: gecos from AD? (was Re: timeout in LDAP access) 2014-06-17 10:30 ` gecos from AD? (was Re: timeout in LDAP access) Corinna Vinschen @ 2014-06-17 12:51 ` Corinna Vinschen 2014-06-17 23:07 ` Denis Excoffier 2014-06-18 2:18 ` AW: " Christoph H. Hochstaetter 2014-06-17 22:59 ` Denis Excoffier 1 sibling, 2 replies; 33+ messages in thread From: Corinna Vinschen @ 2014-06-17 12:51 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 1239 bytes --] On Jun 17 12:30, Corinna Vinschen wrote: > On Jun 17 12:00, Corinna Vinschen wrote: > > On Jun 16 22:39, Denis Excoffier wrote: > > > Another (unrelated and less important) problem is that 'getent' > > > happily produces lines with some extra ‘:’, in particular when the > > > gecos field itself contains ‘:’. > > > > Wow, that *is* important. All fields returned from the server have to > > get their colons converted to commas. I'll fix that. On second thought, removing colons should only occur for gecos. The other fields shouldn't contain colons anyway since their content has to be POSIX-compatible anyway. So, either I add code to remove the colons from the gecos field ... > While we're at it... do we really need the gecos info? Cygwin fills > out this field with the Windows username and SID info for internal > purposes, and then adds the gecos info from AD. However, it's just > informational and usually only used by the finger(1) tool. > > Shall I just remove fetching the gecos fields from AD entirely? ... or that. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: gecos from AD? (was Re: timeout in LDAP access) 2014-06-17 12:51 ` Corinna Vinschen @ 2014-06-17 23:07 ` Denis Excoffier 2014-06-18 2:18 ` AW: " Christoph H. Hochstaetter 1 sibling, 0 replies; 33+ messages in thread From: Denis Excoffier @ 2014-06-17 23:07 UTC (permalink / raw) To: cygwin On 2014-06-17 14:51, Corinna Vinschen wrote: > On Jun 17 12:30, Corinna Vinschen wrote: >> On Jun 17 12:00, Corinna Vinschen wrote: >>> On Jun 16 22:39, Denis Excoffier wrote: >>>> Another (unrelated and less important) problem is that 'getent' >>>> happily produces lines with some extra ‘:’, in particular when the >>>> gecos field itself contains ‘:’. >>> >>> Wow, that *is* important. All fields returned from the server have to >>> get their colons converted to commas. I'll fix that. > > On second thought, removing colons should only occur for gecos. The > other fields shouldn't contain colons anyway since their content has to > be POSIX-compatible anyway. > > So, either I add code to remove the colons from the gecos field … This. > > >> While we're at it... do we really need the gecos info? Cygwin fills >> out this field with the Windows username and SID info for internal >> purposes, and then adds the gecos info from AD. However, it's just >> informational and usually only used by the finger(1) tool. >> >> Shall I just remove fetching the gecos fields from AD entirely? > > ... or that. Not that. Denis Excoffier. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 33+ messages in thread
* AW: gecos from AD? (was Re: timeout in LDAP access) 2014-06-17 12:51 ` Corinna Vinschen 2014-06-17 23:07 ` Denis Excoffier @ 2014-06-18 2:18 ` Christoph H. Hochstaetter 1 sibling, 0 replies; 33+ messages in thread From: Christoph H. Hochstaetter @ 2014-06-18 2:18 UTC (permalink / raw) To: cygwin On Jun 17 14:52, Corinna Vinschen wrote: >On Jun 17 12:30, Corinna Vinschen wrote: >> On Jun 17 12:00, Corinna Vinschen wrote: >> > On Jun 16 22:39, Denis Excoffier wrote: >> > > Another (unrelated and less important) problem is that 'getent' >> > > happily produces lines with some extra ‘:’, in particular when the >> > > gecos field itself contains ‘:’. >> > >> > Wow, that *is* important. All fields returned from the server have >> > to get their colons converted to commas. I'll fix that. > >On second thought, removing colons should only occur for gecos. >The other fields shouldn't contain colons anyway since their >content has to be POSIX-compatible anyway. > >So, either I add code to remove the colons from the gecos field ... > > >> While we're at it... do we really need the gecos info? Cygwin fills >> out this field with the Windows username and SID info for internal >> purposes, and then adds the gecos info from AD. However, it's just >> informational and usually only used by the finger(1) tool. >> >> Shall I just remove fetching the gecos fields from AD entirely? > >... or that. See http://www.manpages.info/freebsd/passwd.5.html "This information is used by the finger(1) program, and the first field used by the system mailer " Some mail program might want to use the gecos field as a friendly name in the from: field. Thus the gecos field should start with the user's full name and then a comma (at least when retrieved with getpwnam(3) but getpwent(3) would be great too). Whether the name comes from the gecos field in AD or any other source (e.g. some other AD field) doesn't seem to make a difference for me. -Christoph -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: gecos from AD? (was Re: timeout in LDAP access) 2014-06-17 10:30 ` gecos from AD? (was Re: timeout in LDAP access) Corinna Vinschen 2014-06-17 12:51 ` Corinna Vinschen @ 2014-06-17 22:59 ` Denis Excoffier 2014-06-18 8:38 ` Corinna Vinschen 1 sibling, 1 reply; 33+ messages in thread From: Denis Excoffier @ 2014-06-17 22:59 UTC (permalink / raw) To: cygwin On 2014-06-17 12:30, Corinna Vinschen wrote: > On Jun 17 12:00, Corinna Vinschen wrote: >> On Jun 16 22:39, Denis Excoffier wrote: >>> Another (unrelated and less important) problem is that 'getent' >>> happily produces lines with some extra ‘:’, in particular when the >>> gecos field itself contains ‘:’. >> >> Wow, that *is* important. All fields returned from the server have to >> get their colons converted to commas. I'll fix that. > > While we're at it... do we really need the gecos info? Cygwin fills > out this field with the Windows username and SID info for internal > purposes, and then adds the gecos info from AD. However, it's just > informational and usually only used by the finger(1) tool. The gecos field from AD seems to be _prepended_ (not appended) to the username + SID. In any case, it may represent some information with high added value (like user real name or e-mail address, depending on local rules of course). I would not vote for removing it. Why is it so clear that the ‘:’ should be replaced by a comma? Here, we have situations where it contains something like « Owner: Albert Einstein ». An underscore could be more appropriate. There is something more important: i’ve written in one of my previous messages that when ‘:’ occurs in gecos, the resulting ‘passwd’ file under ‘getent’ will contain more ‘:’ than expected, but this is incorrect. In fact (and i would like someone to try it), when ‘:’ is found within the gecos field, ‘getent’ does not show the last (homedir) field, and the count of ‘:’ is still correct. The problem might not be in getent after all. Regards, Denis Excoffier. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: gecos from AD? (was Re: timeout in LDAP access) 2014-06-17 22:59 ` Denis Excoffier @ 2014-06-18 8:38 ` Corinna Vinschen 0 siblings, 0 replies; 33+ messages in thread From: Corinna Vinschen @ 2014-06-18 8:38 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 2386 bytes --] On Jun 18 00:59, Denis Excoffier wrote: > On 2014-06-17 12:30, Corinna Vinschen wrote: > > On Jun 17 12:00, Corinna Vinschen wrote: > >> On Jun 16 22:39, Denis Excoffier wrote: > >>> Another (unrelated and less important) problem is that 'getent' > >>> happily produces lines with some extra ‘:’, in particular when the > >>> gecos field itself contains ‘:’. > >> > >> Wow, that *is* important. All fields returned from the server have to > >> get their colons converted to commas. I'll fix that. > > > > While we're at it... do we really need the gecos info? Cygwin fills > > out this field with the Windows username and SID info for internal > > purposes, and then adds the gecos info from AD. However, it's just > > informational and usually only used by the finger(1) tool. > The gecos field from AD seems to be _prepended_ (not appended) to the > username + SID. Right, I just wasn't going for details. The content of gecos is added to the pw_gecos field, one way or another. > In any case, it may represent some information with > high added value (like user real name or e-mail address, depending on > local rules of course). I would not vote for removing it. > > Why is it so clear that the ‘:’ should be replaced by a comma? Here, we > have situations where it contains something > like « Owner: Albert Einstein ». An underscore could be more appropriate. The point is, the colon must be replaced with some other ASCII char. I'm pretty sure this doesn't deserve another nsswitch.conf setting. So we just choose *some* ASCII char and be done with it. I don't like the underscore but maybe space is ok. Or semicolon. > There is something more important: i’ve written in one of my previous > messages that when ‘:’ occurs in gecos, the resulting ‘passwd’ file under > ‘getent’ will contain more ‘:’ than expected, but this is incorrect. In fact > (and i would like someone to try it), when ‘:’ is found within the > gecos field, ‘getent’ does not show the last (homedir) field, and > the count of ‘:’ is still correct. The problem might not be in getent after > all. Sure. It's all occuring inside the Cygwin DLL. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-06-17 10:00 ` Corinna Vinschen 2014-06-17 10:30 ` gecos from AD? (was Re: timeout in LDAP access) Corinna Vinschen @ 2014-06-17 22:41 ` Denis Excoffier 2014-06-18 8:33 ` Corinna Vinschen 1 sibling, 1 reply; 33+ messages in thread From: Denis Excoffier @ 2014-06-17 22:41 UTC (permalink / raw) To: cygwin Hi Corinna, On 2014-06-17 12:00, Corinna Vinschen wrote: > > So I expect an LDAP_SUCCESS with ldap_count_entries() == 0 and then > repeat the request. But the code doesn't expect LDAP_TIMEOUT in this > case. Do I have to handle LDAP_TIMEOUT here as well? LDAP_TIMEOUT can occur there. I can even suppose it occurs more frequently for the _last_ 100-sid chunk (eg there are 5868 users in a domain, and timeout occurs after 5800 and the last 68 get lost). But it can also occur after 27 chunks while about 350000 users are still to be read in a given domain (yes, that makes about 352700 users in a single domain). I’m pretty convinced today that 300 is more than enough, and that with 3, only one or two timeouts are to be expected for an AD with 500000 users and not so many domains (50 or 100). The flaw is that as soon as the first timeout occurs, the whole rest of the current domain is skipped, which can be much in some cases. ldap_get_next_page_s() should perhaps deserve a second chance (with timeout 30s). After all, this function is called 3527 times (for the same domain). Also a simple observation: if LDAP_TIMEOUT is not to be expected, what is the use of this timeval* parameter in ldap_get_next_page_s()? > I'm wondering if the timeout, at least for enumerating accounts, should > go away entirely. In case of a connection problem this could result in > a hang for about 2 minutes by default I think (LDAP_OPT_PING_LIMIT). I think i like this (it it works). But in this case, it will not resume to the next domain, and the whole operation (eg getent) is interrupted? Regards, Denis Excoffier. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-06-17 22:41 ` timeout in LDAP access Denis Excoffier @ 2014-06-18 8:33 ` Corinna Vinschen 2014-06-18 18:01 ` Corinna Vinschen 0 siblings, 1 reply; 33+ messages in thread From: Corinna Vinschen @ 2014-06-18 8:33 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 3221 bytes --] On Jun 18 00:41, Denis Excoffier wrote: > Hi Corinna, > > On 2014-06-17 12:00, Corinna Vinschen wrote: > > > > So I expect an LDAP_SUCCESS with ldap_count_entries() == 0 and then > > repeat the request. But the code doesn't expect LDAP_TIMEOUT in this > > case. Do I have to handle LDAP_TIMEOUT here as well? > LDAP_TIMEOUT can occur there. I can even suppose it occurs more > frequently for the _last_ 100-sid chunk (eg there are 5868 users in > a domain, and timeout occurs after 5800 and the last 68 get lost). But > it can also occur after 27 chunks while about 350000 users are still to be > read in a given domain (yes, that makes about 352700 users in a single domain). > > I’m pretty convinced today that 300 is more than enough, Much more than enough. 300 seconds? 5 minutes? For 100 SIDs? > and that with 3, only > one or two timeouts are to be expected for an AD with 500000 users and not so > many domains (50 or 100). The flaw is that as soon as the first timeout occurs, > the whole rest of the current domain is skipped, which can be much in some cases. > ldap_get_next_page_s() should perhaps deserve a second chance (with timeout 30s). > After all, this function is called 3527 times (for the same domain). > > Also a simple observation: if LDAP_TIMEOUT is not to be expected, what is the > use of this timeval* parameter in ldap_get_next_page_s()? > > > I'm wondering if the timeout, at least for enumerating accounts, should > > go away entirely. In case of a connection problem this could result in > > a hang for about 2 minutes by default I think (LDAP_OPT_PING_LIMIT). > I think i like this (it it works). But in this case, it will not resume > to the next domain, and the whole operation (eg getent) is interrupted? I don't quite understand the question. All LDAP operations have a default timeout of 2 minutes if LDAP_OPT_TIMEOUT is not set. The operations we're doing here are pretty simple ones, the bunch of 100 SIDs per getpwent LDAP call is a really small dataset (about 4K bytes) of indexed data, which should be readily available. And there's a certain (not Cygwin-specific) expectation that a simple LDAP operation is fast. Assuming the server takes more than just 3 seconds to reply to a single request for some reason, let's say 30 seconds. The call will result in a laming output of getent, of course, but it would have no other consequences. If the server needs actually more than two minutes to reply, and doesn't return a ping either, the timeout is a very likely indication that we have network problems, or the server is down. In that case, the normal code path applies. The connection with the server will be closed and we try the next domain. The idea I was proposing was just to drop all attempts to seconds guess how fast a DC replies. We're going to use LDAP with default settings and that's it. Default settings means, every operation times out after the default timeout period of 120 seconds, which should really be sufficient. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-06-18 8:33 ` Corinna Vinschen @ 2014-06-18 18:01 ` Corinna Vinschen 2014-06-19 17:53 ` Denis Excoffier 0 siblings, 1 reply; 33+ messages in thread From: Corinna Vinschen @ 2014-06-18 18:01 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 2383 bytes --] On Jun 18 10:33, Corinna Vinschen wrote: > On Jun 18 00:41, Denis Excoffier wrote: > > On 2014-06-17 12:00, Corinna Vinschen wrote: > > > I'm wondering if the timeout, at least for enumerating accounts, should > > > go away entirely. In case of a connection problem this could result in > > > a hang for about 2 minutes by default I think (LDAP_OPT_PING_LIMIT). > > I think i like this (it it works). But in this case, it will not resume > > to the next domain, and the whole operation (eg getent) is interrupted? > > I don't quite understand the question. All LDAP operations have a > default timeout of 2 minutes if LDAP_OPT_TIMEOUT is not set. The > operations we're doing here are pretty simple ones, the bunch of 100 > SIDs per getpwent LDAP call is a really small dataset (about 4K bytes) > of indexed data, which should be readily available. And there's a > certain (not Cygwin-specific) expectation that a simple LDAP operation > is fast. > > Assuming the server takes more than just 3 seconds to reply to > a single request for some reason, let's say 30 seconds. The call will > result in a laming output of getent, of course, but it would have no > other consequences. If the server needs actually more than two minutes > to reply, and doesn't return a ping either, the timeout is a very likely > indication that we have network problems, or the server is down. > In that case, the normal code path applies. The connection with the > server will be closed and we try the next domain. > > The idea I was proposing was just to drop all attempts to seconds guess > how fast a DC replies. We're going to use LDAP with default settings > and that's it. Default settings means, every operation times out after > the default timeout period of 120 seconds, which should really be > sufficient. I'm not quite sure I understand the effect of all the timeout values in LDAP entirely correctly and the API documentation leaves quite a bit to be desired. For the time being I raised the timeout to 30 seconds, and colons in the gecos field are converted to semicolons. I uploaded a new developer snapshot to http://cygwin.com/snapshots/ Please give it a try. Thanks, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-06-18 18:01 ` Corinna Vinschen @ 2014-06-19 17:53 ` Denis Excoffier 2014-06-23 9:10 ` Corinna Vinschen 0 siblings, 1 reply; 33+ messages in thread From: Denis Excoffier @ 2014-06-19 17:53 UTC (permalink / raw) To: cygwin On 2014-06-18 20:01, Corinna Vinschen wrote: > On Jun 18 10:33, Corinna Vinschen wrote: >> >> >> The idea I was proposing was just to drop all attempts to seconds guess >> how fast a DC replies. We're going to use LDAP with default settings >> and that's it. Default settings means, every operation times out after >> the default timeout period of 120 seconds, which should really be >> sufficient. > > I'm not quite sure I understand the effect of all the timeout values in > LDAP entirely correctly and the API documentation leaves quite a bit to > be desired. > > For the time being I raised the timeout to 30 seconds, and colons in > the gecos field are converted to semicolons. I uploaded a new developer > snapshot to http://cygwin.com/snapshots/ Please give it a try. I tried the last snapshot. First the ${tr ‘:’ ‘;’} operation works perfectly, and the last field (of 'getent passwd' is now always the homedir. You may like to correct a typo in the ChangeLog, should be ‘semicolon’ instead of ‘comma’. Also, i tried with several different values for CYG_LDAP_TIMEOUT. With 45s, 60s, 115s and 125s, i obtained no timeout (outputing 500000 users takes 1h). I tried 3 times with 30s and got once with no timeout, once with one timeout and once with 3 timeouts (ie one timeout for 3 domains). In any case, if you wish to switch to timeout=120s is ok for me. The PageSize (100) could also be changed? Here two remarks about timeouts: 1) for most of the 100-sid chunks, the high timeout is not used, therefore the global penalty in delay is not so high. And perhaps a 120s timeout is high enough so that when it is met, we could abandon not only the current domain, but also the whole search? 2) if value of timeout is not high enough (i have no figures…), timeout may occur when the PC is in fact occupied with other tasks (eg antivirus scanning or something else), unrelated to network delays or server latencies. Regards, Denis. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-06-19 17:53 ` Denis Excoffier @ 2014-06-23 9:10 ` Corinna Vinschen 2014-06-23 20:38 ` Denis Excoffier 0 siblings, 1 reply; 33+ messages in thread From: Corinna Vinschen @ 2014-06-23 9:10 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 4215 bytes --] On Jun 19 19:53, Denis Excoffier wrote: > On 2014-06-18 20:01, Corinna Vinschen wrote: > > On Jun 18 10:33, Corinna Vinschen wrote: > >> > >> > >> The idea I was proposing was just to drop all attempts to seconds guess > >> how fast a DC replies. We're going to use LDAP with default settings > >> and that's it. Default settings means, every operation times out after > >> the default timeout period of 120 seconds, which should really be > >> sufficient. > > > > I'm not quite sure I understand the effect of all the timeout values in > > LDAP entirely correctly and the API documentation leaves quite a bit to > > be desired. > > > > For the time being I raised the timeout to 30 seconds, and colons in > > the gecos field are converted to semicolons. I uploaded a new developer > > snapshot to http://cygwin.com/snapshots/ Please give it a try. > > I tried the last snapshot. First the ${tr ‘:’ ‘;’} operation works perfectly, > and the last field (of 'getent passwd' is now always the homedir. You may > like to correct a typo in the ChangeLog, should be ‘semicolon’ instead of > ‘comma’. > > Also, i tried with several different values for CYG_LDAP_TIMEOUT. With 45s, 60s, > 115s and 125s, i obtained no timeout (outputing 500000 users takes 1h). > I tried 3 times with 30s and got once with no timeout, once with one timeout > and once with 3 timeouts (ie one timeout for 3 domains). > > In any case, if you wish to switch to timeout=120s is ok for me. Here's another question, which occured to me over the weekend: Do you really *want* to enumerate 500K users when accessing the DCs remote over a slow DSL line? Isn't this a situation in which you'd rather like to avoid enumerating accounts or restrict it to an essential subset? That's what db_enum would be good for. What I'm really concerned about is not the enumeration functionality getpwent/getgrent, but the "normal" functions accessing a single account. Accessing a single account, even over a slow line, shouldn't be that slow. And even if it is, it's only the supplementary information (gecos, home, shell, *iff* any of them is set in AD at all) which might be wrong. Do we really want to introduce a long timeout for this? I don't think so. I'm rather inclined to revert the timeout for single account access to a smaller value again (5 or 10 secs) and introduce a second, longer timeout value for enumeration (60 secs, for instance). I've applied a patch to ldap.cc to this effect. Would you mind to give it a try? > The PageSize > (100) could also be changed? Yes, the pagesize can be changed, too. I'm just not sure about the consequences. In my pretty small AD environment 100 seemed to be a good compromise in terms of performance and size (as I mentioned, just 4 KB per page). Less than 100 slowed down getent noticably, more than 100 didn't provide a visible speedup. Can you test in your big environment in how far raising this value changes the performance and the chance for timeout? Since the load is on the server, it should be pretty fast in collecting the next X SIDs. I'm just a bit concerned about the (unnecessary?) network traffic this might generate. > Here two remarks about timeouts: > 1) for most of the 100-sid chunks, the high timeout is not used, therefore > the global penalty in delay is not so high. And perhaps a 120s timeout is high > enough so that when it is met, we could abandon not only the current domain, > but also the whole search? Would that be really a bright idea? Assuming your ADs (and their DCs) are in different remote locations, One of those connections being down would disable enumerating other domains. > 2) if value of timeout is not high enough (i have no figures…), timeout may > occur when the PC is in fact occupied with other tasks (eg antivirus scanning > or something else), unrelated to network delays or server latencies. No timeout is prepared for a CPU being 100% in use :| Thanks, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-06-23 9:10 ` Corinna Vinschen @ 2014-06-23 20:38 ` Denis Excoffier 2014-06-24 15:59 ` Corinna Vinschen 0 siblings, 1 reply; 33+ messages in thread From: Denis Excoffier @ 2014-06-23 20:38 UTC (permalink / raw) To: cygwin On 2014-06-23 11:09, Corinna Vinschen wrote: > On Jun 19 19:53, Denis Excoffier wrote: > > Do you really *want* to enumerate 500K users when accessing the DCs > remote over a slow DSL line? Isn't this a situation in which you'd > rather like to avoid enumerating accounts or restrict it to an > essential subset? That's what db_enum would be good for. IMHO the line is not especially slow. Instead, the server (and occasionally the client) is clobbered sometimes. For example it seems more difficult (ie timeout occurs more frequently) for a server to output the last sid’s in a domain than to output a full PageSize of results. Personally i don’t *want* to use /etc/nsswitch.conf at all. What bothers me is that the user does not get any indication of a timeout (and several successive and unrelated timeouts may be met in a single invocation of getent). Therefore even if all servers are up, the user has no means to know that the list is exhaustive. If the timeout occurs for the last chunk this is not so important, but if the timeout occurs in the middle it may be. That is the difference between a large timeout and a timeout, say, too accurate. > I'm rather inclined to revert the timeout for single account access to a > smaller value again (5 or 10 secs) and introduce a second, longer > timeout value for enumeration (60 secs, for instance). This is fine. I suppose timeout will rarely occur when a single result is expected (and the server is up). I tried ‘getent passwd sid’ a couple of times and the result has always been instantaneous. > > I've applied a patch to ldap.cc to this effect. Would you mind to give > it a try? 60s is okay. Today i got several timeouts while enumerating passwd with a timeout of 60s. Last Friday, all my tests with timeout >= 45s produced no timeout. Perhaps the servers are less used when the week-end is not too far... > >> The PageSize >> (100) could also be changed? > > Yes, the pagesize can be changed, too. I'm just not sure about the > consequences. In my pretty small AD environment 100 seemed to be a > good compromise in terms of performance and size (as I mentioned, just > 4 KB per page). Less than 100 slowed down getent noticably, more than > 100 didn't provide a visible speedup. > > Can you test in your big environment in how far raising this value > changes the performance and the chance for timeout? Since the load is > on the server, it should be pretty fast in collecting the next X SIDs. > I'm just a bit concerned about the (unnecessary?) network traffic this > might generate. I tried pagesize=50,200,400, with, as you said, no notable difference. With 400, i can suppose it is a little faster (10% less than usually) and a little longer with 50. 1 or 2 timeouts always (i also tried with timeout=120s). No big difference really. > >> Here two remarks about timeouts: >> 1) for most of the 100-sid chunks, the high timeout is not used, therefore >> the global penalty in delay is not so high. And perhaps a 120s timeout is high >> enough so that when it is met, we could abandon not only the current domain, >> but also the whole search? > > Would that be really a bright idea? Assuming your ADs (and their DCs) > are in different remote locations, One of those connections being down > would disable enumerating other domains. It would be a means to have getent 'depend' on a unique timeout. > >> 2) if value of timeout is not high enough (i have no figures…), timeout may >> occur when the PC is in fact occupied with other tasks (eg antivirus scanning >> or something else), unrelated to network delays or server latencies. > > No timeout is prepared for a CPU being 100% in use :| My experience is that if antivirus considers that some job has to be done urgently, everything else freezes. I have to cope with that. Well. My (current) opinion is: * def_tv=5, enum_tv=60 or 120 * pagesize=100 is fine * perhaps getent could be augmented to enumerate domains (getent domain) and also to enumerate sids in a given domain? That way, the timeout, when it occurs, is for a single domain. And this would perhaps be more useful than the full ‘getent passwd’ for a large database. Thank you Corinna for your time with this. Denis Excoffier. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-06-23 20:38 ` Denis Excoffier @ 2014-06-24 15:59 ` Corinna Vinschen 2014-06-25 10:15 ` Corinna Vinschen 0 siblings, 1 reply; 33+ messages in thread From: Corinna Vinschen @ 2014-06-24 15:59 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 2726 bytes --] On Jun 23 22:38, Denis Excoffier wrote: > On 2014-06-23 11:09, Corinna Vinschen wrote: > > On Jun 19 19:53, Denis Excoffier wrote: > > > > Do you really *want* to enumerate 500K users when accessing the DCs > > remote over a slow DSL line? Isn't this a situation in which you'd > > rather like to avoid enumerating accounts or restrict it to an > > essential subset? That's what db_enum would be good for. > IMHO the line is not especially slow. Instead, the > server (and occasionally the client) is clobbered sometimes. For example it > seems more difficult (ie timeout occurs more frequently) for a server > to output the last sid’s in a domain than to output a full PageSize of > results. > > Personally i don’t *want* to use /etc/nsswitch.conf at all. What bothers me > is that the user does not get any indication of a timeout (and several successive > and unrelated timeouts may be met in a single invocation of getent). Therefore > even if all servers are up, the user has no means to know that the list is exhaustive. > If the timeout occurs for the last chunk this is not so important, but if > the timeout occurs in the middle it may be. That is the difference between > a large timeout and a timeout, say, too accurate. > [...] > >> 1) for most of the 100-sid chunks, the high timeout is not used, therefore > >> the global penalty in delay is not so high. And perhaps a 120s timeout is high > >> enough so that when it is met, we could abandon not only the current domain, > >> but also the whole search? > > > > Would that be really a bright idea? Assuming your ADs (and their DCs) > > are in different remote locations, One of those connections being down > > would disable enumerating other domains. > It would be a means to have getent 'depend' on a unique timeout. > > > >> 2) if value of timeout is not high enough (i have no figures…), timeout may > >> occur when the PC is in fact occupied with other tasks (eg antivirus scanning > >> or something else), unrelated to network delays or server latencies. > > Stay tuned. I'm rewriting the LDAP access code to perform all critical LDAP calls in interruptible threads. The Windows LDAP calls don't provide any kind of synchronization, only timeouts. I hoped to get away with short timeouts but it seems I hoped in vain. So the next iteration of this code will not use any timeout other than the default LDAP network timeout of 2 minutes, but the calls will be interruptible by signals. I hope that fixes this the right way :} Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-06-24 15:59 ` Corinna Vinschen @ 2014-06-25 10:15 ` Corinna Vinschen 2014-06-25 20:44 ` Denis Excoffier 0 siblings, 1 reply; 33+ messages in thread From: Corinna Vinschen @ 2014-06-25 10:15 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 3475 bytes --] On Jun 24 17:58, Corinna Vinschen wrote: > On Jun 23 22:38, Denis Excoffier wrote: > > On 2014-06-23 11:09, Corinna Vinschen wrote: > > > On Jun 19 19:53, Denis Excoffier wrote: > > > > > > Do you really *want* to enumerate 500K users when accessing the DCs > > > remote over a slow DSL line? Isn't this a situation in which you'd > > > rather like to avoid enumerating accounts or restrict it to an > > > essential subset? That's what db_enum would be good for. > > IMHO the line is not especially slow. Instead, the > > server (and occasionally the client) is clobbered sometimes. For example it > > seems more difficult (ie timeout occurs more frequently) for a server > > to output the last sid’s in a domain than to output a full PageSize of > > results. > > > > Personally i don’t *want* to use /etc/nsswitch.conf at all. What bothers me > > is that the user does not get any indication of a timeout (and several successive > > and unrelated timeouts may be met in a single invocation of getent). Therefore > > even if all servers are up, the user has no means to know that the list is exhaustive. > > If the timeout occurs for the last chunk this is not so important, but if > > the timeout occurs in the middle it may be. That is the difference between > > a large timeout and a timeout, say, too accurate. > > [...] > > >> 1) for most of the 100-sid chunks, the high timeout is not used, therefore > > >> the global penalty in delay is not so high. And perhaps a 120s timeout is high > > >> enough so that when it is met, we could abandon not only the current domain, > > >> but also the whole search? > > > > > > Would that be really a bright idea? Assuming your ADs (and their DCs) > > > are in different remote locations, One of those connections being down > > > would disable enumerating other domains. > > It would be a means to have getent 'depend' on a unique timeout. > > > > > >> 2) if value of timeout is not high enough (i have no figures…), timeout may > > >> occur when the PC is in fact occupied with other tasks (eg antivirus scanning > > >> or something else), unrelated to network delays or server latencies. > > > > > Stay tuned. I'm rewriting the LDAP access code to perform all critical > LDAP calls in interruptible threads. The Windows LDAP calls don't > provide any kind of synchronization, only timeouts. I hoped to get away > with short timeouts but it seems I hoped in vain. > > So the next iteration of this code will not use any timeout other than > the default LDAP network timeout of 2 minutes, but the calls will be > interruptible by signals. > > I hope that fixes this the right way :} I applied a matching patch and created new developer snapshots on http://cygwin.com/snapshots/ No more artificial timeouts, but the LDAP calls will be interruptible by a signal now. Also, if an error occurs during ad enumeration, getpwent/getgrent will return NULL with errno set accordingly. But that won't help you much when running getent. getent simply stops the enumeration when getpwent/getgrent return NULL. It does not check the error code and therefore it won't indicate if the enumeration has been stopped for a reason other than the end of the list has been reached. Please test, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-06-25 10:15 ` Corinna Vinschen @ 2014-06-25 20:44 ` Denis Excoffier 2014-06-25 21:14 ` Corinna Vinschen 0 siblings, 1 reply; 33+ messages in thread From: Denis Excoffier @ 2014-06-25 20:44 UTC (permalink / raw) To: cygwin On 2014-06-25 12:15, Corinna Vinschen wrote: >> Stay tuned. I'm rewriting the LDAP access code to perform all critical >> LDAP calls in interruptible threads. The Windows LDAP calls don't >> provide any kind of synchronization, only timeouts. I hoped to get away >> with short timeouts but it seems I hoped in vain. >> >> So the next iteration of this code will not use any timeout other than >> the default LDAP network timeout of 2 minutes, but the calls will be >> interruptible by signals. >> > > No more artificial timeouts, but the LDAP calls will be interruptible by > a signal now. > > Also, if an error occurs during ad enumeration, getpwent/getgrent will > return NULL with errno set accordingly. > > Please test, I did. Again, i instrumented ldap.cc by replacing all debug_printf() calls with system_printf() because my /usr/bin/strace does not work. Again, i tested with ‘getent passwd > result’ and 'db_enum: all’. I got the following message: [ldap_init] getent 6024 cyg_ldap::connect_non_ssl: ldap_bind(xxxxxx.zzz) 0x51 and getent stops after the 376000 users in my own domain. No timeout occurred but the enumeration was stopped by LDAP_SERVER_DOWN (0x51) [the xxxxxx.zzz domain name has been edited here but it was completely new to me, never seen before]. Also, there was a large delay (more than 2 min, say at least 8 minutes) between the end of output and the end of getent. I got one single system_printf message (see above). More than that, i added system_printf("starting open in domain %W", domain) immediately at the beginning of cyg_ldap::open, and run ‘getent passwd’ now during one minute (wait 60s, then Control-C). I got 1080 ‘starting open in domain (null)’ messages on stderr and 1016 normal passwd entries on stdout. The discrepancy 1016 vs 1080 is ok because stdout was not properly flushed out. It seems that - domain is printed as ‘(null)’? Strange - there are as many open() calls as passwd entries in the output? Also strange - EIO (or equivalent) is produced for LDAP_SERVER_DOWN, it probably should be better if this were not the case? I suppose it will need more testing, but i’m currently unavailable for tests, by the way until Friday 08:00 UTC. Regards, Denis Excoffier. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-06-25 20:44 ` Denis Excoffier @ 2014-06-25 21:14 ` Corinna Vinschen 2014-07-03 20:57 ` Denis Excoffier 0 siblings, 1 reply; 33+ messages in thread From: Corinna Vinschen @ 2014-06-25 21:14 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 3867 bytes --] On Jun 25 22:44, Denis Excoffier wrote: > On 2014-06-25 12:15, Corinna Vinschen wrote: > >> Stay tuned. I'm rewriting the LDAP access code to perform all critical > >> LDAP calls in interruptible threads. The Windows LDAP calls don't > >> provide any kind of synchronization, only timeouts. I hoped to get away > >> with short timeouts but it seems I hoped in vain. > >> > >> So the next iteration of this code will not use any timeout other than > >> the default LDAP network timeout of 2 minutes, but the calls will be > >> interruptible by signals. > >> > > > > No more artificial timeouts, but the LDAP calls will be interruptible by > > a signal now. > > > > Also, if an error occurs during ad enumeration, getpwent/getgrent will > > return NULL with errno set accordingly. > > > > Please test, > I did. Again, i instrumented ldap.cc by replacing all debug_printf() calls > with system_printf() because my /usr/bin/strace does not work. Again, i > tested with ‘getent passwd > result’ and 'db_enum: all’. > > I got the following message: > [ldap_init] getent 6024 cyg_ldap::connect_non_ssl: ldap_bind(xxxxxx.zzz) 0x51 > and getent stops after the 376000 users in my own domain. No timeout occurred > but the enumeration was stopped by LDAP_SERVER_DOWN (0x51) [the xxxxxx.zzz > domain name has been edited here but it was completely new to me, never seen > before]. You asked for errors being propagated up the chain to the getpwent/getgrent calls and that's exactly what happens now. There are a lot of LDAP error codes. How is Cygwin supposed to handle every one of them? Do we need a list of ignorable and non-ignorable error codes? Alternatively this gets reverted and Cywin does *not* break the search if an error occurs, but instead skips this domain and starts enumerating the next domain, just as before? > Also, there was a large delay (more than 2 min, say at least 8 minutes) between > the end of output and the end of getent. I got one single system_printf > message (see above). I can't observe this. It needs debugging in your environment so I know which part of the source is responsible for this delay under what circumstances. (and I still think it's a crazy idea to enumerate 500K users) > More than that, i added system_printf("starting open in domain %W", domain) > immediately at the beginning of cyg_ldap::open, and run ‘getent passwd’ now during > one minute (wait 60s, then Control-C). I got 1080 ‘starting open in domain (null)’ > messages on stderr and 1016 normal passwd entries on stdout. The discrepancy > 1016 vs 1080 is ok because stdout was not properly flushed out. 60 seconds for 1016 user entries? That sounds incredibly slow. > It seems that > - domain is printed as ‘(null)’? Strange Not at all. This indicates the primary domain. > - there are as many open() calls as passwd entries in the output? The open function is called for every account, but that doesn't mean it really needs opening. That's what the early return is for. The code starts like this: int cyg_ldap::open (PCWSTR domain) { int ret = 0; /* Already open? */ if (lh) return 0; if ((ret = connect (domain)) != NO_ERROR) goto err; [...] Did you add the system_printf before the "/* Already open? */" comment, by any chance? > Also strange > - EIO (or equivalent) is produced for LDAP_SERVER_DOWN, it probably should be > better if this were not the case? See above. > I suppose it will need more testing, but i’m currently unavailable for tests, > by the way until Friday 08:00 UTC. No worries. Thanks for pulling this through. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-06-25 21:14 ` Corinna Vinschen @ 2014-07-03 20:57 ` Denis Excoffier 2014-07-07 11:07 ` Corinna Vinschen 0 siblings, 1 reply; 33+ messages in thread From: Denis Excoffier @ 2014-07-03 20:57 UTC (permalink / raw) To: cygwin On 2014-06-25 23:13 Corinna Vinschen wrote: > > You asked for errors being propagated up the chain to the > getpwent/getgrent calls and that's exactly what happens now. There are > a lot of LDAP error codes. How is Cygwin supposed to handle every one > of them? Do we need a list of ignorable and non-ignorable error codes? I don’t know. IMHO: - a server which is down can be ignored (unless explicitly requested) - a timeout, when some output has already been received, must be reported - all servers should be treated independently since they are independent For the time being, i have added LDAP_SERVER_DOWN in map_ldaperr_to_errno at the same place as LDAP_SUCCESS. > >> Also, there was a large delay (more than 2 min, say at least 8 minutes) between >> the end of output and the end of getent. I got one single system_printf >> message (see above). > > I can't observe this. It needs debugging in your environment so I know > which part of the source is responsible for this delay under what > circumstances. I forgot to test it again. I’ll do it soon. > >> More than that, i added system_printf("starting open in domain %W", domain) >> immediately at the beginning of cyg_ldap::open, and run ‘getent passwd’ now during >> one minute (wait 60s, then Control-C). I got 1080 ‘starting open in domain (null)’ >> messages on stderr and 1016 normal passwd entries on stdout. The discrepancy >> 1016 vs 1080 is ok because stdout was not properly flushed out. > > 60 seconds for 1016 user entries? That sounds incredibly slow. I’m pretty sure that this is due to the non-buffering of stderr. In fact, system_printf() is incredibly slow ;-) >> - there are as many open() calls as passwd entries in the output? > > The open function is called for every account, but that doesn't mean it > really needs opening. That's what the early return is for. The code > starts like this: > > [...] > > Did you add the system_printf before the "/* Already open? */" comment, > by any chance? You’re right. It was before. Now i have it after and there is only one such message for the primary domain. However, for the non-primary domains the result is the same: i get as many cyg_ldap::open()s as accounts. Even more strange, for all these open’s (except the first one) the domain variable is printed as (null). Perhaps something uncontrolled within pg_ent::enumerate_ad()? Simple suggestion, i was not able to understand the logic there. > > Corinna Denis. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-07-03 20:57 ` Denis Excoffier @ 2014-07-07 11:07 ` Corinna Vinschen 2014-07-08 19:34 ` Denis Excoffier 0 siblings, 1 reply; 33+ messages in thread From: Corinna Vinschen @ 2014-07-07 11:07 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 3286 bytes --] On Jul 3 22:56, Denis Excoffier wrote: > On 2014-06-25 23:13 Corinna Vinschen wrote: > > > > You asked for errors being propagated up the chain to the > > getpwent/getgrent calls and that's exactly what happens now. There are > > a lot of LDAP error codes. How is Cygwin supposed to handle every one > > of them? Do we need a list of ignorable and non-ignorable error codes? > I don’t know. IMHO: > - a server which is down can be ignored (unless explicitly requested) > - a timeout, when some output has already been received, must be reported > - all servers should be treated independently since they are independent > For the time being, i have added LDAP_SERVER_DOWN in map_ldaperr_to_errno > at the same place as LDAP_SUCCESS. I'm wondering if that's the right thing to do. It feels wrong to convert a valid error to LDAP_SUCCESS. Taking a step back, the only reason to ignore such an error would be, if trying to connect to a domain fails. If this error occurs somewhere in the middle, during enumerating a domain, it's a legit error. I changed pg_ent::enumerate_ad accordingly. > >> More than that, i added system_printf("starting open in domain %W", domain) > >> immediately at the beginning of cyg_ldap::open, and run ‘getent passwd’ now during > >> one minute (wait 60s, then Control-C). I got 1080 ‘starting open in domain (null)’ > >> messages on stderr and 1016 normal passwd entries on stdout. The discrepancy > >> 1016 vs 1080 is ok because stdout was not properly flushed out. > > > > 60 seconds for 1016 user entries? That sounds incredibly slow. > I’m pretty sure that this is due to the non-buffering > of stderr. In fact, system_printf() is incredibly slow ;-) Oh, right. I didn't realize the 60 secs are the time it takes while stracing. No worries here. > > The open function is called for every account, but that doesn't mean it > > really needs opening. That's what the early return is for. The code > > starts like this: > > [...] > > Did you add the system_printf before the "/* Already open? */" comment, > > by any chance? > You’re right. It was before. Now i have it after and there is only one > such message for the primary domain. > > However, for the non-primary domains the result is the same: i get as > many cyg_ldap::open()s as accounts. Even more strange, for all these open’s > (except the first one) the domain variable is printed as (null). Perhaps > something uncontrolled within pg_ent::enumerate_ad()? Simple suggestion, i > was not able to understand the logic there. I can't reproduce this. For enumerating a non-primary domain, I get exactly two calls to cyg_ldap::open which actually do a connect. The first call opens the domain for enumeration. The second call opens the primary domain (NULL) to fetch the POSIX offset value for the foreign domain (see my document explaining the POSIX offset stuff), unless the application or one of its parent processes already fetched the POSIX offset for this domain. I don't observer any further calls to connect in this scenario. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-07-07 11:07 ` Corinna Vinschen @ 2014-07-08 19:34 ` Denis Excoffier 2014-07-09 10:13 ` Corinna Vinschen 0 siblings, 1 reply; 33+ messages in thread From: Denis Excoffier @ 2014-07-08 19:34 UTC (permalink / raw) To: cygwin On 2014-07-07 13:07, Corinna Vinschen wrote: > > For enumerating a non-primary domain, I get exactly two calls to > cyg_ldap::open which actually do a connect. The first call opens the > domain for enumeration. The second call opens the primary domain (NULL) > to fetch the POSIX offset value for the foreign domain (see my document > explaining the POSIX offset stuff), unless the application or one of > its parent processes already fetched the POSIX offset for this domain. > > I don't observer any further calls to connect in this scenario. > > In your preliminary documentation (your message dated 2014-06-25, please correct "seet" in it), trustPosixOffset is "some arbitrary 32 bit value", ie including 0. In your code (fetch_posix_offset), td->PosixOffset is used to record the value and also (when 0) to record that the value has still not been fetched. I have encountered this case in real life. The domain admins have set the trustPosixOffset of the secondary domain to zero. This value is therefore never recorded and the cldap->open occurs again and again. Hope this helps. Regards, Denis Excoffier. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-07-08 19:34 ` Denis Excoffier @ 2014-07-09 10:13 ` Corinna Vinschen 2014-07-12 13:39 ` Denis Excoffier 0 siblings, 1 reply; 33+ messages in thread From: Corinna Vinschen @ 2014-07-09 10:13 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 1450 bytes --] On Jul 8 21:22, Denis Excoffier wrote: > > On 2014-07-07 13:07, Corinna Vinschen wrote: > > > > > For enumerating a non-primary domain, I get exactly two calls to > > cyg_ldap::open which actually do a connect. The first call opens the > > domain for enumeration. The second call opens the primary domain (NULL) > > to fetch the POSIX offset value for the foreign domain (see my document > > explaining the POSIX offset stuff), unless the application or one of > > its parent processes already fetched the POSIX offset for this domain. > > > > I don't observer any further calls to connect in this scenario. > > > > > In your preliminary documentation (your message dated 2014-06-25, please > correct "seet" in it), trustPosixOffset is "some arbitrary 32 bit value", > ie including 0. > > In your code (fetch_posix_offset), td->PosixOffset is used to record the > value and also (when 0) to record that the value has still not been > fetched. > > I have encountered this case in real life. The domain admins have set > the trustPosixOffset of the secondary domain to zero. This value is therefore > never recorded and the cldap->open occurs again and again. Ouch. Why on earth are admins doing this? There's no way to workaround this reliably. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-07-09 10:13 ` Corinna Vinschen @ 2014-07-12 13:39 ` Denis Excoffier 2014-07-14 9:51 ` Corinna Vinschen 0 siblings, 1 reply; 33+ messages in thread From: Denis Excoffier @ 2014-07-12 13:39 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 778 bytes --] On 2014-07-09 12:12 Corinna Vinschen wrote: >> >> I have encountered this case in real life. The domain admins have set >> the trustPosixOffset of the secondary domain to zero. This value is therefore >> never recorded and the cldap->open occurs again and again. > > Ouch. Why on earth are admins doing this? There's no way to > workaround this reliably. > Reliably i don’t know. I’ve modified uinfo.cc in order that the special value for td->PosixOffset is no longer 0. Taking into account that LDAP_SERVER_DOWN is now recognized, my ‘getent passwd’ executes gracefully in 40 minutes (instead of 60) and ‘getent group’ in 25 minutes (instead of 90). Also quicker is ‘mkpasswd -d secondary_domain’ of course. Patch attached. Regards, Denis Excoffier. [-- Attachment #2: posixoffset.patch --] [-- Type: application/octet-stream, Size: 1877 bytes --] diff -uNr cygwin-snapshot-20140709-1-o/winsup/cygwin/uinfo.cc cygwin-snapshot-20140709-1-p/winsup/cygwin/uinfo.cc --- cygwin-snapshot-20140709-1-o/winsup/cygwin/uinfo.cc 2014-07-09 14:10:50.000000000 +0200 +++ cygwin-snapshot-20140709-1-p/winsup/cygwin/uinfo.cc 2014-07-11 13:16:07.671916100 +0200 @@ -35,6 +35,8 @@ #include "ldap.h" #include "cygserver_pwdgrp.h" +#define CYG_LDAP_IMPROBABLE_POSIXOFFSET 1111111111 /* 0 would be too probable */ + /* Initialize the part of cygheap_user that does not depend on files. The information is used in shared.cc for the user shared. Final initialization occurs in uinfo_init */ @@ -853,8 +855,9 @@ tdom[idx].DomainSid = cmalloc_abort(HEAP_BUF, len); RtlCopySid (len, tdom[idx].DomainSid, td[idx].DomainSid); } - /* ...and set PosixOffset to 0. This */ - tdom[idx].PosixOffset = 0; + /* ...and set PosixOffset to CYG_LDAP_IMPROBABLE_POSIXOFFSET to mean + that the offset is still to be fetched */ + tdom[idx].PosixOffset = CYG_LDAP_IMPROBABLE_POSIXOFFSET; } NetApiBufferFree (td); tdom_count = tdom_cnt; @@ -1139,7 +1142,7 @@ { uint32_t id_val; - if (!td->PosixOffset && !(td->Flags & DS_DOMAIN_PRIMARY) && td->DomainSid) + if (td->PosixOffset == CYG_LDAP_IMPROBABLE_POSIXOFFSET && !(td->Flags & DS_DOMAIN_PRIMARY) && td->DomainSid) { if (cldap->open (NULL) != NO_ERROR) { @@ -1151,13 +1154,14 @@ } else id_val = cldap->fetch_posix_offset_for_domain (td->DnsDomainName); - if (id_val) + if (id_val != CYG_LDAP_IMPROBABLE_POSIXOFFSET) { td->PosixOffset = id_val; if (id_val < cygheap->dom.lowest_tdo_posix_offset) cygheap->dom.lowest_tdo_posix_offset = id_val; } + debug_printf ("computing PosixOffset for domain %W, found %u", td->DnsDomainName, td->PosixOffset); } return td->PosixOffset; } [-- Attachment #3: Type: text/plain, Size: 218 bytes --] -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-07-12 13:39 ` Denis Excoffier @ 2014-07-14 9:51 ` Corinna Vinschen 2014-07-14 13:48 ` Corinna Vinschen 0 siblings, 1 reply; 33+ messages in thread From: Corinna Vinschen @ 2014-07-14 9:51 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 1904 bytes --] On Jul 12 15:39, Denis Excoffier wrote: > On 2014-07-09 12:12 Corinna Vinschen wrote: > >> > >> I have encountered this case in real life. The domain admins have set > >> the trustPosixOffset of the secondary domain to zero. This value is therefore > >> never recorded and the cldap->open occurs again and again. > > > > Ouch. Why on earth are admins doing this? There's no way to > > workaround this reliably. > > > Reliably i don’t know. I’ve modified uinfo.cc in order that the special value > for td->PosixOffset is no longer 0. Taking into account that LDAP_SERVER_DOWN > is now recognized, my ‘getent passwd’ executes gracefully in 40 minutes > (instead of 60) and ‘getent group’ in 25 minutes (instead of 90). Also quicker > is ‘mkpasswd -d secondary_domain’ of course. Patch attached. That won't work. It works around your immediate problem by defining a non-0 start value, no doubt about that, but it doesn't fix the underlying problem. A POSIX offset of 0 is bad. If other trusted domains have no functional POSIX offset value, but are set to 0 instead, they won't have different UID values for accounts of different domains. Two users from different domains, both with RID 1000 will both have UID 1000 in Cygwin. Also, the lower UID numbers are reserved for special accounts. There is no guarantee that there won't be a collision at some point of the 32 bit UID spectrum, but a POSIX offset of 0 will almost guarantee the collision. There are two ways to workaround that. - The better solution is to inform your IT of the problem. - The not so well one is to enhance /etc/nsswitch.conf to allow to define POSIX offsets for domains indepedent of the AD setting. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-07-14 9:51 ` Corinna Vinschen @ 2014-07-14 13:48 ` Corinna Vinschen 2014-07-15 16:29 ` Denis Excoffier 0 siblings, 1 reply; 33+ messages in thread From: Corinna Vinschen @ 2014-07-14 13:48 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 2413 bytes --] On Jul 14 11:51, Corinna Vinschen wrote: > On Jul 12 15:39, Denis Excoffier wrote: > > On 2014-07-09 12:12 Corinna Vinschen wrote: > > >> > > >> I have encountered this case in real life. The domain admins have set > > >> the trustPosixOffset of the secondary domain to zero. This value is therefore > > >> never recorded and the cldap->open occurs again and again. > > > > > > Ouch. Why on earth are admins doing this? There's no way to > > > workaround this reliably. > > > > > Reliably i don’t know. I’ve modified uinfo.cc in order that the special value > > for td->PosixOffset is no longer 0. Taking into account that LDAP_SERVER_DOWN > > is now recognized, my ‘getent passwd’ executes gracefully in 40 minutes > > (instead of 60) and ‘getent group’ in 25 minutes (instead of 90). Also quicker > > is ‘mkpasswd -d secondary_domain’ of course. Patch attached. > > That won't work. It works around your immediate problem by defining > a non-0 start value, no doubt about that, but it doesn't fix the > underlying problem. > > A POSIX offset of 0 is bad. If other trusted domains have no functional > POSIX offset value, but are set to 0 instead, they won't have different > UID values for accounts of different domains. Two users from different > domains, both with RID 1000 will both have UID 1000 in Cygwin. Also, > the lower UID numbers are reserved for special accounts. > > There is no guarantee that there won't be a collision at some point of > the 32 bit UID spectrum, but a POSIX offset of 0 will almost guarantee > the collision. > > There are two ways to workaround that. > > - The better solution is to inform your IT of the problem. > > - The not so well one is to enhance /etc/nsswitch.conf to allow to > define POSIX offsets for domains indepedent of the AD setting. I tried the third solution for the time being, which is, generating the fake POSIX offset a bit differently. Fake offsets are a bit dangerous in that there's no guarantee that you get a stable mapping between SID and UID/GID, but it's *hopefully* a border situation we're trying to workaround. Please give the latest developer snashot from http://cygwin.com/snapshots/ a try. Thanks, Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-07-14 13:48 ` Corinna Vinschen @ 2014-07-15 16:29 ` Denis Excoffier 2014-07-15 18:20 ` Andrey Repin 2014-07-16 13:52 ` Corinna Vinschen 0 siblings, 2 replies; 33+ messages in thread From: Denis Excoffier @ 2014-07-15 16:29 UTC (permalink / raw) To: cygwin On 2014-07-14 15:48 Corinna Vinschen wrote: > On Jul 14 11:51, Corinna Vinschen wrote: >> On Jul 12 15:39, Denis Excoffier wrote: >>> On 2014-07-09 12:12 Corinna Vinschen wrote: >>>>> >>>>> I have encountered this case in real life. The domain admins have set >>>>> the trustPosixOffset of the secondary domain to zero. This value is therefore >>>>> never recorded and the cldap->open occurs again and again. >>>> >>>> Ouch. Why on earth are admins doing this? There's no way to >>>> workaround this reliably. >>>> >>> Reliably i don’t know. I’ve modified uinfo.cc in order that the special value >>> for td->PosixOffset is no longer 0. Taking into account that LDAP_SERVER_DOWN >>> is now recognized, my ‘getent passwd’ executes gracefully in 40 minutes >>> (instead of 60) and ‘getent group’ in 25 minutes (instead of 90). Also quicker >>> is ‘mkpasswd -d secondary_domain’ of course. Patch attached. >> >> That won't work. It works around your immediate problem by defining >> a non-0 start value, no doubt about that, but it doesn't fix the >> underlying problem. >> >> A POSIX offset of 0 is bad. If other trusted domains have no functional >> POSIX offset value, but are set to 0 instead, they won't have different >> UID values for accounts of different domains. Two users from different >> domains, both with RID 1000 will both have UID 1000 in Cygwin. Also, >> the lower UID numbers are reserved for special accounts. >> >> There is no guarantee that there won't be a collision at some point of >> the 32 bit UID spectrum, but a POSIX offset of 0 will almost guarantee >> the collision. >> >> There are two ways to workaround that. >> >> - The better solution is to inform your IT of the problem. >> >> - The not so well one is to enhance /etc/nsswitch.conf to allow to >> define POSIX offsets for domains indepedent of the AD setting. > > I tried the third solution for the time being, which is, generating the > fake POSIX offset a bit differently. Fake offsets are a bit dangerous > in that there's no guarantee that you get a stable mapping between SID > and UID/GID, but it's *hopefully* a border situation we're trying to > workaround. Please give the latest developer snashot from > http://cygwin.com/snapshots/ a try. Tried and it works as expected. However there is a design bug. Suppose you have a SID from a non-primary domain (with PosixOffset=0). When you enumerate, you get a PosixOffset that takes into account the previously encountered secondary domains with PosixOffset=0, say you get UNIX_POSIX_OFFSET-3*0x00800000 But you can also jump directly to the non-primary domain of this SID, eg by ‘getent passwd SID’. In this case you get UNIX_POSIX_OFFSET-0x00800000. In fact, real code is a little bit more complex, but you get the point: ‘getent passwd’ and ‘getent passwd SID’ will not give the same UID for a given SID, the AD remaining unmodified. Independently, i’m still not sure we have to workaround IT "madness" at all. First, IT people might set PosixOffset to 1 for each domain and you cannot catch this kind of alternate madness. Also, be sure that if some user someday suffers from a duplicate UID situation, this will be reported to them and hopefully addressed (or not because this might be expected), but most probably for a single domain. We have to live with PosixOffset=0. Yet, under the assumption that PosixOffsets are not modified by Cygwin, previous uinfo.cc (snapshot dated 20140709) is not so efficient when PosixOffset=0 (eg too many connect’s), and i think my patch makes a better Cygwin than with no patch. Probably it can also be improved to remove the special value. Regards, Denis Excoffier. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-07-15 16:29 ` Denis Excoffier @ 2014-07-15 18:20 ` Andrey Repin 2014-07-16 13:52 ` Corinna Vinschen 1 sibling, 0 replies; 33+ messages in thread From: Andrey Repin @ 2014-07-15 18:20 UTC (permalink / raw) To: Denis Excoffier, cygwin Greetings, Denis Excoffier! >>> A POSIX offset of 0 is bad. If other trusted domains have no functional >>> POSIX offset value, but are set to 0 instead, they won't have different >>> UID values for accounts of different domains. Two users from different >>> domains, both with RID 1000 will both have UID 1000 in Cygwin. Also, >>> the lower UID numbers are reserved for special accounts. >>> >>> There is no guarantee that there won't be a collision at some point of >>> the 32 bit UID spectrum, but a POSIX offset of 0 will almost guarantee >>> the collision. > Independently, i’m still not sure we have to workaround IT "madness" at all. First, IT > people might set PosixOffset to 1 for each domain and you cannot catch this kind > of alternate madness. Also, be sure that if some user someday suffers from a duplicate > UID situation, this will be reported to them and hopefully addressed (or not because > this might be expected), but most probably for a single domain. We have to live with > PosixOffset=0. I'd say, setting up your AD with zero offset is as bad, as using 192.168.0.1/24 network (or any other well known range) for VPN connections. I don't think this is a situation that should be attempted to fix from client side. What we really need here is a comprehensive explanation of the issue and a suggested way to remedy it at the root. -- WBR, Andrey Repin (anrdaemon@yandex.ru) 15.07.2014, <22:08> Sorry for my terrible english... ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-07-15 16:29 ` Denis Excoffier 2014-07-15 18:20 ` Andrey Repin @ 2014-07-16 13:52 ` Corinna Vinschen 2014-07-17 6:33 ` Denis Excoffier 1 sibling, 1 reply; 33+ messages in thread From: Corinna Vinschen @ 2014-07-16 13:52 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 5053 bytes --] On Jul 15 18:29, Denis Excoffier wrote: > On 2014-07-14 15:48 Corinna Vinschen wrote: > > On Jul 14 11:51, Corinna Vinschen wrote: > >> On Jul 12 15:39, Denis Excoffier wrote: > >>> On 2014-07-09 12:12 Corinna Vinschen wrote: > >>>>> > >>>>> I have encountered this case in real life. The domain admins have set > >>>>> the trustPosixOffset of the secondary domain to zero. This value is therefore > >>>>> never recorded and the cldap->open occurs again and again. > >>>> > >>>> Ouch. Why on earth are admins doing this? There's no way to > >>>> workaround this reliably. > >>>> > >>> Reliably i don’t know. I’ve modified uinfo.cc in order that the special value > >>> for td->PosixOffset is no longer 0. Taking into account that LDAP_SERVER_DOWN > >>> is now recognized, my ‘getent passwd’ executes gracefully in 40 minutes > >>> (instead of 60) and ‘getent group’ in 25 minutes (instead of 90). Also quicker > >>> is ‘mkpasswd -d secondary_domain’ of course. Patch attached. > >> > >> That won't work. It works around your immediate problem by defining > >> a non-0 start value, no doubt about that, but it doesn't fix the > >> underlying problem. > >> > >> A POSIX offset of 0 is bad. If other trusted domains have no functional > >> POSIX offset value, but are set to 0 instead, they won't have different > >> UID values for accounts of different domains. Two users from different > >> domains, both with RID 1000 will both have UID 1000 in Cygwin. Also, > >> the lower UID numbers are reserved for special accounts. > >> > >> There is no guarantee that there won't be a collision at some point of > >> the 32 bit UID spectrum, but a POSIX offset of 0 will almost guarantee > >> the collision. > >> > >> There are two ways to workaround that. > >> > >> - The better solution is to inform your IT of the problem. > >> > >> - The not so well one is to enhance /etc/nsswitch.conf to allow to > >> define POSIX offsets for domains indepedent of the AD setting. > > > > I tried the third solution for the time being, which is, generating the > > fake POSIX offset a bit differently. Fake offsets are a bit dangerous > > in that there's no guarantee that you get a stable mapping between SID > > and UID/GID, but it's *hopefully* a border situation we're trying to > > workaround. Please give the latest developer snashot from > > http://cygwin.com/snapshots/ a try. > Tried and it works as expected. However there is a design bug. Suppose you > have a SID from a non-primary domain (with PosixOffset=0). When you enumerate, > you get a PosixOffset that takes into account the previously encountered > secondary domains with PosixOffset=0, say you get UNIX_POSIX_OFFSET-3*0x00800000 That was, actually, not a design bug but a deliberate decision. In some way we have to work with accounts from a badly defined domain, but for those getent isn't the problem. You don't need to enumerate all domains except in very rare cases. What should work, though, is to ls -l files and see the correct owner of a file and to chmod the files. For everything else I would opt for kicking your IT. Keep in mind that AD chooses more or less sane POSIX offsets for trusted domains by default. Setting it to 0 is an entirely gratuitious act by the admin. A service desk ticket might be helpful. > Independently, i’m still not sure we have to workaround IT "madness" at all. First, IT > people might set PosixOffset to 1 for each domain and you cannot catch this kind > of alternate madness. Also, be sure that if some user someday suffers from a duplicate Yes, you can. IT has to know there's software running which needs sane POSIX offset settings. Alternatively we can still implement some other workaround at one point. It occured to me that there's another way to do that. The problem you're mentioning above could be alleviated if the first Cygwin process in a process tree fetches all POSIX offsets of all trusted domains right at the start, rather than fetching the POSIX offsets only on demand by whatever process needs it. This would slow down the startup of the first process slightly (one LDAP request per trusted domain, but only asking your primary DC), but this would have two advantages: - After fetching all POSIX offsets, we could filter out all POSIX offsets which don't make sense. These would be set using the fake offset setting mechanism. "No sense" would include offsets < 0x110000 or offsets > 0xff000000. If the first process in the tree - The UID/GID values would be stable throughout the process tree. - The UID/GID values would be stable systemwide when utilizing cygserver. That's a bit of work, but Cygwin 1.7.31 will still come without this AD integration code anyway, so we still have time to turn everything upside down. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-07-16 13:52 ` Corinna Vinschen @ 2014-07-17 6:33 ` Denis Excoffier 2014-07-18 19:18 ` Corinna Vinschen 0 siblings, 1 reply; 33+ messages in thread From: Denis Excoffier @ 2014-07-17 6:33 UTC (permalink / raw) To: cygwin On 2014-07-16 15:51, Corinna Vinschen wrote: > It occured to me that there's another way to do that. The problem > you're mentioning above could be alleviated if the first Cygwin process > in a process tree fetches all POSIX offsets of all trusted domains right > at the start, rather than fetching the POSIX offsets only on demand by > whatever process needs it. This would slow down the startup of the > first process slightly (one LDAP request per trusted domain, but only > asking your primary DC), but this would have two advantages: > > - After fetching all POSIX offsets, we could filter out all POSIX > offsets which don't make sense. These would be set using the fake > offset setting mechanism. "No sense" would include offsets < 0x110000 > or offsets > 0xff000000. If the first process in the tree > > - The UID/GID values would be stable throughout the process tree. > > - The UID/GID values would be stable systemwide when utilizing cygserver. > > That's a bit of work, but Cygwin 1.7.31 will still come without this > AD integration code anyway, so we still have time to turn everything > upside down. I buy this of course, but i’m still not convinced that we have to workaround. After all, since i don’t care the other domains in my daily work, i’m not affected at all. Most of the users will never be affected i suppose. And if Cygwin happens to circumvent a null posixOffset by providing its own, there will be even less chances for collisions and for collisions being reported. But we can consider the other way and for that i will use a comparison: using special characters (like ‘\n’) gratuitously in the middle of filenames is usually considered as a bad practice, but always possible by doing ‘char *filename = "a\nb"; fopen(filename, "w")’. Now, once this file is created, you can use ‘ls’ in the folder. Do you think ‘ls' should respect user decision and display the raw \n in its output or try to workaround by using some substitution character (like ‘?’) in order not to wrap at unexpected locations? The answer is that ‘ls’ substitutes by default, but also provides a full group of related options to change this behavior (--quoting-style=WORD, --hide-control-chars). Of course, adding options (eg in nsswitch.conf) to orientate the assignment of posixOffsets to various substitutes would be useless. Even assigning the null posixOffsets to non-null values, i’m not convinced of. Denis Excoffier. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-07-17 6:33 ` Denis Excoffier @ 2014-07-18 19:18 ` Corinna Vinschen 2014-07-28 9:21 ` Corinna Vinschen 0 siblings, 1 reply; 33+ messages in thread From: Corinna Vinschen @ 2014-07-18 19:18 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 3463 bytes --] On Jul 17 08:33, Denis Excoffier wrote: > On 2014-07-16 15:51, Corinna Vinschen wrote: > > It occured to me that there's another way to do that. The problem > > you're mentioning above could be alleviated if the first Cygwin process > > in a process tree fetches all POSIX offsets of all trusted domains right > > at the start, rather than fetching the POSIX offsets only on demand by > > whatever process needs it. This would slow down the startup of the > > first process slightly (one LDAP request per trusted domain, but only > > asking your primary DC), but this would have two advantages: > > > > - After fetching all POSIX offsets, we could filter out all POSIX > > offsets which don't make sense. These would be set using the fake > > offset setting mechanism. "No sense" would include offsets < 0x110000 > > or offsets > 0xff000000. If the first process in the tree > > > > - The UID/GID values would be stable throughout the process tree. > > > > - The UID/GID values would be stable systemwide when utilizing cygserver. > > > > That's a bit of work, but Cygwin 1.7.31 will still come without this > > AD integration code anyway, so we still have time to turn everything > > upside down. > I buy this of course, but i’m still not convinced that we have to > workaround. After all, since i don’t care the other domains in my daily > work, i’m not affected at all. Most of the users will never be affected > i suppose. And if Cygwin happens to circumvent a null posixOffset by > providing its own, there will be even less chances for collisions and > for collisions being reported. > > But we can consider the other way and for that i will use a comparison: > using special characters (like ‘\n’) gratuitously in the middle of filenames > is usually considered as a bad practice, but always possible by > doing ‘char *filename = "a\nb"; fopen(filename, "w")’. Now, once this > file is created, you can use ‘ls’ in the folder. Do you think ‘ls' > should respect user decision and display the raw \n in its output or > try to workaround by using some substitution character (like ‘?’) in order > not to wrap at unexpected locations? The answer is that ‘ls’ substitutes > by default, but also provides a full group of related options to change this > behavior (--quoting-style=WORD, --hide-control-chars). > > Of course, adding options (eg in nsswitch.conf) to orientate the assignment > of posixOffsets to various substitutes would be useless. Even assigning > the null posixOffsets to non-null values, i’m not convinced of. We really should do that to avoid collisions with system accounts, IMHO. But maybe we should handle it as a border case of a border case, and reliably. Rather than using the default fake mechanism, what if we use default offsets for the two cases: Case 1: posix offset is < 0x100000 ==> Enforce posix 0ffset 0xfe80000 Case 2: posix offset can't be fetched (this points to a local user having no access to this kind of domain information) ==> Enforce posix offset 0xfe000000. This would result in potential collisions in very rare border cases, but it would result in reliable mappings throught all processes. And, the complexity would be quite small. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-07-18 19:18 ` Corinna Vinschen @ 2014-07-28 9:21 ` Corinna Vinschen 2014-07-28 18:51 ` Denis Excoffier 0 siblings, 1 reply; 33+ messages in thread From: Corinna Vinschen @ 2014-07-28 9:21 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 3710 bytes --] Ping? On Jul 18 21:18, Corinna Vinschen wrote: > On Jul 17 08:33, Denis Excoffier wrote: > > On 2014-07-16 15:51, Corinna Vinschen wrote: > > > It occured to me that there's another way to do that. The problem > > > you're mentioning above could be alleviated if the first Cygwin process > > > in a process tree fetches all POSIX offsets of all trusted domains right > > > at the start, rather than fetching the POSIX offsets only on demand by > > > whatever process needs it. This would slow down the startup of the > > > first process slightly (one LDAP request per trusted domain, but only > > > asking your primary DC), but this would have two advantages: > > > > > > - After fetching all POSIX offsets, we could filter out all POSIX > > > offsets which don't make sense. These would be set using the fake > > > offset setting mechanism. "No sense" would include offsets < 0x110000 > > > or offsets > 0xff000000. If the first process in the tree > > > > > > - The UID/GID values would be stable throughout the process tree. > > > > > > - The UID/GID values would be stable systemwide when utilizing cygserver. > > > > > > That's a bit of work, but Cygwin 1.7.31 will still come without this > > > AD integration code anyway, so we still have time to turn everything > > > upside down. > > I buy this of course, but i’m still not convinced that we have to > > workaround. After all, since i don’t care the other domains in my daily > > work, i’m not affected at all. Most of the users will never be affected > > i suppose. And if Cygwin happens to circumvent a null posixOffset by > > providing its own, there will be even less chances for collisions and > > for collisions being reported. > > > > But we can consider the other way and for that i will use a comparison: > > using special characters (like ‘\n’) gratuitously in the middle of filenames > > is usually considered as a bad practice, but always possible by > > doing ‘char *filename = "a\nb"; fopen(filename, "w")’. Now, once this > > file is created, you can use ‘ls’ in the folder. Do you think ‘ls' > > should respect user decision and display the raw \n in its output or > > try to workaround by using some substitution character (like ‘?’) in order > > not to wrap at unexpected locations? The answer is that ‘ls’ substitutes > > by default, but also provides a full group of related options to change this > > behavior (--quoting-style=WORD, --hide-control-chars). > > > > Of course, adding options (eg in nsswitch.conf) to orientate the assignment > > of posixOffsets to various substitutes would be useless. Even assigning > > the null posixOffsets to non-null values, i’m not convinced of. > > We really should do that to avoid collisions with system accounts, IMHO. > > But maybe we should handle it as a border case of a border case, and > reliably. Rather than using the default fake mechanism, what if > we use default offsets for the two cases: > > Case 1: posix offset is < 0x100000 ==> Enforce posix 0ffset 0xfe80000 > Case 2: posix offset can't be fetched (this points to a local user > having no access to this kind of domain information) > ==> Enforce posix offset 0xfe000000. > > This would result in potential collisions in very rare border cases, > but it would result in reliable mappings throught all processes. > And, the complexity would be quite small. any feedback on this one? Shall I create a snapshot with a matching patch? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: timeout in LDAP access 2014-07-28 9:21 ` Corinna Vinschen @ 2014-07-28 18:51 ` Denis Excoffier 2014-07-29 9:07 ` Please test AD integration changes, documentation attached (was Re: timeout in LDAP access) Corinna Vinschen 0 siblings, 1 reply; 33+ messages in thread From: Denis Excoffier @ 2014-07-28 18:51 UTC (permalink / raw) To: cygwin On 2014-07-28 11:21, Corinna Vinschen wrote: > Ping? > > On Jul 18 21:18, Corinna Vinschen wrote: >> >> We really should do that to avoid collisions with system accounts, IMHO. >> >> But maybe we should handle it as a border case of a border case, and >> reliably. Rather than using the default fake mechanism, what if >> we use default offsets for the two cases: >> >> Case 1: posix offset is < 0x100000 ==> Enforce posix 0ffset 0xfe80000 >> Case 2: posix offset can't be fetched (this points to a local user >> having no access to this kind of domain information) >> ==> Enforce posix offset 0xfe000000. >> >> This would result in potential collisions in very rare border cases, >> but it would result in reliable mappings throught all processes. >> And, the complexity would be quite small. > > any feedback on this one? Shall I create a snapshot with a matching > patch? I have nothing to add except that i am a great fan of cygwin snapshots in general, and i suppose that if several posix offsets are set to 0, it is a minor problem if all of them get replaced by the same 0xfe80000. Regards, Denis Excoffier. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 33+ messages in thread
* Please test AD integration changes, documentation attached (was Re: timeout in LDAP access) 2014-07-28 18:51 ` Denis Excoffier @ 2014-07-29 9:07 ` Corinna Vinschen 0 siblings, 0 replies; 33+ messages in thread From: Corinna Vinschen @ 2014-07-29 9:07 UTC (permalink / raw) To: cygwin [-- Attachment #1.1: Type: text/plain, Size: 1880 bytes --] On Jul 28 20:51, Denis Excoffier wrote: > On 2014-07-28 11:21, Corinna Vinschen wrote: > > Ping? > > > > On Jul 18 21:18, Corinna Vinschen wrote: > >> > >> We really should do that to avoid collisions with system accounts, IMHO. > >> > >> But maybe we should handle it as a border case of a border case, and > >> reliably. Rather than using the default fake mechanism, what if > >> we use default offsets for the two cases: > >> > >> Case 1: posix offset is < 0x100000 ==> Enforce posix 0ffset 0xfe80000 > >> Case 2: posix offset can't be fetched (this points to a local user > >> having no access to this kind of domain information) > >> ==> Enforce posix offset 0xfe000000. > >> > >> This would result in potential collisions in very rare border cases, > >> but it would result in reliable mappings throught all processes. > >> And, the complexity would be quite small. > > > > any feedback on this one? Shall I create a snapshot with a matching > > patch? > I have nothing to add except that i am a great fan of cygwin snapshots in > general, and i suppose that if several posix offsets are set to 0, it is > a minor problem if all of them get replaced by the same 0xfe80000. I chose even more unlikely offsets 0xfea00000 and 0xfe500000, but otherwise, the 2014-07-29 snapshot I just uploaded implements the above method, see http://cygwin.com/snapshots/ I also described this briefly in my preliminary documentation (attached), which, I fear, it's time to merge into the official docs. I'm inclined to create a new official Cygwin version with the AD integration changes pretty soon. There seem to be no other way to get more feedback on these changes :| Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #1.2: pwdgrp-doc --] [-- Type: text/plain, Size: 32391 bytes --] ======= History ======= For as long as Cygwin has existed, it has stored user and group information in /etc/passwd and /etc/group files. Under the assumption that these files would never be too large, the first process in a process tree, as well as every execing process within the tree would parse them into structures in memory. Thus every Cygwin process would contain an expanded copy of the full information from /etc/passwd and /etc/group. This approach has a few downsides. One of them is that the idea to have always small files is flawed. Another one is that reading the entire file is most of the time entirely useless, since most processes only need information on their own user and the primary group. Last but not least, the passwd and group files have to be maintained separately from the already existing Windows user databases, the local SAM and Active Directory. On the other hand, we have to have a mapping between Windows SIDs and POSIX uid/gid values (see [1]), so we rely on some mechanism to convert SIDs to uid/gid values and vice versa. Microsoft "Services for UNIX" (SFU) (which are unfortunately deprecated since Windows 8/Server 2012) never used passwd/group files. Rather, SFU used a fixed, computational mapping between SIDs and POSIX uid/gid. It allows to generate uid/gid values from SIDs and vice versa. The mechanism is documented, albeit in a confusing way and spread over multiple MSDN articles. The Cygwin approach clones the mapping, with just tiny differences for backward compatibility. ================= How does it work? ================= The following description assumes you're comfortable with the concept of Windows SIDs and RIDs. For a brief introduction, please read [1]. Cygwin's new mapping between SIDs and uid/gid values works in two ways. - Read /etc/passwd and /etc/group files, like before, mainly for backward compatibility. - If no files are present, or if an entry is missing in the files, ask Windows. At least, that's the default behaviour now. It will be configurable using a file /etc/nsswitch.conf, which is discussed in a later section. Let's explore the default for now. If files are present, they will be scanned on demand as soon as a mapping from SIDs to uid/gid or account names is required. The new mechanism will never read the entire file into memory, but only scan for the requested entry and cache this one in memory[2]. If no entry is found, or no passwd or group file was present, Cygwin will ask the OS. Note: If the first process in a Cygwin process tree determines that no /etc/passwd or /etc/group file is present, no other process in the entire process tree will try to read the files later on. This is done for self-preservation. It's rather bad if the uid or gid of a user changes during the lifetime of a process tree. For the same reason, if you delete the /etc/passwd or /etc/group file, this will be ignored. The passwd and group records read from the files will persist in memory until either a new /etc/passwd or /etc/group files is created, or you exit all processes in the current process tree. See the note in the section on /etc/nsswitch.conf for some comprehensive examples. So if we've drawn a blank reading the files, we're going to ask the OS. First thing, we ask the local machine for the SID or the username. The OS functions LookupAccountSid and LookupAccountName[3] are pretty intelligent. They have all the stuff built in to ask for any account of the local machine, the Active Directory domain of the machine, the Global Catalog of the forest of the domain, as well as any trusted domain of our forest for the information. One OS call and we're practically done... Except, the calls only return the mapping between SID, account name and the account's domain. We don't have a mapping to POSIX uid/gid and we're missing information on the user's home dir and login shell. Let's discuss the SID<=>uid/gid mapping first. Here's how it works. - Well-known SIDs in the NT_AUTHORITY domain of the S-1-5-RID type, or aliases of the S-1-5-32-RID type are mapped to the uid/gid value RID[4]. For an overview of well-known SIDs, see [5]. Examples: "SYSTEM" S-1-5-18 <=> uid/gid: 18 "Users" S-1-5-32-545 <=> uid/gid: 545 - Other well-known SIDs in the NT_AUTHORITY domain (S-1-5-X-RID): S-1-5-X-RID <=> uid/gid: 0x1000 * X + RID Example: "NTLM Authentication" S-1-5-64-10 <=> uid/gid: 0x4000A == 262154 - Other well-known SIDs: S-1-X-Y <=> uid/gid: 0x10000 + 0x100 * X + Y Example: "LOCAL" S-1-2-0 <=> uid/gid: 0x10200 == 66048 "Creator Group" S-1-3-1 <=> uid/gid: 0x10301 == 66305 - Logon SIDs: The own LogonSid is converted to the fixed uid 0xfff == 4095 and named "CurrentSession". Any other LogonSid is converted to the fixed uid 0xffe == 4094 and named "OtherSession". - Mandatory Labels: S-1-16-RID <=> uid/gid: 0x60000 + RID Example: "Medium Mandatory Level" S-1-16-8192 <=> uid/gid: 0x62000 == 401408 - Accounts from the local machine's user DB (SAM): S-1-5-21-X-Y-Z-RID <=> 0x30000 + RID Example: "Administrator" S-1-5-21-X-Y-Z-500 <=> uid/gid: 0x301f4 == 197108 - Accounts from the machine's primary domain: S-1-5-21-X-Y-Z-RID <=> 0x100000 + RID Example: "Domain Users" S-1-5-21-X-Y-Z-513 <=> 0x100201 == 1049089 - Accounts from a trusted domain of the machine's primary domain: S-1-5-21-X-Y-Z-RID <=> trustPosixOffset(domain) + RID "trustPosixOffset"? This needs a bit of explanation. This value exists in Windows domains already since before Active Directory days. What happens is this. If you create a domain trust between two domains, a trustedDomain entry will be added to both databases. It describes how *this* domain trusts the *other* domain. One attribute of a trust is a 32 bit value called "trustPosixOffset" For each new trust, trustPosixOffset will get some automatic value. In recent AD domain implementations, the first trusted domain will get trustPosixOffset set to 0x80000000. Following domains will get lower values. Unfortunately the domain admins are allowed to set the trustPosixOffset value for each trusted domain to some arbitrary 32 bit value, no matter what the other trustPosixOffsets are seet to, thus allowing any kind of collisions between the trustPosixOffsets of domains. That's not exactly helpful, but as the user of this value, we have to *trust* the domain admins to set trustPosixOffset to sensible values, or to keep it at the system chosen values. So, for the first (or only) trusted domain of your domain, the automatic offset is 0x80000000. An example for a user of that trusted domain is: S-1-5-21-X-Y-Z-1234 <=> uid/gid 0x800004d2 == 2147484882 There's one problem with this approach. Assuming you're running in the context of a local SAM user on a domain member machine. Local users don't have the right to fetch this kind of domain information from the DC, they'll get permission denied. In this case Cygwin will fake a sensible trustPosixOffset value. Another problem is if the AD administrators chose an unreasonable small POSIX offset value. Anything below the hexadecimal value 0x100000 (the POSIX offset of the primary domain) is bound to produce collisions with system accounts as well as local accounts. The right thing to do in this case is to notify your administrator of the problem and to ask for moving the offset to a more reasonable value. However, to reduce the probability for collisions, Cygwin overrides this offset with a sensible fixed replacement offset. - Local accounts from another machine in the network: There's no SID<=>uid/gid mapping implemented for this case. The problem is, there's no way to generate a bijective mapping. There's no central place which keeps an analogue value of the trustPosixOffset, and there's the additional problem that the LookupAccountSid and LookupAccountName functions cannnot resolve the SIDs, unless they know the name of the machine this SID comes from. And even then it will probably suffer a "Permission denied" when trying to ask the machine for its local account. SFU just prints the account RID in this case, Cygwin maps the account to the fake accounts "Unknown+User"/"Unknown+Group" with uid/gid -1. Now we have a semi-bijective mapping between SIDs and POSIX uid/gid values, but, given that we have potentially users and groups in different domains having the same name, how do we uniquely differ between them by name? Well, we can do that by making their names unique in a per-machine way. Dependent on the domain membership of the account, and dependent of the machine being a domain member or not, the user and group names will be generated using a domain prefix and a separator character between domain and account name. The default separator character is the plus sign, '+', as in SFU. - Well-known SIDs will have the separator character prepended: "+SYSTEM", "+LOCAL", "+Medium Mandatory Level", ... - If the machine is no domain member machine, only local accounts can be resolved into names, so for ease of use, just the account names are used as Cygwin user/group names: "corinna", "bigfoot", "None", ... - If the machine is a domain member machine, all accounts from the primary domain of the machine are mapped to Cygwin names without domain prefix: "corinna", "bigfoot", "Domain Users", ... while accounts from other domains are prepended by their domain: "DOMAIN1+corinna", "DOMAIN2+bigfoot", "DOMAIN3+Domain Users", ... - Local machine accounts of a domain member machine get a Cygwin user name the same way as accounts from another domain: The local machine name gets prepended: "MYMACHINE+corinna", "MYMACHINE+bigfoot", "MYMACHINE+None", ... - If LookupAccountSid fails, Cygwin checks the accounts against the known trusted domains. If the account is from one of the trusted domains, an artificial account name is created. It consists of the domain name, and a special name created from the account RID: "MY_DOM+User(1234)", "MY_DOM+Group(5678)" Otherwise we know nothing about this SID, so it will be mapped to the fake accounts "Unknown+User"/"Unknown+Group" with uid/gid -1. ======= Caching ======= The information fetched from file or the Windows account database is cached by the process. The cached information is inherited by child processes. While usually working fine, this has some drawbacks. Consider a shell calling `id'. `id' fetches all group information from the current token and caches them. Unfortunately `id' doesn't start any child processes, so the information is lost as soon as `id' exits. But there's another caching mechanism available. If cygserver is running it will provide passwd and group entry caching for all processes in a Cygwin process tree, which first process has been started after cygserver. So, if you start a Cygwin Terminal and cygserver is running at the time, mintty, the shell, and all child processes will use cygserver caching. If you start a Cygwin Terminal and cygserver is not running a the time, none of the processes started inside this terminal window will use cygserver caching. The advantage of cygserver caching is that it's system-wide and, as long as cygserver is running, unforgetful. Every Cygwin process on the system will have the cygserver cache at its service. Additionally, all information requested from cygserver once, will be cached inside the process itself and, again, propagated to child processes. ========================================== Cygwin user names, home dirs, login shells ========================================== Obviously, if you don't maintain passwd and group files, you need to have a way to maintain the other fields of a passwd entry as well. Three things come to mind: - You want to use a Cygwin username different from your Windows username. Note: This is only supported via /etc/passwd and /etc/group files. A Cygwin username maintained in the Windows user databases would require very costly (read: slow) seach operations. - You want a home dir different from the default /home/$USER. - You want to specify a different login shell than /bin/bash. How this is done depends on your account being a domain account or a local account. Let's start with the default. Assuming your Windows account name is "bigfoot" and your domain is "MY_DOM". Your default passwd entry in absence of anything I'll describe below looks like this: bigfoot:*:<uid>:<gid>:U-MY_DOM\bigfoot,S-1-5-....:/home/bigfoot:/bin/bash or, if your account is from a different domain than the primary domain of the machine: MY_DOM+bigfoot:*:<uid>:<gid>:U-MY_DOM\bigfoot,S-1-5-....:/home/bigfoot:/bin/bash Yes, the default homedir is still /home/bigfoot. If your account is a domain account: Either create an /etc/passwd and/or /etc/group file with entries for your account and use that, just as before. Or, Cygwin will utilize the posixAccount/posixGroup attributes per RFC 2307[6]. These attributes are by default available in Active Directory since Windows Server 2003 R2. They are "not set", unless utilized by the (deprecated since Server 2012 R2) Active Directory "Server for NIS" feature. The user attributes utilized by Cygwin are: unixHomeDirectory If set, will be used as Cygwin home directory. loginShell If set, will be used as Cygwin login shell. gecos Content will be added to the pw_gecos field. uidNumber See next section. The group attributes utilized by Cygwin are: gidNumber See next section. Apart from power shell scripting or inventing new CLI tools, these attributes can be changed using the "Attribute Editor" tab in the user properties dialog of the "Active Directory Users and Computers" MMC snap-in. Alternatively, if the "Server for NIS" administration feature has been installed, there will be a "UNIX Attributes" tab which contains the required fields, except for the gecos field, which isn't really important anyway. Last resort is "ADSI Edit". The primary group of a user is always the Windows primary group set in Active Directory and can't be changed. If your machine is not a domain member machine or your account is a local account for some reason: Either create an /etc/passwd and/or /etc/group file with entries for your account and use that, just as before. Or enter the information into the "Comment" field of your local user entry. In the "Local Users and Groups" MMC snap-in it's called "Description". You can utilze this field even if you're running a "home edition" of Windows, using the command line. The "net user" command allows to set all values in the SAM, even if the GUI is crippled. A Cygwin SAM comment entry looks like this: <cygwin key="value" key="value" [...] /> The supported keys are home="value" Sets the Cygwin home dir to value. shell="value" Sets the Cygwin login shell to value. group="value" Sets the Cygwin primary group of the account to value, provided that the user *is* already a member of that group. This allows to override the default "None" primary group for local accounts. One nice idea here is, for instance group="Users". unix="value" Sets the NFS/Samba uid of the user to the decimal value. See the next chapter. The <cygwin .../> string can start at any point in the comment, but you have to follow the rules: - It starts with "<cygwin " and ends with "/>". - The "cygwin" string and the key names have to be lowercase. - No spaces between key and "value", just the equal sign. - The value must be placed within double quotes and it must not contain a double quote itself. The double quotes are required for the decimal values as well! CMD example: net user corinna /comment:"<cygwin home=\"/home/foo\"/>" Bash example (use single quotes): net user corinna /comment:'<cygwin home="/home/foo"/>' For changing group comments, use the `net localgroup' command. The supported key/value pair for groups are unix="value" Sets the NFS/Samba gid of the group to the decimal value. See the next chapter. =================== NFS account mapping =================== Microsoft's NFS client does not map the uid/gid values on the NFS shares to SIDs. There's no such thing as a (fake) security descriptor returned to the application. Rather, via an undocumented API an applications can fetch RFC 1813 compatible NFSv3 stat information from the share[7]. This is what Cygwin is using to show stat information for files on NFS shares. The problem is, while all other information in this stat record, like timestamps, file size etc., can be used by Cygwin, Cygwin had no way to map the values of the st_uid and st_gid members to a Windows SID for a long time. So it just faked the file owner info and claimed that it's you. However, SFU has, over time, developed multiple methods to map UNIX uid/gid values on NFS shares to Windows SIDs. You'll find the full documentation of the mapping methods in [8]. Cygwin now utilizes the RFC 2307 mapping for this purpose. This is most of the time provided by an AD domain, but it could also be a standalone LDAP mapping server. Per RFC 2307, the uid is in the attribute uidNumber. For groups, the gid is in the gidNumber attribute. When Cygwin stat's files on an NFS share, it asks the mapping server via LDAP in two different ways, depending on the role of the mapping server. - If the server is an AD domain controller, it asks for an account with uidNumber attribute == st_uid field of the stat record returned by NFS. If an account matches, AD returns the Windows SID, so we have an immediate mapping from UNIX uid to a Windows SID, if the user account has a valid uidNumber attribute. For groups, the method is the same, just that Cygwin asks for a group with gidNumber attribute == st_gid field of the stat record. - If the server is a standalone LDAP mapping server Cygwin asks for the same uidNumber/gidNumber attributes, but it can't expect that the LDAP server knows anything about Windows SIDs. Rather, the mapping server returns the account name. Cygwin then asks the DC for an account with this name, and if that succeeds, we have a mapping between UNIX uid/gid and Windows SIDs. The mapping will be cached for the lifetime of the process, and inherited by child processes. ===================== Samba account mapping ===================== A fully set up Samba with domain integration is running winbindd to map Window SIDs to artificially created UNIX uids and gids, and this mapping is transparent within the domain, so Cygwin doesn't have to do anything special. However, setting up winbindd isn't for everybody, and it fails to map Windows accounts to already existing UNIX users or groups. In contrast to NFS, Samba returns security descriptors, but unmapped UNIX accounts get special SIDs: - A UNIX user account with uid X is mapped to the Windows SID S-1-22-1-X. - A UNIX group account with gid X is mapped to SID S-1-22-2-X. As you can see, even though we have SIDs, they just reflect the actual uid/gid values on the UNIX box in the RID value. It's only marginally different from the NFS method, so why not just use the same method as for NFS? That's what Cygwin will do. If it encounters a S-1-22-x-y SID, it will perform the same RFC 2307 mapping as for NFS shares. For home users without any Windows domain or LDAP server per RFC 2307, but with a Linux machine running Samba, just add this information to your SAM account. Assuming the uid of your Linux user account is 505 and the gid of your primary group is, say, 100, just add the values to your SAM user and group accounts. The following example assumes you didn't already add something else to the comment field. To your user's SAM comment (remember: called "Description" in the GUI), add: <cygwin group="Users" unix="505"/> To the user's group SAM comment add: <cygwin unix="100"/> This should be sufficient to work on your Samba share and to see all files owned by your Linux user account as your files. =========================== The /etc/nsswitch.conf file =========================== Last, but not least, let's talk about the way to configure how the mapping works on your machine. On Linux and some other UNIXy OSes, we have a file called /etc/nsswitch.conf[9]. One part of it is to specify how the passwd and group entries are generated. That's what Cygwin now provides as well. The /etc/nsswitch.conf file is optional. If you don't have one, Cygwin uses sensible defaults. Note: The /etc/nsswitch.conf file is read exactly once by the first process of a Cygwin process tree. If there was no /etc/nsswitch.conf file when this first process started, then no other process in the running Cygwin process trees will try to read the file. If you create or change /etc/nsswitch.conf, you need to restart all Cygwin processes that need to see the change. If the process you want to see the change is a child of another process, you need to restart all of that process's parents, too. For example, if you run Vim inside the default Cygwin Terminal, Vim is a child of your shell, which is a child of mintty.exe. If you edit /etc/nsswitch.conf in that Vim instance, your shell won't immediately see the change, nor will Vim if you restart it from that same shell instance. This is because both are getting their nsswitch information from their ancestor, mintty.exe. You need to start a fresh terminal window for the change to take effect. By contrast, if you leave that Cygwin Terminal window open after making the change to /etc/nsswitch.conf, then restart a Cygwin service like cron, cron will see the change, because it is not a child of mintty.exe or any other Cygwin process. (Technically, it is a child of cygrunsrv, but that instance also restarts when you restart the service.) The reason we point all this out is that the requirements for restarting things are not quite as stringent as when you replace cygwin1.dll. If you have three process trees, you have three independent copies of the nsswitch information. If you start a fresh process tree, it will see the changes. As long as any process in an existing process tree remains running, all processes in that tree will continue to use the old information. So, what mischief can we perform with /etc/nsswitch.conf? To explain, lets have a look into an /etc/nsswitch.conf file set up to all default values: # /etc/nsswitch.conf passwd: files db group: files db db_prefix: auto db_separator: + db_enum: cache builtin The first line, starting with a hash '#' is a comment. The hash character starts a comment, just as in shell scripts. Everything up to the end of the line is ignored. So this: foo: bar # baz means, for the entry "foo", do "bar", ignore everything after the hash sign. "baz" is only a comment. The other lines define the available settings. The first word up to a colon is a keyword. Note that the colon *must* follow immediately after the keyword. This is a valid line: foo: bar This is not valid: foo : bar Apart from this restriction, the reminder of the line can have as may spaces and TABs as you like. This is a valid line: foo: bar baz Now let's have a look at the available keywords and settings. The two lines starting with the keywords "passwd" and "group" define where Cygwin gets its passwd and group information from. "files" means, fetch the information from the corresponding file in the /etc directory. "db" means, fetch the information from the Windows account databases, the SAM for local accounts, Active Directory for domain account. Examples: passwd: files Read passwd entries only from /etc/passwd. group: db Read group entries only from SAM/AD. group: files # db Read group entries only from /etc/group ("db" is ignored due to the preceding hash sign). passwd: files db Read passwd entries from /etc/passwd. If a user account isn't found, try to find it in SAM or AD. This is the default for both, passwd and group information. group: db files This is a valid entry, but the order will be ignored by Cygwin. If both, files and db are specified, Cygwin will always try the files first, then the db. The remaining entries define certain aspects of the Windows account database search. "db_prefix" determines how the Cygwin user or group name is created: db_prefix: auto This is the default. If your account is from the primary domain of your machine, or if your machine is a standalone machine (not a domain member), your Cygwin account name is just the Windows account name. If your account is from another domain, or if you're logged in as local user on a domain machine, the Cygwin username will be the combination of Windows domainname and username, with the separator char in between: MY_DOM+username (foreign domain) MACHINE+username (local account) Builtin accounts have just the separator char prepended: +LOCAL +Users Unknown accounts on NFS or Samba shares (that is, accounts which cannot be mapped to Windows user accounts via RFC 2307) get a Cygwin account name consisting of the artificial domains "Unix_User" or "Unix_Group" and the uid/gid value, for instance: Unix_User+0 (root) Unix_Group+10 (wheel) db_prefix: primary Like "auto", but primary domain accounts will be prepended by the domainname as well. db_prefix: always All accounts, even the builtin accounts, will have the domain name prepended: BUILTIN+Users "db_separator" defines the spearator char used to prepend the domain name to the user or group name. The default is '+': MY_DOM+username With "db_separator", you can define any ASCII char except space, tab, carriage return, line feed, and the colon, as separator char. Example: db_separator: \ MY_DOM\username "db_enum" defines the depth of a database search, if an application calls one of the enumeration functions getpwent[10] or getgrent[11]. The problem with these functions is, they neither allow to define how many entries will be enumerated when calling them in a loop, nor do they allow to add some filter criteria. They were designed back in the days, when only /etc/passwd and /etc/group files existed and the number of user accounts on a typical UNIX system was seldomly a three-digit number. These days, with user and group databases sometimes going in the six-digit range, they are a potential burden. For that reason, Cygwin does not enumerate all user or group accounts by default, but rather just a very small list, consisting only of the accounts cached in memory by the current process, as well as a handful of predefined builtin accounts. "db_enum" allows to specify the accounts to enumerate in a fine-grained way. It takes a list of sources as argument: db_enum: source1 source2 ... The recognized sources are the following: none No output from getpwent/getgrent at all. all The opposite. Enumerates accounts from all known sources, including all trusted domains. cache Enumerate all accounts currently cached in memory. builtin Enumerate the predefined builtin accounts for backward compatibility. These are five passwd accounts (SYSTEM, LocalService, NetworkService, Administrators, TrustedInstaller) and two group accounts (SYSTEM and TrustedInstaller). files Enumerate the accounts from /etc/passwd or /etc/group. local Enumerate all accounts from the local SAM. primary Enumerate all accounts from the primary domain. alltrusted Enumerate all accounts from all trusted domains. some.domain Enumerate all accounts from the trusted domain some.domain. The trusted domain can be given as Netbios flat name (MY_DOMAIN) or as dns domain name (my_domain.corp). In contrast to the aforementioned fixed source keywords, distinct domain names are caseinsensitive. Only domains which are actually trusted domains are enumerated. Unknown domains are simply ignored. Please note that getpwent/getgrent do *not* test if an account was already listed from another source, so an account can easily show up twice or three times. Such a test would be rather tricky, nor does the Linux implementation perform such test. Here are a few examples for /etc/nsswitch.conf: db_enum: none No output from getpwent/getgrent at all. The first call to the function immediately returns a NULL pointer. db_enum: cache files Enumerate all accounts cached by the current process, plus all entries from either the /etc/passwd or /etc/group file. db_enum: cache local primary Enumerate all accounts cached by the current process, all accounts from the SAM of the local machine, and all accounts from the primary domain of the machine. db_enum: local primary alltrusted Enumerate the accounts from the machine's SAM, from the primary domain of the machine, and from all trusted domains. db_enum: primary domain1.corp sub.domain.corp domain2.net Enumerate the accounts from the primary domain and from the domains domain1.corp, sub.domain.corp and domain2.net. db_enum: all Enumerate everything and the kitchen sink. ========== Footnotes: ========== [1] http://cygwin.com/cygwin-ug-net/ntsec.html [2] This may change. Right now the file is read in 32K chunks, but we could easily read the file in 64K chunks and, if we find the file is < 64K anyway, just cache the entire bunch, like before. Not implemented yet, but something to keep in mind. [3] http://msdn.microsoft.com/en-us/library/windows/desktop/aa379166%28v=vs.85%29.aspx http://msdn.microsoft.com/en-us/library/windows/desktop/aa379159%28v=vs.85%29.aspx [4] This is where Cygwin differs from SFU. The reason is that we need the old uid/gid values for backward compatibility. There are Cygwin packages (cron, for instance) who rely on the fact that the uid of SYSTEM is 18. In SFU, these accounts get mapped like the other built in SIDs. [5] http://support.microsoft.com/kb/243330 [6] https://tools.ietf.org/html/rfc2307 [7] https://tools.ietf.org/html/rfc1813 [8] http://msdn.microsoft.com/en-us/library/cc980032.aspx [9] http://linux.die.net/man/5/nsswitch.conf [10] http://linux.die.net/man/3/getpwent [11] http://linux.die.net/man/3/getgrent [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2014-07-29 9:07 UTC | newest] Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-06-16 20:39 timeout in LDAP access Denis Excoffier 2014-06-17 10:00 ` Corinna Vinschen 2014-06-17 10:30 ` gecos from AD? (was Re: timeout in LDAP access) Corinna Vinschen 2014-06-17 12:51 ` Corinna Vinschen 2014-06-17 23:07 ` Denis Excoffier 2014-06-18 2:18 ` AW: " Christoph H. Hochstaetter 2014-06-17 22:59 ` Denis Excoffier 2014-06-18 8:38 ` Corinna Vinschen 2014-06-17 22:41 ` timeout in LDAP access Denis Excoffier 2014-06-18 8:33 ` Corinna Vinschen 2014-06-18 18:01 ` Corinna Vinschen 2014-06-19 17:53 ` Denis Excoffier 2014-06-23 9:10 ` Corinna Vinschen 2014-06-23 20:38 ` Denis Excoffier 2014-06-24 15:59 ` Corinna Vinschen 2014-06-25 10:15 ` Corinna Vinschen 2014-06-25 20:44 ` Denis Excoffier 2014-06-25 21:14 ` Corinna Vinschen 2014-07-03 20:57 ` Denis Excoffier 2014-07-07 11:07 ` Corinna Vinschen 2014-07-08 19:34 ` Denis Excoffier 2014-07-09 10:13 ` Corinna Vinschen 2014-07-12 13:39 ` Denis Excoffier 2014-07-14 9:51 ` Corinna Vinschen 2014-07-14 13:48 ` Corinna Vinschen 2014-07-15 16:29 ` Denis Excoffier 2014-07-15 18:20 ` Andrey Repin 2014-07-16 13:52 ` Corinna Vinschen 2014-07-17 6:33 ` Denis Excoffier 2014-07-18 19:18 ` Corinna Vinschen 2014-07-28 9:21 ` Corinna Vinschen 2014-07-28 18:51 ` Denis Excoffier 2014-07-29 9:07 ` Please test AD integration changes, documentation attached (was Re: timeout in LDAP access) Corinna Vinschen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).