public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
* Re: Mail is way behind
  2004-05-20 21:12 Mail is way behind Ian Lance Taylor
@ 2004-05-20 18:31 ` Christopher Faylor
  2004-05-20 21:14   ` Christopher Faylor
  2004-05-20 22:05 ` Matthew Galgoci
  1 sibling, 1 reply; 29+ messages in thread
From: Christopher Faylor @ 2004-05-20 18:31 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: overseers

On Thu, May 20, 2004 at 12:52:50PM -0400, Ian Lance Taylor wrote:
>Mail on sourceware is getting pretty far behind.

FWIW, I've been noticing that my connection to sourceware goes dead for
a few seconds at a time and then restarts.

cgf

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Mail is way behind
@ 2004-05-20 21:12 Ian Lance Taylor
  2004-05-20 18:31 ` Christopher Faylor
  2004-05-20 22:05 ` Matthew Galgoci
  0 siblings, 2 replies; 29+ messages in thread
From: Ian Lance Taylor @ 2004-05-20 21:12 UTC (permalink / raw)
  To: overseers

Mail on sourceware is getting pretty far behind.

==> 14:00 <==
Number of unsent notes:  41 , number of deliveries:  5278
 
==> 14:15 <==
Number of unsent notes:  43 , number of deliveries:  5204
 
==> 14:30 <==
Number of unsent notes:  43 , number of deliveries:  4902
 
==> 14:45 <==
Number of unsent notes:  33 , number of deliveries:  5834
 
==> 15:00 <==
Number of unsent notes:  44 , number of deliveries:  3311
 
==> 15:15 <==
Number of unsent notes:  47 , number of deliveries:  4972
 
==> 15:30 <==
Number of unsent notes:  51 , number of deliveries:  7421
 
==> 15:45 <==
Number of unsent notes:  49 , number of deliveries:  9926
 
==> 16:00 <==
Number of unsent notes:  49 , number of deliveries:  12641
 
==> 16:15 <==
Number of unsent notes:  75 , number of deliveries:  13490
 
==> 16:30 <==
Number of unsent notes:  93 , number of deliveries:  16805
 
==> 16:45 <==
Number of unsent notes:  97 , number of deliveries:  16302

Looking at the log files, qmail seems to be going through waves of
failures to make an SMTP connection.  At this point I don't know why.
Are there any transient networking problems happening?

Ian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-20 18:31 ` Christopher Faylor
@ 2004-05-20 21:14   ` Christopher Faylor
  0 siblings, 0 replies; 29+ messages in thread
From: Christopher Faylor @ 2004-05-20 21:14 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: overseers

On Thu, May 20, 2004 at 12:52:50PM -0400, Ian Lance Taylor wrote:
>Mail on sourceware is getting pretty far behind.

FWIW, I've been noticing that my connection to sourceware goes dead for
a few seconds at a time and then restarts.

cgf

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-20 21:12 Mail is way behind Ian Lance Taylor
  2004-05-20 18:31 ` Christopher Faylor
@ 2004-05-20 22:05 ` Matthew Galgoci
  2004-05-21  0:57   ` Frank Ch. Eigler
  1 sibling, 1 reply; 29+ messages in thread
From: Matthew Galgoci @ 2004-05-20 22:05 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: overseers

> Looking at the log files, qmail seems to be going through waves of
> failures to make an SMTP connection.  At this point I don't know why.
> Are there any transient networking problems happening?

These seems to be transient networking weirdness going on. We are investigation
and will let you all know when the situation changes.

-- 
Matthew Galgoci
System Administrator and Sr. Manager of Ruminants
Red Hat, Inc
919.754.3700 x44155

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-20 22:05 ` Matthew Galgoci
@ 2004-05-21  0:57   ` Frank Ch. Eigler
  2004-05-21  2:07     ` Ian Lance Taylor
  0 siblings, 1 reply; 29+ messages in thread
From: Frank Ch. Eigler @ 2004-05-21  0:57 UTC (permalink / raw)
  To: Sourceware Overseers

Hi -

mgalgoci wrote:

> These seems to be transient networking weirdness going on. [...]

In the mean time, I made some system tweaks, which appear to be
helping with the email backlog:

- setting /etc/resolv.conf back to using 127.0.0.1 as a preferred server
- configuring qmail for 250 concurrent remote deliveries and a shorter timeout

- FChE

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21  0:57   ` Frank Ch. Eigler
@ 2004-05-21  2:07     ` Ian Lance Taylor
  2004-05-21  3:29       ` Christopher Faylor
  0 siblings, 1 reply; 29+ messages in thread
From: Ian Lance Taylor @ 2004-05-21  2:07 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Sourceware Overseers

"Frank Ch. Eigler" <fche@redhat.com> writes:

> - setting /etc/resolv.conf back to using 127.0.0.1 as a preferred server

Yikes, that should never have changed.  Glad you noticed.

> - configuring qmail for 250 concurrent remote deliveries and a shorter timeout

Sounds good as long as the load stays under control, which it
certainly is so far.

Ian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21  2:07     ` Ian Lance Taylor
@ 2004-05-21  3:29       ` Christopher Faylor
  2004-05-21  3:46         ` Matthew Galgoci
  0 siblings, 1 reply; 29+ messages in thread
From: Christopher Faylor @ 2004-05-21  3:29 UTC (permalink / raw)
  To: overseers, fche

On Thu, May 20, 2004 at 08:57:23PM -0400, Ian Lance Taylor wrote:
>"Frank Ch. Eigler" <fche@redhat.com> writes:
>>- setting /etc/resolv.conf back to using 127.0.0.1 as a preferred
>>server
>
>Yikes, that should never have changed.  Glad you noticed.

I assume it was set when the system moved, for some reason.

cgf

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21  3:29       ` Christopher Faylor
@ 2004-05-21  3:46         ` Matthew Galgoci
  2004-05-21  4:08           ` Ian Lance Taylor
  0 siblings, 1 reply; 29+ messages in thread
From: Matthew Galgoci @ 2004-05-21  3:46 UTC (permalink / raw)
  To: Christopher Faylor; +Cc: overseers, fche

On Thu, 20 May 2004, Christopher Faylor wrote:

> On Thu, May 20, 2004 at 08:57:23PM -0400, Ian Lance Taylor wrote:
> >"Frank Ch. Eigler" <fche@redhat.com> writes:
> >>- setting /etc/resolv.conf back to using 127.0.0.1 as a preferred
> >>server
> >
> >Yikes, that should never have changed.  Glad you noticed.
> 
> I assume it was set when the system moved, for some reason.

I probably did it at one point to get minimal networking online in runlevel
1. :\

-- 
Matthew Galgoci
System Administrator and Sr. Manager of Ruminants
Red Hat, Inc
919.754.3700 x44155

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21  3:46         ` Matthew Galgoci
@ 2004-05-21  4:08           ` Ian Lance Taylor
  2004-05-21  4:19             ` Jonathan Larmour
                               ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Ian Lance Taylor @ 2004-05-21  4:08 UTC (permalink / raw)
  To: Matthew Galgoci; +Cc: overseers

We are still having networking problems, by the way.  Lots of e-mail
connections fail to open.  By looking at qmail-remote, I observed,
among other things, several attempts to deliver to dberlin.org.  When
I do
    telnet dberlin.org smtp
it opens right away and displays
    220 dberlin.org ESMTP CommuniGate Pro 4.1.4

I then type
    helo sourceware.org
and it hangs.

This does not happen from my home system.  It opens right away, and I
get a response:
    250 dberlin.org your name is not sourceware.org
or if I use my real host name, I get:
    250 dberlin.org is pleased to meet you
Both response come right away.

So why is the connection from sourceware.org hanging after the helo
command?

One thought was that perhaps the new IP address is in some spam
blacklist and dberlin.org is doing tarpitting, but I can't find any
evidence of that.

Another thought was that perhaps the IP address does not reliably
reverse resolve, but I ran a dnstrace and everything looked fine.

Another thought was that there is some very odd firewall between
sourceware.org and dberlin.org.  I don't know how to check for that.

Other than that, I have no idea.  Doing a ping to dberlin.org works
just fine:
    51 packets transmitted, 51 received, 0% packet loss, time 50475ms
    rtt min/avg/max/mdev = 19.925/21.720/42.577/3.016 ms, pipe 2

There are 425 undelivered e-mail messages queued up to dberlin.org.

Does anybody have a clue here?

Ian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21  4:08           ` Ian Lance Taylor
@ 2004-05-21  4:19             ` Jonathan Larmour
  2004-05-21  4:28               ` Ian Lance Taylor
  2004-05-21  4:30               ` Andrew Pinski
  2004-05-21  4:20             ` Christopher Faylor
  2004-05-21  4:52             ` Frank Ch. Eigler
  2 siblings, 2 replies; 29+ messages in thread
From: Jonathan Larmour @ 2004-05-21  4:19 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Matthew Galgoci, overseers

Ian Lance Taylor wrote:
> We are still having networking problems, by the way.  Lots of e-mail
> connections fail to open.  By looking at qmail-remote, I observed,
> among other things, several attempts to deliver to dberlin.org.  When
> I do
>     telnet dberlin.org smtp
> it opens right away and displays
>     220 dberlin.org ESMTP CommuniGate Pro 4.1.4
> 
> I then type
>     helo sourceware.org
> and it hangs.
[snip]
> Another thought was that perhaps the IP address does not reliably
> reverse resolve, but I ran a dnstrace and everything looked fine.

If I was to hazard a guess, it's Daniel's machine that's not doing the 
reverse lookup. Perhaps he hard-coded the IP address in some way and since 
the IP addr change it's broken things?

I would have thought the easiest way is to ask him, but not via sourceware 
:-). dberlin (AT) dberlin.org

Jifl
-- 
eCosCentric    http://www.eCosCentric.com/    The eCos and RedBoot experts
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21  4:08           ` Ian Lance Taylor
  2004-05-21  4:19             ` Jonathan Larmour
@ 2004-05-21  4:20             ` Christopher Faylor
  2004-05-21  4:52             ` Frank Ch. Eigler
  2 siblings, 0 replies; 29+ messages in thread
From: Christopher Faylor @ 2004-05-21  4:20 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Matthew Galgoci, overseers

On Thu, May 20, 2004 at 11:43:23PM -0400, Ian Lance Taylor wrote:
>We are still having networking problems, by the way.  Lots of e-mail
>connections fail to open.  By looking at qmail-remote, I observed,
>among other things, several attempts to deliver to dberlin.org.  When
>I do
>    telnet dberlin.org smtp
>it opens right away and displays
>    220 dberlin.org ESMTP CommuniGate Pro 4.1.4
>
>I then type
>    helo sourceware.org
>and it hangs.

But ehlo sourceware.org works fine.  That's strange.

I tried turning off iptables to see if that was the culprit but it still
hangs.

cgf

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21  4:19             ` Jonathan Larmour
@ 2004-05-21  4:28               ` Ian Lance Taylor
  2004-05-21  4:48                 ` Ian Lance Taylor
  2004-05-21 13:38                 ` Hans-Peter Nilsson
  2004-05-21  4:30               ` Andrew Pinski
  1 sibling, 2 replies; 29+ messages in thread
From: Ian Lance Taylor @ 2004-05-21  4:28 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Matthew Galgoci, overseers

Jonathan Larmour <jifl@eCosCentric.com> writes:

> Ian Lance Taylor wrote:
> > We are still having networking problems, by the way.  Lots of e-mail
> > connections fail to open.  By looking at qmail-remote, I observed,
> > among other things, several attempts to deliver to dberlin.org.  When
> > I do
> >     telnet dberlin.org smtp
> > it opens right away and displays
> >     220 dberlin.org ESMTP CommuniGate Pro 4.1.4
> > I then type
> >     helo sourceware.org
> > and it hangs.
> [snip]
> > Another thought was that perhaps the IP address does not reliably
> > reverse resolve, but I ran a dnstrace and everything looked fine.
> 
> If I was to hazard a guess, it's Daniel's machine that's not doing the
> reverse lookup. Perhaps he hard-coded the IP address in some way and
> since the IP addr change it's broken things?

Hard to see why it would work from my home system, in that case.

> I would have thought the easiest way is to ask him, but not via
> sourceware :-). dberlin (AT) dberlin.org

dberlin.org just happened to be the first failing address which I
tried.  There are other addresses which appear to be failing, such as
axis.se and ecoscentric.com.  Perhaps there is something special about
dberlin.org, but my initial assumption is that it might serve as an
indicator of the larger problem.

Ian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21  4:19             ` Jonathan Larmour
  2004-05-21  4:28               ` Ian Lance Taylor
@ 2004-05-21  4:30               ` Andrew Pinski
  2004-05-21  5:08                 ` Zack Weinberg
  1 sibling, 1 reply; 29+ messages in thread
From: Andrew Pinski @ 2004-05-21  4:30 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Ian Lance Taylor, Matthew Galgoci, overseers

> 
> Ian Lance Taylor wrote:
> > We are still having networking problems, by the way.  Lots of e-mail
> > connections fail to open.  By looking at qmail-remote, I observed,
> > among other things, several attempts to deliver to dberlin.org.  When
> > I do
> >     telnet dberlin.org smtp
> > it opens right away and displays
> >     220 dberlin.org ESMTP CommuniGate Pro 4.1.4
> > 
> > I then type
> >     helo sourceware.org
> > and it hangs.
> [snip]
> > Another thought was that perhaps the IP address does not reliably
> > reverse resolve, but I ran a dnstrace and everything looked fine.
> 
> If I was to hazard a guess, it's Daniel's machine that's not doing the 
> reverse lookup. Perhaps he hard-coded the IP address in some way and since 
> the IP addr change it's broken things?
> 
> I would have thought the easiest way is to ask him, but not via sourceware 
> :-). dberlin (AT) dberlin.org

Note that IIRC Daniel is on vaction and is on a ship so he will not be able
to reply for a while.


Andrew

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21  4:28               ` Ian Lance Taylor
@ 2004-05-21  4:48                 ` Ian Lance Taylor
  2004-05-21  6:07                   ` Jonathan Larmour
  2004-05-21 13:38                 ` Hans-Peter Nilsson
  1 sibling, 1 reply; 29+ messages in thread
From: Ian Lance Taylor @ 2004-05-21  4:48 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Matthew Galgoci, overseers

Ian Lance Taylor <ian@airs.com> writes:

> dberlin.org just happened to be the first failing address which I
> tried.  There are other addresses which appear to be failing, such as
> axis.se and ecoscentric.com.  Perhaps there is something special about
> dberlin.org, but my initial assumption is that it might serve as an
> indicator of the larger problem.

Of course then I remember that you are writing from ecoscentric.com.

On sourceware,
    telnet mail.ecoscentric.com smtp
connects, but hangs waiting for a 220 welcome message.  From my home
machine, I quickly get
    220 smtp.ecoscentric.com ESMTP Postfix

So this is a different problem.

There are 60 messages queued for ecoscentric.com.

Ian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21  4:08           ` Ian Lance Taylor
  2004-05-21  4:19             ` Jonathan Larmour
  2004-05-21  4:20             ` Christopher Faylor
@ 2004-05-21  4:52             ` Frank Ch. Eigler
  2 siblings, 0 replies; 29+ messages in thread
From: Frank Ch. Eigler @ 2004-05-21  4:52 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Matthew Galgoci, overseers

Hi -

> [...]
> Another thought was that there is some very odd firewall between
> sourceware.org and dberlin.org.  I don't know how to check for that.
> [...]

I recall that recently some of the RH firewall boxes were erroneously
munging packet checksums.  I don't remember whether it was IP or UDP/TCP.
One might try injecting some bad packets inbound or outbound via some
perl script, and watch with tcpdump how far they make it.

- FChE

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21  4:30               ` Andrew Pinski
@ 2004-05-21  5:08                 ` Zack Weinberg
  0 siblings, 0 replies; 29+ messages in thread
From: Zack Weinberg @ 2004-05-21  5:08 UTC (permalink / raw)
  To: Andrew Pinski
  Cc: Jonathan Larmour, Ian Lance Taylor, Matthew Galgoci, overseers


FYI, I forwarded Ian's original message to Daniel.

zw

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21  4:48                 ` Ian Lance Taylor
@ 2004-05-21  6:07                   ` Jonathan Larmour
  2004-05-21  8:28                     ` Jonathan Larmour
                                       ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Jonathan Larmour @ 2004-05-21  6:07 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Matthew Galgoci, overseers

Ian Lance Taylor wrote:
> Ian Lance Taylor <ian@airs.com> writes:
> 
> 
>>dberlin.org just happened to be the first failing address which I
>>tried.  There are other addresses which appear to be failing, such as
>>axis.se and ecoscentric.com.  Perhaps there is something special about
>>dberlin.org, but my initial assumption is that it might serve as an
>>indicator of the larger problem.
> 
> 
> Of course then I remember that you are writing from ecoscentric.com.
> 
> On sourceware,
>     telnet mail.ecoscentric.com smtp
> connects, but hangs waiting for a 220 welcome message.  From my home
> machine, I quickly get
>     220 smtp.ecoscentric.com ESMTP Postfix
> 
> So this is a different problem.

A different symptom, but I would have a suspicion it could be the same problem.

A tcpdump shows that mail.ecoscentric.com keeps trying to send to 
sourceware.org but gets no TCP response. So here is what happened when I 
connected from sourceware via smtp by hand (norbert is 
mail.ecoscentric.com's real name):

05:31:09.788302 sourceware.org.44386 > norbert.ecoscentric.com.smtp: SWE 
2180261257:2180261257(0) win 5840 <mss 1460,sackOK,timestamp 83025491 
0,nop,wscale 0> (DF) [tos 0x10]
05:31:09.788342 norbert.ecoscentric.com.smtp > sourceware.org.44386: S 
2950878486:2950878486(0) ack 2180261258 win 5792 <mss 1460,sackOK,timestamp 
551524080 83025491,nop,wscale 0> (DF)
05:31:09.895787 sourceware.org.44386 > norbert.ecoscentric.com.smtp: . ack 
1 win 5840 <nop,nop,timestamp 83025502 551524080> (DF) [tos 0x10]
05:31:09.896874 norbert.ecoscentric.com.smtp > sourceware.org.44386: P 
1:41(40) ack 1 win 5792 <nop,nop,timestamp 551524091 83025502> (DF)
05:31:10.196796 norbert.ecoscentric.com.smtp > sourceware.org.44386: P 
1:41(40) ack 1 win 5792 <nop,nop,timestamp 551524121 83025502> (DF)
05:31:10.796800 norbert.ecoscentric.com.smtp > sourceware.org.44386: P 
1:41(40) ack 1 win 5792 <nop,nop,timestamp 551524181 83025502> (DF)
05:31:11.996833 norbert.ecoscentric.com.smtp > sourceware.org.44386: P 
1:41(40) ack 1 win 5792 <nop,nop,timestamp 551524301 83025502> (DF)
05:31:14.396874 norbert.ecoscentric.com.smtp > sourceware.org.44386: P 
1:41(40) ack 1 win 5792 <nop,nop,timestamp 551524541 83025502> (DF)
05:31:19.196963 norbert.ecoscentric.com.smtp > sourceware.org.44386: P 
1:41(40) ack 1 win 5792 <nop,nop,timestamp 551525021 83025502> (DF)
[and so on]

netstat shows 40 bytes stuck in the send queue for that socket, which 
matches the above.

Clearly something works sort of because sourceware sends a response to the 
TCP SYN+ACK. netstat -n on sourceware shows empty send and receive queues 
for that socket (but it does say it's ESTABLISHED at least).

It may be informative to see what sourceware thinks it receives by doing a 
"telnet mail.ecoscentric.com smtp" there with a "tcpdump host 
mail.ecoscentric.com". But I suspect it will see nothing. It does sort of 
seem to suggest a firewalling issue, but on the sourceware side (or its 
router). I wonder if the router does stateful filtering but can't handle 
the number of simultaneous connections sourceware has...

Jifl
-- 
eCosCentric    http://www.eCosCentric.com/    The eCos and RedBoot experts
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21  6:07                   ` Jonathan Larmour
@ 2004-05-21  8:28                     ` Jonathan Larmour
  2004-05-21 12:30                     ` Ian Lance Taylor
  2004-05-21 14:28                     ` Frank Ch. Eigler
  2 siblings, 0 replies; 29+ messages in thread
From: Jonathan Larmour @ 2004-05-21  8:28 UTC (permalink / raw)
  To: overseers

Jonathan Larmour wrote:
>>
>> Of course then I remember that you are writing from ecoscentric.com.

And in case you're wondering, although my return address is 
@ecoscentric.com, I'm subscribed to the overseers list from a different 
domain, which is why I can see this thread at all :-).

Jifl
-- 
eCosCentric    http://www.eCosCentric.com/    The eCos and RedBoot experts
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21  6:07                   ` Jonathan Larmour
  2004-05-21  8:28                     ` Jonathan Larmour
@ 2004-05-21 12:30                     ` Ian Lance Taylor
  2004-05-21 12:58                       ` Angela Marie Thomas
  2004-05-21 16:06                       ` Matthew Galgoci
  2004-05-21 14:28                     ` Frank Ch. Eigler
  2 siblings, 2 replies; 29+ messages in thread
From: Ian Lance Taylor @ 2004-05-21 12:30 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Matthew Galgoci, overseers

Jonathan Larmour <jifl@eCosCentric.com> writes:

> It may be informative to see what sourceware thinks it receives by
> doing a "telnet mail.ecoscentric.com smtp" there with a "tcpdump host
> mail.ecoscentric.com". But I suspect it will see nothing.

Easily done, and your suspicion is correct:

04:57:08.630077 sources.redhat.com.54827 > norbert.ecoscentric.com.smtp: SWE 3844614268:3844614268(0) win 5840 <mss 1460,sackOK,timestamp 83181375 0,nop,wscale 0> (DF) [tos 0x10]
04:57:08.737521 norbert.ecoscentric.com.smtp > sources.redhat.com.54827: S 303508727:303508727(0) ack 3844614269 win 5792 <mss 1460,sackOK,timestamp 551679967 83181375,nop,wscale 0> (DF)
04:57:08.737615 sources.redhat.com.54827 > norbert.ecoscentric.com.smtp: . ack 1 win 5840 <nop,nop,timestamp 83181386 551679967> (DF) [tos 0x10]

Then I exited out of telnet without typing anything, and we see that
the FIN gets through:

04:58:10.521111 sources.redhat.com.54827 > norbert.ecoscentric.com.smtp: F 1:1(0) ack 1 win 5840 <nop,nop,timestamp 83187564 551679967> (DF) [tos 0x10]
04:58:10.628862 norbert.ecoscentric.com.smtp > sources.redhat.com.54827: F 41:41(0) ack 2 win 5792 <nop,nop,timestamp 551686156 83187564> (DF)
04:58:10.628936 sources.redhat.com.54827 > norbert.ecoscentric.com.smtp: R 3844614270:3844614270(0) win 0 (DF)

> It does sort
> of seem to suggest a firewalling issue, but on the sourceware side (or
> its router). I wonder if the router does stateful filtering but can't
> handle the number of simultaneous connections sourceware has...

I don't think it could be something as simple as that, or we wouldn't
see consistent failures for particular sites, while most sites appear
to consistently succeed.

But it does pretty clear seem to be some sort of misbehaving
firewall.  Matthew, can you tell us anything about the networking
environment around sourceware?

Ian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21 12:30                     ` Ian Lance Taylor
@ 2004-05-21 12:58                       ` Angela Marie Thomas
  2004-05-21 14:33                         ` Ian Lance Taylor
  2004-05-21 16:06                       ` Matthew Galgoci
  1 sibling, 1 reply; 29+ messages in thread
From: Angela Marie Thomas @ 2004-05-21 12:58 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Jonathan Larmour, Matthew Galgoci, overseers


Have we ruled out identd?  I remember a problem in the past with
identd being blocked at the router.

In general it would be nice to know what level of filtering the
RHAT router is doing vs. our iptables.

--Angela

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21  4:28               ` Ian Lance Taylor
  2004-05-21  4:48                 ` Ian Lance Taylor
@ 2004-05-21 13:38                 ` Hans-Peter Nilsson
  2004-05-21 15:02                   ` Ian Lance Taylor
  1 sibling, 1 reply; 29+ messages in thread
From: Hans-Peter Nilsson @ 2004-05-21 13:38 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: overseers

On Fri, 21 May 2004, Ian Lance Taylor wrote:
> dberlin.org just happened to be the first failing address which I
> tried.  There are other addresses which appear to be failing, such as
> axis.se and ecoscentric.com.

Could you please give some pointers on the axis.se failures?

(The only issue I know of is to bounce viruses at the SMTP layer
[a 5xx reply to the connecting MTA], which seems to work
properly these days; I thought I had reported the spam level on
gcc-cvs here, but must have pressed "cancel" instead of "send"
or something, cause I couldn't find my message when cgf
mentioned he'd fixed it.)

brgds, H-P

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21  6:07                   ` Jonathan Larmour
  2004-05-21  8:28                     ` Jonathan Larmour
  2004-05-21 12:30                     ` Ian Lance Taylor
@ 2004-05-21 14:28                     ` Frank Ch. Eigler
  2 siblings, 0 replies; 29+ messages in thread
From: Frank Ch. Eigler @ 2004-05-21 14:28 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Matthew Galgoci, overseers

Hi -

jifl wrote:

> [...] It does sort of 
> seem to suggest a firewalling issue, but on the sourceware side (or its 
> router). I wonder if the router does stateful filtering but can't handle 
> the number of simultaneous connections sourceware has...

As an experiment, we might turn down ftp / httpd service to many fewer
concurrent clients than now.  (As a coarse figure, sourceware has been
transmitting about 1 MB/s for a while now.)  As another experiment, we
could reboot sourceware, in case it is some kernel problem.

- FChE

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21 12:58                       ` Angela Marie Thomas
@ 2004-05-21 14:33                         ` Ian Lance Taylor
  0 siblings, 0 replies; 29+ messages in thread
From: Ian Lance Taylor @ 2004-05-21 14:33 UTC (permalink / raw)
  To: angela; +Cc: Jonathan Larmour, Matthew Galgoci, overseers

Angela Marie Thomas <angela@foam.wonderslug.com> writes:

> Have we ruled out identd?  I remember a problem in the past with
> identd being blocked at the router.

An identd connection appears to be refused immediately, which is what
we want.  Some time ago the router was dropping those packets, which
led to timeouts, but now the connection is just refused, which is
fine.

> In general it would be nice to know what level of filtering the
> RHAT router is doing vs. our iptables.

Yes.

Ian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21 13:38                 ` Hans-Peter Nilsson
@ 2004-05-21 15:02                   ` Ian Lance Taylor
  0 siblings, 0 replies; 29+ messages in thread
From: Ian Lance Taylor @ 2004-05-21 15:02 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: overseers

Hans-Peter Nilsson <hp@bitrange.com> writes:

> On Fri, 21 May 2004, Ian Lance Taylor wrote:
> > dberlin.org just happened to be the first failing address which I
> > tried.  There are other addresses which appear to be failing, such as
> > axis.se and ecoscentric.com.
> 
> Could you please give some pointers on the axis.se failures?

The axis.se case appears to fail in yet another way.  Actually, I
guess none of the cases fail in precisely the same way, although there
is clearly a relationship.  Very odd indeed.

With axis.se, the sequence goes like this:

sourceware> telnet miranda-uunet.se.axis.com smtp
Trying 212.209.10.220...
Connected to miranda-uunet.se.axis.com.
Escape character is '^]'.
220 miranda.se.axis.com ESMTP Sendmail 8.12.9/8.12.9/Debian-5local0.1; Fri, 21 May 2004 15:35:32 +0200; (No UCE/UBE) logging access from: sourceware.org(OK)-sourceware.org [12.107.209.250]
helo sources.redhat.com
250 miranda.se.axis.com Hello sourceware.org [12.107.209.250], pleased to meet you
mail from: <ian@airs.com>
250 2.1.0 <ian@airs.com>... Sender ok
rcpt to: <hp@axis.se>

At this point the connection hangs.

When I try this from my home system, the rcpt to line gets an
immediate response:

250 2.1.5 <hp@axis.se>... Recipient ok

There are 381 e-mail messages queued up for axis.se.

Ian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21 12:30                     ` Ian Lance Taylor
  2004-05-21 12:58                       ` Angela Marie Thomas
@ 2004-05-21 16:06                       ` Matthew Galgoci
  2004-05-22  6:32                         ` Matthew Galgoci
  1 sibling, 1 reply; 29+ messages in thread
From: Matthew Galgoci @ 2004-05-21 16:06 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Jonathan Larmour, overseers

> I don't think it could be something as simple as that, or we wouldn't
> see consistent failures for particular sites, while most sites appear
> to consistently succeed.
> 
> But it does pretty clear seem to be some sort of misbehaving
> firewall.  Matthew, can you tell us anything about the networking
> environment around sourceware?

As I said before, we are currently experiencing 'weird' network problems that as
of yet, we have no resolution to. Our network engineers are working on the problem,
but be aware that sourceware is not the only thing affected. We are seeing this
weirdness internally. We do currently have a p1 support case open with the network
vendor.

I will update you all when I know more.

-- 
Matthew Galgoci
System Administrator and Sr. Manager of Ruminants
Red Hat, Inc
919.754.3700 x44155

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-21 16:06                       ` Matthew Galgoci
@ 2004-05-22  6:32                         ` Matthew Galgoci
  2004-05-23 23:37                           ` Jonathan Larmour
  0 siblings, 1 reply; 29+ messages in thread
From: Matthew Galgoci @ 2004-05-22  6:32 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Jonathan Larmour, overseers

> As I said before, we are currently experiencing 'weird' network problems that as
> of yet, we have no resolution to. Our network engineers are working on the problem,
> but be aware that sourceware is not the only thing affected. We are seeing this
> weirdness internally. We do currently have a p1 support case open with the network
> vendor.
> 
> I will update you all when I know more.

Ok, I now know more.

We've tracked the problem that sourceware (and a number of other sites) are seeing
down to what we believe is a failing ds3 network transciever that is randomly corrupting
packets.

Despite our best efforts, it will probably be monday, EDT, before the problem is resolved,
though I remain hopeful that we might get a replacement before then. In the mean time, 
networking in and around sourceware.org will be sketchy. It sucks and it is a highly
suboptimal situation, I know.

In the mean time, please be patient and bear with us as we deal with the current situation.

Thanks for understanding,

Matthew Galgoci

-- 
Matthew Galgoci
System Administrator and Sr. Manager of Ruminants
Red Hat, Inc
919.754.3700 x44155

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-22  6:32                         ` Matthew Galgoci
@ 2004-05-23 23:37                           ` Jonathan Larmour
  2004-05-24  6:05                             ` Angela Marie Thomas
  2004-05-24  6:08                             ` Alexandre Oliva
  0 siblings, 2 replies; 29+ messages in thread
From: Jonathan Larmour @ 2004-05-23 23:37 UTC (permalink / raw)
  To: Matthew Galgoci; +Cc: Ian Lance Taylor, overseers

Matthew Galgoci wrote:
>>As I said before, we are currently experiencing 'weird' network problems that as
>>of yet, we have no resolution to. Our network engineers are working on the problem,
>>but be aware that sourceware is not the only thing affected. We are seeing this
>>weirdness internally. We do currently have a p1 support case open with the network
>>vendor.
>>
>>I will update you all when I know more.
> 
> 
> Ok, I now know more.
> 
> We've tracked the problem that sourceware (and a number of other sites) are seeing
> down to what we believe is a failing ds3 network transciever that is randomly corrupting
> packets.

Hmm... this doesn't entirely ring true with the symptoms: connections to 
smtp.ecoscentric.com had been consistently failing with the exact same 
failure mode, but only for certain machines and not others. It looks to be 
deterministic corruption at best. But earlier at aroudn 20:40 EDT it did 
start working for smtp.ecoscentric.com, and still is, and mail has been 
getting through. I don't know what to make of that.

Jifl
-- 
eCosCentric    http://www.eCosCentric.com/    The eCos and RedBoot experts
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-23 23:37                           ` Jonathan Larmour
@ 2004-05-24  6:05                             ` Angela Marie Thomas
  2004-05-24  6:08                             ` Alexandre Oliva
  1 sibling, 0 replies; 29+ messages in thread
From: Angela Marie Thomas @ 2004-05-24  6:05 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Matthew Galgoci, Ian Lance Taylor, overseers


> Hmm... this doesn't entirely ring true with the symptoms: connections to 
> smtp.ecoscentric.com had been consistently failing with the exact same 
> failure mode, but only for certain machines and not others. It looks to be 
> deterministic corruption at best. But earlier at aroudn 20:40 EDT it did 
> start working for smtp.ecoscentric.com, and still is, and mail has been 
> getting through. I don't know what to make of that.

FWIW, mail seems to be backing up again.

--Angela

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Mail is way behind
  2004-05-23 23:37                           ` Jonathan Larmour
  2004-05-24  6:05                             ` Angela Marie Thomas
@ 2004-05-24  6:08                             ` Alexandre Oliva
  1 sibling, 0 replies; 29+ messages in thread
From: Alexandre Oliva @ 2004-05-24  6:08 UTC (permalink / raw)
  To: Jonathan Larmour; +Cc: Matthew Galgoci, Ian Lance Taylor, overseers

On May 22, 2004, Jonathan Larmour <jifl@eCosCentric.com> wrote:

> 20:40 EDT it did start working for smtp.ecoscentric.com, and still is,
> and mail has been getting through. I don't know what to make of that.

Networking issues appear to be fixed for me as well, and I had several
different very reliable ways to trigger the problem, but none of them
fail any more.

-- 
Alexandre Oliva             http://www.ic.unicamp.br/~oliva/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2004-05-22  6:32 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-05-20 21:12 Mail is way behind Ian Lance Taylor
2004-05-20 18:31 ` Christopher Faylor
2004-05-20 21:14   ` Christopher Faylor
2004-05-20 22:05 ` Matthew Galgoci
2004-05-21  0:57   ` Frank Ch. Eigler
2004-05-21  2:07     ` Ian Lance Taylor
2004-05-21  3:29       ` Christopher Faylor
2004-05-21  3:46         ` Matthew Galgoci
2004-05-21  4:08           ` Ian Lance Taylor
2004-05-21  4:19             ` Jonathan Larmour
2004-05-21  4:28               ` Ian Lance Taylor
2004-05-21  4:48                 ` Ian Lance Taylor
2004-05-21  6:07                   ` Jonathan Larmour
2004-05-21  8:28                     ` Jonathan Larmour
2004-05-21 12:30                     ` Ian Lance Taylor
2004-05-21 12:58                       ` Angela Marie Thomas
2004-05-21 14:33                         ` Ian Lance Taylor
2004-05-21 16:06                       ` Matthew Galgoci
2004-05-22  6:32                         ` Matthew Galgoci
2004-05-23 23:37                           ` Jonathan Larmour
2004-05-24  6:05                             ` Angela Marie Thomas
2004-05-24  6:08                             ` Alexandre Oliva
2004-05-21 14:28                     ` Frank Ch. Eigler
2004-05-21 13:38                 ` Hans-Peter Nilsson
2004-05-21 15:02                   ` Ian Lance Taylor
2004-05-21  4:30               ` Andrew Pinski
2004-05-21  5:08                 ` Zack Weinberg
2004-05-21  4:20             ` Christopher Faylor
2004-05-21  4:52             ` Frank Ch. Eigler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).