From: Michael Sinatra (michael@rancid.berkeley.edu)
Date: Tue Dec 18 2001 - 18:22:11 PST
Hi,
You have probably been seeing messages from Jerry Berkman regarding
connections from ATTBI.COM (AT&T Broadband cable internet service) to
uclink. You probably also recall that last week, AT&T completely messed
up its DNS service to the point that uclink, socrates, and other campus
departmental mail servers were refusing connections from AT&T internet
customers.
AT&T sort of fixed the problem last week, but there were still may clients
with problems connecting. This week, it appears that they moved DNS for
the client.attbi.com subdomain to new DNS servers (perhaps the main
ATTBI.COM servers were overloaded). Ordinarily, when one does this, one
keeps the old nameservers in production for the client.attbi.com zone for
the period of time it will take for the cache to expire on nameservers
throughout the internet (48 hours is the standard ttl in the .com
top-level domain). Unfortunately, AT&T didn't do this. Instead the
attbi.com nameservers began stubbornly refusing to answer queries in the
client.attbi.com zone, even though many nameservers on the internet still
believed them to be authoritative for that zone. This caused many AT&T
customers to become unable to connect to many campus servers (and indeed
*any* host on the internet that is configured to prevent DNS spoofing).
After working with the uclink admins to trace the source of the increasing
connection failures, I examined the ns1/ns2.berkeley.edu box (to get an
idea of how our nameservers are arranged, please see my award-winning
article at <http://istpub.berkeley.edu:4201/bcc/Winter2000/net.dns.html>)
that was closest to uclink and determined that it was still caching the
old nameserver records and would continue to do so for another 22 hours.
I made the determination at this point to restart the nameserver, which is
a very disruptive process. This shouldn't have been necessary had AT&T
done the right thing with respect to gracefully transitioning their name
service to the new servers. Were it not for the large number of affected
clients, I probably wouldn't have taken this action.
I just checked with both uclink and socrates admins and found that there
have been no connection refusals due to DNS issues since 16:42 today, so
this particular problem appears to be fixed until AT&T decides to screw it
up again. Note that many other campus nameservers still have the old
information; some will have their caches on the AT&T DNS records expire in
the next few hours and will start getting the new information. For the
others, I will probably restart them as the night wears on, after taking
steps to minimize disruption to campus (like going home first and having a
few martinis).
Please note that as a pathological curmudgeon, it gives me no end of
pleasure to be able to clean up someone else's mess so I can complain
about it ad nauseum.
Thanks for all of the help that the uclink folks (esp. Bernie Tower and
Michele Tomkin) provided in troubleshooting this problem.
Thanks and have a nice day,
Michael Sinatra
Mr. Network Outrage
------------------------------------------------------------------------
The following was automatically added to this message by the list server:
For information about Micronet, its meetings and events, and its
mailing list, including information on subscribing and unsubscribing,
see the Micronet Web site at <http://wss.berkeley.edu/micronet/>.
This archive was generated by hypermail 2b29 : Tue Dec 18 2001 - 18:23:09 PST