Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Weird DNS issues with 10.6.7 Server

Hi, been having a problem with my DNS server, clients are able to resolve some hosts, but not others.


For example, If I run a dig search on one of the affected hosts I get:


dig www.yahoo.com

; <<>> DiG 9.6.0-APPLE-P2 <<>> www.yahoo.com

;; global options: +cmd

;; connection timed out; no servers could be reached


But running tcpdump at the same time, I can see that the server is responding, but the client is somehow not receiving it.


tcpdump -tttt -n -s 1500 -i en0 udp port 53


2011-08-02 12:48:35.749381 IP 192.168.2.108.58221 > 192.168.2.1.53: 46224+ A? www.yahoo.com. (31)

2011-08-02 12:48:35.873716 IP 192.168.2.1.53 > 192.168.2.108.58221: 46224 4/2/0 CNAME fp3.wg1.b.yahoo.com., CNAME any-fp3-lfb.wa1.b.yahoo.com., CNAME any-fp3-real.wa1.b.yahoo.com., A 209.191.122.70 (164)

2011-08-02 12:48:40.749482 IP 192.168.2.108.58221 > 192.168.2.1.53: 46224+ A? www.yahoo.com. (31)

2011-08-02 12:48:40.750136 IP 192.168.2.1.53 > 192.168.2.108.58221: 46224 4/2/0 CNAME fp3.wg1.b.yahoo.com., CNAME any-fp3-lfb.wa1.b.yahoo.com., CNAME any-fp3-real.wa1.b.yahoo.com., A 209.191.122.70 (164)

2011-08-02 12:48:45.749678 IP 192.168.2.108.58221 > 192.168.2.1.53: 46224+ A? www.yahoo.com. (31)

2011-08-02 12:48:45.750116 IP 192.168.2.1.53 > 192.168.2.108.58221: 46224 4/2/0 CNAME fp3.wg1.b.yahoo.com., CNAME any-fp3-lfb.wa1.b.yahoo.com., CNAME any-fp3-real.wa1.b.yahoo.com., A 209.191.122.70 (164)


And if I try to resolve to another DNS server I do get an answer


dig www.yahoo.com @8.8.8.8


; <<>> DiG 9.6.0-APPLE-P2 <<>> www.yahoo.com @8.8.8.8

;; global options: +cmd

;; Got answer:

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49432

;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 0



;; QUESTION SECTION:

;www.yahoo.com. IN A



;; ANSWER SECTION:

www.yahoo.com. 280 IN CNAME fp3.wg1.b.yahoo.com.

fp3.wg1.b.yahoo.com. 41 IN CNAME any-fp3-lfb.wa1.b.yahoo.com.

any-fp3-lfb.wa1.b.yahoo.com. 281 IN CNAME any-fp3-real.wa1.b.yahoo.com.

any-fp3-real.wa1.b.yahoo.com. 41 IN A 69.147.125.65

any-fp3-real.wa1.b.yahoo.com. 41 IN A 209.191.122.70

any-fp3-real.wa1.b.yahoo.com. 41 IN A 67.195.160.76



;; Query time: 46 msec

;; SERVER: 8.8.8.8#53(8.8.8.8)

;; WHEN: Tue Aug 2 12:51:40 2011

;; MSG SIZE rcvd: 160


tcpdump -tttt -n -s 1500 -i en0 udp port 53


2011-08-02 12:51:40.377033 IP 192.168.2.108.60998 > 8.8.8.8.53: 49432+ A? www.yahoo.com. (31)

2011-08-02 12:51:40.423394 IP 8.8.8.8.53 > 192.168.2.108.60998: 49432 6/0/0 CNAME fp3.wg1.b.yahoo.com., CNAME any-fp3-lfb.wa1.b.yahoo.com., CNAME any-fp3-real.wa1.b.yahoo.com., A 69.147.125.65, A 209.191.122.70, A 67.195.160.76 (160)


When I run dig with a host that resolves I get the following:


dig www.google.com



; <<>> DiG 9.6.0-APPLE-P2 <<>> www.google.com

;; global options: +cmd

;; Got answer:

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28956

;; flags: qr rd ra; QUERY: 1, ANSWER: 7, AUTHORITY: 4, ADDITIONAL: 0



;; QUESTION SECTION:

;www.google.com. IN A



;; ANSWER SECTION:

www.google.com. 535755 IN CNAME www.l.google.com.

www.l.google.com. 65 IN A 74.125.47.106

www.l.google.com. 65 IN A 74.125.47.147

www.l.google.com. 65 IN A 74.125.47.99

www.l.google.com. 65 IN A 74.125.47.103

www.l.google.com. 65 IN A 74.125.47.104

www.l.google.com. 65 IN A 74.125.47.105



;; AUTHORITY SECTION:

google.com. 103690 IN NS ns4.google.com.

google.com. 103690 IN NS ns1.google.com.

google.com. 103690 IN NS ns2.google.com.

google.com. 103690 IN NS ns3.google.com.



;; Query time: 1 msec

;; SERVER: 192.168.2.1#53(192.168.2.1)

;; WHEN: Tue Aug 2 12:53:40 2011

;; MSG SIZE rcvd: 220


and while running tcpdump -tttt -n -s 1500 -i en0 udp port 53 i get:


2011-08-02 12:53:40.434780 IP 192.168.2.108.58569 > 192.168.2.1.53: 28956+ A? www.google.com. (32)

2011-08-02 12:53:40.435239 IP 192.168.2.1.53 > 192.168.2.108.58569: 28956 7/4/0 CNAME www.l.google.com., A 74.125.47.106, A 74.125.47.147, A 74.125.47.99, A 74.125.47.103, A 74.125.47.104, A 74.125.47.105 (220)




Any help would be appreciated, this is driving me crazy.

Intel Xserve, Mac OS X (10.6.7)

Posted on Aug 2, 2011 10:59 AM

Reply
13 replies

Aug 3, 2011 8:25 AM in response to Zunito

Both. You'll want to look for any clues in the client logs, and also in the server logs.


In addition to dig, also test with ping, as ping (also) looks in the local DNS caches. dig doesn't. (As does ping, the dscacheutil command can also query translations from the local DNS caches on the clients, where dig won't.)


And a more general question, has this configuration ever worked? Or is this a new configuration?


And a question on the clients: do the clients have any DNS servers listed in addition to 192.168.2.1?


Any firewalls blocking traffic?


Are you running the Mac OS X Server box as a gateway, or do you have an external network gateway box?

Aug 3, 2011 10:35 AM in response to Zunito

What's the common link between hostnames that don't resolve vs. those that do?


The yahoo lookup that fails includes a CNAME chain (www.yahoo.com -> CNAME -> CNAME -> IP address) whereas google just uses a single CNAME.


Are other sites that fail using CNAME chains?


My other guess (slightly related) would be packet size - the original DNS spec allowed for replies up to 512 bytes and larger responses (e.g. those CNAME chains) could exceed this and get dropped. That doesn't appear to be the case here since the reply size is only 160 bytes, and the Google response is 220 bytes, but it's worth considering.

Aug 5, 2011 8:35 AM in response to Zunito

I am faced with exactly the same problem: for some hostnames, dig and nslookup will return a "connection timed out; no servers could be reached" message while dsacacheutil would work. I have also confirmed through Wireshark that the DNS server returns a complete answer to the query, which seems to be ignored by dig or nslookup.


Webbrowsing still works but with issues, such as long delays.


Running dig or dsacacheutl on the server itself always work : the problem appears only for clients. Client-wise, I got the same behavior from a Mac OS X 10.6.8 client and from a Ubuntu 10.04 client.


A few more details:

-- I am running two Mac OS X Servers (in MacMini Servers).

-- One server is the DNS master while the other serves as a slave and get its zones from the Master.

-- Both servers are running 10.6.8.

-- Those servers only serve the internal network.

-- However, the DNS slave works just fine.

-- While reviewing the servers configurations, I found some issues, which I fixed thanks to MrHoffman excellent guide.

-- Port 53 in TCP and UDP are opened on the firewall, but in any case, the problem is still present when the firewall is completely off.

-- The problem seems to go away when I restart the service (through ServerAdmin) or when I reload the Bind configuration by changing a setting in Server Admin such as adding or removing a forwarder.


I think that it may be related to the size of some of the components in the reply, as some of the problematic addresses I found (s1.lemde.fr and fr.archive.ubuntu.com) do expand quite a bit.


I am still looking into the issue. Any help would be appreciated.

Aug 5, 2011 12:22 PM in response to Kooorrg

I installed Bind 9.8.0-P4 using the graphical installer from Mice&Men. Pretty much a "drop-in" replacement for the Apple version (9.6.0) !


However, it does not solve the issue. The same symptoms reappear, after about 40 minutes. Still, rebooting the server through Server Admin seems to fix the issue.


In named.log, I did notice those two lines :

05-Aug-2011 21:01:01.837 received control channel command 'null'

05-Aug-2011 21:01:01.839 received control channel command 'status'


They seem to match the moment when things start to go wrong, but I haven't paid attention to them before so I'm not sure whether they are significant.


Will look into this issue again next week...

Aug 8, 2011 3:29 AM in response to Kooorrg

Weirdness continues...


If I do a direct search, I receive the "unreacheable" error :

$ dig fr.archive.ubuntu.com A


; <<>> DiG 9.6.0-APPLE-P2 <<>> fr.archive.ubuntu.com A

;; global options: +cmd

;; connection timed out; no servers could be reached


But if I do an ANY search, it works:

$ dig fr.archive.ubuntu.com ANY



; <<>> DiG 9.6.0-APPLE-P2 <<>> fr.archive.ubuntu.com ANY

;; global options: +cmd

;; Got answer:

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54806

;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 13, ADDITIONAL: 11



;; QUESTION SECTION:

;fr.archive.ubuntu.com. IN ANY



;; ANSWER SECTION:

fr.archive.ubuntu.com. 68 IN CNAME ubuntu-archive.mirrors.proxad.net.



;; AUTHORITY SECTION:

. 488153 IN NS a.root-servers.net.

. 488153 IN NS i.root-servers.net.

. 488153 IN NS c.root-servers.net.

. 488153 IN NS k.root-servers.net.

. 488153 IN NS f.root-servers.net.

. 488153 IN NS b.root-servers.net.

. 488153 IN NS m.root-servers.net.

. 488153 IN NS e.root-servers.net.

. 488153 IN NS j.root-servers.net.

. 488153 IN NS l.root-servers.net.

. 488153 IN NS d.root-servers.net.

. 488153 IN NS g.root-servers.net.

. 488153 IN NS h.root-servers.net.



;; ADDITIONAL SECTION:

a.root-servers.net. 603914 IN A 198.41.0.4

a.root-servers.net. 603914 IN AAAA 2001:503:ba3e::2:30

b.root-servers.net. 603914 IN A 192.228.79.201

c.root-servers.net. 603914 IN A 192.33.4.12

d.root-servers.net. 603914 IN A 128.8.10.90

d.root-servers.net. 603914 IN AAAA 2001:500:2d::d

e.root-servers.net. 603914 IN A 192.203.230.10

f.root-servers.net. 603914 IN A 192.5.5.241

f.root-servers.net. 603914 IN AAAA 2001:500:2f::f

g.root-servers.net. 603914 IN A 192.112.36.4

h.root-servers.net. 603914 IN A 128.63.2.53



;; Query time: 14 msec

;; SERVER: 192.168.10.70#53(192.168.10.70)

;; WHEN: Mon Aug 8 12:25:16 2011

;; MSG SIZE rcvd: 506


Following up with a A search on ubuntu-archive.mirrors.proxad.net succeeds as well :

$ dig ubuntu-archive.mirrors.proxad.net A



; <<>> DiG 9.6.0-APPLE-P2 <<>> ubuntu-archive.mirrors.proxad.net A

;; global options: +cmd

;; Got answer:

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56485

;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 2



;; QUESTION SECTION:

;ubuntu-archive.mirrors.proxad.net. IN A



;; ANSWER SECTION:

ubuntu-archive.mirrors.proxad.net. 1759 IN A 88.191.250.131



;; AUTHORITY SECTION:

mirrors.proxad.net. 1759 IN NS ns1.proxad.net.

mirrors.proxad.net. 1759 IN NS ns0.proxad.net.



;; ADDITIONAL SECTION:

ns0.proxad.net. 61206 IN A 212.27.32.2

ns1.proxad.net. 76113 IN A 212.27.32.130



;; Query time: 12 msec

;; SERVER: 192.168.10.70#53(192.168.10.70)

;; WHEN: Mon Aug 8 12:27:17 2011

;; MSG SIZE rcvd: 135


Very weird. BTW, the only network component between me and the server is Netgear Switch (GS724T).

Aug 8, 2011 10:07 AM in response to Kooorrg

Ok, I made a major step forward : the checksum of the UDP packet sent back by my local server is incorrect. I run Wireshark on the server (as well as on the client) and I have validated that the incorrect checksum is not modified by another network element along the way. On my server, this value is actually fixed to 0x9650.


My other Mac OS X Server (same hardware, same OS version) generates correct checksums, so I don't think that the checksum are actually modified by the hardware (i.e. checksum offloading).


This problem explains perfectly why I've seen the same issue with Linux clients. Simply put, in both cases dig sends a query but never get an answer back because the kernel checks the checksum and silently drops the packet. It is also explains why "dig +tcp ..." works just fine.


Now, this problem occurs only for some adresses (ironically, this includes www.apple.com right now).


Does anyone knows who's in charge of generating the UDP checksums ? Is it done by BIND or by the Kernel ?

Aug 8, 2011 11:09 AM in response to Kooorrg

Well, I think that I got the culprit : VirtualBox. I found this bug report that describe exactly the problem we're having : http://www.virtualbox.org/ticket/8395


On my server, www.apple.com started to resolved as soon as I turned off VirtualBox, and came back when I restarted VirtualBox. The other adresses worked too. VirtualBox uses some kernels extensions, I guess that one of them is messing up with some UDP packets. Since I have two DNS servers, I think that I did not notice that the one where I installed VirtualBox was having some troubles.


I'm thinking that this may be related to the virtual network card I'm using. But getting another server just for VirtualBoxes is starting to look like a better option.


@Zunito: do you have it installed on your server as well?

Aug 9, 2011 1:32 AM in response to Zunito

@Zunito: Thank you for checking VirtualBox 4.1. I guess I'll stick with the current version.


Just to be sure, I told my VirtualBox machine to use the Virtuo network card instead of the "virtual" Intel Server card I used before, but it doesn't change anything. I guess the issue is in one of the VirtualBox kernel modules.

Weird DNS issues with 10.6.7 Server

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.