Skip navigation

Weird DNS issues with 10.6.7 Server

1543 Views 13 Replies Latest reply: Aug 9, 2011 1:32 AM by Kooorrg RSS
Zunito Calculating status...
Currently Being Moderated
Aug 2, 2011 10:59 AM

Hi, been having a problem with my DNS server, clients are able to resolve some hosts, but not others.

 

For example, If I run a dig search on one of the affected hosts I get:

 

dig www.yahoo.com

; <<>> DiG 9.6.0-APPLE-P2 <<>> www.yahoo.com

;; global options: +cmd

;; connection timed out; no servers could be reached

 

But running tcpdump at the same time, I can see that the server is responding, but the client is somehow not receiving it.

 

tcpdump -tttt -n -s 1500 -i en0 udp port 53

 

2011-08-02 12:48:35.749381 IP 192.168.2.108.58221 > 192.168.2.1.53: 46224+ A? www.yahoo.com. (31)

2011-08-02 12:48:35.873716 IP 192.168.2.1.53 > 192.168.2.108.58221: 46224 4/2/0 CNAME fp3.wg1.b.yahoo.com., CNAME any-fp3-lfb.wa1.b.yahoo.com., CNAME any-fp3-real.wa1.b.yahoo.com., A 209.191.122.70 (164)

2011-08-02 12:48:40.749482 IP 192.168.2.108.58221 > 192.168.2.1.53: 46224+ A? www.yahoo.com. (31)

2011-08-02 12:48:40.750136 IP 192.168.2.1.53 > 192.168.2.108.58221: 46224 4/2/0 CNAME fp3.wg1.b.yahoo.com., CNAME any-fp3-lfb.wa1.b.yahoo.com., CNAME any-fp3-real.wa1.b.yahoo.com., A 209.191.122.70 (164)

2011-08-02 12:48:45.749678 IP 192.168.2.108.58221 > 192.168.2.1.53: 46224+ A? www.yahoo.com. (31)

2011-08-02 12:48:45.750116 IP 192.168.2.1.53 > 192.168.2.108.58221: 46224 4/2/0 CNAME fp3.wg1.b.yahoo.com., CNAME any-fp3-lfb.wa1.b.yahoo.com., CNAME any-fp3-real.wa1.b.yahoo.com., A 209.191.122.70 (164)

 

And if I try to resolve to another DNS server I do get an answer

 

dig www.yahoo.com @8.8.8.8

 

; <<>> DiG 9.6.0-APPLE-P2 <<>> www.yahoo.com @8.8.8.8

;; global options: +cmd

;; Got answer:

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49432

;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 0

 

 

;; QUESTION SECTION:

;www.yahoo.com.                              IN          A

 

 

;; ANSWER SECTION:

www.yahoo.com.                    280          IN          CNAME          fp3.wg1.b.yahoo.com.

fp3.wg1.b.yahoo.com.          41          IN          CNAME          any-fp3-lfb.wa1.b.yahoo.com.

any-fp3-lfb.wa1.b.yahoo.com. 281 IN          CNAME          any-fp3-real.wa1.b.yahoo.com.

any-fp3-real.wa1.b.yahoo.com. 41 IN          A          69.147.125.65

any-fp3-real.wa1.b.yahoo.com. 41 IN          A          209.191.122.70

any-fp3-real.wa1.b.yahoo.com. 41 IN          A          67.195.160.76

 

 

;; Query time: 46 msec

;; SERVER: 8.8.8.8#53(8.8.8.8)

;; WHEN: Tue Aug  2 12:51:40 2011

;; MSG SIZE  rcvd: 160

 

tcpdump -tttt -n -s 1500 -i en0 udp port 53

 

2011-08-02 12:51:40.377033 IP 192.168.2.108.60998 > 8.8.8.8.53: 49432+ A? www.yahoo.com. (31)

2011-08-02 12:51:40.423394 IP 8.8.8.8.53 > 192.168.2.108.60998: 49432 6/0/0 CNAME fp3.wg1.b.yahoo.com., CNAME any-fp3-lfb.wa1.b.yahoo.com., CNAME any-fp3-real.wa1.b.yahoo.com., A 69.147.125.65, A 209.191.122.70, A 67.195.160.76 (160)

 

When I run dig with a host that resolves I get the following:

 

dig www.google.com

 

 

; <<>> DiG 9.6.0-APPLE-P2 <<>> www.google.com

;; global options: +cmd

;; Got answer:

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28956

;; flags: qr rd ra; QUERY: 1, ANSWER: 7, AUTHORITY: 4, ADDITIONAL: 0

 

 

;; QUESTION SECTION:

;www.google.com.                              IN          A

 

 

;; ANSWER SECTION:

www.google.com.                    535755          IN          CNAME          www.l.google.com.

www.l.google.com.          65          IN          A          74.125.47.106

www.l.google.com.          65          IN          A          74.125.47.147

www.l.google.com.          65          IN          A          74.125.47.99

www.l.google.com.          65          IN          A          74.125.47.103

www.l.google.com.          65          IN          A          74.125.47.104

www.l.google.com.          65          IN          A          74.125.47.105

 

 

;; AUTHORITY SECTION:

google.com.                    103690          IN          NS          ns4.google.com.

google.com.                    103690          IN          NS          ns1.google.com.

google.com.                    103690          IN          NS          ns2.google.com.

google.com.                    103690          IN          NS          ns3.google.com.

 

 

;; Query time: 1 msec

;; SERVER: 192.168.2.1#53(192.168.2.1)

;; WHEN: Tue Aug  2 12:53:40 2011

;; MSG SIZE  rcvd: 220

 

and while running tcpdump -tttt -n -s 1500 -i en0 udp port 53 i get:

 

2011-08-02 12:53:40.434780 IP 192.168.2.108.58569 > 192.168.2.1.53: 28956+ A? www.google.com. (32)

2011-08-02 12:53:40.435239 IP 192.168.2.1.53 > 192.168.2.108.58569: 28956 7/4/0 CNAME www.l.google.com., A 74.125.47.106, A 74.125.47.147, A 74.125.47.99, A 74.125.47.103, A 74.125.47.104, A 74.125.47.105 (220)

 

 

 

Any help would be appreciated, this is driving me crazy.

Intel Xserve, Mac OS X (10.6.7)
  • MrHoffman Level 6 Level 6 (11,710 points)
    Currently Being Moderated
    Aug 2, 2011 11:16 AM (in response to Zunito)

    What's in the system logs?  Anything relevant?

  • MrHoffman Level 6 Level 6 (11,710 points)
    Currently Being Moderated
    Aug 3, 2011 8:25 AM (in response to Zunito)

    Both.  You'll want to look for any clues in the client logs, and also in the server logs.

     

    In addition to dig, also test with ping, as ping (also) looks in the local DNS caches.  dig doesn't.  (As does ping, the dscacheutil command can also query translations from the local DNS caches on the clients, where dig won't.)

     

    And a more general question, has this configuration ever worked?  Or is this a new configuration?

     

    And a question on the clients: do the clients have any DNS servers listed in addition to 192.168.2.1?

     

    Any firewalls blocking traffic?

     

    Are you running the Mac OS X Server box as a gateway, or do you have an external network gateway box?

  • Camelot Level 8 Level 8 (45,670 points)
    Currently Being Moderated
    Aug 3, 2011 10:35 AM (in response to Zunito)

    What's the common link between hostnames that don't resolve vs. those that do?

     

    The yahoo lookup that fails includes a CNAME chain (www.yahoo.com -> CNAME -> CNAME -> IP address) whereas google just uses a single CNAME.

     

    Are other sites that fail using CNAME chains?

     

    My other guess (slightly related) would be packet size - the original DNS spec allowed for replies up to 512 bytes and larger responses (e.g. those CNAME chains) could exceed this and get dropped. That doesn't appear to be the case here since the reply size is only 160 bytes, and the Google response is 220 bytes, but it's worth considering.

  • Kooorrg Calculating status...
    Currently Being Moderated
    Aug 5, 2011 8:35 AM (in response to Zunito)

    I am faced with exactly the same problem: for some hostnames, dig and nslookup will return a "connection timed out; no servers could be reached" message while dsacacheutil would work. I have also confirmed through Wireshark that the DNS server returns a complete answer to the query, which seems to be ignored by dig or nslookup.

     

    Webbrowsing still works but with issues, such as long delays.

     

    Running dig or dsacacheutl on the server itself always work : the problem appears only for clients. Client-wise, I got the same behavior from a Mac OS X 10.6.8 client and from a Ubuntu 10.04 client.

     

    A few more details:

    -- I am running two Mac OS X Servers (in MacMini Servers).

    -- One server is the DNS master while the other serves as a slave and get its zones from the Master.

    -- Both servers are running 10.6.8.

    -- Those servers only serve the internal network.

    -- However, the DNS slave works just fine.

    -- While reviewing the servers configurations, I found some issues, which I fixed thanks to MrHoffman excellent guide. 

    -- Port 53 in TCP and UDP are opened on the firewall, but in any case, the problem is still present when the firewall is completely off.

    -- The problem seems to go away when I restart the service (through ServerAdmin) or when I reload the Bind configuration by changing a setting in Server Admin such as adding or removing a forwarder.

     

    I think that it may be related to the size of some of the components in the reply, as some of the problematic addresses I found (s1.lemde.fr and fr.archive.ubuntu.com) do expand quite a bit.

     

    I am still looking into the issue. Any help would be appreciated.

  • Kooorrg Level 1 Level 1 (10 points)
    Currently Being Moderated
    Aug 5, 2011 12:22 PM (in response to Kooorrg)

    I installed Bind 9.8.0-P4 using the graphical installer from Mice&Men. Pretty much a "drop-in" replacement for the Apple version (9.6.0) !

     

    However, it does not solve the issue. The same symptoms reappear, after about 40 minutes. Still, rebooting the server through Server Admin seems to fix the issue.

     

    In named.log, I did notice those two lines :

    05-Aug-2011 21:01:01.837 received control channel command 'null'

    05-Aug-2011 21:01:01.839 received control channel command 'status'

     

    They seem to match the moment when things start to go wrong, but I haven't paid attention to them before so I'm not sure whether they are significant.

     

    Will look into this issue again next week...

  • Kooorrg Level 1 Level 1 (10 points)
    Currently Being Moderated
    Aug 8, 2011 3:29 AM (in response to Kooorrg)

    Weirdness continues...

     

    If I do a direct search, I receive the "unreacheable" error :

    $ dig fr.archive.ubuntu.com A

     

    ; <<>> DiG 9.6.0-APPLE-P2 <<>> fr.archive.ubuntu.com A

    ;; global options: +cmd

    ;; connection timed out; no servers could be reached

     

    But if I do an ANY search, it works:

    $ dig fr.archive.ubuntu.com ANY

     

     

    ; <<>> DiG 9.6.0-APPLE-P2 <<>> fr.archive.ubuntu.com ANY

    ;; global options: +cmd

    ;; Got answer:

    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54806

    ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 13, ADDITIONAL: 11

     

     

    ;; QUESTION SECTION:

    ;fr.archive.ubuntu.com.                    IN          ANY

     

     

    ;; ANSWER SECTION:

    fr.archive.ubuntu.com.          68          IN          CNAME          ubuntu-archive.mirrors.proxad.net.

     

     

    ;; AUTHORITY SECTION:

    .                              488153          IN          NS          a.root-servers.net.

    .                              488153          IN          NS          i.root-servers.net.

    .                              488153          IN          NS          c.root-servers.net.

    .                              488153          IN          NS          k.root-servers.net.

    .                              488153          IN          NS          f.root-servers.net.

    .                              488153          IN          NS          b.root-servers.net.

    .                              488153          IN          NS          m.root-servers.net.

    .                              488153          IN          NS          e.root-servers.net.

    .                              488153          IN          NS          j.root-servers.net.

    .                              488153          IN          NS          l.root-servers.net.

    .                              488153          IN          NS          d.root-servers.net.

    .                              488153          IN          NS          g.root-servers.net.

    .                              488153          IN          NS          h.root-servers.net.

     

     

    ;; ADDITIONAL SECTION:

    a.root-servers.net.          603914          IN          A          198.41.0.4

    a.root-servers.net.          603914          IN          AAAA          2001:503:ba3e::2:30

    b.root-servers.net.          603914          IN          A          192.228.79.201

    c.root-servers.net.          603914          IN          A          192.33.4.12

    d.root-servers.net.          603914          IN          A          128.8.10.90

    d.root-servers.net.          603914          IN          AAAA          2001:500:2d::d

    e.root-servers.net.          603914          IN          A          192.203.230.10

    f.root-servers.net.          603914          IN          A          192.5.5.241

    f.root-servers.net.          603914          IN          AAAA          2001:500:2f::f

    g.root-servers.net.          603914          IN          A          192.112.36.4

    h.root-servers.net.          603914          IN          A          128.63.2.53

     

     

    ;; Query time: 14 msec

    ;; SERVER: 192.168.10.70#53(192.168.10.70)

    ;; WHEN: Mon Aug  8 12:25:16 2011

    ;; MSG SIZE  rcvd: 506

     

    Following up with a A search on ubuntu-archive.mirrors.proxad.net succeeds as well :

    $ dig ubuntu-archive.mirrors.proxad.net A

     

     

    ; <<>> DiG 9.6.0-APPLE-P2 <<>> ubuntu-archive.mirrors.proxad.net A

    ;; global options: +cmd

    ;; Got answer:

    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56485

    ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 2

     

     

    ;; QUESTION SECTION:

    ;ubuntu-archive.mirrors.proxad.net. IN          A

     

     

    ;; ANSWER SECTION:

    ubuntu-archive.mirrors.proxad.net. 1759          IN A          88.191.250.131

     

     

    ;; AUTHORITY SECTION:

    mirrors.proxad.net.          1759          IN          NS          ns1.proxad.net.

    mirrors.proxad.net.          1759          IN          NS          ns0.proxad.net.

     

     

    ;; ADDITIONAL SECTION:

    ns0.proxad.net.                    61206          IN          A          212.27.32.2

    ns1.proxad.net.                    76113          IN          A          212.27.32.130

     

     

    ;; Query time: 12 msec

    ;; SERVER: 192.168.10.70#53(192.168.10.70)

    ;; WHEN: Mon Aug  8 12:27:17 2011

    ;; MSG SIZE  rcvd: 135

     

    Very weird. BTW, the only network component between me and the server is Netgear Switch (GS724T).

  • Kooorrg Level 1 Level 1 (10 points)
    Currently Being Moderated
    Aug 8, 2011 10:07 AM (in response to Kooorrg)

    Ok, I made a major step forward : the checksum of the UDP packet sent back by my local server is incorrect. I run Wireshark on the server (as well as on the client) and I have validated that the incorrect checksum is not modified by another network element along the way. On my server, this value is actually fixed to 0x9650.

     

    My other Mac OS X Server (same hardware, same OS version) generates correct checksums, so I don't think that the checksum are actually modified by the hardware (i.e. checksum offloading).

     

    This problem explains perfectly why I've seen the same issue with Linux clients. Simply put, in both cases dig sends a query but never get an answer back because the kernel checks the checksum and silently drops the packet. It is also explains why "dig +tcp ..." works just fine.

     

    Now, this problem occurs only for some adresses (ironically, this includes www.apple.com right now).

     

    Does anyone knows who's in charge of generating the UDP checksums ? Is it done by BIND or by the Kernel ?

  • Kooorrg Level 1 Level 1 (10 points)
    Currently Being Moderated
    Aug 8, 2011 11:09 AM (in response to Kooorrg)

    Well, I think that I got the culprit : VirtualBox. I found this bug report that describe exactly the problem we're having : http://www.virtualbox.org/ticket/8395

     

    On my server, www.apple.com started to resolved as soon as I turned off VirtualBox, and came back when I restarted VirtualBox. The other adresses worked too. VirtualBox uses some kernels extensions, I guess that one of them is messing up with some UDP packets. Since I have two DNS servers, I think that I did not notice that the one where I installed VirtualBox was having some troubles.

     

    I'm thinking that this may be related to the virtual network card I'm using. But getting another server just for VirtualBoxes is starting to look like a better option.

     

    @Zunito: do you have it installed on your server as well?

  • Kooorrg Level 1 Level 1 (10 points)
    Currently Being Moderated
    Aug 9, 2011 1:32 AM (in response to Zunito)

    @Zunito: Thank you for checking VirtualBox 4.1. I guess I'll stick with the current version.

     

    Just to be sure, I told my VirtualBox machine to use the Virtuo network card instead of the "virtual" Intel Server card I used before, but it doesn't change anything. I guess the issue is in one of the VirtualBox kernel modules.

Actions

More Like This

  • Retrieving data ...

Bookmarked By (1)

Legend

  • This solved my question - 10 points
  • This helped me - 5 points
This site contains user submitted content, comments and opinions and is for informational purposes only. Apple disclaims any and all liability for the acts, omissions and conduct of any third parties in connection with or related to your use of the site. All postings and use of the content on this site are subject to the Apple Support Communities Terms of Use.