420 Too Many Open Files

I'm running Apache 2.2 under OS X Server 10.5.2. In the wake of Security Update 2008-002, Apache refuses to run, with the following log entries:

[Wed Mar 19 17:01:16 2008] [notice] Graceful restart requested, doing restart
[Wed Mar 19 17:01:16 2008] [error] (9)Bad file descriptor: apr pollsetpoll: (listen)
piped logspawn: unable to setup child process '/usr/sbin/rotatelogs "/Services/Laughing Boot/Sites/Logs/defaultsite/accesslog" 604800 -420': Too many open files
Unable to open logs
[Wed Mar 19 17:01:20 2008] [warn] pid file /var/run/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
[Wed Mar 19 17:01:20 2008] [error] (24)Too many open files: could not create /var/run/httpd.pid
[Wed Mar 19 17:01:20 2008] [error] httpd: could not log pid to file /var/run/httpd.pid

The last three lines repeat, ad infinitum.

I have forty-some sites, and if I disable all but 33 of them, the problem vanishes. Enable 34, and the above fit ensues.

As you can no doubt tell, this all a bit over my head. I'm pounding my head against the espresso machine and shortly will begin moaning about how God hates me.

Anyone care to at least point me in the right direction?

Thanks,
Bryan

G5 DP 2.5 ghz, etc., Mac OS X (10.5.2), OS X Server 10.5.2

Posted on Mar 19, 2008 5:11 PM

Reply
21 replies

Mar 19, 2008 11:33 PM in response to R Bryan Harrison

How many files does it need to open?

On my 10.5.2 user

sysctl -a | grep files


kern.maxfiles = 12288
kern.maxfilesperproc = 10240
kern.maxfiles: 12288
kern.maxfilesperproc: 10240


I don't know for sure, but maybe try increase those figures?

sudo sysctl -w kern.maxfilesperproc=65536 (or some other lower figure)

This change is temporary. You could set it in a file /etc/sysctl.conf that is read at boot.

Just add the line:

kern.maxfilesperproc=65536

Mar 27, 2008 2:16 PM in response to R Bryan Harrison

We're seeing the same problem.

Here's a sample from /var/log/httpd/error_log:

piped logspawn: unable to setup child process '/usr/sbin/rotatelogs "/private/var/log/httpd/www janscarton_com/accesslog" 86400 -300': Too many open files
Unable to open logs

Here's a sample from the console:

3/27/08 2:17:09 PM com.apple.launchd[1] (org.apache.httpd[6748]) Stray process with PGID equal to this dead job: PID 6844 PPID 1 rotatelogs

The two errors seem to be related.

We're generating separate access and error log files for about 75 sites.

Upping maxfilesperproc from 10240 to 65536 in sysctl didn't affect the problem.

This command:

sudo lsof | wc -l

revealed that only 1236 files were open anyway. Far from the per process limit.

Disabling log rotation in the site settings via Server Admin caused the problem to abate, but that's not a workable solution.

May 5, 2008 4:27 AM in response to Jans Carton

Yeah I got this problem too, I have to turn off my "busy" sites just to run the server. In my Activity Monitor program, while browsing the current processes in order of CUP % the Directory Process takes up all my remaining CPU power. Stopping the web service seems to clear the problem, and on restarting the web service allows for good running for about 15 mins then the web services die again. Stopping and starting the web server also takes about 10 mins. I find it fast to reboot "shutdown -r now" using terminal.

I am wasting so muck time on this problem. Any feed back would be appreciated.

System Log file:
Apr 30 12:50:38 ns1 DirectoryService25: CFPropertyListCreateFromXMLData(): Old-style plist parser: missing semicolon in dictionary.
Apr 30 12:50:42: --- last message repeated 3 times ---
Apr 30 12:50:42 ns1 servermgrd50: * -WPSession createRecordWithType:attributes:inNode:error:: * Warning: the record of type dsRecTypeStandard:Config was created successfully, but OpenDirectoryFramework cannot find it. More info: recordName 'ServicesInformation11111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 1111111' in /Local/Default

May 5, 2008 6:17 AM in response to Jans Carton

Ok, I may have the answer (for my case anyway). After loads of searching I couldnt find the file used by the DirectoryService program that was causing my problems. I did finally find the error log used by Directory Service. located: /Library/Logs/DirectoryService/DirectoryService.error.log if you issue the following command:
tail -50 /Library/Logs/DirectoryService/DirectoryService.error.log

You should get an out put with some minor errors or nothing at all. In my case I got the following:

2008-05-05 22:49:03 EST - T[0xF0103000] - CDSLocalPluginNode::FlushRecord(): fopen() returned NULL while attempting to open 'w' "/var/db/dslocal/nodes/Default/config/ServicesInformation1111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111.plist.temp"
2008-05-05 22:49:03 EST - T[0xF0103000] - CDSLocalPluginNode::FlushRecord(): fopen() returned NULL while attempting to open 'w' "/var/db/dslocal/nodes/Default/config/ServicesInformation1111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111.plist.temp"
2008-05-05 22:49:05 EST - T[0xF0103000] - CDSLocalPluginNode::FlushRecord(): fopen() returned NULL while attempting to open 'w' "/var/db/dslocal/nodes/Default/config/ServicesInformation1111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111.plist.temp"
2008-05-05 22:49:05 EST - T[0xF0103000] - CDSLocalPluginNode::FlushRecord(): fopen() returned NULL while attempting to open 'w' "/var/db/dslocal/nodes/Default/config/ServicesInformation1111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111.plist.temp"
2008-05-05 22:50:02 EST - T[0xF0103000] - CDSLocalPluginNode::FlushRecord(): fopen() returned NULL while attempting to open 'w' "/var/db/dslocal/nodes/Default/config/ServicesInformation1111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111.plist.temp"
2008-05-05 22:50:02 EST - T[0xF0103000] - CDSLocalPluginNode::FlushRecord(): fopen() returned NULL while attempting to open 'w' "/var/db/dslocal/nodes/Default/config/ServicesInformation1111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111.plist.temp"
2008-05-05 22:50:07 EST - T[0xF0103000] - CDSLocalPluginNode::FlushRecord(): fopen() returned NULL while attempting to open 'w' "/var/db/dslocal/nodes/Default/config/ServicesInformation1111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111.plist.temp"
2008-05-05 22:50:07 EST - T[0xF0103000] - CDSLocalPluginNode::FlushRecord(): fopen() returned NULL while attempting to open 'w' "/var/db/dslocal/nodes/Default/config/ServicesInformation1111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111.plist.temp"
2008-05-05 22:50:11 EST - T[0xF0103000] - CDSLocalPluginNode::FlushRecord(): fopen() returned NULL while attempting to open 'w' "/var/db/dslocal/nodes/Default/config/ServicesInformation1111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111.plist.temp"
2008-05-05 22:50:11 EST - T[0xF0103000] - CDSLocalPluginNode::FlushRecord(): fopen() returned NULL while attempting to open 'w' "/var/db/dslocal/nodes/Default/config/ServicesInformation1111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111.plist.temp"

This error was causing Directory Service to use all my CPU power up and finally kill other services like web and mail. I moved all the files in folder /var/db/dslocal/nodes/Default/config/ using this comand:

rm /var/db/dslocal/nodes/Default/config/ServicesInformation* /Desktop/Backup

I did a restart and now my CPU activity is back around 15-20%. I assumed that Directory Service would rebuild the files if it needed them. I am wondering if I have killed my server? But its now working as required and much faster.

Any feedback would be appreciated.

Cheers,
Hamish.

May 23, 2008 6:43 PM in response to R Bryan Harrison

Does anyone have any solution to this problem yet? If you do, please post here as the 'too many open files' problem is crippling our web services!

I just can't believe how many problems we are having since upgrading to Leopard! Both Client and Server software would have to be the most unstable and buggy software from Apple we I have ever used - and I've been using Apple since the Apple II days! Very disappointing!

Just to confirm what we have tried as well (none have been successful):
- Upping ulimit
- Upping maxfiles/maxfilesperproc
- Switching off 'Persistent Connections' seems to slow the problem for a few minutes
- Increasing Max Persistent Connections
- Decreasing Persistent Connection timeout seems to slow the problem a few minutes
- Switching off all 'cache' modules in Apache2 slows the problem for a few minutes
- Increasing the Maximum Simultaneous Connections
- Increasing (or decreasing) the Maximum Spare Servers

The '_www' process has ~1600 files open when the problem occurs

Anyone with any help, please let us know what you did? Due to the time and changes we have implemented, it is almost impossible to go back to 10.3 Server (which was 100% more stable!)

Jun 18, 2008 2:07 PM in response to ILBT

I had a related problem.

Background: 10.4 server running fine for a few years. Migrated to 10.5.0 in November. At no point in time would it allow the Apache migration from 1.3 to 2.x. The 10.4 server had been used strictly via the GUI -- never a CLI alteration to the httpd or any other config. Yet Apple's scripting could not perform the upgrade for Apache versioning. 10.5.1 and 10.5.2 releases behaved the same way. Upon 10.5.3 update, the migration tool magically worked (this past weekend) and the server has been functional. But there are similar errors in my logs. I didn't find them until stumbling across a particularly nasty one today.

[Fri Jun 13 07:45:30 2008] [warn] No JkShmFile defined in httpd.conf. Using default /usr/logs/jk-runtime-status
[Fri Jun 13 07:45:31 2008] [warn] The Alias directive in /etc/apache2/sites/0020 any_80forums.domain1.com.conf at line 56 will probably never match because it overlaps an earlier Alias.
[Fri Jun 13 07:45:31 2008] [warn] The Alias directive in /etc/apache2/sites/0052 any_80domain2.com.conf at line 58 will probably never match because it overlaps an earlier Alias.
[Fri Jun 13 07:45:31 2008] [warn] worker http://127.0.0.1:8087/groups already used by another worker
[Fri Jun 13 07:45:31 2008] [warn] The Alias directive in /etc/apache2/httpd teamsrequired.conf at line 41 will probably never match because it overlaps an earlier Alias.
[Fri Jun 13 07:45:31 2008] [warn] worker http://127.0.0.1:8086 already used by another worker
[Fri Jun 13 07:45:31 2008] [warn] worker http://127.0.0.1:8087/collaboration-config already used by another worker
[Fri Jun 13 07:45:31 2008] [warn] worker http://127.0.0.1:8087/wikihelp already used by another worker
[Fri Jun 13 07:45:31 2008] [warn] worker http://127.0.0.1:8087/groups already used by another worker
[Fri Jun 13 07:45:31 2008] [warn] The Alias directive in /etc/apache2/httpd teamsrequired.conf at line 41 will probably never match because it overlaps an earlier Alias.
[Fri Jun 13 07:45:31 2008] [warn] worker http://127.0.0.1:8086 already used by another worker
[Fri Jun 13 07:45:31 2008] [warn] worker http://127.0.0.1:8087/collaboration-config already used by another worker
[Fri Jun 13 07:45:31 2008] [warn] worker http://127.0.0.1:8087/wikihelp already used by another worker
[Fri Jun 13 07:45:31 2008] [warn] The Alias directive in /etc/apache2/sites/0070 any_80x.domain3.com.conf at line 56 will probably never match because it overlaps an earlier Alias.
[Fri Jun 13 07:45:31 2008] [warn] The Alias directive in /etc/apache2/sites/0072 any_80y.domain4.comt.conf at line 56 will probably never match because it overlaps an earlier Alias.
[Fri Jun 13 07:45:32 2008] [warn] No JkShmFile defined in httpd.conf. Using default /usr/logs/jk-runtime-status
[Fri Jun 13 07:45:32 2008] [notice] mod scgipubsub: generating secret for digest authentication ...
[Fri Jun 13 07:45:32 2008] [notice] mod scgipubsub: done
[Fri Jun 13 07:45:32 2008] [notice] Apache/2.2.8 (Unix) PHP/5.2.5 mod_ssl/2.2.8 OpenSSL/0.9.7l DAV/2 mod_jk/1.2.23 mod scgipubsub/1.11-pubsub configured -- resuming normal operations
[Fri Jun 13 07:45:32 2008] [crit] (24)Too many open files: mod_rewrite: could not init rewrite log lock in child
(and this last line is repeated five more times)
[Fri Jun 13 07:51:54 2008] [warn] (13)Permission denied: connect to listener on [::]:80
[Fri Jun 13 07:52:19 2008] [notice] caught SIGTERM, shutting down

Today, I had Apache totally quit on me -- completely go deaf, 'though the rest of the server (listserve, ftp) would function. Mine might be a special case. Here's some detail I've gleaned from the logs.

What is of interest here is that only a few of the domains (from the 80+) that migrated from 1.3 to 2.2.8 are listed above in the errors. In reviewing the logs and comparing them to the GUI settings, the differences between settings for domains that are not reported in error and those that are fall within these tab options:

Web:Sites:Logging
Web:Sites:Aliases section for URL Aliases and Redirects

Those also happen to be the four most recently generated Site entries. If the new site was enabled, the server would crash and fail. If the site was then disabled (enable checkbox cleared), the server would start and run, serving sites properly.

So something within those few default GUI settings (not matching the bulk of the sites that did work) may have been the cause -- or there are elements in the .conf files for each site that are not reflected in the GUI that differ. After some troubleshooting, I narrowed it down to the logging.

If one of the 'bad' sites was changed to disable the 'Enable Access Log' by unchecking it, saving this change, then enabling the site, the server would restart gracefully.

Next, if the 'bad' site adopted the same custom logging format string, was set to the same log roll (archiving) time frame as the others, and enabled, Apache2 would gracefully restart. Apache2 would only fail IF the logging for one site did not match the format for all of the others.

I'm sure there are other causes for concern, but this is how I solved my issue with Apache crashing when a new site is added to the server.

Jun 19, 2008 9:57 AM in response to Mabel O'Farrell

Perhaps he is using the GUI admin, and that it is poorly crafted to support what you dictate. There is nothing wrong with using the VH conf file structures -- Apache should be able to concatenate that without issue. What is the problem is how Apple has permitted the config of the logging. In a robust hosting environment, individual logging (and storing to a client-accessible directory for individual analysis) is not unusual at all. It should be expected -- and it should not die with only 30-some hosted sites. 800 or 1000, maybe, but 30? Unless he has other issues with his installation.

What Apple should do is rewrite their admin (and their documentation) to either disable this or make it impossible to split out the logs when this 'file threshold' is approached, forcing the server to use one log format for all domains (or one common log and a common error log).

Apple used to preach that it's the GUI that should do the work, and the underlying code should not get in the way of the user experience or the user's productivity. They have lost their way badly in the OSXS product, broken it in Leopard more than fixed it (relative to Tiger) with regard to Internet services, and they really need to recraft 10.5 to work properly. Not wait for 10.6 -- fix 10.5. Include the obvious management requirements:

If you are going to have email services, they should support VH accounts and domains properly. Tiger worked, why doesn't Leopard?

If you are going to support listserves, have them support VH. CompanyB doesn't want to have to ride on CompanyA's name in a listserve environment. The current Mailman can do it, but Apple has 1 yr old technology under the hood that hasn't kept up.

If you are going to use PHP5 (or PHP4), then simple admin stuff like max file size management should be a given in the GUI, not a .ini file to be hacked out in an invisible directory via upload maxfilesize = . This has gone begging for nearly 4 years now.

If you are going to support multiple accounts via the WGM, there should be a GUI method to point additional account to an existing home, or subdirectories within a home/Sites/ folder to allow fine-tuned FTP access. Many companies have different divisions work different subdirectories in a website, and one master account doesn't cut it.

There's a whole lot more that is a mess in this product. It is NOT acceptable to say one needs to CLI to make it work. Why bother to sell a server product if the minute it is deployed, the admin has to be abandoned to get any functionality? I invite any viewer here to read the Apple Human Interface Guidelines from the 80s or early 90s. That philosophy is what should drive modern computing -- not a cobbled together half-functional admin GUI.

Sorry to seem *****, but there needs to be a wakeup call to Apple to get their server dev group in order (or let them do their work and stop pulling those resources over to the i-product of the moment). Jobs: It's the user experience that matters. Even here.

Jun 19, 2008 7:47 PM in response to R Bryan Harrison

Just following up my previous posts and the solution we found to work with the 'Too Many Open Files' error, maybe it will help others as well.

The problem seems to be in the way Apple has changed the files/proc settings for Leopard. Using the 'ulimit' and 'sysctl' options to adjust the settings no longer works.

The option you have to use is the 'launchctl limit' command, which also adjusts both the 'ulimit' and the 'sysctl' settings automatically.

To check your existing settings, you can use the command 'launchctl limit', this shows that the server is ridiculously set at a very low maxfiles of 256 files per user/process.

Three columns will be returned, the first column displayed will be the name of the limit setting, you are looking for the 'maxproc' and/or 'maxfiles' settings towards the bottom of the list.

To increase the maxfiles you can use the command 'launchctl limit maxfiles [user max] [system max]', where [user max] is the maximum files you want to allow for each user/process and likewise for [system max].

An example would be (running as super user): launchctl limit maxfiles 2048 unlimited

We first increased this to 2048 unlimited as Apple states in the manual that the Server Admin app is supposed to change the soft process limit from 100 (the default) to 2048 when starting Apache, obviously this is not happening and the default is set to 256, so the manual is out of date or wrong. We did however still find infrequent 'too many open files' errors in apache logs so again increased the limit, this time to 4096 unlimited and have not had a problem since.

The same applies to the maxproc setting if you need additional processes.

You can verify the changes by using the 'ulimit -a' and/or 'sysctl -a | grep files' commands.

To make the changes stick through a restart, you need to add a file (as it doesn't exist) by using something like 'sudo pico /etc/launchd.conf' and add the changes one per line, so in our case, our file looks like this:

limit maxfiles 4096 unlimited

Remember you do not need to add 'launchctl' in front of the commands in this file.

Hope this helps others with this problem, we have not upgraded to 10.5.3 yet, now that things are working in 10.5.2 we will have to setup a test environment before applying anymore potential fuel to the fire!

Jun 24, 2008 12:35 AM in response to Paul Vail

@Paul Vail: To make the changes stick through a restart, you need to add a file (as it doesn't exist) by using something like 'sudo pico /etc/launchd.conf' and add the changes one per line...

It resides in /etc directory and Apache doesn't load it or need any adjustments as it is a system limitation causing the problem and once you've made the change/s Apache can run free like the wind 😉

Jun 24, 2008 6:57 PM in response to ILBT

Using /etc/launchd.conf to change launchd limits will change things for all processes, which some might consider overkill. For those who just want to raise the limit for a single service started by launchd, try modifying the launchd plist file (eg. /System/Library/LaunchDaemon/org.apache.httpd.plist) to set the soft and hard limits. The com.apple.metadata.mds.plist gives an example of how to increase the NumberOfFiles limit to 1024.

Jun 26, 2008 4:55 AM in response to Dean Huxley

Thanks for the tip Dean Huxley 🙂 There is a typo in the location of your post, it should be /System/Library/LaunchDaemonS/org.apache.httpd.plist (there is an 's' on the end of LaunchDaemons).

The only concern we have with this solution is that files in /System/Library/LaunchDaemons are installed by Apple and can/will be overwritten on updates/upgrades?

Is there another solution to avoid this happening and only raise the limits for a single service?

Jun 26, 2008 6:31 PM in response to ILBT

Well, in theory, you might be able to copy the plist file to /Library/LaunchDaemons and disable the System one, but I suspect you wouldn't be able to use serveradmin to start and stop apache, etc. (I haven't tried it).

I really don't find it that hard to overwrite/patch the plist file after an upgrade. It probably wouldn't be too hard to write a script to do it. I normally just do a 'diff -r' with the LaunchDaemons directory compared against a time machine backup to see if anything changed after an update, then I put in my own changes if needed.

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

420 Too Many Open Files

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.