It sounds like you recorded a live event. In these instances there are a few things to remember.
1. Always record directly from the main mic if possible
A. When not possible, be prepared to do some audio cleaning
Ultimately, you won't remove free radicals like babies crying completely. They are in the vocal range, which is where your desired audio lives as well. It's not like separating strands of hair or untangling a rope. The electrical component you are representing is also representing the audio signal. The waveforms are measured as a complex wave, with on signals at varying amplitudes along multiple waves at the same instant.
Audition has a spectral editor that can help you find the crying audio wave area and blend it out, but you'll also be grabbing several other waveforms. This can be damaging to your audio if you just erase it. The best method is to slowly drop the audio volume in these areas until the crying is at its least intrusive, and it will be a less noticeable change in your overall mix.
Here are methods for other apps.
Simple--
Using a parametric EQ, you can find the partials where the sounds live in your mix and drop them. You'll have to use several EQ plugs, and target low and hi frequencies separately at first. Then I'd write them down and use a single eq for each separate incident of baby cries, and one main eq for the AC unit. Using automation, you could turn them on and off again, or apply them over a wider area so they are less noticeable. It is similar to the Audition method, but slightly less targeted.
Exhaustive methods--
Method 1 (experimental): When you have to use another method to mic, you have to use another method to record. In digital, they say you have virtually infinite headroom. This is true if you consider that your audio can be as electrically powerful as the hardware can handle, but you won't necessarily want to hear it. Digital also has another drawback: Infinite sensitivity. You will capture everything. Mic placement is key, mic choice is also key... ...Capture as low a volume as possible. You should barely register a high whisper electrically; this will ensure that anything else may not even show up in your signal as it isn't powerful enough to trigger enough change in the signal. It's an old analogue technique, for killing transients mostly, but it has great usefulness in the digital world. Got a clue yet? One method I've used for killing this kind of noise is repetitious signal drop and boost. What you want to be able to do is drop the volume fader to a point where those free radicals barely register at all (one tiny bar if possible), mark them and set you cycle range to their spot; now you pass that out through an output interface, and back into an input interface, set it to constantly play back, set the fader in logic at 0, turn on the monitoring (the "i" icon) and you should be able to use your interface as a new "Digimic" style cleaner by carefully reducing signal until the free radical disappears. When the free radical no longer registers, stop playback, go back to your start, and record straight through. Now apply gain filters until you get back your loudness. Now apply a healthy dose of expansive compression (using a compression plug or hardware, set the input boost to a negative value, set the threshold down at your lowest volume area, like silence between words, and set your compression to a fractional value to stretch the dynamic range). Now use your fader to bring the overall volume into a better area. Expansive compression works opposite to compression. The fractional value means: "For every 0.xdb of volume, set it to 1db of volume" This is a multiplier function. At .75, you are 133% of signal, at 0.5, you are at 200% of signal, at .25 you are at 400% of signal range. Because you apply this to the top end of the volume range, other areas of the signal below that range end up farther from audible when you recompress it using a threshold that includes them. You're just dropping them the same amount as the rest, and they end up less audible. Remember, around -40 to -50, you really shouldn't hear anything, so any sounds that are there will be almost unnoticeable. If you can push your radicals, to a level between -50 and -60, nobody will hear them even when they turn up their speakers. Bottom line, you won't remove them entirely, but they won't be intrusive.
Method 2 (when all else fails): Alternatively, you can use a very clean-sounding set of speakers and microphones to bring the audio back in without the other sounds. It isn't perfect, but you can usually apply some EQ to get warmer sounding audio and it cuts out much of the expansive compression. I've used this method once or twice with decent results. It requires you know the spl where your mics react, and the electrical level where your speakers will output any spl. If you can keep your desired audio above the SPL where the mics react, and your radicals below, you'll get better results. But the bottom line here is that you probably won't remove them entirely.
Before the problem starts:
I record live events often enough that I have to contend with the same kind of Free radical audio from the room. One method I've found to help with this is phase correction. I use a set of "ROOM MICS" to capture the audience. These mics face opposite the stage and are out of phase with the stage mics. Using phase correction, I can use a phase shift plugin to slowly alter the phase until much of the audience sound drops out or thins out. In Audition, I could take every stage track and run noise cancellation for the audience, but that will affect the entire volume level, and squash frequencies I don't want squashed. In Logic, a simple additive Phase adjustment will affect the same frequencies, but with better results. I don't squash particular frequencies, but all frequencies. By selectively bringing them back with a phase adjustment, I can add back or remove them. By keeping my Room Mics to a low level, I can avoid affecting the sound-stage as much. IF you want to get exhaustive... ...I've even used expansion and compression on the tracks to keep the volume adjustment on the far end to a standard and less noticeable effect. Think in terms of coincident, complex waveforms happening in the same instant. When the higher amplitudes are brought down to the lower amplitudes, and then added to the opposite phase, the result is adjusted by less than before you compressed amplitudes. Thinking in this way, you can figure out how to make your free radicals less intrusive, and possibly even useful in your audio to give a more live feel. Thought I'd throw this in there... ...might help, might not.