You seem to be new to FCP X. There are no tracks in the application, so there is not a specific video or audio track. Again: there are no tracks!
What you describe as the "video" track is what is called the "primary storyline". It is the backbone of your story. Everything else will be connected to it.
You need to familiarize yourself with how the application works. FCP X is very different from other NLE's, and that can be confusing at first; but if you learn how it works (and don't try to force it to work as other apps do), you are likely to find it is using a superior paradigm.
My suggestion is to take a while and follow the excellent introductory course by Izzy Video (which is free, but professionally done; don't get lost in the myriad youtube videos, most of which are of low technical or pedagogical value, and often describe obsolete versions).
http://training.izzyvideo.com/courses/final-cut-pro-x-tutorial
Going back to your original question: depending on what you do, it can make a lot of sense to put the audio in the primary storyline - for example, for a music video - or to put the video there, which would likely make more sense when you are doing voice over. In that case, you would
have the voice over clips connected to the primary storyline: