It may be possible, but I don't think U would try to do this in the Custom template. As you've noted there is really very limited control over the timing.
If you have Motion, you could create a stack of text boxes with the text sequence behavior (create one and duplicate the rest in new layers). You would need to make notes on the TC for the talent's dialogue, then make in and out points for each of the layers so the words in a sentence (or line of text) would appear sequentially and persist.
An slightly easier way might be simply to stack basic title clips right within FCP – one title clip per word – staggering the clip position according to its place in the line of text. Alignment would be adjusted by sight.
Obviously, it's a fiddly process and there will probably be a post by someone who has superior graphics skills and who will give you some better ideas.
If you're going to do a lot of this, consider third party open caption software. That can range from very expensive to donation software.
Good luck.
Russ