Okay, this question covers a range of topics.
First, when posting code-involved questions, it helps greatly to post code; to post a concise reproducer. This is what started all this off, and you didn’t post the hunk of code that stalled things. Yes, that can involve binary search within the code, with print-based or other debugging, too. But code and errors matter.
Here is how Python deals with different file encodings.
https://docs.python.org/3/howto/unicode.html
This is an overview of Unicode, which most of us will inevitably need to know more about than what we would prefer to know:
https://tonsky.me/blog/unicode/
There can be some great “fun” awaiting in a UTF-8 or other Unicode file. An OCR tool usually won’t give you a gremlin character (an invisible of some sort) for instance, but other tools will:
https://www.thelinuxrain.org/articles/hunting-gremlin-characters
I’m presuming use of Apple Live Text as your OCR here, so Cyrillic seems an odd choice if English was otherwise being detected, but this also wouldn’t be the first time that some wad of ML hallucinated. Apple doesn’t have controls over this detection, either. OCR ~always leaves some rubbish.
BBEdit can show the character encoding for the file in the status bar, when that is enabled. I’d be shocked if it couldn’t search for non ISO Latin-1 characters, too.
I would be exceedingly cautious around getting Office or LibreOffice or other such in the mix when programming, as you’ll need to have that output plain text files only, and not any of the Office formats. BBEdit, vim, emacs, pico, etc., all work with plain text files.
I usually use xxd when looking for “fun” characters in files, and that tool is reversible for data files; you can hex dump a data file (or a text file) and then edit the hex dump, and then xxd to convert the patched file back to the original file format. The hexdump or other tools can also be used.
BBEdit can do various conversions and searches as well:
https://apple.stackexchange.com/questions/408181/changing-character-encoding-from-unicode-to-ascii
I’d be surprised if BBEdit couldn’t somehow highlight ranges too, but I don’t use that editor heavily.
Using file (as mentioned above) and grep for file format spelunking:
https://unix.stackexchange.com/a/474812
The following converts a UTF-8 file into an ISO Latin-1 (~ASCII) file, though lots of UTF-8 isn’t present in ISO Latin-1 and will get vaporized. Do not specify the same file name for input and for output. (Long switches shown, and -t and -f will work, too.)
iconv --from-code=iso-8859-1 —-to-code=utf-8 < utf8.txt > ascii.txt
macOS and common Apple tools deal reasonably well with UTF-8 in most spots including in the command shell, but other UNIX platforms can have issues.
TL;DR: xxd and look