remove non-readable characters from a text file

I've created a multi-row comma separated text file where each row contains a number of comma separated numerical values that when imported, inform the creation of a family in Autodesk Revit. When attempting to import the values from the text file, Revit produced an error message for a value that was not in the text file. Someone on the Revit support forum found a "non-readable" character, using notepad, in one of the rows that was the culprit. Is there a way to find and eliminate these "non-readable" characters using TextEdit or some other Apple program?

Earlier Mac models

Posted on Apr 12, 2023 1:18 PM

Reply
Question marked as Top-ranking reply

Posted on Apr 13, 2023 8:28 AM

ctc123 wrote:

"man strings" and "xxd {filename} | less" are for use in Terminal?


Short answer:


Yes; Terminal app, and the command line.


Long answer:


The macOS GUI doesn’t have built-in tooling for this case, though there are apps that can be downloaded, or that can be purchased. For tasks such as this particular investigation, HexFiend works well, and is free. The HexFiend app—like xxd at the command line—lets you see exactly what characters are in the file.


Akin to Notepad++ at the Windows GUI, BBEdit is an excellent text editor. The slightly de-tuned version of BBEdit is free.


What’s a text editor? A text editor is an app that can read and write a plain text file. Tools such as Apple Pages or Microsoft Word use their own formats, and are either incapable of or are clunky at reading and writing plain text files.


CSV is a plain text file. What characters can be included in a plain text file—and for this case, which of those characters will be tolerated by apps reading the CSV file—varies. Which is why we’re here.


And again, CSV deflagrates into flaming hot garbage of corner cases, just as soon as you try to get serious with it.


PS: I’d likely use WSL on Windows too, and not Notepad++. WSL gets you a very capable command line and associated tooling on Windows, and which will be familiar to Unix and Linux and macOS command line users.

11 replies
Question marked as Top-ranking reply

Apr 13, 2023 8:28 AM in response to ctc123

ctc123 wrote:

"man strings" and "xxd {filename} | less" are for use in Terminal?


Short answer:


Yes; Terminal app, and the command line.


Long answer:


The macOS GUI doesn’t have built-in tooling for this case, though there are apps that can be downloaded, or that can be purchased. For tasks such as this particular investigation, HexFiend works well, and is free. The HexFiend app—like xxd at the command line—lets you see exactly what characters are in the file.


Akin to Notepad++ at the Windows GUI, BBEdit is an excellent text editor. The slightly de-tuned version of BBEdit is free.


What’s a text editor? A text editor is an app that can read and write a plain text file. Tools such as Apple Pages or Microsoft Word use their own formats, and are either incapable of or are clunky at reading and writing plain text files.


CSV is a plain text file. What characters can be included in a plain text file—and for this case, which of those characters will be tolerated by apps reading the CSV file—varies. Which is why we’re here.


And again, CSV deflagrates into flaming hot garbage of corner cases, just as soon as you try to get serious with it.


PS: I’d likely use WSL on Windows too, and not Notepad++. WSL gets you a very capable command line and associated tooling on Windows, and which will be familiar to Unix and Linux and macOS command line users.

Apr 12, 2023 1:52 PM in response to ctc123

One would need to know what that "unreadable" character is before filtering it. It might be two contiguous commas, or it might be something else. Did the Revit support team mention what "non-readable" character they found?


Notepad would suggest that the Revit support person was on Windows, not macOS, and that might emply the text file may have the wrong line endings of CRLF rather than the expected LF on UNIX.


One may be able to read that text file and allow only numbers, commas, and line feeds on any given row. Any double quotes used in the file?

Apr 12, 2023 2:44 PM in response to ctc123

Let's say that text file is on your Desktop. Launch the Terminal application. The following sed syntax says remove anything that is not a digit, comma, or newline from the file. Do it in place, so the filtered file is the same name and the original file is backed up as foo.txt.bak. You end up with the filtered foo.txt and one whose character set is forced to UTF-8. These may be identical, but a good test for the AutoDesk Revit import process.


cd ~/Desktop
/usr/bin/sed -i .bak 's/[^[:digit:],]//g' foo.txt && /usr/bin/iconv -t UTF-8 ./test.txt > test_utf8.txt




Apr 12, 2023 6:37 PM in response to VikingOSX

Thanks VikingOSX that is very helpful of you. The txt file has a top line that defines the below parameters. Here it is.


,Bar Diameter##length##inches,Length##length##inches,Keynote##other##,Cover Beams & Columns##length##inches,Cover Other##length##inches,Cover in Contact with Ground##length##inches

#_3,0.3750,15.0000,03 21 00.A1,2,1.5,3

#_4,0.5000,16.0000,03 21 00.B2,2,1.5,3

#_5,0.6250,25.0000,03 21 00.C3,2,1.5,3

#_6,0.7500,30.0000,03 21 00.D3,2,1.5,3

#_7,0.8750,35.0000,03 21 00.E1,2,1.5,3

#_8,1.0000,40.0000,03 21 00.F1,2,1.5,3

#_9,1.1250,45.0000,03 21 00.G1,2,1.5,3

#_10,1.2500,50.0000,03 21 00.H1,2,1.5,3

#_11,1.3750,55.0000,03 21 00.I1,2,1.5,3

#_14,1.7500,60.0000,03 21 00.J1,2,1.5,3

#_18,2.2500,65.0000,03 21 00.K1,2,1.5,3


This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

remove non-readable characters from a text file

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.