MattJayC

Q: Convert csv file from us-ascii to UTF-8

I'm just trying to convert a file over I run this through terminal

 

StudioA:~ StudioA$ file -I /Users/StudioA/Desktop/Mikey_WK37.csv 

and the result is

/Users/StudioA/Desktop/Mikey_WK37.csv: text/plain; charset=us-ascii


So I run this

iconv -f US-ASCII -t UTF-8 /Users/StudioA/Desktop/Mikey_WK37.csv > /Users/StudioA/Desktop/new_file.csv

then when I check the new file it is still in us-ascii.


Where am I going wrong?

Mac Pro, OS X El Capitan (10.11.1)

Posted on May 5, 2016 6:29 AM

Close

Q: Convert csv file from us-ascii to UTF-8

  • All replies
  • Helpful answers

Page 1 Next
  • by Tom Gewecke,

    Tom Gewecke Tom Gewecke May 5, 2016 6:53 AM in response to MattJayC
    Level 9 (79,035 points)
    May 5, 2016 6:53 AM in response to MattJayC

    TThere is no difference between us-ascii and UTF8.  The former is just a subset of the latter. 

  • by MattJayC,

    MattJayC MattJayC May 5, 2016 7:50 AM in response to Tom Gewecke
    Level 1 (5 points)
    Mac OS X
    May 5, 2016 7:50 AM in response to Tom Gewecke

    Can I still convert it? or at least change the character set as I have script that uses the UTF-8 set.

  • by MattJayC,

    MattJayC MattJayC May 5, 2016 8:40 AM in response to Tom Gewecke
    Level 1 (5 points)
    Mac OS X
    May 5, 2016 8:40 AM in response to Tom Gewecke

    The line in the script is

     

    if existsCSV then set o's csvText to paragraphs of (read checkListFile as «class utf8») -- get the contents of the CSV file ***

    It won't let me change it to us-ascii

  • by Tom Gewecke,Solvedanswer

    Tom Gewecke Tom Gewecke May 5, 2016 9:09 AM in response to MattJayC
    Level 9 (79,035 points)
    May 5, 2016 9:09 AM in response to MattJayC

    Does your csv file actually include a statement saying it is us-ascii?  If so you could just change that to utf-8.

  • by MattJayC,

    MattJayC MattJayC May 5, 2016 1:53 PM in response to Tom Gewecke
    Level 1 (5 points)
    Mac OS X
    May 5, 2016 1:53 PM in response to Tom Gewecke

    If I create a file in text edit in plain text type 1 or whatever then save as 'Western (Mac OS Roman), then run file -I theFile it is us-ascii

     

    This is how most of the files I get look, I need to use an extra character "✔" and when you add that file, it then asks for it to be saved in another code (Such as UTF-8)

     

    Tom Gewecke wrote:

     

    Does your csv file actually include a statement saying it is us-ascii?  If so you could just change that to utf-8.

    If I can just change it. How can I do that in terminal

     

    Thanks

     

    Matt

  • by Tom Gewecke,

    Tom Gewecke Tom Gewecke May 5, 2016 1:58 PM in response to MattJayC
    Level 9 (79,035 points)
    May 5, 2016 1:58 PM in response to MattJayC

    HHow about just replacing the text "us-ascii" by "utf-8"

  • by MattJayC,

    MattJayC MattJayC May 5, 2016 2:02 PM in response to Tom Gewecke
    Level 1 (5 points)
    Mac OS X
    May 5, 2016 2:02 PM in response to Tom Gewecke

    Excuse the correct answer I accidentally clicked it.

     

    I still don't know how I just change the text?

  • by Tom Gewecke,

    Tom Gewecke Tom Gewecke May 5, 2016 2:24 PM in response to MattJayC
    Level 9 (79,035 points)
    May 5, 2016 2:24 PM in response to MattJayC

    Sorry, I don't know much about doing such things in terminal.  Isn't there some kind of find/replace operation you can do on a file's contents?

  • by VikingOSX,Helpful

    VikingOSX VikingOSX May 6, 2016 1:01 AM in response to MattJayC
    Level 7 (20,591 points)
    Mac OS X
    May 6, 2016 1:01 AM in response to MattJayC

    Here is a Python program that reads in US-ASCII CSV and outputs it as a UTF-8 quoted field, Excel compatible CSV. The output opens and formats nicely in Numbers v3.6.1, and LibreOffice Calc v5.1.2.2.

     

    Usage: ucsv.py input.csv output.csv

     

    Copy and paste the following Python code into a programmer's editor. If it is Sublime Text 3, then use Paste and Indent. Otherwise, paste into a TextEdit plain text file, and save as ucvs.py. Make the Python script executable in the Terminal.

     

    Test this on a small CSV and open it in a spreadsheet application to verify it works ok for you.

     

    Code:

    #!/usr/bin/env python
    # coding: utf-8
    '''
    ucsv.py
    
    
    Read in a US-ASCII CSV document and write out a quoted field Excel CSV.
    Output CSV read correctly by Numbers v3.6.1, LibreOffice Calc 5.1.2.2.
    
    
    Usage: ucsv.py us-ascii-input.csv utf8_output.csv
    Derived from : https://docs.python.org/2.7/library/csv.html#examples
    http://stackoverflow.com/questions/17245415/read-and-write-csv-files-including-unicode-with-python-2-7
    '''
    
    
    import csv
    import codecs
    import cStringIO
    import os
    import sys
    
    
    
    
    class UTF8Recoder:
        def __init__(self, f, encoding):
            self.reader = codecs.getreader(encoding)(f)
    
    
        def __iter__(self):
            return self
    
    
        def next(self):
            return self.reader.next().encode("utf-8")
    
    
    
    
    class UnicodeReader:
        def __init__(self, f, dialect=csv.excel, encoding="utf-8-sig", **kwds):
            f = UTF8Recoder(f, encoding)
            self.reader = csv.reader(f, dialect=dialect, **kwds)
    
    
        def next(self):
            '''next() -> unicode
            This function reads and returns the next line as a Unicode string.
            '''
            row = self.reader.next()
            return [unicode(s, "utf-8") for s in row]
    
    
        def __iter__(self):
            return self
    
    
    
    
    class UnicodeWriter:
        def __init__(self, f, dialect=csv.excel, encoding="utf-8-sig", **kwds):
            self.queue = cStringIO.StringIO()
            self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
            self.stream = f
            self.encoder = codecs.getincrementalencoder(encoding)()
    
    
        def writerow(self, row):
            '''writerow(unicode) -> None
            This function takes a Unicode string and encodes it to the output.
            '''
            self.writer.writerow([s.encode("utf-8") for s in row])
            data = self.queue.getvalue()
            data = data.decode("utf-8")
            data = self.encoder.encode(data)
            self.stream.write(data)
            self.queue.truncate(0)
    
    
        def writerows(self, rows):
            for row in rows:
                self.writerow(row)
    
    
    if len(sys.argv) < 3:
        sys.exit("{} <ascii-csv> <utf-csv>\n".format(sys.argv[0]))
    
    
    if os.path.exists(sys.argv[1]) and sys.argv[1].endswith('.csv'):
        ascii_csv = os.path.expanduser(sys.argv[1])
        utf8_csv = os.path.expanduser(sys.argv[2])
    else:
        sys.exit("One or both of the input files do not exist.")
    
    
    with open(ascii_csv, 'rb') as fin, open(utf8_csv, 'wb') as fout:
        reader = UnicodeReader(fin)
        writer = UnicodeWriter(fout, quoting=csv.QUOTE_ALL)
        for line in reader:
            writer.writerow(line)
    
  • by Tom Gewecke,

    Tom Gewecke Tom Gewecke May 5, 2016 4:45 PM in response to VikingOSX
    Level 9 (79,035 points)
    May 5, 2016 4:45 PM in response to VikingOSX

    VikingOS X -- How would the output of this program differ from the input, since nothing changes when us-ascii is relabelled as utf-8?

  • by MattJayC,

    MattJayC MattJayC May 6, 2016 1:02 AM in response to VikingOSX
    Level 1 (5 points)
    Mac OS X
    May 6, 2016 1:02 AM in response to VikingOSX

    This does it now, many thanks. hopefully it will run with no problem in the other script.

     

    Many Thanks

  • by VikingOSX,

    VikingOSX VikingOSX May 6, 2016 7:07 AM in response to Tom Gewecke
    Level 7 (20,591 points)
    Mac OS X
    May 6, 2016 7:07 AM in response to Tom Gewecke

    Tom,

     

    Numbers v3.6.1 (and LibreOffice Calc) always informs that the Export to CSV default encoding is UTF-8. If only US-ASCII characters are used in the spreadsheet, then the UNIX file utility will report the CSV as ASCII text with CRLF line terminators. If the spreadsheet contains US-ASCII and non-US-ASCII characters, then the UNIX file utility will identify the CSV as UTF-8 Unicode text with CRLF line terminators.

     

    The Python script does some more things, but in essence, it takes a pure US-ASCII CSV, and re-encodes it to 8-bit ASCII characters that now inform the UNIX file utility to report the above UTF-8 message. Rather pointless.

     

    In retrospect, and some discovery on my part after I posted the code, it is unnecessary, as the current Numbers and LibreOffice Calc are adapting the exported CSV encoding as the content demands.

  • by Tom Gewecke,

    Tom Gewecke Tom Gewecke May 6, 2016 8:45 AM in response to VikingOSX
    Level 9 (79,035 points)
    May 6, 2016 8:45 AM in response to VikingOSX

    Thanks for the explanation.  I wonder if the script adds a BOM to the beginning of the file.  I remember now that some apps require that to recognize "utf-8", even if the content is really only ascii.

  • by VikingOSX,

    VikingOSX VikingOSX May 6, 2016 9:53 AM in response to Tom Gewecke
    Level 7 (20,591 points)
    Mac OS X
    May 6, 2016 9:53 AM in response to Tom Gewecke

    The script does insert a BOM at the beginning of the output csv file. If I would just make good on my intent to acquire Office 2016 for Mac, I could test this output  file compatibility with Excel.

Page 1 Next