CSV file + dates

Question

Level 1

8 points

CSV file + dates

I have a CSV file that i am trying to import into a program. the problem is the csv file has one row that is dates. I go in to inspector and change the date to the format of mmddyyyy. after I export it as a csv file, and try to open it up the date has changed back to a "Dec 19, 2004" format" I need it to stay in the mmddyyyy format for the program.

thanks

Imac 2.8 24", Mac OS X (10.5.4)

Posted on Jul 6, 2008 8:48 PM

Reply

Answer 1

KOENIG Yvan

Level 8

41,819 points

Jul 6, 2008 11:49 PM in response to capt996

This feature was described MANY times.
• When you apply a date format, it changes the way the date is displayed on the screen, it doesn't change the way it is stored.
• I apologize but I never met the format mmddyyyy in the lists of available formats. The one I know is mm/dd/yyyy.
• If you really need that such format remains in the CSV file, you must convert your dates as strings.

I know that it is annoying because formulas using these dates will behave oddly so the exported file will be wrong. In fact, the only clean soluce which I know is to duplicate the column containing the dates. One column will remain true dates (exported with the unwanted format), the second will be converted as strings (exported as you want).

Yvan KOENIG (from FRANCE lundi 7 juillet 2008 08:46:48)

Reply

Answer 2

R C-R

Level 6

17,794 points

Jul 7, 2008 6:18 AM in response to KOENIG Yvan

KOENIG Yvan wrote:
• When you apply a date format, it changes the way the date is displayed on the screen, it doesn't change the way it is stored.

But it does change how Numbers stores the cell value in an exported CSV file, at least on my system.

For example, let's say a row displays a value of "1/1/2009" in column A of a Numbers table & "my date example" in column B. If I export the document in CSV format, for this row in the CSV file, I get:

1/1/2009,my date example

If I change the date format of column A to one that displays "January 1, 2009" I get:

January 1, 2009,my date example

Basically, what you see displayed in the Numbers cells is what you get in the CSV file.

• If you really need that such format remains in the CSV file, you must convert your dates as strings.

This statement may be misleading. CSV files themselves do not support data type declarations; they are just a series of strings separated by delimiters like commas & newline characters. (In fact, CSV isn't actually an official standard but a set of related but not identical file formats with minor variations in how they handle such things as spaces & delimiters embedded inside strings.) Because of this, there is no way to (and no need to) format your data as strings in the CSV file -- it is already in that format in the file.

So, if I understand what capt996 is doing correctly, the problem is in how the program that imports the CSV file is interpreting or displaying the exported CSV file data. As with Numbers, many other programs will attempt to interpret CSV data as various data types & automatically put them into some default formats appropriate for that data type. Avoiding or working around that will depend on the features of the importing program.

Reply

Answer 3

KOENIG Yvan

Level 8

41,819 points

Jul 7, 2008 8:36 AM in response to R C-R

It appears that I wrongly read the original message.

I go in to inspector and change the date to the format of mmddyyyy. after I export it as a csv file, and try to open it up the date has changed back to a "Dec 19, 2004" format" I need it to stay in the mmddyyyy format for the program.

I understood that the date was exported in the CSV file as "Dec 19, 2004".

Now I understand better.

The CSV contains 12/19/2004 but, when the program decipher it, it apply its default format to display the date.
As your default format is mmm dd, yyyy
you get "Dec 19, 2004"
To get the mm/dd/yyyy format, you must change the used format.

The ability to decide by ourself which must be the default format is in first place in my wish list;-)

Yvan KOENIG (from FRANCE lundi 7 juillet 2008 17:36:20)

Reply

Answer 4

R C-R

Level 6

17,794 points

Jul 7, 2008 11:51 AM in response to KOENIG Yvan

KOENIG Yvan wrote:
It appears that I wrongly read the original message.

You may still have misunderstood what it meant, depending on what the O.P. means by "the program." I at first assumed that meant Numbers, but on more careful reading, I realized it may mean that the file is being opened in some other application. IOW, Numbers could be the source of the CSV file but not the intended destination.

If that is true, it won't matter what the default format is in Numbers: as long as the displayed format is the one desired for the export to the CSV file, Numbers will export the cell value in that format (as a string) & any observed reformatting of that string would be caused by the application opening the file. Thus, it would be that application's import or automatic data typing options that need to be investigated, not Numbers' defaults, to get the results desired.

A side note:

AFAIK, this direct relationship between the currently displayed & exported string value during the creation of a CSV export file is not mentioned anywhere in the Numbers documentation, nor can I find any prior discussion of it in this forum. Only after contemplating the possible meaning of the original post did I get curious enough to check the exported file with a text editor & discover this feature.

Giving users the choice of how the string is exported really does seem to be a hidden "bonus" feature (as opposed to a bug) that could be very useful to know about in some circumstances. I read recently in some other thread an opinion that a description of all options of implemented features could be found in the product documentation; this is (apparently) an example demonstrating that isn't always true.

It is a little like the documentation for importing CSV files. It tells you Numbers "can also import files in comma-separated value (CSV) format" but leaves out the detail that the filename extension plays a part in how this is done. (If the extension is "csv" then Numbers imports each comma separated value into a separate a column; if it is say "txt" then everything up to the newline character goes into one cell. If the extension is missing completely, Numbers refuses to import the file at all.)

Things like this prove the value of experimentation: depending on what you want to do, there may be a simple but obscure way to do it.

Reply

Answer 5

KOENIG Yvan

Level 8

41,819 points

Jul 7, 2008 12:50 PM in response to R C-R

R C-R wrote:
It is a little like the documentation for importing CSV files. It tells you Numbers "can also import files in comma-separated value (CSV) format" but leaves out the detail that the filename extension plays a part in how this is done. (If the extension is "csv" then Numbers imports each comma separated value into a separate a column; if it is say "txt" then everything up to the newline character goes into one cell. If the extension is missing completely, Numbers refuses to import the file at all.)

From my point of view, what you point to is not important. I never had the idea to see a .txt file as a .csv one but it's perhaps because I hate the csv format sot I always take care before using it.

The important point is that in fact, Numbers import only some CSV files, those really using the comma as separators.
They forget that most of the CSV files generated in Europe use the semi-colon as separator to eliminate conflicts with our decimal comma.

The first funny drawback is that these european user can't import in Numbers a CSV generated by Bento and of course can't import in Bento CSV files generated by Numbers.

Of course you know that FileMaker which edit Bento is a 100% Apple's subsidiary , but as often, someone's left hand ignores what the right hand does 😉

Yvan KOENIG (from FRANCE lundi 7 juillet 2008 21:50:14)

Reply

Answer 6

R C-R

Level 6

17,794 points

Jul 7, 2008 4:21 PM in response to KOENIG Yvan

KOENIG Yvan wrote:
They forget that most of the CSV files generated in Europe use the semi-colon as separator to eliminate conflicts with our decimal comma.

That actually shouldn't be necessary: as long as the data strings are enclosed in quotes, there should be no conflicts. But the real problem is (as I said) CSV isn't an official format, but a number of related ones, many of which predate any modern operating system or application. A few have become so entrenched over the years that they are seen as "the real standard," even though none can legitimately claim that distinction.

So, we have one delimiter common in one part of the world, another one in another part of the world, & something less than the universal format we would like.

For those interested, http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm offers a good overview of the variations & some considerations on their use.

Reply

Answer 7

KOENIG Yvan

Level 8

41,819 points

Jul 8, 2008 3:12 AM in response to R C-R

R C-R wrote:
That actually shouldn't be necessary: as long as the data strings are enclosed in quotes, there should be no conflicts. But the real problem is (as I said) CSV isn't an official format, but a number of related ones, many of which predate any modern operating system or application. A few have become so entrenched over the years that they are seen as "the real standard," even though none can legitimately claim that distinction.

Most often, CSV generated in Europe, are built exactly as Bento do:

semi-colon to separate values
double quotes enclose strings containing semi-colons

in the rest of the world, most often they are:
comma separating values
double quotes enclosing strings containing commas.

These choices where made to reduce the size of the files.

I hate these formats separating the world into two parts exactly as the "steel curtain" did for years. All of us know a format which may be used worldwide: the TAB separated values one.

At least I think that asking to Apple and FileMaker (100% Apple subsidiary) to use the same definition is not completely absurd 😉

In my script translating TSV into CSV usable by Bento in Europe, I uses the brute force scheme: I replace every tab by double quote semi-colon double quote and of course I add a double quote at the very beginning and at the end

Yvan KOENIG (from FRANCE mardi 8 juillet 2008 12:03:46)

Reply

Answer 8

R C-R

Level 6

17,794 points

Jul 8, 2008 6:10 PM in response to KOENIG Yvan

These choices where made to reduce the size of the files.

Reducing file size almost always compromises data portability. When storage space & network bandwidth were limited & expensive, it was a reasonable compromise. Today, that is much less true. In the future, data portability almost certainly will eventually become such a dominate concern that the value of CSV (& all other simple delimiter-separated formats) as a "universal" data exchange format will approach zero.

I would not speculate about how quickly that will occur, but I will speculate about what will replace it, at least for the foreseeable future: I think there is no question that XML will become the standard. True, it is horribly inefficient from a file size standpoint, & it increases data processing overhead immensely, but then so was another compromise once cursed for its inefficiency: the GUI interface.

This puts the iWork applications on the cutting edge of data exchange technology, since their "native" file format is XML. Just what that will eventually mean for users is something I will not speculate about here (because it flirts with violating the Discussions ToU), but you might find this XML.com article provides some clues about Apple's plans ... or not, since it seems obvious that Apple will not publish the details of its schemas so quickly that it compromise its competitive advantages in this area.

Of course, today being on the cutting edge may not be what users want, since it is sharp & we might be the ones getting cut by it, but eventually I think it will prove its value. Mac users should be used to this: we often are the early adopters of an Apple innovation that reaches its potential only after the rest of the world reluctantly (& sometimes unknowingly) adopts it too.

Reply

Answer 9

KOENIG Yvan

Level 8

41,819 points

Jul 9, 2008 2:23 AM in response to R C-R

R C-R wrote:
Reducing file size almost always compromises data portability.

In this case, using TAB delimiters reduced the size but don't reduced portability.

I don't know the origin of CSV but it's a wonderful example of English imperialism : "it works for me English writer so it must be used by others too".

At this time, I'm just wondering why Numbers is exporting in CSV, not in TSV and why Bento is importing CSV, not TSV and of course why the two don't use the same definition of CSV.

Comparing to XML is an other problem.

Yvan KOENIG (from FRANCE mercredi 9 juillet 2008 11:23:22)

Reply

Answer 10

R C-R

Level 6

17,794 points

Jul 9, 2008 5:29 AM in response to KOENIG Yvan

KOENIG Yvan wrote:
In this case, using TAB delimiters reduced the size but don't reduced portability.

It doesn't only because Numbers doesn't allow tabs in its data values, so there is no problem telling data from the delimiter when exporting from Numbers into certain other applications. If you were importing data into Numbers, or exporting to an application that allowed tabs in data values, portability would be compromised.

I don't know the origin of CSV but it's a wonderful example of English imperialism : "it works for me English writer so it must be used by others too".

1. CSV originated in a time when most advances in applied computer science were made by U.S. companies for an emerging domestic market. Like with ASCII, little thought was given to international considerations. In the era that followed, when markets were expanded abroad, it was with customized software written specifically for that market. The high cost of doing this is what eventually led to the software abstractions that enable localizations. Apple was the first major company to apply this abstraction systematically (through the resource forks of the old Mac OS), but this was many years later.

2. Nobody forced anybody to use CSV. Had other countries developed a competitive domestic software industry in that era, it probably would have led to a proliferation of incompatible local data exchange standards & the realization that something else was needed for an international standard would have come much sooner. To this day, many people think the Microsoft version of CSV is the standard, even though it has several forms, depending on the type & vintage of the MS product that generates it.

3. Most fundamentally of all: the name of the format is comma separated value! Why something that doesn't use commas as the delimiter came to be referred to as "CSV" is something I can't quite understand. Like tab & comma separated, the European version is a member of the family of delimiter separated data formats, but that does not justify calling it CSV any more than calling tab separated values by that name.

At this time, I'm just wondering why Numbers is exporting in CSV, not in TSV and why Bento is importing CSV, not TSV and of course why the two don't use the same definition of CSV.

I discarded the trial version of Bento as too weak for my database needs so I can't refresh my memory about whether or not it accepts tabs in field values, but if it does (or that is planned for a future version), it might explain why it doesn't use TSV.

As noted earlier, this is a problem shared by all delimiter separated formats: unless there is a single, unambiguous standard for handling delimiters within data values, there is no perfect solution.

Reply

Answer 11

KOENIG Yvan

Level 8

41,819 points

Jul 9, 2008 6:16 AM in response to R C-R

It doesn't only because Numbers doesn't allow tabs in its data values, so there is no problem telling data from the delimiter when exporting from Numbers into certain other applications. If you were importing data into Numbers, or exporting to an application that allowed tabs in data values, portability would be compromised.

Let me smile, Numbers accepts TABs in cells.

If I remember well, the 1st program using TAB delimited values was Visicalc.
All spreadsheets which I used since Visicalc are able to import/export TSV.

The first one which I met proving unable to export TSV is Numbers.
Happily, there is a workaround:

If cells don't contain TAB
select all
copy
paste in a TextEdit new doc
set format to text
save

If cells contain TAB
export as CSV
open it in TextEdit
replace commas by double quote TAB double quote
and add double quote at the beginning and at the end.

I don't understand why a program allowing TABs in cells would be unable to export/import TSV.
Is it so difficult to use the scheme used with CSV? If a field contains a TAB, it would be enclosed between double quotes.
With such a format we would be able to exchange datas worldwide.

I know that neoOffice is able to do that (the cells containing TAB characters are enclosed by double quotes).
I don't know for the true Excel (which is not allowe…) or for openOffice in which I was unable to paste a string containing a TAB.

Back to the CSVs formats. The first time I met a CSV file, it was one generated by a French version of Excel. If I remember well, the menu entry giving access to this format was named "Save as Text CSV.csv" as it is in openOffice and neoOffice.

So, now I know that a format usable worldwide already exists.
It's only ignorance or weight of "traditions" or a huge contempt of "the rest of the world" which leave CSV in use.

Yvan KOENIG (from FRANCE mercredi 9 juillet 2008 15:04:41)

Reply

Answer 12

R C-R

Level 6

17,794 points

Jul 9, 2008 7:36 AM in response to KOENIG Yvan

Let me smile, Numbers accepts TABs in cells.

What I meant was there is no way to enter a tab into a cell from the keyboard. You can't paste text containing a tab from some other application into a selected Numbers cell either (the text up to the tab goes into the selected cell & the text after the tab goes into the adjacent one, even if the table must be expanded by one column to do so).

But you can paste text with an embedded tab into a single cell if the insertion point is visible in the cell or formula bar before the paste. Whether this was an intended feature or just the result of inadequate error trapping I cannot say, but it is something that has caused problems with other applications that don't expect certain characters to appear in data fields or cells. (I used to have this problem with old versions of Filemaker, for example.)

I don't understand why a program allowing TABs in cells would be unable to export/import TSV.
Is it so difficult to use the scheme used with CSV? If a field contains a TAB, it would be enclosed between double quotes.

It isn't that it would be difficult to implement in the exporting application. It is that the importing application must understand the convention or it would not be universal. That is the problem with all such proposals. There are several ways of handling delimiters embedded in data values. Each depends on an assumed frequency of the embedded delimiter in the data being low enough to offer optimal file size. Your scheme would produce smaller files for those who rarely use tabs in data but some other scheme would be better for data in which tabs were frequent.

Perhaps the best of these schemes uses the escapement mechanism common to many international standards: precede the embedded delimiter with an escape character, typically one that is rarely used in data (like the backslash), but it too suffers the same limitation: if the escape character is common in the data of interest, it is not the best choice.

The addition of Unicode characters to the mix also must be considered: a two byte character representation may have a much higher frequency of the delimiter in the raw data than otherwise expected.

The first time I met a CSV file, it was one generated by a French version of Excel.

Excel, at least in later versions, is noted for its substitution of a local delimiter for the comma. If you want to blame someone for misuse of the "CSV" format name & the confusion that resulted, blame Microsoft.

So, now I know that a format usable worldwide already exists.

The problem is that too many of them exist. It is not ignorance or arrogance that caused this but the needs of the times & of the various parts of the world that led to this. As mentioned earlier, maximum data portability & compactness are impossible to achieve at the same time. There is only so much information that can be packed into a string. If you want more, you must increase the string size.

Reply

Answer 13

KOENIG Yvan

Level 8

41,819 points

Jul 9, 2008 7:58 AM in response to R C-R

I blame Apple because it is not consistent in it's own behavior.

AppleWorks exported as TSV.
Numbers import/export a kind of CSV.
Bento which is edited by a 100% Apple's subsidiary export/import an other kind of CSV.

For a company whose Guidelines insists upon internationalization:

+The Mac OS X application bundling scheme is designed to support localized strings, images, nib files,+
+and other resources. However, there is more to designing an application for use in different markets+
+than just including the right translated strings. “Worldwide Compatibility” (page 50) provides some+
…
+For further guidelines and information about how to internationalize your applications, see+
+Internationalization Programming Topics.+

it's at least funny.

It becomes really boring when we are asked by Apple vendors that if we want a DB tool to complete iWork we may "buy Bento".

(I used to have this problem with old versions of Filemaker, for example.)

This prove only that the import TSV code was badly designed. We will not discard beds because, in "rich countries" most of human beings die in a bed 😉

Yvan KOENIG (from FRANCE mercredi 9 juillet 2008 16:56:58)

Reply

Answer 14

R C-R

Level 6

17,794 points

Jul 9, 2008 11:51 AM in response to KOENIG Yvan

I blame Apple because it is not consistent in it's own behavior.

The point you seem unwilling to accept is that there is no one consistent behavior using simple delimiter separated schemes, now or in the future, that anyone can depend on for unambiguous data portability.

This is not something that Apple or Microsoft or anybody else can change. No matter what variant any application adopts, there will be files that don't follow it & a need to work with the data they contain.

Thus, it reduces to a popularity issue: what variant is most likely to be encountered most often? There is no one good answer for this: every one of them will favor some local standard or data content more than others. That is why the variants came to exist in the first place. Each meets some needs; none meet them all.

Reply

Answer 15

KOENIG Yvan

Level 8

41,819 points

Jul 9, 2008 12:26 PM in response to R C-R

R C-R wrote:
The point you seem unwilling to accept is that there is no one consistent behavior using simple delimiter separated schemes, now or in the future, that anyone can depend on for unambiguous data portability.

I perfectly understood what you wrote but it seems that you refuse to understand what I wrote.

I may "accept" that a µSoft program doesn't treat CSV the same way than an Apple product do. It's what I write very often when users complain that "XL offer this function and Numbers doesn't".

What I feel boring is that in a single entity (Apple + FileMaker) the way to treat the same problem at the same date is different.
I'm not wondering upon exchanges between (Apple+FileMaker) products and µsoft ones. I completely ignores them. Every µSoft file which I receive is automatically trashed.
I wonder upon exchanges between Apple and FileMaker products. I feel that asking them to treat CSVs the same way is not a foolish request. If FileMaker is reluctant to apply the Apple's choice, at last resort, the boss is the Apple's boss. Isn't it ?

Yvan KOENIG (from FRANCE mercredi 9 juillet 2008 21:23:46)

Reply