Database

Question

Level 1

94 points

Database

I'm thinking of using a plain text format for a data file format for my app/platform. It seems to me that I can use string programming to read and write data from a plain text file, and since almost all of the data needs to be read and written in series rather than searched through like a dictionary-type database, a plain text file seems like a good solution.

The problem is converting the numeric values to objects, performing a calculation, and placing them into an array.

So my questions include 1) Am I missing something before I go ahead and make plain text files my format for application data?

2) What are some of the efficient ways to take comma separated float values in a string from a location or file (after locating the symbol used by the reader to identify the beginning and end of the data string), convert each into an object, perform a calculation, and add that value to an array? I need several such sub-arrays, which then have to be re-calculated together. Will a new stack be opened for a new array, or should I direct the computer to open a new stack for each new array that is needed simultaneously?

Let's say I have these values in a string read from a text file: "12.3,12.5,12.6,13.2,13.4,13.8"

So let's say I take the string, read to a comma, convert it into an NSNumber object, all assigned to an instance variable. Then I do the same for the second value, etc. Is it better to take a couple of values and perform the calculation, move the result to another array, and then re-use the instance variables, or is it better to store an array of the NSNumber objects before doing any calculations? Should I read each value, assign a variable, and then put all the variables and calculation directives onto a queue?

iMac

Posted on Oct 20, 2012 5:41 AM

Reply

Answer 1

Best reply

etresoft

Level 8

46,128 points

Oct 20, 2012 8:01 AM in response to mark133

mark133 wrote:

I'm thinking of using a plain text format for a data file format for my app/platform. It seems to me that I can use string programming to read and write data from a plain text file, and since almost all of the data needs to be read and written in series rather than searched through like a dictionary-type database, a plain text file seems like a good solution.

Don't do that.

The problem is converting the numeric values to objects, performing a calculation, and placing them into an array.

Among other problems.

1) Am I missing something before I go ahead and make plain text files my format for application data?

Yes. That is a royal nightmare.

2) What are some of the efficient ways to take comma separated float values in a string from a location or file (after locating the symbol used by the reader to identify the beginning and end of the data string), convert each into an object, perform a calculation, and add that value to an array? I need several such sub-arrays, which then have to be re-calculated together. Will a new stack be opened for a new array, or should I direct the computer to open a new stack for each new array that is needed simultaneously?

Hmmmm... Perhaps the only thing worse than plain text is comma delimited text. Really, you don't want to go there.

Why not use Core data? I know it can be difficult to get your head around, but it will be worth it in the end. Start slow. Do some practice apps. Just never roll your own data format.

Reply

Answer 2

mark133 Author

Level 1

94 points

Oct 20, 2012 8:52 AM in response to etresoft

Thank you for replying, etresoft. I've been held up on this for a little while. One of the problems is that I don't have a very good idea of the ratios between switching times, calculation times, etc, and the core data processing times.

It looked like core data is designed for dictionary type access with management for making changes on multiple clients simultaneously and/or saving changes, etc. This is the scenario with most business type apps, where information is changed or entered at various places, and the information is dictionary type information.

The data format I need to generate, store, and use is very strictly only needed as a linear enumeration, practically never changes once entered, and needs to travel as a document to be shared.

It seems like core data is not the tool for this scenario?

Reply

Answer 3

mark133 Author

Level 1

94 points

Oct 20, 2012 8:58 AM in response to mark133

Another important detail is that the app requires a numerous quantity of calculations on data as quickly as possible, which I understand can be done with Core Graphics, only I'm not sure of the ratios between calculation time and transfer time, if that's appropriate for, say, 5000 points to undergo a relatively simple calculation, or if it's faster to calculate each point in a loop.

Reply

Answer 4

mark133 Author

Level 1

94 points

Oct 20, 2012 9:01 AM in response to mark133

Another possibility is using a familiar format, like XML, and entering a string of values as a keyed value? Or is reading that string the problem you are mainly talking about?

Reply

Answer 5

etresoft

Level 8

46,128 points

Oct 20, 2012 9:39 AM in response to mark133

mark133 wrote:

One of the problems is that I don't have a very good idea of the ratios between switching times, calculation times, etc, and the core data processing times.

What do you mean by "switching times"? Do you mean context switching à la threads? You don't need to worry about those details. That is premature optimization and that is a bad idea.

It looked like core data is designed for dictionary type access with management for making changes on multiple clients simultaneously and/or saving changes, etc. This is the scenario with most business type apps, where information is changed or entered at various places, and the information is dictionary type information.

Core data is just an indexed data store tied into key-value coding. It is a general technology not intended for any particular type of application.

The data format I need to generate, store, and use is very strictly only needed as a linear enumeration, practically never changes once entered, and needs to travel as a document to be shared.

You can have document-based Core Data.

It seems like core data is not the tool for this scenario?

You've got an uphill fight to make a claim that CSV is the right tool for any scenario.

Another important detail is that the app requires a numerous quantity of calculations on data as quickly as possible, which I understand can be done with Core Graphics, only I'm not sure of the ratios between calculation time and transfer time, if that's appropriate for, say, 5000 points to undergo a relatively simple calculation, or if it's faster to calculate each point in a loop.

Are you talking about OpenCL? You definitely don't need anything like OpenCL for a paltry 5000 data points. OpenCL is for when you need millions of calculations each millisecond.

In any event, how you store the data is irrelevant to performing calculations on it. Extracting data from storage is a slow operation no matter how you do it. Performing 5000 calculations is instantaneous no matter where the data came from.

Another possibility is using a familiar format, like XML, and entering a string of values as a keyed value? Or is reading that string the problem you are mainly talking about?

Core Data can store data in XML. You can store your data in your own XML schema and that would be better than CSV. Raw XML doesn't handle versioning like Core Data does. XML is better as a published schema for cross-platform or cross-technology applications.

Reply

Answer 6

mark133 Author

Level 1

94 points

Oct 20, 2012 9:53 AM in response to etresoft

That's a lot of meat and potatoes. Thanks etresoft!

I don't know exactly what CSV stands for, and I'm surprised I can't guess.

I really needed that ratio you gave, that the number of calculations is not at all an issue, only getting the data into direct access memory (if you'll pardon the uneducated stabs at the correct terms). That helps a lot, as I have been stuck not knowing whether to load the data in parts, do the calculations in parts, or move the data in parts. Now I see it's best to load up as much data as needed into the local/direct access memory and keep it there as instance variables or properties. I still don't have too good of an idea about how much data can be held in the direct access locations, which varies by machine? But I guess that is what core data is mostly for? So it looks like the problems I will run into are the problems that core data solves.

Then what about the file format for core data storage? I'm also nervous about where those files should be stored to be easily accessible for sharing over the internet.

Reply

Answer 7

mark133 Author

Level 1

94 points

Oct 20, 2012 10:00 AM in response to mark133

I mean where they should be stored for both compliance with data file location standards and also to be shared over the internet. Is simply using MyDocuments as the default location acceptable?

Reply

Answer 8

mark133 Author

Level 1

94 points

Oct 20, 2012 10:02 AM in response to mark133

What are the main considerations when choosing a CoreData file format?

Reply

Answer 9

mark133 Author

Level 1

94 points

Oct 20, 2012 10:27 AM in response to mark133

I guess I'm still having trouble trying to understand how each float value being a managed object can be efficient. I don't for a millisecond doubt that it is efficient, I just find it difficult to understand tens of thousands of numeric values, each assigned an identity as a managed object, a key and an index, could be efficiently loaded from a file. But I'm not alone in being baffled with what is here at my fingertips. We're all still a bit stunned, I think, for another decade or so at least anyway.

SO when I load data from a core data source file, if the data goes through several stages of calculations and comparisons, it sounds like I should take the data point by point through all the code, and then loop back for another point, rather than trying to take all the data through step-by-step as one unit and moving that unit into another location at each step of the way?

Reply

Answer 10

mark133 Author

Level 1

94 points

Oct 20, 2012 12:24 PM in response to mark133

I guess CoreData is pretty self-explanatory. Seems that now my main difficulty is getting over the nerves of attaining another exciting level of success with Cocoa. Core Data is a powerful set of tools now that I see that it works as well with simple data sets as it does with more complicated ones.

When I look at the potential of using queues, and the ease of using all these tools, I am stuck between looking to the exciting future when everyone knows how to use these tools with ease and the reality that even I am stuck in the middle of development and the inertia of having to learn and apply something new.

Reply

Answer 11

mark133 Author

Level 1

94 points

Oct 20, 2012 12:26 PM in response to mark133

With such difficulty we trudge through the paths to freedom, as if we must necessarily bear some price for the reward.

Reply

Answer 12

etresoft

Level 8

46,128 points

Oct 21, 2012 7:55 AM in response to mark133

mark133 wrote:

I don't know exactly what CSV stands for, and I'm surprised I can't guess.

Comma Separated Value. It sounds great, quick, and easy during testing. In real life it is a nightmare. How do you handle commas in the data? With quotes? Then how do you handle quotes? What about Unicode? What about new lines? It's a mess.

I still don't have too good of an idea about how much data can be held in the direct access locations, which varies by machine?

All of it. Seriously, modern machines have gigabytes of RAM. If you are managing 5000 doube-precision values, that is something like 48k.

But I guess that is what core data is mostly for? So it looks like the problems I will run into are the problems that core data solves.

Core data is just an abstraction layer that scales well and gives you versions. Otherwise, if you need to add a field later on it becomes extremely difficult. There is also the NSCoding protocol that allows you to read and write entire containers of NSObject types. That has versioning too, but it doesn't scale as well as core data does. Core Data is the future.

Then what about the file format for core data storage?

You can use either XML or SQLite. iOS only supports SQLite. XML is easier to debug.

I'm also nervous about where those files should be stored to be easily accessible for sharing over the internet.

I mean where they should be stored for both compliance with data file location standards and also to be shared over the internet. Is simply using MyDocuments as the default location acceptable?

Sharing over the internet is a whole different question.

I guess I'm still having trouble trying to understand how each float value being a managed object can be efficient. I don't for a millisecond doubt that it is efficient, I just find it difficult to understand tens of thousands of numeric values, each assigned an identity as a managed object, a key and an index, could be efficiently loaded from a file.

You don't have to store each value as a row. You could store the data all in one field if you really wanted to. However, that would be premature optimization. It is always easier to optimize a working system that build an optimally efficient one.

SO when I load data from a core data source file, if the data goes through several stages of calculations and comparisons, it sounds like I should take the data point by point through all the code, and then loop back for another point, rather than trying to take all the data through step-by-step as one unit and moving that unit into another location at each step of the way?

I don't know what you are really doing. Conceptually, you would load all the data from storage and then operate on each element in a loop. Core Data handles the "load all the data from storage" part for you. You just start your loop and it will efficiently pull in chunks of data as it needs it.

I guess CoreData is pretty self-explanatory

I wouldn't say that. It has taken me years to figure it out and I still don't have a firm grasp. Of course, I have only been working part time on it and that really holds me back.

Reply

Answer 13

mark133 Author

Level 1

94 points

Oct 21, 2012 1:11 PM in response to etresoft

Now you've got me thinking again. Much of the functionality of the entire platform I'm designing rests on users being able to share data across platforms quickly and easily. I find little trouble with the string programming objects when it comes to setting up parsing, even testing for text encodings, relative to the entire project. It sounds like text files might still be a reasonable possibility, even if the contents are loaded into core data for superior handling.

Reply

Answer 14

mark133 Author

Level 1

94 points

Oct 21, 2012 1:41 PM in response to mark133

But then again, what's the difficulty with adding a plug-in (if that's the right name for it) that translates iOS data into XML or SQLite, etc? I know you've kept trying to tell me to just get 'er done and fix it later, but is that really appropriate lately? I mean, there are standards falling into place. To take the correct route today really does mean more than it used to.

So this question of designing a platform where one of the core capabilities depends on transferring document-form data between various platforms quickly and easily is still worth the time to consider. Text itself, even if there are two or three acceptable standards, is still the most universal form of readable data? In this case it might be worth considering? And certainly worth discussing as long as there is anyone with experience who is willing and able to give me new information, new insights, and good perspectives on the topic. Currently, you are the only developer I know who fits the bill, and I am fortunate that you are willing to discuss the topic with me on this. Of course, any perspectives or views, even echoing yours, would also be appreciated if available.

Thank you again for your time and experience with this topic of common data sharing formats as related to the use of CoreData in application development.

Reply

Answer 15

etresoft

Level 8

46,128 points

Oct 21, 2012 2:08 PM in response to mark133

mark133 wrote:

Much of the functionality of the entire platform I'm designing rests on users being able to share data across platforms quickly and easily.

In that case, it looks like you need to concentrate first of building a web-based platform to serve that data. Clients can use a REST architecture to view and update the data using either XML or JSON. I recommend JSON. You wouldn't need Core Data at all unless you were planning on supporting offline access.

Text itself, even if there are two or three acceptable standards, is still the most universal form of readable data?

You are on the right track. Both XML and JSON are text based. The important part is that they both provide a semantic structure to your data that you can very easily translate into some sort of NSObject.

Reply