methods to save data

A forum for general discussion of the Python programming language.

methods to save data

Postby metulburr » Tue May 21, 2013 6:25 pm

I was looking for the numerous methods of databases and when to use which one or the other
when i say databases i mean any method of saving state information.

These are really the only ways i know:
1) just a simple text file (which i assume is for small programs or even single line state info)
2) shelve and pickle (which seems also for small amounts of data) (and pickle to NOT be used with the net?)
3) JSON (which to be honest i dont really know the purpose, I have always used it to store data in a file that could manuelly be changed, and still be able to return it back to the program as an object)
4) sqlite3 (which i believe is just a lightweight MySql ?)
5) MySQLdb (which I if i remember correctly is outdated or wasnt planning on being ported to python3, i dont remember)

Are there any other methods? I know the sqlite3 and mysql ones would be used for large programs with a lot of data, but are there any more for small programs, that also dont require the inner knowledge of syntax of MySql?
New Users, Read This
OS Ubuntu 14.04, Arch Linux, Gentoo, Windows 7/8
https://github.com/metulburr
steam
User avatar
metulburr
 
Posts: 1560
Joined: Thu Feb 07, 2013 4:47 pm
Location: Elmira, NY

Re: methods to save data

Postby setrofim » Tue May 21, 2013 6:56 pm

Some comments on your list:
  1. You can use text files for programs and data of any size. E.g. large applications often use text files to store log output, and those files can reach dozens, or even hundreds of megabytes.
  2. Neither should really be used in production environment (whether on the net or not). shelve and pickle use the same underlying mechanism and just expose different APIs.
  3. JSON was originally designed for Javascript serialisation (it stands for JavaScript Object Notation), but since then has become very widely used. It is (or, at least, can be) reasonly human-readable and is also very easily parsed by a machine. It is pretty simple and compact (unlike XML, which it has largely replaced). JSON is typically used to serialise data for transmission across the network or IPC; but it is also used for persisting data to disk. Basically, if you're not sure how to structure your data for serialization, JSON is often a safe bet.
  4. Basically, yes. Unlike most other SQL database, sqlite doesn't use a DBMS server to manage the database (a separate server process that you application needs to connect to to access data). Instead, it's just a single binary file that is parsed by a library. This has both advantages and limitations.
  5. MySQLdb is only one module for connecting to a MySQL database. On Python 3, you can use PyMySQL for the same purpose. So the fact that MySQLdb is old and not supported by Python 3 should not stop you from using MySQL in your projects. MySQL (and other server-based databases) is pretty heavy-duty compared to other approaches listed. You will typically use those when several applications (or several instances of an application) may be accessing the data at the same time, or when you have a lot of data and you need to run non-trivial queries on it.

These are not on your list: XML and bespoke binary format (basically, using struct module to pack the data into a file in whatever format makes sense for your application), which is just as well. The former has been largely superceeded by JSON, and the latter only really makes sense in very specialist situations.

There is also csv/tsv (which are basically tables in a text file). Their main advantage is that they can be easily opened in data analysis applicatoins (Excel, R, Matlab, etc), so they are ofter used as "export" formats for data from other sources.

Finally, there are so called NoSQL databases (MongoDB, redis, CouchDB, Cassandra, etc). These are used in similar ways to SQL databases but in situations where a rigid SQL schema is too constricting or when they offer performance benefits for the scenarios you're interested in.
setrofim
 
Posts: 288
Joined: Mon Mar 04, 2013 7:52 pm

Re: methods to save data

Postby metulburr » Tue May 21, 2013 8:05 pm

thanks setrofim. Good explanation
New Users, Read This
OS Ubuntu 14.04, Arch Linux, Gentoo, Windows 7/8
https://github.com/metulburr
steam
User avatar
metulburr
 
Posts: 1560
Joined: Thu Feb 07, 2013 4:47 pm
Location: Elmira, NY

Re: methods to save data

Postby Crimson King » Wed May 22, 2013 6:49 am

Nice answer setrofim.

Is there a situation where you'd pick XML over JSON to serialize your data? Haven't found any useful answers to that question, only one that says XML beats JSON when the number of elements to encode surpasses 5000 (no benchmarks found whatsoever)

And also wanted to add one more to the list: YAML which is a superset of JSON. Found it a few days ago, just tried a couple of basic things, nothing big, but seems pretty nice.

I'll leave the links to Python's implementation of YAML and a basic tutorial:

PyYAML
Yaml Introduction
User avatar
Crimson King
 
Posts: 132
Joined: Fri Mar 08, 2013 2:42 pm
Location: Buenos Aires, Argentina

Re: methods to save data

Postby setrofim » Wed May 22, 2013 9:24 am

Crimson King wrote:Is there a situation where you'd pick XML over JSON to serialize your data?

Good question. In general, no (unless you're saving data to an existing format that uses XML, e.g. bookmarks, or ASX playlists). There are some edge cases. XML has a bunch of features that are rarely used (or useful), which typically just add to reasons for not using it. Sometimes such features are useful for what you're doing, and then you might consider using XML. For example, XML supports schema validation -- a document can contain a link to a DTD or XSD schema that can be used to make sure that the document contains a valid structure; the ability to query large documents using XPath can also come in handy; namespaces, while generally a nuisance, may be useful in some environments (where element name clashes are likely). All of these aren't that common, and if you find you really need them, then you're probably doing something wrong (e.g. if you find you rely on XPath a lot to navigate your data, you should probably be storing it in a proper database).

Crimson King wrote:only one that says XML beats JSON when the number of elements to encode surpasses 5000 (no benchmarks found whatsoever)

Hm, well that would largely depend on the specific parsers used. XML does have a streaming SAX parser as well as DOM, which would make it more suitable for dealing with large documents (as you don't have to build the whole structure in memory). I am not aware of streaming JSON parsers, though in principle, there is not reason why there couldn't be one. But then, if you have 5000 elements in a single text file, something is probably wrong (maybe use a DB, or if it contains settings or configuration, break it up into several files). Large text files are usually hard to work with regardless of format. And XML is a verbose format, so if you do have a lot of data, XML is not going to be an efficient way of storing it.

Crimson King wrote:And also wanted to add one more to the list: YAML

Ah yes, forgot about that one. Thanks for pointing it out. Haven't played much with it either, but it seems to be a sort of half-way house between XML and JSON in terms of complexity and seems to mostly be used for configuration files and such.
setrofim
 
Posts: 288
Joined: Mon Mar 04, 2013 7:52 pm


Return to General Discussions

Who is online

Users browsing this forum: No registered users and 2 guests