Storing Data

Flat files contain tabular data in plain text format with one data record per line and each record or line having one or more fields. These fields are separated by delimiters, like commas, tabs, or colons.

Advantages of flat files include:

  • They're text files and therefore human readable.

  • Lightweight.

  • Simple to understand.

  • Software that can read/write text files is ubiquitous, like text editors.

  • Great for small datasets.

Disadvantages of flat files, in comparison to relational databases, for example, include:

  • Lack of standards.

  • Data redundancy.

  • Sharing data can be cumbersome.

  • Not great for large datasets (see "When does small become large?" in the Cornell link in More Information).

The advantages and disadvantages of flat files were discussed earlier in the lesson in the Flat File Structure concept. One of the advantages:

Great for small datasets.

And one of the disadvantages:

Sharing data can be cumbersome.

Given the size of the dataset in our example with imdb movies and that it likely won't be shared often, saving to a flat file like a CSV is probably the best solution. With pandas, saving your gathered data to a CSV file is easy. The to_csvDataFrame method is all you need and the only parameter required to save a file on your computer is the file path to which you want to save this file. Often specifying index=False is necessary too if you don't want the DataFrame index showing up as a column in your stored dataset. If you had a DataFrame, df, and wanted to save to a file named dataset.csv with no index column:

df.to_csv('dataset.csv', index=False)

Last updated