Python - Cheat Sheet

df.to_csv(filename) # Writes to a CSV file
df.to_excel(filename) # Writes to an Excel file
df.to_sql(table_name, connection_object) # Writes to a SQL table
df.to_json(filename) # Writes to a file in JSON format
df.to_html(filename) # Saves as an HTML table
df.to_clipboard() # Writes to the clipboard

Defining, then Coding and Testing Immediately

In reality, it is often more practical to define a cleaning operation, then immediately code and test it. The data wrangling template still applies here, except you'll have multiple Define, Code, and Testsubheadings, with third level headers (###) denoting each issue, as displayed below.

Saving and writing

Remember, set index=False to avoid saving with an unnamed column!

Selecting

We can select data using loc and iloc, which you can read more about here. loc uses labels of rows or columns to select data, while iloc uses the index numbers. We'll use these to index the dataframe below.

Another useful function that we’re going to use is pandas' query function.

In the previous lesson, we selected rows in a dataframe by indexing with a mask. Here are those same examples, along with equivalent statements that use query().

The examples above filtered columns containing strings. You can also use query to filter columns containing numeric data like this.

Groupby

Describing

Useful exploratory methods:

Cleaning

Fixing data types

Regular expressions

Splitting entries into 2 columns

Renaming Columns

The following helps us determine duplicated columns in different tables

Helpful script

Tidiness

Last updated

Was this helpful?