Cheatsheet of Jupyter Notebook and Pandas


Overview

This article is a personal cheatsheet for frequent usecase of Jupyter Notebook (Colaboratory) and pandas.

Spreadsheet Manipulation

Load s/s values to Dataframe

Connect to Google Spreadsheet from Colaboratory.

# Authorize.
from google.colab import auth
auth.authenticate_user()

import gspread
from google.auth import default
creds, _ = default()

sskey='YOUR_SS_KEY' # Find value from URL.

gc = gspread.authorize(creds)

workbook = gc.open_by_key(offer_sskey)
sheet = workbook.worksheet('SHEET_NAME')

# Load all values as DataFrame.
import pandas as pd
df = pd.DataFrame(sheet.get_all_values())

Pandas

Manipulation

Put first row to header of Dataframe

df = df.rename(columns=df.iloc[0])
# Drop the header row from DataFrame.
df = df.drop([0])

Join two DataFrames

df3 = df.merge(df2, on='id')

Cast type

Value types can be marked as object. In such case describe after grouping won't work well.
Casting the value as desirable type would be good.

df['key'] = df['key'].astype("int")

Rename column

df.rename(columns={'key_original': 'key_renamed', 'before':'after'})

Set Index to column

df.reset_index(level=0)

Grouping

group = df.groupby(['key'])
group.describe()

Sort

df.sort_values('key', ascending=False)