Cheatsheet of Jupyter Notebook and Pandas
Overview
This article is a personal cheatsheet for frequent usecase of Jupyter Notebook (Colaboratory) and pandas.
Spreadsheet Manipulation
Load s/s values to Dataframe
Connect to Google Spreadsheet from Colaboratory.
# Authorize.
from google.colab import auth
auth.authenticate_user()
import gspread
from google.auth import default
creds, _ = default()
sskey='YOUR_SS_KEY' # Find value from URL.
gc = gspread.authorize(creds)
workbook = gc.open_by_key(offer_sskey)
sheet = workbook.worksheet('SHEET_NAME')
# Load all values as DataFrame.
import pandas as pd
df = pd.DataFrame(sheet.get_all_values())
Pandas
Manipulation
Put first row to header of Dataframe
df = df.rename(columns=df.iloc[0])
# Drop the header row from DataFrame.
df = df.drop([0])
Join two DataFrames
df3 = df.merge(df2, on='id')
Cast type
Value types can be marked as object
. In such case describe after grouping won't work well.
Casting the value as desirable type would be good.
df['key'] = df['key'].astype("int")
Rename column
df.rename(columns={'key_original': 'key_renamed', 'before':'after'})
Set Index to column
df.reset_index(level=0)
Grouping
group = df.groupby(['key'])
group.describe()
Sort
df.sort_values('key', ascending=False)