Skip to content

Instantly share code, notes, and snippets.

@ChengLuFred
Last active October 4, 2020 01:12
Show Gist options
  • Save ChengLuFred/8d7325e054cb218a0c50f6137c79f4d2 to your computer and use it in GitHub Desktop.
Save ChengLuFred/8d7325e054cb218a0c50f6137c79f4d2 to your computer and use it in GitHub Desktop.
[Data Process] #Python

0 Package

import pandas as pd

1 Index and locating

create multiple index and locate data according to multi-level index

# set cik and year as index
data_target = data_target.set_index(['cik', 'year'])

# get one index value from multiple index
cik_list =  data_target.index.get_level_values('cik').values

# get subset of data
data_1996 = data.loc[1995]
data_cik_3766 = data.loc[3766]
data_1996_cik_3766 = data.loc[1995,3766]

2 Remove duplicated index

In pandas data frame we can remove duplicated index

data.index.duplicated(keep='first') # the value passed to keep can be modified

0 pacakge

from datetime import datetime

1 Create date object in data

create new date object column in data frame

# specify date format
date_format = '%m/%d/%Y' # Y for year, m for month, d for day

# convert each string in date column to date object, x is a string
data['date'] = [datetime.strptime(x, date_format) for x in data['date']]

# extract year from date object
data['year'] = [x.year for x in data['date']]
data['month'] = [x.month for x in data['date']]

select subset of data according to date

year_range = pd.date_range(start='1995', end='2019', freq='Y')
data_1995 = data.loc[data['date'] <= year_range[0]]

1. Delete specific row in data fram

We can delete selected rows using operators ~

row_list = [1, 2, 3]
row_index = data.iloc[row_list].index
data = data.loc[~row_index]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment