Created
September 15, 2015 08:31
-
-
Save alistairwalsh/e451a41f388ebb9a5806 to your computer and use it in GitHub Desktop.
Impute missing values
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
from sklearn.preprocessing import Imputer | |
#generate some data | |
df1 = np.array(np.random.randn(1000)).reshape(100,10) | |
#make some values 'NaN' | |
df1[(df1>-.05) & (df1<.05)] = np.nan | |
X = df1 | |
print(X) | |
imp = Imputer(missing_values='NaN', strategy='mean', axis=0) | |
X = imp.fit_transform(X) | |
print(X) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
the working bits of code are:
Where you create an instance of the Imputer
missing_values='NaN', strategy='mean', axis=0 are all defaults so don't actually need to be stated. It looks at the other values in a column (feature) to generate new values by default (axis = 0)
and
Which actually fits the Imputer to the data and creates a new array with the missing values filled