Become a Python Data Science Hacker
pca = PCA()
pca.fit(X)
cumsum = np.cumsum(pca.explained_variance_ratio_) #(2)
d = np.argmax(cumsum >= 0.95) + 1 #(1)
print(d)
-
cumsum >= 0.95gives the following output. When you doargmaxfor this, you will get the index of the firstTruevalue. Thus, we will find "the first value that is at least 95%"
array([False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True]) -
np.cumsumHelps us find the cumulative sum of the given values. In PCA, we always get values in the descending order, so if we wish to find how many components to use to explain at least 95% of the variance. This is the way to go.