Become a Python Data Science Hacker

pca = PCA()
pca.fit(X)
cumsum = np.cumsum(pca.explained_variance_ratio_) #(2)
d = np.argmax(cumsum >= 0.95) + 1 #(1)
print(d)
  1. cumsum >= 0.95 gives the following output. When you do argmax for this, you will get the index of the first True value. Thus, we will find "the first value that is at least 95%"

    array([False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True])
    

  2. np.cumsum Helps us find the cumulative sum of the given values. In PCA, we always get values in the descending order, so if we wish to find how many components to use to explain at least 95% of the variance. This is the way to go.

Comments