So if you are trying to do build a classifier using sklear.linear_model & came across the error:
ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0
This is because of a bug in sklearn.linear_model module. Sparkit trains sklearn’s linear models in parallel, then averages them in a reduce step. There is at least one block, which contains only one of the labels. So few of the blocks end up with only one class in it. This error is very likely to occur if you have a few classes and large dataset & dataset is ordered by labels.
The solution for this is to randomise the dataset. Say X is your dataset and Y is your label array:
import numpy as np
Z = np.c_[X.reshape(len(X), -1), Y.reshape(len(Y), -1)]
X2 = Z[:, :X.size // len(X)].reshape(X.shape);
Y2 = Z[:, X.size // len(X):].reshape(Y.shape);
Note: do not use names like X2, Y2 in production environment.
def randomize(X, Y):
permutation = np.random.permutation(Y.shape)
X2 = X[permutation,:,:]
Y2 = Y[permutation]
return X2, Y2
X2, Y2 = randomize(X, Y)