Machine Learning A-Z, Section 7. Support Vector Regression (SVR), Lecture 67, ValueError: Expected 2D array, got 1D array instead:

So I was going through the wonderful course Machine Learning A-Zā„¢: Hands-On Python & R In Data Science and got an error on Lecture 67 with this traceback

/home/rob/.local/lib/python3.6/site-packages/sklearn/utils/validation.py:475: DataConversionWarning: Data with input dtype int64 was converted to float64 by StandardScaler.
  warnings.warn(msg, DataConversionWarning)
Traceback (most recent call last):
  File "/home/rob/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-4c6eabe008f6>", line 21, in <module>
    y = sc_y.fit_transform(y)
  File "/home/rob/.local/lib/python3.6/site-packages/sklearn/base.py", line 517, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "/home/rob/.local/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 590, in fit
    return self.partial_fit(X, y)
  File "/home/rob/.local/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 612, in partial_fit
    warn_on_dtype=True, estimator=self, dtype=FLOAT_DTYPES)
  File "/home/rob/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 441, in check_array
    "if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[  45000.   50000.   60000.   80000.  110000.  150000.  200000.  300000.
  500000. 1000000.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

I did some digging around to the error and found this stackoverflow post.

Essentially the issue is that your y values is a 1-d array, and StandardScalar is expecting a 2d array. there are two solutions to this issue.

Solution 1

Use scale function. StandardScaler is just a wrapper over this function.

from sklearn.preprocessing import scale
y = scale(y)

with the full code up to this point in the tutorial being

from sklearn.svm import SVR
regressor = SVR(kernel="rbf")
regressor.fit(X,y)

# Predicting a new result
y_pred = regressor.predict(6.5)

# Visualising the Regression results
plt.scatter(X, y, color = 'red')
plt.plot(X, regressor.predict(X), color = 'blue')
plt.title('Truth or Bluff (Regression Model)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

# Visualising the Regression results (for higher resolution and smoother curve)
X_grid = np.arange(min(X), max(X), 0.1)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X, y, color = 'red')
plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
plt.title('Truth or Bluff (Regression Model)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

Solution 2

reshape your y to a 2-d array

sc_y = StandardScaler()
y = np.array(y).reshape(-1,1)
y = sc_y.fit_transform(y)

with the full code up to this point in the tutorial being

# SVR
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values

# Splitting the dataset into the Training set and Test set
"""from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)"""

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X = sc_X.fit_transform(X)

sc_y = StandardScaler()
y = np.array(y).reshape(-1,1)
y = sc_y.fit_transform(y)

# Fitting the SVR to the dataset
from sklearn.svm import SVR
regressor = SVR(kernel="rbf")
regressor.fit(X,y)

# Predicting a new result
y_pred = regressor.predict(6.5)

# Visualising the Regression results
plt.scatter(X, y, color = 'red')
plt.plot(X, regressor.predict(X), color = 'blue')
plt.title('Truth or Bluff (Regression Model)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

# Visualising the Regression results (for higher resolution and smoother curve)
X_grid = np.arange(min(X), max(X), 0.1)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X, y, color = 'red')
plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
plt.title('Truth or Bluff (Regression Model)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

This will fix the errors that you are having with sklearn and remember, “enjoy machine learning” šŸ˜‰