How to predict Ghosts?

Yesterday Kaggle finished a competition on how to recognise the Ghost/Goblin/Ghoul. Kaggle is a platform where data scientists can challenge themselves via different competitions, to either solve a problem being experienced, or by Kaggle themselves, just for fun. This competition was just for fun.
It was easy to win - you just had to overfit the model to leaderboard. But in this post I'll try to show you how to start working with data and make a prediction.

Let's start!

What we will use: Python with Pandas and Sklearn.
First read the data:

import pandas as pd
df = pd.read_csv('train.csv')

We can show a head of data to look closely at what type of data is in this dataset:

df.head()

	id	bone_length	rotting_flesh	hair_length	has_soul	color	type
0	0	0.354512	0.350839	0.465761	0.781142	clear	Ghoul
1	1	0.575560	0.425868	0.531401	0.439899	green	Goblin
2	2	0.467875	0.354330	0.811616	0.791225	black	Ghoul
3	4	0.776652	0.508723	0.636766	0.884464	black	Ghoul
4	5	0.566117	0.875862	0.418594	0.636438	green	Ghost

Our job is to predict the type.
First step to predict it will be splitting the data into train and test.

from sklearn.cross_validation import train_test_split
col_name = ['bone_length', 'rotting_flesh', 'hair_length', 'has_soul'] # we use only numeric data
X_train, X_test, y_train, y_test = train_test_split(df[col_name], df['type']) # try to predict type

Now in X_train we have 80% of our set. I only used numeric data.
We can build the first model (baseline), this will be the base score and, after building this model, I will try to improve the score.

from sklearn.metrics import accuracy_score, classification_report
from sklearn.linear_model import SGDClassifier # import classifier SGD
base_line = SGDClassifier() #create an instance of classifier
base_line.fit(X_train, y_train) #train classifier
predict = base_line.predict(X_test) # make prediction
print accuracy_score(y_pred = predict, y_true = y_test) # check the accuracy
print classification_report(y_pred= predict, y_true = y_test) #print the report

The most interesting thing is the classification report, which looks like this:

	precision	recall	f1-score	support
Ghost	0.96	0.76	0.85	29
Ghoul	0.91	0.31	0.47	32
Goblin	0.51	0.94	0.66	32
avg / total	0.79	0.67	0.65	93

The accuracy score is (this is the score we want to beat):
0.6666

To improve this result I decided to use SVM.

from sklearn.svm import SVC # import SVC
svm= SVC() # use default setting
svm.fit(X_train, y_train)
predict = svm.predict(X_test)
print accuracy_score(y_pred = predict, y_true = y_test)
print classification_report(y_pred= predict, y_true = y_test)

The result:

	precision	recall	f1-score	support
Ghost	0.75	0.93	0.83	29
Ghoul	0.79	0.84	0.82	32
Goblin	0.74	0.53	0.62	32
avg / total	0.76	0.76	0.75	93

The accuracy:
0.76
which is better than the previous one.

To beat this result we can try to use a different classifier e.g xgboost, random forest tree, neural network or we can create classifier using a set of classifiers?