Do Video Game Descriptions Help Determine a Game's Overall Rating?¶

by: J.Roberge

How do descriptions effect a video game's performance. Do high ranking video games have a more detailed description and most importantly, can this description be used to predict a video games performance? This analysis will utilize a previously curated dataset that has tokenized the description of 10,000 video games which were found on itunes. The issues going forward will be how to most effectively deal with an extremely sparse matrix and which type of model performs best on this type of dataset. This analysis will be broken down into three sections: section One, will deal with outlier detection; section two, will reduce the dimensions; and section 3 will fit the models.

Please note, that the way I fit these models is not 'correct' (you shouldn't touch your y_test until the very end), but method used was only employed do to the constraints of the assignment (which can be found here)

Table of Contents

Outlier Detection
Dimension Reduction
Model Training

Model Dictionary
Model Function
All Possible models
All Possible models with Feature Selection

### importing dependencies ####
import pandas as pd
import numpy as np

### dimension reduction teckniques
from sklearn.decomposition import PCA
from sklearn.decomposition import SparsePCA
from sklearn.manifold import TSNE


### Validation techniques/ search techniques
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.model_selection import ParameterGrid


## models 
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from xgboost import XGBClassifier

### outlier
from sklearn.preprocessing import StandardScaler
from sklearn.covariance import EllipticEnvelope
from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor

### plotting
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns

### feature selection
from xgboost import plot_importance
from sklearn.feature_selection import SelectKBest, chi2, f_regression

### reading in Data Frame
%cd "C:\Users\jwr17\OneDrive - University of New Hampshire\machine learning\quiz_2\Quiz 2 ML"
### importing data sets ###
df_token=pd.read_csv("Apple1000games.csv", index_col=0)

df_des=pd.read_csv("apple1000new.csv")

C:\Users\jwr17\OneDrive - University of New Hampshire\machine learning\quiz_2\Quiz 2 ML

### mapping to categorical 

# df_des
cuts=pd.cut(df_des['Average User Rating'], [-1, 2.99, 3.99, 5.5], labels=['Poor', 'Good', "Great"])

# adding cuts to the description csv
df_des['User_cat']=cuts

### checking for missing values
df_des.User_cat.isna().sum()

### im going to impute missing values to Poor do to the size of the
### data set (loosing 100 rows is significant and i believe it is safe to assume that games without a ratting are more than lickley poor)
df_des['User_cat']=df_des.User_cat.fillna('Poor')
print("Shape fo description file", df_des.shape)
display(df_des.head(3))

print("\nShape fo token file", df_token.shape)
display(df_token.head())

Shape fo description file (1000, 29)

Shape fo token file (1000, 2000)

Section One: Outliers Detection¶

I'm currently at a crossroads with outlier detection. This is a sparse matrixs and I worry that a cell with any sort of value in it may be considered an outlier. In order to ensure I'm not just getting rid of values good data I will sort thorough what is considered an outlier.

#### outlier Detection ####

# I am currently at a corssroads with outlier detection. Im a little worried about the data being sparse and trying to run outlier
# analysis on it. 

### standardization ####
standard=StandardScaler().fit_transform(df_token)
df_token=pd.DataFrame(standard, columns=df_token.columns)

### mahalanobis ###
clf = EllipticEnvelope(contamination=.05,random_state=0)
clf.fit(df_token)
Envelope_prediction= clf.predict(df_token)
envelope_scroes= pd.Series(clf.decision_function(df_token)) 

### isolation forest ###
clf = IsolationForest( n_estimators=400, random_state=4, n_jobs=-1, contamination=.05)
clf.fit(df_token)
outliers=clf.predict(df_token)
TagRemovePreprocessor.remove_all_outputs_tags

After reviewing the outliers it seems that the mahalanobis and isolation forest performed well. From here I will drop everything that it considered to be an outlier.

#### dropping outliers ###
outliers_master=pd.concat([pd.Series(Envelope_prediction), pd.Series(outliers)], axis=1)
outliers_master.columns=['mahalanobis', 'isolation']
outliers_master[outliers_master.isolation<1]
print("The shape of the outlier dataframe: ",outliers_master.shape)
display(outliers_master.head(3))

### Droping outliers
df_token=df_token[(outliers_master.isolation==1) | (outliers_master.mahalanobis ==1)]
df_des=df_des[(outliers_master.isolation==1) | (outliers_master.mahalanobis ==1)]

print("The shape of the token dataframe after outlier drop:", df_token.shape)
display(df_token.head(3))

print("The shape of the description dataframe  after outlier drop:", df_des.shape)
display(df_des.head(3))

The shape of the outlier dataframe:  (1000, 2)

The shape of the token dataframe after outlier drop: (991, 2000)

The shape of the description dataframe  after outlier drop: (991, 29)

Section Two: Dimension Reduction¶

Currently there is around 10,00 features in this dataset, and these features are extremely sparse. In order to get a working model, dimension reduction must take place. I will employ three dimension reduction techniques: first, I will use Principle Component Analysis and I will keep 90% of the variation; Second, I will employ Sparse PCA and keep the top 20 Components; and third I will use T_SNE and keep three components.

## pca's
pca=PCA(0.9).fit_transform(df_token)

master = pd.DataFrame(pca, columns=['PC_'+str(i) for i in range(1,pca.shape[1]+1)])

## sparse pca
sparse=SparsePCA(n_components=20, n_jobs=-1).fit_transform(df_token)
sparse= pd.DataFrame(sparse, columns=['Sparse_'+str(i) for i in range(1,21)])
master=pd.concat([sparse,master], axis=1)
del(sparse)
del(pca)

print("Shape of the reduced dimension df with sparse and pca", master.shape)
display(master.head(3))

C:\Users\jwr17\Anaconda3\envs\PyhtonAndR\lib\site-packages\sklearn\decomposition\sparse_pca.py:170: DeprecationWarning: normalize_components=False is a backward-compatible setting that implements a non-standard definition of sparse PCA. This compatibility mode will be removed in 0.22.
  DeprecationWarning)

Shape of the reduced dimension df with sparse and pca (991, 458)

T_SNE¶

After running through several interactions, and after doing some research I have decided to change the measuring metric on T_SNE from euclidean to cosine distance. According to some research, changing the distance measurement to cosine empirically performs better on a sparse matrices. Additionally, the plot does seems to be slightly better. (the plot will print outside of the notebook)

### t_sne
TSNE()
X_embedding = TSNE(metric='cosine',n_components=3, perplexity=1000, n_iter = 500, learning_rate=500).fit_transform(df_token,)

### 3d Graph Prints outside notebook ###
get_ipython().run_line_magic('matplotlib', 'inline')
df_sne_3d=pd.DataFrame(X_embedding, columns=['tsne_1',"tsne_2","tsne_3"])
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
color=[]
for a in df_des['User_cat']:
    if a=='Great':
        color.append('green')
    elif a=='Good':
        color.append('yellow')
    elif a=='Poor':
        color.append('red')
    else:
        color.append('NaN')
ax.scatter3D(xs=df_sne_3d.tsne_1, ys=df_sne_3d.tsne_2, zs=df_sne_3d.tsne_3,c=color, marker='o')

<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x21507d88908>

### appending t-sne to master
master=pd.concat([master, df_sne_3d],axis=1)
print("Shape of the reduced dimension df with tsne, sparse and pca", master.shape)
display(master.head(3))

### Cleaning up my memory
del(df_sne_3d, cuts, color,ax,fig,standard,outliers,outliers_master,Envelope_prediction,X_embedding,a)

Shape of the reduced dimension df with tsne, sparse and pca (991, 464)

The following graph shows the total count of all target values. I made this graph to see the balance of the classes. In all, I would say that the class balance is pretty good, and therefor I will not pursue any type of re-sampling techniques.

### seeing how the classes balance out

get_ipython().run_line_magic('matplotlib', 'inline')
sns.countplot(df_des['User_cat'])

<matplotlib.axes._subplots.AxesSubplot at 0x1a8f66f6788>

Section Three: Training the models¶

Do to the constraint of the assignment I will be limited in my use in using gridsearchcv(), and this is because I can't fit my y_testing data on individual models when I run grid search. In order to alleviate this issue I will use Paramgrid(), which is a function that returns the grid of gridsearchcv(). This param grid will be used to fit indivdual models throughout the analysis. The fitting process will be broken down into four parts. Part one, will be the dictionary layout for paramgrid. Part two will be the functions that I plan to use for the modeling fitting. Part three will be fitting all possible models using four types of features which are all possible features, TSNE features, Sparse PCA features and PCA features. The Last Part will fit models all possible models based on feature importance.

Part One: Param Dictionaries¶

#### dictionaries for paramgrid ###

knn_params={
    'p':[2],
    'n_neighbors':[5,10,20,50],
    'n_jobs':[-1]
}

random_params={
    'n_estimators':[100,500],
    'min_samples_split':[5],
    'min_samples_leaf':[5],
    'n_jobs':[-1]
}
gradient_params={
    'learning_rate':[.1,.01,.001],
    'n_estimators':[100,500],
    'subsample':[.6,.8,1],
    'min_samples_split':[5],
    'min_samples_leaf':[5],
    'random_state':[4],
}
xg_params={
    'learning_rate':[.1,.01,.001],
    'n_estimators':[100,500],
    'subsample':[.6,.8,1],
    'max_depth':[5,7,9],
    'n_jobs':[-1]
}

master_params=[knn_params, random_params, gradient_params, xg_params]

Part Two: Function for the models¶

The following function takes in a model and a parameter grid and fits all possible models on a training set and then predicts the testing set

from sklearn.model_selection import ParameterGrid
from sklearn.model_selection import fit_grid_point

def get_model(model,x_train, y_train, x_test,y_test, param, scorring1, scorring2):
    """
    This function fits and returns a serries for a dataframe
    What it returns: all models given the paramter grid fit to the test data.
    """
    ### creates the paramter grid
    param_grid=ParameterGrid(param)
    ### puts the param grid into a dataframe
    df_param_grid=pd.DataFrame(param_grid)
    master_score1=[]
    master_score2=[]
    ### assigning clasifier name to the data frame
    df_param_grid['Clasifier']=model.__name__
    
    for params in param_grid:
        clf=model()
        ## setting the individual parameter grid to the function
        clf.set_params(**params)
        ### fitting the model
        clf.fit(x_train.values,y_train.astype(str).values)
        pred=clf.predict(x_test.values)
        ### calculating the score of y_test vs y_pred
        score=scorring1(y_test.astype(str).values, pred)
        print(score)
        master_score1.append(score)
        score=scorring2(y_test.astype(str).values, pred, average='macro')
        print(score)
        master_score2.append(score)
    ### assining the scores to the param grid df   
    df_param_grid[scorring1.__name__]=master_score1
    df_param_grid[scorring2.__name__]=master_score2

    return df_param_grid

x=ParameterGrid(xg_params)
for i in x:
    x=XGBClassifier(**i)
    x.fit(y_train)
    x.

{'subsample': 1,
 'n_jobs': -1,
 'n_estimators': 100,
 'max_depth': 5,
 'learning_rate': 0.1}

Part Three: Master Model Run¶

The following for loop fits all possible models (see master model list) across all possible parameters (see parameter dictionary), and then fits all models across the different dimension reduction techniques.

master_params=[knn_params, random_params, gradient_params, xg_params]
master_models=[KNeighborsClassifier, RandomForestClassifier, GradientBoostingClassifier, XGBClassifier]
target=df_des['User_cat'].astype(str)

### column list for fitting th different models
master_cols=master.columns
sparse_cols=[col for col in master.columns if 'Sparse' in col]
PC_cols=[col for col in master.columns if 'PC' in col]
TSNE=[col for col in master.columns if 'tsne' in col]
col_list=[master_cols, sparse_cols, PC_cols, sparse_cols]
col_names=['all', 'sparse', 'PCA', 'TSNE']


master_df=pd.DataFrame()

for i ,col in enumerate(col_list):
    ## splits train and Test 
    x_train, x_test, y_train, y_test= train_test_split(master[col], target,test_size=.2) 
    
    ### fits model on dimensions col
    for j ,model in enumerate(master_models):
        df=get_model(model, x_train, y_train,x_test,y_test, master_params[j], accuracy_score, f1_score)
        df['features']=col_names[i]
        master_df=pd.concat([master_df, df], axis=0)

master_df.to_csv("master_quiz_3.csv")

### Observing The output ###
print("Shape of Paramgrid: ", master_df.shape)
display(master_df.head(3))

Shape of Paramgrid:  (312, 15)

Part Four: Models from Feature Importance¶

The following are models based on feature importance. The features importance was determined through XGBoost's feature importance moduel. The model fit from XGBoost feature importance was determined using the best fit model from the previous model fitting. The features for this model run was determined using three criteria: information gain, weight, and total information gain. These features will be broken down into four categories and these categoreis will be run through all possible models.

### looking through the results i beleive a combination of features may produce the best results, for this excercise i will
### pick what seem to be the best features. 

## step one using the best model thuse far (using f1-score) and extracting the best features
x_train, x_test, y_train, y_test= train_test_split(master, target,test_size=.2) 

### fitting the top model from previous model fittings
clf=XGBClassifier(learning_rate=.1, max_depth=7, n_estimators=100, n_jobs=-1, subsample=1.)
clf.fit(x_train,y_train)

### plotting best features
### To get gain and weight chnage importance_type='total_gain'
plot_importance(clf,max_num_features=20, importance_type='total_gain')

<matplotlib.axes._subplots.AxesSubplot at 0x1a8ff343048>

#### fitting model after slecting features based on gain, weight and total gain

total_gain=['PC_1','PC_12','PC_3','PC_10','PC_220', 'PC_16','PC_45','PC_54','PC_57', 'tsne_3']
weight=['tsne_1','tsne_2','tsne_3','Pc_12', 'PC_3', 'PC_16', 'Sparse_1', 'Sparse_11','PC_9', 'PC_4']
gain=['PC_1','PC_57','PC_177','PC_408', 'PC_10', 'PC_328','PC_126','PC_195','PC_278','PC_307']
all_select=set(total_gain+weight+gain)

col=[total_gain, weight, gain, all_select]
col_names=['Total_Gain', 'Weight', 'Gain', 'All_select']

select_master=pd.DataFrame()

for i ,col in enumerate(col_list):
    ## splits train and Test 
    x_train, x_test, y_train, y_test= train_test_split(master[col], target,test_size=.2) 
    
    ### fits model on dimensions col
    for j ,model in enumerate(master_models):
        df=get_model(model, x_train, y_train,x_test,y_test, master_params[j], accuracy_score, f1_score)
        df['features']=col_names[i]
        select_master=pd.concat([select_master, df], axis=0)

select_master.sort_values('accuracy_score')

master_model=pd.concat([select_master,master_df])


master_model['Your Name']='Joshua Roberge'
master_model['Random State']=4
master_model_1=master_model.drop(['p','random_state', 'min_samples_split', 'n_jobs','min_samples_leaf'], axis=1)
master_model_1=master_model_1.rename(columns={'Clasifier': 'Algorithm'})
master_model_1.to_csv('Master_model_1.csv')

Conclusion¶

At This point in time I can not conclusively say that a video game's performance effects its outcome, nor can I say that a video games description has any sort of predictive capabilities for predicting a video games performance.

Going Forward:¶

Not all is lost! Perhaps a deeper dive into NLP could solve the issue. Going forward the use of a lexicon could prove beneficial and produce better predictive features. Additionally, I believe playing around with transformations on the matrix could improve the model's overall performance.

	Name	Icon URL	Average User Rating	User Rating Count	Price	In-app Purchases	Description	Developer	Age Rating	Languages	...	Unnamed: 19	Unnamed: 20	Unnamed: 21	Unnamed: 22	Unnamed: 23	Unnamed: 24	Unnamed: 25	Unnamed: 26	Unnamed: 27	User_cat
0	Sudoku	https://is2-ssl.mzstatic.com/image/thumb/Purpl...	4.0	3553.0	2.99	NaN	Join over 21,000,000 of our fans and download ...	Mighty Mighty Good Games	4+	DA, NL, EN, FI, FR, DE, IT, JA, KO, NB, PL, PT...	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Great
1	Reversi	https://is4-ssl.mzstatic.com/image/thumb/Purpl...	3.5	284.0	1.99	NaN	The classic game of Reversi, also known as Oth...	Kiss The Machine	4+	EN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Good
2	Morocco	https://is5-ssl.mzstatic.com/image/thumb/Purpl...	3.0	8376.0	0.00	NaN	Play the classic strategy game Othello (also k...	Bayou Games	4+	EN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Good

	abil	abl	abov	absolut	access	accomplish	accord	account	accur	ace	...	you\u2019r	you\u2019v	young	youtub	z	zero	zingoplaylit	zombi	zone	zoom
0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0

	abil	abl	abov	absolut	access	accomplish	accord	account	accur	ace	...	you\u2019r	you\u2019v	young	youtub	z	zero	zingoplaylit	zombi	zone	zoom
0	-0.233203	-0.251087	-0.152507	-0.162088	-0.220433	-0.110208	-0.112333	-0.171306	-0.105463	-0.085064	...	-0.20989	-0.143463	-0.095298	-0.153432	-0.054313	-0.096659	-0.031639	-0.118664	-0.067919	-0.168645
1	-0.233203	-0.251087	-0.152507	-0.162088	-0.220433	-0.110208	-0.112333	-0.171306	-0.105463	-0.085064	...	-0.20989	-0.143463	-0.095298	-0.153432	-0.054313	-0.096659	-0.031639	-0.118664	-0.067919	-0.168645
2	-0.233203	-0.251087	-0.152507	-0.162088	-0.220433	-0.110208	-0.112333	-0.171306	-0.105463	-0.085064	...	-0.20989	-0.143463	-0.095298	-0.153432	-0.054313	-0.096659	-0.031639	-0.118664	-0.067919	-0.168645

	Name	Icon URL	Average User Rating	User Rating Count	Price	In-app Purchases	Description	Developer	Age Rating	Languages	...	Unnamed: 19	Unnamed: 20	Unnamed: 21	Unnamed: 22	Unnamed: 23	Unnamed: 24	Unnamed: 25	Unnamed: 26	Unnamed: 27	User_cat
0	Sudoku	https://is2-ssl.mzstatic.com/image/thumb/Purpl...	4.0	3553.0	2.99	NaN	Join over 21,000,000 of our fans and download ...	Mighty Mighty Good Games	4+	DA, NL, EN, FI, FR, DE, IT, JA, KO, NB, PL, PT...	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Great
1	Reversi	https://is4-ssl.mzstatic.com/image/thumb/Purpl...	3.5	284.0	1.99	NaN	The classic game of Reversi, also known as Oth...	Kiss The Machine	4+	EN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Good
2	Morocco	https://is5-ssl.mzstatic.com/image/thumb/Purpl...	3.0	8376.0	0.00	NaN	Play the classic strategy game Othello (also k...	Bayou Games	4+	EN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Good

	Sparse_1	Sparse_2	Sparse_3	Sparse_4	Sparse_5	Sparse_6	Sparse_7	Sparse_8	Sparse_9	Sparse_10	...	PC_429	PC_430	PC_431	PC_432	PC_433	PC_434	PC_435	PC_436	PC_437	PC_438
0	0.003111	-0.007090	-0.001605	0.004124	-0.003503	0.001851	-0.003369	0.001227	-0.000279	0.031796	...	-0.020590	0.021111	0.321117	1.142703	-0.307474	0.153247	0.007989	-0.026510	-0.548945	-0.393779
1	0.003174	-0.001326	0.000506	0.008860	-0.005468	-0.008240	-0.007486	0.005104	-0.004304	0.015580	...	0.432841	-1.572423	-0.050012	-3.641469	-2.554754	-0.282550	-1.758610	-1.931872	1.902900	-0.087452
2	0.002818	-0.005823	0.001213	0.002864	-0.001885	-0.007623	0.001609	0.006572	-0.000774	0.018426	...	-1.541419	-1.448775	0.905741	-0.183866	1.045458	0.302338	-1.017309	-0.331573	1.140685	-0.280409

	abil	abl	abov	absolut	access	accomplish	accord	account	accur	ace	...	you\u2019r	you\u2019v	young	youtub	z	zero	zingoplaylit	zombi	zone	zoom
0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0

	Unnamed: 0	Clasifier	accuracy_score	f1_score	features	learning_rate	max_depth	min_samples_leaf	min_samples_split	n_estimators	n_jobs	n_neighbors	p	random_state	subsample
0	0	KNeighborsClassifier	0.336683	0.313983	all	NaN	NaN	NaN	NaN	NaN	-1.0	5.0	2.0	NaN	NaN
1	1	KNeighborsClassifier	0.346734	0.339408	all	NaN	NaN	NaN	NaN	NaN	-1.0	10.0	2.0	NaN	NaN
2	2	KNeighborsClassifier	0.366834	0.352205	all	NaN	NaN	NaN	NaN	NaN	-1.0	20.0	2.0	NaN	NaN

	Clasifier	accuracy_score	f1_score	features	learning_rate	max_depth	min_samples_leaf	min_samples_split	n_estimators	n_jobs	n_neighbors	p	random_state	subsample
3	KNeighborsClassifier	0.261307	0.206945	Gain	NaN	NaN	NaN	NaN	NaN	-1.0	50.0	2.0	NaN	NaN
2	KNeighborsClassifier	0.336683	0.305608	Total_Gain	NaN	NaN	NaN	NaN	NaN	-1.0	20.0	2.0	NaN	NaN
3	KNeighborsClassifier	0.336683	0.289420	Total_Gain	NaN	NaN	NaN	NaN	NaN	-1.0	50.0	2.0	NaN	NaN
1	KNeighborsClassifier	0.341709	0.333114	Total_Gain	NaN	NaN	NaN	NaN	NaN	-1.0	10.0	2.0	NaN	NaN
2	KNeighborsClassifier	0.341709	0.330366	Gain	NaN	NaN	NaN	NaN	NaN	-1.0	20.0	2.0	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
49	XGBClassifier	0.582915	0.545212	Gain	0.001	9.0	NaN	NaN	100.0	-1.0	NaN	NaN	NaN	0.8
6	XGBClassifier	0.582915	0.535810	Gain	0.100	7.0	NaN	NaN	100.0	-1.0	NaN	NaN	NaN	0.6
51	XGBClassifier	0.582915	0.528345	Gain	0.001	9.0	NaN	NaN	500.0	-1.0	NaN	NaN	NaN	0.6
10	GradientBoostingClassifier	0.582915	0.524818	Gain	0.010	NaN	5.0	5.0	500.0	NaN	NaN	NaN	4.0	0.8
7	GradientBoostingClassifier	0.592965	0.507791	Gain	0.010	NaN	5.0	5.0	100.0	NaN	NaN	NaN	4.0	0.8

	abil	abl	abov	absolut	access	accomplish	accord	account	accur	ace	...	you\u2019r	you\u2019v	young	youtub	z	zero	zingoplaylit	zombi	zone	zoom
0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0

	abil	abl	abov	absolut	access	accomplish	accord	account	accur	ace	...	you\u2019r	you\u2019v	young	youtub	z	zero	zingoplaylit	zombi	zone	zoom
0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0