Sharing a link to the KDNuggets article with thoughts provided by a few data scientists on main 2016 developments and what the trends for 2017 look like. Reinforcement Learning is mentioned more than once!
A brand new idea for my blog in 2017 is a monthly Did You Know That digest where I am going to share with you m things (where m<=3) that I recently learnt and found to be useful. I am going to keep such digests short and simple, as not to overwhelm you with verbiage and unnecessary details. This month’s top 3 Did you know that? are:
- scikit-learn SGDClassifier – one learner, many tricks up its sleeve;
- GraphViz is integrated in scikit-learn – no need no import it separately!
- Zeppelin notebook from Apache – worth a look if you are into Python notebooks;
This is a multi-classifier module that implements stochastic gradient descent. The loss parameter controls which model is used to train and perform classification. For example, loss=hinge will give a linear SVM, and loss=log will give a logistic regression. When should you use it? When your training data set does not fit into memory. Note that SGD also allows mini-batch learning.
GraphViz is Integrated in scikit-learn Decision Trees
If you read all my blog post, you may have come across this one where I put together some code to train a binary decision tree to recognize a hand in poker. The code is available on my github space. If you read the code, you will see that I defined graph_decision_tree method with all the hula-loops to graph and save the images. But did you know that you don’t need to do all this work since sklearn.tree has export_graphviz module? If dsTree is an instance of DecisionTreeClassifier, then one can simply do:
from sklearn.tree import export_graphviz export_graphviz(dsTree, out_file='dsTree.dot', feature_names=['feature1', 'feature2'])
The .dot file can be converted to a .png file (if you have installed GraphViz) like this:
dot -Tpng tree.dot -o tree.png
Zeppelin Notebook from Apache
If you are using Apache Spark you may be glad to learn that Apache has a notebook to go along with it. Zeppelin notebook offers similar functionality to Jupyter in terms of data visualization, paragraph writing and notebook sharing. I recommend that you to check it out.