Tuesday, July 28, 2015

Some exciting developments at Dato

You may have missed our latest Dato blog post, so I wanted to shed light on two of the coolest released features:

It's particularly exciting to mention that GraphLab Create's integration with Numpy will effectively scale scikit-learn. Now with GraphLab Create and Dato Predictive Services, you can deploy existing scikit-learn models at scale as a RESTful predictive service by changing only a few lines of code. Very cool.

graphlab-create-numpy-scale


Dato Distributed now with distributed machine learning

# jobs distribution environments
# s = gl.deploy.spark_cluster.load(‘hdfs://…’)
# h = gl.deploy.hadoop_cluster.load(‘hdfs://…’)
e = gl.deploy.ec2_cluster.load(‘s3://…’)

# set distribution environment to my AWS cluster
gl.set_distributed_execution_environment(e)
Dato Distributed enables GraphLab Create users to execute parallel computation of Python code tasks on EC2, Spark or Hadoop clusters. The above shows how GraphLab Create can switch between these environments by changing one-line of code. In GraphLab Create 1.5.1, Dato Distributed on Hadoop now seamlessly supports distributed execution of machine learning models including logistic regression, linear regression, SVM classifier, label propagation and PageRank. Distributed machine learning on EC2 and Spark are in the works.

dato-distributed-pagerank-iteration

No comments:

Post a Comment