Yahoo Open-Sources CaffeOnSpark

Another tech giant is now sharing its artificial intelligence know-how with the world. Recently, Yahoo has gone on to publish the source code of its CaffeOnSpark AI engine, hence allowing anyone from academic researchers to big corporations to use it or to modify it.
 
These days, Yahoo might not be popular for it technological prowess; however, it did incubate Hadoop, an open source, which is a widely popular data crunching platform used by Facebook, Twitter, and numerous other companies. When it comes to Al, it has unique assets. While training artificial intelligence systems, the data matters as much as the algorithms. And, Yahoo has one of the most interesting data sets around in the form of Yahoo-owned photo site Flickr.
 
Like numerous other new open source Al projects, CaffeOnSpark has been completely based on deep learning, a branch of artificial intelligence which is particularly helpful in helping machines recognize human speech or the content of videos or photos. For example, Yahoo uses it in order to improve search results on Flickr, by determining the content of different photographs. Instead of relying on the explanation and the keywords entered by people uploading pictures on the site, Yahoo teaches its computers to recognize certain characteristics of photos, like specific colors or objects.
 
In recent months, we have seen that Google has open sourced its deep learning framework TensorFlow, Microsoft has opened CNTK, Facebook shared its Al hardware designs, and Baidu, a Chinese search giant, has unveiled its deep learning training software. Each of these open source technologies serves a different purpose. For Yahoo, it is the yearning for deep learning processes on existing systems, without the requirement to move data from one place to another.
 
CaffeOnSpark combines two existing technologies:the deep learning framework Caffe and the up-coming data crunching system Spark, which runs on top of the big data platform Hadoop. Yahoo simply created a way to run Caffee atop Spark clusters, which can either run on Spark alone or Hadoop.