Friday, October 27, 2017

Keras

I've been working hard to keep up to date with recent progress in machine learning.  The progress in Deep Learning is exciting to me, especially because it builds on all of the things I learned about neural networks in 1995-99 in college in grad school (particularly some work with Dr. William Levy at UVA), but never really had the opportunity to put into practice directly, except in some simple classifier type situations.

In any case, last year I took a short course from Miner and Kasch taught by Florian Muellerklein that was quite good. You can check out the slides and code on github.  I'd previously been messing around with Torch, Caffe, TensorFlow, cuDNN and a variety of other libraries, while easier than the old days of finding eigenvalues in c++ or running out of MatLab, they required a lot of configuration and such.  In the course, we jumped right into using keras. Wow, so much easier. It's a bit like a Ruby on Rails for neural networks, giving you some sensible defaults and get going right away, minimizing common errors, but different in that it is just a simpler interface overlaying other libraries.  If anyone is diving in this stuff, the keras path is the best path that I've tried.

To help one out on this path, there are now a couple of books on keras. Deep Learning with Python is the one that I recommend. It is written by Francois Chollet, who is the creator and maintainer of keras, and is now working for Google. Respective of that, I think it does the best job of communicating how the library is intended to be used, and puts things in the right context of experimentation with hyper-parameters and other topics that can take up all of your time.

In any case, I was recently asked what I thought about the joint project Gluon from Amazon and Microsoft. I think it is trying to do the same thing as keras, and I am hoping that it doesn't mean more divergence in the stack.  Already, the deep learning stack is pushing in the hardware direction. There were already a level of matrix math operations that were implemented in hardware, which is why Nvidia GPUs were such a early accelerator for progress based on similar needs in graphics processing, and AMD and Intel are not far behind, with Intel's acquisition of Nervana systems a key recent purchase in designing chips built for purpose. Google's TensorFlow Processing Units (TPUs) take this a step further, and are designed to push more of the TensorFlow code into hardware.  Obviously Microsoft and Amazon don't want to be left behind, as they won't be buying TPUs for AWS or Azure. Even Tesla is looking at building their own chips for image analysis in cars.

That leaves those of us at software level trying to find the right API to code against. Right now, the answer is for me keras, but keeping an eye on the whole stack is necessary to see what choice is the right one to make at the top of it.