Friday, October 27, 2017

Keras

I've been working hard to keep up to date with recent progress in machine learning.  The progress in Deep Learning is exciting to me, especially because it builds on all of the things I learned about neural networks in 1995-99 in college in grad school (particularly some work with Dr. William Levy at UVA), but never really had the opportunity to put into practice directly, except in some simple classifier type situations.

In any case, last year I took a short course from Miner and Kasch taught by Florian Muellerklein that was quite good. You can check out the slides and code on github.  I'd previously been messing around with Torch, Caffe, TensorFlow, cuDNN and a variety of other libraries, while easier than the old days of finding eigenvalues in c++ or running out of MatLab, they required a lot of configuration and such.  In the course, we jumped right into using keras. Wow, so much easier. It's a bit like a Ruby on Rails for neural networks, giving you some sensible defaults and get going right away, minimizing common errors, but different in that it is just a simpler interface overlaying other libraries.  If anyone is diving in this stuff, the keras path is the best path that I've tried.

To help one out on this path, there are now a couple of books on keras. Deep Learning with Python is the one that I recommend. It is written by Francois Chollet, who is the creator and maintainer of keras, and is now working for Google. Respective of that, I think it does the best job of communicating how the library is intended to be used, and puts things in the right context of experimentation with hyper-parameters and other topics that can take up all of your time.

In any case, I was recently asked what I thought about the joint project Gluon from Amazon and Microsoft. I think it is trying to do the same thing as keras, and I am hoping that it doesn't mean more divergence in the stack.  Already, the deep learning stack is pushing in the hardware direction. There were already a level of matrix math operations that were implemented in hardware, which is why Nvidia GPUs were such a early accelerator for progress based on similar needs in graphics processing, and AMD and Intel are not far behind, with Intel's acquisition of Nervana systems a key recent purchase in designing chips built for purpose. Google's TensorFlow Processing Units (TPUs) take this a step further, and are designed to push more of the TensorFlow code into hardware.  Obviously Microsoft and Amazon don't want to be left behind, as they won't be buying TPUs for AWS or Azure. Even Tesla is looking at building their own chips for image analysis in cars.

That leaves those of us at software level trying to find the right API to code against. Right now, the answer is for me keras, but keeping an eye on the whole stack is necessary to see what choice is the right one to make at the top of it.

Sunday, July 23, 2017

Programming Languages

Why are some programming languages more popular than others? The number one reason is libraries. If you are going to be building something for the iPhone, you would have used Objective-C and now Swift because they are supported the tools on that platform. If you are doing Microsoft development, you will probably have a similar preference for C#, because that's where the libraries needed to connect to various applications live. Python is successful in data science, because it has a great set of libraries for numerical analysis which have formed the basis for more complex. Ruby became quite popular because of the Rails framework. Java has a number of open source projects that have kept it alive, things like Lucene and Hadoop, so we can blame Doug Cutting for its continued existence (in addition to a virtual machine that supports a number of platforms). And, of course, we are all stuck with JavaScript because it is the only thing that runs in the web browsers- despite the numerous and ever changing set of libraries that JS developers have been rapidly moving through over the past five years.

 You'll notice that what is largely missing here are any features of the language, whether a language is functional or object-oriented, dynamic or compiled. In the post-modern programming world where things are built from other blocks, these kinds of features of the language tend to be dwarfed in importance, because the average enterprise project includes more code written elsewhere than it does code written specifically for the project.