Self-Organizing Maps and their Benefit: Part One

At Filestack, we are committed to helping you use, engage and analyze your data in intelligent ways through the best in machine learning. The algorithms that create self-organizing maps, optical character recognition (OCR), video and audio analysis, and so forth, are complex. Often, these algorithms become a major hardware and software investment to build out on your own. This is exactly why we offer many of them seamlessly integrated into the core of our service.

That being said, as software developers, sometimes it’s fun to just play around with cool algorithms. This is especially true when they take minimal math knowledge and help to reveal the often black-box, closed-off nature of machine learning magic. Before, we’ve shown you how to build your own Prisma-like filters, how to recreate Silicon Valley’s NotHotDog app, and how to use autoencoders in Pytorch to pre-train your deep learning models. In this series of articles, I’m going to take you through an unsupervised (meaning unlabeled data) machine learning algorithm called self-organizing maps (SOMS).

Self-Organizing Maps Make Sense of Unlabeled Data

SOMS don’t provide robust classification into distinct classes the way many deep-learning algorithms do, but instead present a two-dimensional, geometric visualization of similarities in your data. Take a look at an example from Wikipedia:

Self-organizing map (SOM) of word frequency across Wikipedia articles

As you can see, the data is organized in a way that preserves semantic similarity between different words and concepts. SOMS infer and map automatically due to the nature of their algorithm without the need for labelled data or predefined clusters. This is useful when you are less interested in building out classifiers as you are understanding relationships between your data. For instance, if you want to know which countries have similar quality-of-life indices without making too many assumptions, you can pull out a few data samples from each country (like literacy rates, life expectancy, and such) to see how the SOM clusters them. Here’s any example of using SOMs to build a world-poverty map:

That’s what self-organizing maps do, and why you should care. Next time, I’ll take you through its algorithm and a very basic implementation in Python.

Read More →