At Filestack, we are committed to helping you use, engage and analyze your data in intelligent ways through the best in machine learning. The algorithms that create self-organizing maps, optical character recognition (OCR), video and audio analysis, and so forth, are complex. Often, these algorithms become a major hardware and software investment to build out on your own. This is exactly why we offer many of them seamlessly integrated into the core of our service.
That being said, as software developers, sometimes it’s fun to just play around with cool algorithms. This is especially true when they take minimal math knowledge and help to reveal the often black-box, closed-off nature of machine learning magic. Before, we’ve shown you how to build your own Prisma-like filters, how to recreate Silicon Valley’s NotHotDog app, and how to use autoencoders in Pytorch to pre-train your deep learning models. In this series of articles, I’m going to take you through an unsupervised (meaning unlabeled data) machine learning algorithm called self-organizing maps (SOMS).
Self-Organizing Maps Make Sense of Unlabeled Data
SOMS don’t provide robust classification into distinct classes the way many deep-learning algorithms do, but instead present a two-dimensional, geometric visualization of similarities in your data. Take a look at an example from Wikipedia:
Self-organizing map (SOM) of word frequency across Wikipedia articlesAs you can see, the data is organized in a way that preserves semantic similarity between different words and concepts. SOMS infer and map automatically due to the nature of their algorithm without the need for labelled data or predefined clusters. This is useful when you are less interested in building out classifiers as you are understanding relationships between your data. For instance, if you want to know which countries have similar quality-of-life indices without making too many assumptions, you can pull out a few data samples from each country (like literacy rates, life expectancy, and such) to see how the SOM clusters them. Here’s any example of using SOMs to build a world-poverty map:
That’s what self-organizing maps do, and why you should care. Next time, I’ll take you through its algorithm and a very basic implementation in Python.
Filestack is a dynamic team dedicated to revolutionizing file uploads and management for web and mobile applications. Our user-friendly API seamlessly integrates with major cloud services, offering developers a reliable and efficient file handling experience.