Visibility and monitoring in deployed machine learning systems
Machine learning allows us to build systems of unprecedented capability, enabling everything from self-driving cars to the synthesis of speech indistinguishable from a human voice. This sophistication comes at a cost, however, making it harder to understand and monitor the behaviour of live ML systems.
Viewing Jupyter notebooks at the command line
The Jupyter notebook is a literate programming environment that has become ubiquitous in machine learning. While the standard tools for interacting with notebooks are web applications, it’s often useful to be able to view notebooks at the command line. This is convenient when logged into a training workstation via SSH, and the process of configuring SSH to forward a port, starting a Jupyter server, and navigating to it in a web browser is a chore to view a notebook for a few seconds.
Representation learning for audio data
Classical machine learning often cannot be applied to modern, complex datasets–like audio datasets of human speech–without extensive feature engineering. Traditionally, feature engineering requires deep domain knowledge in order to extract the key components of the data.
Jupyter notebooks and collaboration
Git has seen widespread adoption to become the de facto standard for sharing and collaborating on code, and the same is true of Jupyter notebooks as the environment for doing interactive data exploration and modelling. However, herein lies a problem: Git was designed to version plain text files containing source code, and not for storing structured data such as the JSON source of Jupyter notebooks and binary data such as embedded images.