For the past year, I’ve been working on a research project to analyze American TV news using modern computer vision techniques at scale (on the order of 100,000 hours of video, see: Processing Terabytes of Video on Hundreds of Machines). The work isn’t published yet (more on that later!), but I now have a wealth of experience in building infrastructure to enable our tiny rag-tag team of academics to deal with all the issues of data science, machine learning, computer vision, distributed systems, and human-computer interfaces that accompany these sorts of endeavors. In this note, I want to discuss one particular design process that’s emerged for us which I haven’t seen elsewhere: prototyping UIs in Jupyter, specifically in the data science context.
Let’s jump right into a motivating example. As a part of our video study, we needed to find individuals (hosts, politicians, celebrities) in our dataset. For this, we used neural networks to locate and identify the faces of people in our videos, producing a database of 60 million detected faces (thus far). Running deep learning tools on video at scale was a logistical nightmare in of itself—see our upcoming SIGGRAPH paper on the system we built for large-scale video analysis (another multi-year effort).
Once we had the the predicted bounding boxes/face embeddings, the next step is to just look at the data and see if it makes sense. Normally this is done with some combination of OpenCV and Jupyter, but we did this kind of visualization so often that we built a web UI instead.
exec‘d on the server to run the query to visualize (don’t do this at home, kids!).
In the course of visual inspection, we inevitably noticed some mistakes by the neural nets, so the next step for the UI was to make it interactive: allows users to fix mistakes or create gold standard data by manually drawing/correcting visual metadata.
While the system above worked well for simple workflows like selecting random images, labeling them, and saving the labels to the database, over time our workflows became more complex. For example, let’s say we want to make a classifier for Rachel Maddow, a TV news host on MSNBC. Our current algorithm is roughly:
face_list = [...] # Get a list of faces to display widget = esper_widget(face_list) # Create a Python widget to display faces widget # Ask Jupyter to display the widget ### code cell boundary ### print(widget.selected) # Print the faces selected in the
With this API, we’ve been able to start rapidly prototyping the sort of workflows described above. The cool thing is that we’re not just prototyping purely data processing Python pipelines, but we can also capture human-in-the-loop pipelines which involve manual input.
More broadly, I think this is representative of a design process that many data scientists have encountered. In the exploration phase of data analysis, while we want to automate as much of our pipelines as possible, there’s an inevitable human component that’s hard to remove, so we should have the means to properly encapsulate that in the tools we use to create pipelines.
However, for a long time, our automation tools and our GUI tools have existed in separate worlds. While the world’s richest ecosystem for creating GUIs has flourished for web applications (HTML/CSS/JS), languages used for data processing like Python, R, etc. are left with… Tkinter? Qt? Outdated, difficult, imperative GUI toolkits from a bygone era2. Or even just the command line, using the same text-based GUI systems we’ve had for over half a century.
By contrast, the promise of Jupyter here is that it offers a language-agonstic program runtime environment that can interleave languages like Python with web GUIs. Rather than needing servers, endpoints, and a complex ecosystem of tools, the barrier to co-existence is lower than ever. Web GUI programming becomes just another tool in a programmer’s toolkit rather than a walled garden. That said, I think a lot more work needs to be done here in fleshing out this space of prototyping interactive workflows. Specifically:
It’s hard to move past the prototype stage. For example, when we created our Jupyter labeling pipeline as above, it works great for a single user, but if we want to then have many users concurrently labeling, it’s impossible to have multiple individuals running on the same Jupyter notebook. You have to duplicate the notebook separately for each user. Ideally, there would be some way to “freeze” a Jupyter pipeline into a standalone web page which would easily support concurrent access in the same way most modern web frameworks do.
The tooling is not up to par. Today, there’s a significant imbalance between widget users and widget creators—very little documentation/support exists for creating bespoke Jupyter widgets, and most searchable issues are from people trying to use the standard set of Jupyter widgets, not creating their own. Moreover, creating a widget is fairly complex and involves at least three different package managers (pip, jupyter, and npm). Simplifying this process is crucial to reducing the overhead of creating a new custom widget.
A face embedding is an array of floats that abstractly describes a face. The goal of a face embedding space is that two embeddings of different images of the same person should be closer to each other than to other people, even if the algorithm producing the embedding has never seen the input faces before. ↩