This website is under development. If you come accross any issues, please report them to Konstantinos Kanellis (kkanellis@cs.wisc.edu) or Yannis Chronis (chronis@google.com).

The Role of Massively Multi-Task and Weak Supervision in Software 2.0

Authors:
Alexander Ratner, Braden Hancock, Christopher Ré
Abstract

Over the last several years, machine learning models have reached new levels of empirical performance across a broad range of domains. Driven both by accuracy improvements and deployment advantages, many organizations have begun to shift to learningcentered software stacks—a new mode that has been called Software 2.0. This approach holds the promise of radically accelerating the construction, maintenance, and deployment of software systems, and opens up a broad research agenda around changes to hardware, systems, and interaction models. However, these approaches require one critical and often prohibitively expensive ingredient: labeled training data. We outline a vision for a Software 2.0 lifecycle centered around the idea that labeling training data can be the primary interface to Software 2.0 systems. In our envisioned approach, Software 2.0 stacks are programmed using weak supervision—i.e. noisier, programmatically-generated training data—which is specified at various levels of declarative abstraction and precision, and then combined using unsupervised statistical techniques. The codebase for Software 2.0 is also radically different: we envision labels for tens or hundreds of different tasks across an organization combined in a massively multitask central model, leading to amortization of labeling costs and new models of software reuse and development. Finally, we envision Software 2.0 stacks deployed by using collected training labels to supervise commodity model architectures over different servable feature sets. We outline the components of this lifecycle, and provide an interim report on Snorkel, our prototype Software 2.0 system, based on our experiences working on problems ranging from ad fraud to medical diagnostics with some of the world’s largest organizations.