# Jeffrey David Ullman

Jeffrey Ullman spent his childhood in the New York borough of Queens. His earliest years were in Astoria, after which his family moved to a newly constructed development on the edge of the city with a suburban character. He was drawn to mathematics, captaining the Van Burren High School math team. Ullman’s first exposure to computing came during his time as an engineering mathematics student at Columbia University. As part of an actuarial summer internship, he configured a plug board to control an automatic calculator. The next summer Ullman had a chance to program the much larger and faster IBM 709 computer at Brookhaven National Laboratory.

After graduating in 1963 Ullman entered the Ph.D. program in electrical engineering at Princeton. His graduate research focused on coding theory, an area of mathematics fundamental to the representation of information in computers and digital media. This interest in error correcting codes was inspired by a summer working in IBM’s Yorktown Heights lab with Bob Chien.

Another summer placement brought Ullman to the System Development Corporation. As he told an interviewer “I learned mathematical rigor from Seymour Ginsburg” who was researching automata theory there in a group focused on the development of mathematical ways to understand programming language grammars. The experience “gave me enough of the mathematical viewpoint that it enabled me to find my way in the future, to be a rather theoretical computer scientist and yet still worry about the engineering consequences of what I was doing.”

On graduation Ullman went to work at Bell Labs, then a leading center for computing research, where he continued to develop the interest in formal language theory he had acquired during his summer at SDC. His first book Formal Languages and Their Relation to Automata appeared three years later, in 1969. It was co-authored with John Hopcroft, who was to receive his own Turing Award for his work on algorithms and data structures. As a new faculty member at Princeton, Hopcroft had been enlisted to teach a small group of graduate students about the emerging field of computer science. Among them was Ullman, who went on to work with Hopcroft to expand and extend the course materials into a book that became a staple of computer science education. As a discipline, computer science was built upon a body of theory and techniques drawn from existing subfields within mathematics, engineering, and logic. Books like this were vital in integrating these formerly scattered areas of knowledge, extending them in ways relevant to the study of computation, and presenting them in a coherent way to the first generation of students educated as computer scientists.

That same year Ullman returned to Princeton, now as a faculty member. Thanks to his impressive research profile he was hired directly into an Associate Professor position. With Kenneth Steiglitz and others Ullman began to build up computer science as a major area of concentration within the electrical engineering program.

Ullman retained a consulting arrangement with Bell Labs and continued to collaborate with Al Aho, one of his former colleagues, on the development of parsing techniques for programming languages. Their work helped to define classes of language in terms of the capabilities automata would need to recognize them. In an early paper, Aho and Ullman showed how it was possible to make Donald Knuth's LR(k) parsing algorithm work with simple grammars that technically did not meet the requirements of an LR(k) grammar. Another Bell Labs staff member, Steve Johnson, put their ideas into practice with YACC, a widely used Unix tool for the automatic generation of language parsers based on grammar specifications. As a result, according to Ullman, “what used to be a three-month project for five programmers became something you did in a day.”

That was just one of many contributions Aho and Ullman made to formal language theory. The pair invented efficient algorithms for lexical analysis, syntax analysis, code generation, and code optimization. They also developed efficient algorithms for data-flow analysis that exploited the structure of "gotoless" programs, which were at the time just becoming the norm.

Their early joint work in algorithm design and analysis techniques contributed crucial approaches to the theoretical core of computer science that emerged in this period. In its focus on basic methods for processing graphs, strings, and sequences, this algorithmic work was tightly integrated with their research on programming languages. Their foundational book with John Hopcroft, The Design and Analysis of Computer Algorithms, published in 1974, created the conceptual framework not only for teaching algorithms in the standard computer science curriculum for decades but also for presenting and analyzing new algorithms developed by the research community. In addition to incorporating their own results, the book codified a range of disparate algorithms into a set of general design methods, including divide-and-conquer, recursion, dynamic programming, and others that have long since entered the standard toolbox of computer scientists.

Aho and Ullman’s other crucial contribution of the 1970s came with their books on computer language theory. They pulled their research on computer languages into a two-volume series The Theory of Parsing, Translation, and Compiling, published in 1972-3. Aho says that in this book they “attempted to distill the essence of automata and language theory with applications to the process of compilation. And, I think this was one of the interesting demonstrations of putting a theoretical foundation onto an important practical area of computer science.” This provided “a theoretical foundation that could be used to design algorithms for the translation process. And, more importantly, people could then construct tools to help build components of compilers.”

This laid the groundwork for their definitive textbook on compiler technology, Principles of Compiler Design, published in 1977. This "Dragon Book" (nicknamed after its colorful cover) and its subsequent editions with Ravi Sethi, and later with Sethi and Monica Lam, became the bibles of compiler design. These books lucidly lay out the phases in translating a high-level programming language to machine code, modularizing the entire enterprise of compiler construction.

The pair also helped to formalize the structuring of relational databases. Relational database management systems, based on a model for which Edgar Codd won a Turing award, are based around the idea that each piece of information should be stored only once. Database queries specify how records stored in different tables (or “relations” in Codd’s formal model) should be recombined. Ullman’s paper with Aho and Catriel Beeri on "lossless joins" brought rigor to the study of “normal forms” for the storage of data without redundancy and gave a methodology for determining when it was possible to break a relation into smaller components without losing information

Ullman had long been frustrated by university politics at Princeton, which marginalized computing research and continued to prevent the establishment of a separate computer science department. In 1979 he accepted an offer to move to Stanford, one of the strongest and best-established departments of computer science. He remained at Stanford for the rest of his career, serving as department chair from 2000 to 2004.

At Stanford Ullman continued to write computer science textbooks based on his areas of teaching and research. Foundations of Computer Science, another collaboration with Aho, appeared in 1992. The book gave an introduction to computer science grounded in theory and mathematics. Other Ullman textbooks of the 1980s covered ML programming and VLSI.

By the time he arrived at Stanford, Ullman had already shifted the focus of his research to database systems. This led to a new stream of influential textbooks. His book Principles of Database Systems, published in 1980 based on a course Ullman developed while still at Princeton, revolutionized the content of database courses at all levels, moving it from a purely engineering discipline to one with a firm theoretical foundation. The book, which has been revised several times, trained generations of computer scientists in data models, database design, and the use of database management systems.

At Stanford, Ullman established a database research group, later called the InfoLab, with Gio Wiederhold, Héctor García-Molina, and Jennifer Widom. Their work focused on issues around the merging of database records from different sources and on the development of techniques for data mining and the structuring of data for efficient analysis. During the 1980s Ullman was particularly interested in Datalog, a logical programming language designed for database querying. This provided capabilities then lacking in IBM’s widely used SQL language, including support for recursive rules, but was more practical than using generalized logic to query databases.

Ullman retired from Stanford in 2003 but retained emeritus faculty status there and has continued to teach on a more occasional basis. Ullman recently introduced a new undergraduate course “Bridging Policy and Tech Through Design.” His research and publication output has also continued, including several new books and new editions. These focused on database and data mining. With his colleagues from the InfoLab, Ullman wrote A First Course in Database Systems and Database Systems Implementation (which were merged into Database Systems: The Complete Book). Another book, Mining of Massive Datasets, was developed around courses taught at Stanford by Ullman, Jure Leskoved and Anand Rajarman. This focused on manipulation of web data and included discussion of cloud-based techniques for parallel programming, a focus area for Ullman in recent years.