- The American National Corpus Project
- The American National Corpus (ANC) project is a major activity funded by the National Science Foundation that is building a massive corpus of texts and spoken transcriptions of contemporary American English. All of the data are annotated with linguistic analyses of various kinds so that computational linguists can build language models to assist in machine understanding of human language.
The project is based at the Department of Computer Science at Vassar; Princeton University, Columbia University, and the International Computer Science Institute at UC Berkeley are partners. As many as 8-10 Vassar students are involved in ANC research projects, ranging from computer and web programming to linguistic analysis, during both the academic year and the summer.