From Wikipedia, the free encyclopedia
American National Corpus (ANC) is a paid
membership-based collaboratory with the aim of creating an
electronic collection of
American English. The collection will include text and
transcripts of spoken data produced from 1990, with the goal of
a 100 million word corpus.
ANC Consortium members include publishers, software
companies, and academic members. Consortium members have
exclusive access throughout the development period and for five
years after the first installment of the corpus. The First
Release of the American National Corpus (ANC) was made available
in mid-fall,
2003.
The data includes approximately 11 million words of American
English, including written and spoken data and a variety of text
types annotated for part of speech and
lemma. The corpus is provided in XML format conformant to
the
XML Corpus Encoding Standard (XCES).
See also
External links
-
The American National Corpus First Release
-
ANC Website
Categories:
Corpus linguistics |
Online databases