Editing Linguistic Data Consortium (section)

{{Infobox company
| name = Linguistic Data Consortium 
| logo = [[File:Linguistic_Data_Consortium_Logo.png|200px]]
| founded = {{start date and age|1992}}
| website = {{URL|https://www.ldc.upenn.edu/}}
| location_city = [[Philadelphia]], [[Pennsylvania]]
| location_country = [[United States]]
}}
The '''Linguistic Data Consortium''' is an open [[consortium]] of universities, companies and government research laboratories. It creates, collects and distributes speech and text [[database]]s, [[lexicon]]s, and other resources for [[linguistics]] research and development purposes. The [[University of Pennsylvania]] is the LDC's host institution. The LDC was founded in 1992 with a grant from the US [[DARPA|Defense Advanced Research Projects Agency]] (DARPA), and is partly supported by grant IRI-9528587 from the Information and Intelligent Systems division of the [[National Science Foundation]].<ref>{{Cite web|access-date=June 18, 2024|title=About LDC |url=https://www.ldc.upenn.edu/about |website=Linguistic Data Consortium}}</ref><ref>{{Cite web |title=NSF Award Search: Award # 9528587 - HLR: Improved Speech and Text Data Resources |url=https://www.nsf.gov/awardsearch/showAward?AWD_ID=9528587&HistoricalAwards=false |access-date=2025-03-27 |website=www.nsf.gov}}</ref> The director of LDC is [[Mark Liberman]].<ref>{{Cite web|access-date=June 18, 2024|title=Staff |url=https://www.ldc.upenn.edu/about/staff |website=Linguistic Data Consortium}}</ref> It subsumed the previous [[ACL Data Collection Initiative]].

Part of the motivation was to support the benchmark-oriented methodology of DARPA's [[Human Language Technology]] program. Previously, [[John R. Pierce]] directed the committee that produced the [[ALPAC report]] (1966), which caused a severe decrease in funding for linguistic AI for about 10 years. Later, [[Charles Lynn Wayne|Charles Wayne]] restarted funding in speech and language in the mid-1980s. In order to avoid the criticisms from the ALPAC report, they needed a way to demonstrate objective progress, which led to the benchmark-oriented methodology. DARPA would propose specific quantifiable and testable score targets on benchmarks, and teams being funded would attempt to reach the score targets.<ref name=":0">{{Cite journal |last=Cieri |first=Christopher |last2=Liberman |first2=Mark |last3=Cho |first3=Sunghye |last4=Strassel |first4=Stephanie |last5=Fiumara |first5=James |last6=Wright |first6=Jonathan |date=June 2022 |editor-last=Calzolari |editor-first=Nicoletta |editor2-last=Béchet |editor2-first=Frédéric |editor3-last=Blache |editor3-first=Philippe |editor4-last=Choukri |editor4-first=Khalid |editor5-last=Cieri |editor5-first=Christopher |editor6-last=Declerck |editor6-first=Thierry |editor7-last=Goggi |editor7-first=Sara |editor8-last=Isahara |editor8-first=Hitoshi |editor9-last=Maegaard |editor9-first=Bente |title=Reflections on 30 Years of Language Resource Development and Sharing |url=https://aclanthology.org/2022.lrec-1.57/ |journal=Proceedings of the Thirteenth Language Resources and Evaluation Conference |location=Marseille, France |publisher=European Language Resources Association |pages=543–550}}</ref><ref>{{Cite journal |last=Liberman |first=Mark |last2=Wayne |first2=Charles |date=June 2020 |title=Human Language Technology |url=https://onlinelibrary.wiley.com/doi/10.1609/aimag.v41i2.5297 |journal=AI Magazine |language=en |volume=41 |issue=2 |pages=22–35 |doi=10.1609/aimag.v41i2.5297 |issn=0738-4602}}</ref>

It was noted that by 1993, the data needed for training and benchmarking the models was big enough that "Not even the largest companies can easily afford enough of [the needed] data... Researchers at smaller companies and in universities risk being frozen out of the process almost entirely."<ref>Liberman, M. and Godfrey, J. (1993). The Linguistic Data Consortium. In Chen, Keh-Jiann, Chu-Ren Huang, Proc. ROCLing Computational Linguistics Conference VI, Nantou, Taiwan, September. Association for Computational Linguistics and Chinese Language Processing (ACLCLP).</ref> The LDC provided a central location for creating and dispensing such data. There is a membership fee that has been increased once since its founding.<ref name=":0" />