Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Linguistic Data Consortium
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Infobox company | name = Linguistic Data Consortium | logo = [[File:Linguistic_Data_Consortium_Logo.png|200px]] | founded = {{start date and age|1992}} | website = {{URL|https://www.ldc.upenn.edu/}} | location_city = [[Philadelphia]], [[Pennsylvania]] | location_country = [[United States]] }} The '''Linguistic Data Consortium''' is an open [[consortium]] of universities, companies and government research laboratories. It creates, collects and distributes speech and text [[database]]s, [[lexicon]]s, and other resources for [[linguistics]] research and development purposes. The [[University of Pennsylvania]] is the LDC's host institution. The LDC was founded in 1992 with a grant from the US [[DARPA|Defense Advanced Research Projects Agency]] (DARPA), and is partly supported by grant IRI-9528587 from the Information and Intelligent Systems division of the [[National Science Foundation]].<ref>{{Cite web|access-date=June 18, 2024|title=About LDC |url=https://www.ldc.upenn.edu/about |website=Linguistic Data Consortium}}</ref><ref>{{Cite web |title=NSF Award Search: Award # 9528587 - HLR: Improved Speech and Text Data Resources |url=https://www.nsf.gov/awardsearch/showAward?AWD_ID=9528587&HistoricalAwards=false |access-date=2025-03-27 |website=www.nsf.gov}}</ref> The director of LDC is [[Mark Liberman]].<ref>{{Cite web|access-date=June 18, 2024|title=Staff |url=https://www.ldc.upenn.edu/about/staff |website=Linguistic Data Consortium}}</ref> It subsumed the previous [[ACL Data Collection Initiative]]. Part of the motivation was to support the benchmark-oriented methodology of DARPA's [[Human Language Technology]] program. Previously, [[John R. Pierce]] directed the committee that produced the [[ALPAC report]] (1966), which caused a severe decrease in funding for linguistic AI for about 10 years. Later, [[Charles Lynn Wayne|Charles Wayne]] restarted funding in speech and language in the mid-1980s. In order to avoid the criticisms from the ALPAC report, they needed a way to demonstrate objective progress, which led to the benchmark-oriented methodology. DARPA would propose specific quantifiable and testable score targets on benchmarks, and teams being funded would attempt to reach the score targets.<ref name=":0">{{Cite journal |last=Cieri |first=Christopher |last2=Liberman |first2=Mark |last3=Cho |first3=Sunghye |last4=Strassel |first4=Stephanie |last5=Fiumara |first5=James |last6=Wright |first6=Jonathan |date=June 2022 |editor-last=Calzolari |editor-first=Nicoletta |editor2-last=Béchet |editor2-first=Frédéric |editor3-last=Blache |editor3-first=Philippe |editor4-last=Choukri |editor4-first=Khalid |editor5-last=Cieri |editor5-first=Christopher |editor6-last=Declerck |editor6-first=Thierry |editor7-last=Goggi |editor7-first=Sara |editor8-last=Isahara |editor8-first=Hitoshi |editor9-last=Maegaard |editor9-first=Bente |title=Reflections on 30 Years of Language Resource Development and Sharing |url=https://aclanthology.org/2022.lrec-1.57/ |journal=Proceedings of the Thirteenth Language Resources and Evaluation Conference |location=Marseille, France |publisher=European Language Resources Association |pages=543–550}}</ref><ref>{{Cite journal |last=Liberman |first=Mark |last2=Wayne |first2=Charles |date=June 2020 |title=Human Language Technology |url=https://onlinelibrary.wiley.com/doi/10.1609/aimag.v41i2.5297 |journal=AI Magazine |language=en |volume=41 |issue=2 |pages=22–35 |doi=10.1609/aimag.v41i2.5297 |issn=0738-4602}}</ref> It was noted that by 1993, the data needed for training and benchmarking the models was big enough that "Not even the largest companies can easily afford enough of [the needed] data... Researchers at smaller companies and in universities risk being frozen out of the process almost entirely."<ref>Liberman, M. and Godfrey, J. (1993). The Linguistic Data Consortium. In Chen, Keh-Jiann, Chu-Ren Huang, Proc. ROCLing Computational Linguistics Conference VI, Nantou, Taiwan, September. Association for Computational Linguistics and Chinese Language Processing (ACLCLP).</ref> The LDC provided a central location for creating and dispensing such data. There is a membership fee that has been increased once since its founding.<ref name=":0" />
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)