Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Internet Archive
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Text collection=== [[File:Scribe Machine Acquisition 3.jpg|thumb|right|Internet Archive "Scribe" [[book scanning]] workstation]] [[File:A Real Page-Turner.jpg|thumb|An Internet Archive in-house scan ongoing]] The scanning performed by the Internet Archive is financially supported by libraries and foundations.<ref>{{Cite web |last=Kahle |first=Brewster |date=May 23, 2008 |title=Books Scanning to be Publicly Funded |url=https://archive.org/iathreads/post-view.php?id=194217 |url-status=live |archive-url=https://web.archive.org/web/20090924105740/http://www.archive.org/iathreads/post-view.php?id=194217 |archive-date=September 24, 2009 |website=Internet Archive Forums}}</ref> {{As of|2008|11}}, when there were approximately 1 million texts, the entire collection was greater than 500 terabytes, which included raw camera images, cropped and skewed images, [[PDF]]s, and raw [[Optical character recognition|OCR]] data.<ref>{{Cite web |date=November 24, 2008 |title=Bulk Access to OCR for 1 Million Books |url=https://blog.openlibrary.org/2008/11/24/bulk-access-to-ocr-for-1-million-books/ |url-status=live |archive-url=https://web.archive.org/web/20081206124013/http://blog.openlibrary.org/2008/11/24/bulk-access-to-ocr-for-1-million-books/ |archive-date=December 6, 2008 |website=Open Library Blog}}</ref> {{As of|2013|July}}, the Internet Archive was operating 33 [[book scanning|scanning centers]] in five countries, digitizing about 1,000 books a day for a total of more than 2 million books, in a total collection of 4.4 million books{{snd}}including material digitized by others and fed into the Internet Archive; at that time, users were performing more than 15 million downloads per month.<ref name=Hoffelder2013/> The material digitized by others includes more than 300,000 books that were contributed to the collection, between about 2006 and 2008, by [[Microsoft]] through its [[Live Search Books]] project, which also included financial support and scanning equipment directly donated to the Internet Archive.<ref name=msdown>{{cite web|url=http://blogs.msdn.com/livesearch/archive/2008/05/23/book-search-winding-down.aspx |title=Book search winding down |date=May 23, 2008 |work= MSDN Live Search Blog<!-- Official announcement from Microsoft-->|archive-url=https://web.archive.org/web/20080820220749/http://blogs.msdn.com/livesearch/archive/2008/05/23/book-search-winding-down.aspx |archive-date=August 20, 2008}}</ref> On May 23, 2008, Microsoft announced it would be ending its Live Book Search project and would no longer be scanning books, donating its remaining scanning equipment to its former partners.<ref name=msdown/> Around October 2007, Archive users began uploading [[public domain]] books from [[Google Books|Google Book Search]].<ref>{{Cite web |title=Google Books at Internet Archive |url=https://archive.org/details/googlebooks |url-status=live |archive-url=https://web.archive.org/web/20081206201549/https://archive.org/details/googlebooks |archive-date=December 6, 2008 |access-date=November 9, 2008 |publisher=Internet Archive}}</ref> {{As of|2013|November}}, there were more than 900,000 Google-digitized books in the Archive's collection;<ref>{{Cite web |url=https://archive.org/search.php?query=sponsor%3A%28Google%29 |title=List of Google scans |archive-url=https://web.archive.org/web/20140126055407/https://archive.org/search.php?query=sponsor%3A%28Google%29 |archive-date=January 26, 2014 |publisher=Internet Archive}}</ref> the books are identical to the copies found on Google, except without the Google watermarks, and are available for unrestricted use and download.{{efn|Books imported from Google have a metadata tag of scanner:google for searching purposes. The archive provides a link to Google for PDF copies, but also maintains a local PDF copy, which is viewable under the "All Files: HTTPS" link. As all the other books in the collection, they also provide OCR text and images in open formats, particularly [[DjVu]], which Google Books does not offer.}} Brewster Kahle revealed in 2013 that this archival effort was coordinated by [[Aaron Swartz]], who, with a "bunch of friends", downloaded the public domain books from Google slowly enough and from enough computers to stay within Google's restrictions. They did this to ensure public access to the [[public domain]]. The Archive ensured the items were attributed and linked back to Google, which never complained, while libraries "grumbled". According to Kahle, this is an example of Swartz's "genius" to work on what could give the most to the public good for millions of people.<ref name=kahle-aswmem>Brewster Kahle, "[https://archive.org/details/AaronSwartzMemorialAtTheInternetArchive?start=4680 Aaron Swartz memorial at the Internet Archive] {{webarchive|url=https://web.archive.org/web/20150629062022/https://archive.org/details/AaronSwartzMemorialAtTheInternetArchive?start=4680 |date=June 29, 2015 }}", 2013-01-24, via [https://wellpreparedmind.wordpress.com/2013/02/07/aaron-swartz-freed-over-900000-public-domain-books-from-googles-restrictions/ The well-prepared mind] {{Webarchive|url=https://web.archive.org/web/20140814162152/http://wellpreparedmind.wordpress.com/2013/02/07/aaron-swartz-freed-over-900000-public-domain-books-from-googles-restrictions/ |date=August 14, 2014 }}, via [http://scinfolex.com/2013/02/06/cest-aaron-swartz-qui-liberait-les-livres-de-google-books-sur-internet-archive/ S.I.Lex] {{Webarchive|url=https://web.archive.org/web/20140808094118/http://scinfolex.com/2013/02/06/cest-aaron-swartz-qui-liberait-les-livres-de-google-books-sur-internet-archive/ |date=August 8, 2014 }}.</ref> {{anchor|RECAP US Federal Court Documents}}In addition to books, the Archive offers free and anonymous public access to more than four million court opinions, legal briefs, or exhibits uploaded from the [[Federal judiciary of the United States|United States Federal Courts]]' [[PACER (law)|PACER]] electronic document system via the [[RECAP]] web browser plugin. These documents had been kept behind a federal court paywall. On the Archive, they had been accessed by more than six million people by 2013.<ref name=kahle-aswmem/> The Archive's BookReader [[Web application|web app]],<ref name="BookReader">{{cite web |title=Internet Archive BookReader |url=https://archive.org/details/BookReader |website=archive.org |access-date=June 21, 2019 |archive-url=https://web.archive.org/web/20190621131721/https://archive.org/details/BookReader |archive-date=June 21, 2019 |url-status=live }}</ref> built into its website, has features such as single-page, two-page, and [[thumbnail]] modes; fullscreen mode; [[page zooming]] of [[Image resolution|high-resolution]] images; and [[flip page]] animation.<ref name="BookReader"/><ref>{{cite web |last=Kaplan |first=Jeff |date=December 10, 2010 |title=New BookReader! |url=https://blog.archive.org/2010/12/10/2685/ |website=blog.archive.org |access-date=June 21, 2019 |archive-url=https://web.archive.org/web/20190621200255/https://blog.archive.org/2010/12/10/2685/ |archive-date=June 21, 2019 |url-status=live }}</ref> In October 2024, the Internet Archive agreed to accept the paper copies of 400,000 uncatalogued dissertations from the [[Leiden University Library]], from the period 1851–2004, that the library wanted to dispose of. The University had received them from foreign Universities as part of a dissertation exchange program that had begun with its foundation in 1575, continuing for nearly 430 years. The Archive plans to digitise them and make them accessible online. The original full collection included theses by [[Niels Bohr]], [[Marie Curie]], [[Émile Durkheim]], [[Albert Einstein]], [[Otto Hahn]], [[Carl Jung]], [[J. Robert Oppenheimer]], [[Max Planck]], [[Luigi Pirandello]], [[Gustav Stresemann]] and [[Max Weber]].<ref>{{citation |last=Funnekotter |first=Bart |title=Leidse proefschriften worden tóch niet vernietigd, 400.000 dissertaties gaan naar de VS |trans-title=Leiden dissertations not destroyed after all, 400,000 dissertations go to the US |newspaper=[[NRC (newspaper)|NRC]] |date=October 9, 2024 |url=https://www.nrc.nl/nieuws/2024/10/09/leidse-proefschriften-worden-toch-niet-vernietigd-400000-dissertaties-gaan-naar-de-vs-a4868698 |archive-url=https://archive.today/20241009111945/https://www.nrc.nl/nieuws/2024/10/09/leidse-proefschriften-worden-toch-niet-vernietigd-400000-dissertaties-gaan-naar-de-vs-a4868698 |archive-date=October 9, 2024}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)