Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Internet Archive
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Book collections== ===Text collection=== [[File:Scribe Machine Acquisition 3.jpg|thumb|right|Internet Archive "Scribe" [[book scanning]] workstation]] [[File:A Real Page-Turner.jpg|thumb|An Internet Archive in-house scan ongoing]] The scanning performed by the Internet Archive is financially supported by libraries and foundations.<ref>{{Cite web |last=Kahle |first=Brewster |date=May 23, 2008 |title=Books Scanning to be Publicly Funded |url=https://archive.org/iathreads/post-view.php?id=194217 |url-status=live |archive-url=https://web.archive.org/web/20090924105740/http://www.archive.org/iathreads/post-view.php?id=194217 |archive-date=September 24, 2009 |website=Internet Archive Forums}}</ref> {{As of|2008|11}}, when there were approximately 1 million texts, the entire collection was greater than 500 terabytes, which included raw camera images, cropped and skewed images, [[PDF]]s, and raw [[Optical character recognition|OCR]] data.<ref>{{Cite web |date=November 24, 2008 |title=Bulk Access to OCR for 1 Million Books |url=https://blog.openlibrary.org/2008/11/24/bulk-access-to-ocr-for-1-million-books/ |url-status=live |archive-url=https://web.archive.org/web/20081206124013/http://blog.openlibrary.org/2008/11/24/bulk-access-to-ocr-for-1-million-books/ |archive-date=December 6, 2008 |website=Open Library Blog}}</ref> {{As of|2013|July}}, the Internet Archive was operating 33 [[book scanning|scanning centers]] in five countries, digitizing about 1,000 books a day for a total of more than 2 million books, in a total collection of 4.4 million books{{snd}}including material digitized by others and fed into the Internet Archive; at that time, users were performing more than 15 million downloads per month.<ref name=Hoffelder2013/> The material digitized by others includes more than 300,000 books that were contributed to the collection, between about 2006 and 2008, by [[Microsoft]] through its [[Live Search Books]] project, which also included financial support and scanning equipment directly donated to the Internet Archive.<ref name=msdown>{{cite web|url=http://blogs.msdn.com/livesearch/archive/2008/05/23/book-search-winding-down.aspx |title=Book search winding down |date=May 23, 2008 |work= MSDN Live Search Blog<!-- Official announcement from Microsoft-->|archive-url=https://web.archive.org/web/20080820220749/http://blogs.msdn.com/livesearch/archive/2008/05/23/book-search-winding-down.aspx |archive-date=August 20, 2008}}</ref> On May 23, 2008, Microsoft announced it would be ending its Live Book Search project and would no longer be scanning books, donating its remaining scanning equipment to its former partners.<ref name=msdown/> Around October 2007, Archive users began uploading [[public domain]] books from [[Google Books|Google Book Search]].<ref>{{Cite web |title=Google Books at Internet Archive |url=https://archive.org/details/googlebooks |url-status=live |archive-url=https://web.archive.org/web/20081206201549/https://archive.org/details/googlebooks |archive-date=December 6, 2008 |access-date=November 9, 2008 |publisher=Internet Archive}}</ref> {{As of|2013|November}}, there were more than 900,000 Google-digitized books in the Archive's collection;<ref>{{Cite web |url=https://archive.org/search.php?query=sponsor%3A%28Google%29 |title=List of Google scans |archive-url=https://web.archive.org/web/20140126055407/https://archive.org/search.php?query=sponsor%3A%28Google%29 |archive-date=January 26, 2014 |publisher=Internet Archive}}</ref> the books are identical to the copies found on Google, except without the Google watermarks, and are available for unrestricted use and download.{{efn|Books imported from Google have a metadata tag of scanner:google for searching purposes. The archive provides a link to Google for PDF copies, but also maintains a local PDF copy, which is viewable under the "All Files: HTTPS" link. As all the other books in the collection, they also provide OCR text and images in open formats, particularly [[DjVu]], which Google Books does not offer.}} Brewster Kahle revealed in 2013 that this archival effort was coordinated by [[Aaron Swartz]], who, with a "bunch of friends", downloaded the public domain books from Google slowly enough and from enough computers to stay within Google's restrictions. They did this to ensure public access to the [[public domain]]. The Archive ensured the items were attributed and linked back to Google, which never complained, while libraries "grumbled". According to Kahle, this is an example of Swartz's "genius" to work on what could give the most to the public good for millions of people.<ref name=kahle-aswmem>Brewster Kahle, "[https://archive.org/details/AaronSwartzMemorialAtTheInternetArchive?start=4680 Aaron Swartz memorial at the Internet Archive] {{webarchive|url=https://web.archive.org/web/20150629062022/https://archive.org/details/AaronSwartzMemorialAtTheInternetArchive?start=4680 |date=June 29, 2015 }}", 2013-01-24, via [https://wellpreparedmind.wordpress.com/2013/02/07/aaron-swartz-freed-over-900000-public-domain-books-from-googles-restrictions/ The well-prepared mind] {{Webarchive|url=https://web.archive.org/web/20140814162152/http://wellpreparedmind.wordpress.com/2013/02/07/aaron-swartz-freed-over-900000-public-domain-books-from-googles-restrictions/ |date=August 14, 2014 }}, via [http://scinfolex.com/2013/02/06/cest-aaron-swartz-qui-liberait-les-livres-de-google-books-sur-internet-archive/ S.I.Lex] {{Webarchive|url=https://web.archive.org/web/20140808094118/http://scinfolex.com/2013/02/06/cest-aaron-swartz-qui-liberait-les-livres-de-google-books-sur-internet-archive/ |date=August 8, 2014 }}.</ref> {{anchor|RECAP US Federal Court Documents}}In addition to books, the Archive offers free and anonymous public access to more than four million court opinions, legal briefs, or exhibits uploaded from the [[Federal judiciary of the United States|United States Federal Courts]]' [[PACER (law)|PACER]] electronic document system via the [[RECAP]] web browser plugin. These documents had been kept behind a federal court paywall. On the Archive, they had been accessed by more than six million people by 2013.<ref name=kahle-aswmem/> The Archive's BookReader [[Web application|web app]],<ref name="BookReader">{{cite web |title=Internet Archive BookReader |url=https://archive.org/details/BookReader |website=archive.org |access-date=June 21, 2019 |archive-url=https://web.archive.org/web/20190621131721/https://archive.org/details/BookReader |archive-date=June 21, 2019 |url-status=live }}</ref> built into its website, has features such as single-page, two-page, and [[thumbnail]] modes; fullscreen mode; [[page zooming]] of [[Image resolution|high-resolution]] images; and [[flip page]] animation.<ref name="BookReader"/><ref>{{cite web |last=Kaplan |first=Jeff |date=December 10, 2010 |title=New BookReader! |url=https://blog.archive.org/2010/12/10/2685/ |website=blog.archive.org |access-date=June 21, 2019 |archive-url=https://web.archive.org/web/20190621200255/https://blog.archive.org/2010/12/10/2685/ |archive-date=June 21, 2019 |url-status=live }}</ref> In October 2024, the Internet Archive agreed to accept the paper copies of 400,000 uncatalogued dissertations from the [[Leiden University Library]], from the period 1851β2004, that the library wanted to dispose of. The University had received them from foreign Universities as part of a dissertation exchange program that had begun with its foundation in 1575, continuing for nearly 430 years. The Archive plans to digitise them and make them accessible online. The original full collection included theses by [[Niels Bohr]], [[Marie Curie]], [[Γmile Durkheim]], [[Albert Einstein]], [[Otto Hahn]], [[Carl Jung]], [[J. Robert Oppenheimer]], [[Max Planck]], [[Luigi Pirandello]], [[Gustav Stresemann]] and [[Max Weber]].<ref>{{citation |last=Funnekotter |first=Bart |title=Leidse proefschriften worden tΓ³ch niet vernietigd, 400.000 dissertaties gaan naar de VS |trans-title=Leiden dissertations not destroyed after all, 400,000 dissertations go to the US |newspaper=[[NRC (newspaper)|NRC]] |date=October 9, 2024 |url=https://www.nrc.nl/nieuws/2024/10/09/leidse-proefschriften-worden-toch-niet-vernietigd-400000-dissertaties-gaan-naar-de-vs-a4868698 |archive-url=https://archive.today/20241009111945/https://www.nrc.nl/nieuws/2024/10/09/leidse-proefschriften-worden-toch-niet-vernietigd-400000-dissertaties-gaan-naar-de-vs-a4868698 |archive-date=October 9, 2024}}</ref> ===Open Library=== {{main|Open Library}} The Open Library is another project of the Internet Archive. The project seeks to include a web page for every book ever published: it holds 25 million catalog records of editions. It also seeks to be a web-accessible public library: it contains the full texts of approximately 1,600,000 public domain books (out of the more than five million from the main [[#Text collection|texts collection]]), as well as in-print and in-copyright books,<ref>{{cite web |title=FAQ on Controlled Digital Lending (CDL) |date=February 13, 2019 |url=https://nwu.org/book-division/cdl/faq/ |publisher=National Writers Union |access-date=February 15, 2019 |archive-date=March 30, 2020 |archive-url=https://wayback.archive-it.org/all/20200330193826/https://nwu.org/book%2Ddivision/cdl/faq/ |url-status=live }}</ref> many of which are fully readable, downloadable<ref>{{cite magazine|first=Antone |last=Gonsalves |title=Internet Archive Claims Progress Against Google Library Initiative |url=http://www.informationweek.com/story/showArticle.jhtml?articleID=196701339 |magazine=InformationWeek |date=December 20, 2006 |url-status=live |archive-url=https://web.archive.org/web/20071014174528/http://informationweek.com/story/showArticle.jhtml?articleID=196701339 |archive-date=October 14, 2007 }}</ref><ref>{{cite news|title=The Open Library Makes Its Online Debut |url=http://chronicle.com/wiredcampus/index.php?id=2235?=atwc |publisher=Chronicle of Higher Education |work=The Wired Campus |date=July 19, 2007 |archive-url=https://web.archive.org/web/20070930184259/http://chronicle.com/wiredcampus/index.php?id=2235%3F%3Datwc |archive-date=September 30, 2007 |url-status=dead }}</ref> and [[full-text search]]able;<ref>{{Cite web |title=Search Inside |url=https://openlibrary.org/search/inside |url-status=live |archive-url=https://web.archive.org/web/20131020130821/http://openlibrary.org/search/inside |archive-date=October 20, 2013 |website=OpenLibrary.org}}</ref> it offers a two-week loan of [[ebook|e-book]]s in its [[controlled digital lending]] program for over 647,784 books not in the public domain, in partnership with over 1,000 library partners from six countries<ref name="Hoffelder2013">{{Cite web |last=Hoffelder |first=Nate |date=July 9, 2013 |title=Internet Archive Now Hosts 4.4 Million eBooks, Sees 15 Million eBooks Downloaded Each Month |url=https://www.the-digital-reader.com/2013/07/09/internet-archive-now-hosts-4-4-million-ebooks-sees-15-million-ebooks-downloaded-each-month/ |url-status=live |archive-url=https://web.archive.org/web/20230527192428/https://the-digital-reader.com/internet-archive-now-hosts-4-4-million-ebooks-sees-15-million-ebooks-downloaded-each-month/ |archive-date=2023-05-27 |publisher=The Digital Reader}}</ref><ref>{{Cite web |date=June 25, 2011 |title=In-Library eBook Lending Program Expands to 1,000 Libraries |url=https://blog.archive.org/2011/06/25/in-library-ebook-lending-program-expands-to-1000-libraries/ |url-status=live |archive-url=https://web.archive.org/web/20140813035522/https://blog.archive.org/2011/06/25/in-library-ebook-lending-program-expands-to-1000-libraries/ |archive-date=August 13, 2014 |website=Internet Archive Blogs |publisher=Internet Archive}}</ref> after a free registration on the web site. Open Library is a [[free and open-source software]] project, with its source code freely available on [[GitHub]]. The Open Library faces objections from some authors and the [[Society of Authors]], who hold that the project is distributing books without authorization and is thus in violation of copyright laws,<ref>{{cite news|work=The Guardian|url=https://www.theguardian.com/books/2019/jan/22/internet-archives-ebook-loans-face-uk-copyright-challenge|title=Internet Archive's ebook loans face UK copyright challenge|first=Alison|last=Flood|date=22 Jan 2019|access-date=March 28, 2020|archive-date=February 12, 2019|archive-url=https://web.archive.org/web/20190212070623/https://www.theguardian.com/books/2019/jan/22/internet-archives-ebook-loans-face-uk-copyright-challenge|url-status=live}}</ref> and four major publishers initiated a copyright infringement lawsuit against the Internet Archive in June 2020 to stop the Open Library project.<ref name="open library lawsuit">{{cite web | url = https://www.theverge.com/2020/6/1/21277036/internet-archive-publishers-lawsuit-open-library-ebook-lending | title = Publishers sue Internet Archive over Open Library ebook lending | first = Russell | last = Brandom | date = June 1, 2020 | access-date = June 1, 2020 | work = [[The Verge]] | archive-date = June 1, 2020 | archive-url = https://web.archive.org/web/20200601185706/https://www.theverge.com/2020/6/1/21277036/internet-archive-publishers-lawsuit-open-library-ebook-lending | url-status = live }}</ref> ===Digitizing sponsors for books=== Many large institutional sponsors have helped the Internet Archive provide millions of scanned publications (text items).<ref>For example, the [[Princeton Theological Seminary Library]] has described how it and other academic libraries are digitization partners with the Internet Archive: {{cite web |url=https://library.ptsem.edu/partnering-with-the-internet-archive |title=Partnering with the Internet Archive |website=[[Princeton Theological Seminary Library]] |access-date=December 4, 2020 |archive-date=November 30, 2020 |archive-url=https://web.archive.org/web/20201130110259/https://library.ptsem.edu/partnering-with-the-internet-archive |url-status=live }}</ref> Some sponsors that have digitized large quantities of texts include the University of Toronto's [[Robarts Library]], [[University of Alberta Libraries]], [[University of Ottawa]], [[Library of Congress]], [[Boston Library Consortium]] member libraries, [[Boston Public Library]], [[Princeton Theological Seminary Library]], and many others.<ref>{{cite web |url=https://archive.org/search.php?query=collection%3A%28texts%29+AND+mediatype%3A%28collection%29&sort=-downloads |title=Internet Archive Search: collection:(texts) |website=archive.org |access-date=December 4, 2020}}</ref> In 2017, the [[MIT Press]] authorized the Internet Archive to digitize and lend books from the press's [[backlist]],<ref>{{cite web |url=https://archive.org/details/mitpress |title=The MIT Press |website=archive.org |access-date=2020-06-27}}</ref> with financial support from the [[Arcadia Fund]].<ref>{{cite web |last=Hanamura |first=Wendy |date=May 30, 2017 |title=MIT Press Classics Available Soon at Archive.org |url=https://blog.archive.org/2017/05/30/mit-press-classics-available-soon-at-archive-org/ |website=blog.archive.org |access-date=2020-06-27 |quote=For more than eighty years, MIT Press has been publishing acclaimed titles in science, technology, art and architecture. Now, thanks to a new partnership between the Internet Archive and MIT Press, readers will be able to borrow these classics online for the first time.}}</ref><ref>{{cite web |last=Green |first=Alex |date=December 1, 2019 |title=New Takes on Academic Publishing: Three university presses find new ways to keep up with a changing market |url=https://www.publishersweekly.com/pw/by-topic/industry-news/publisher-news/article/81872-new-takes-on-academic-publishing.html |website=[[Publishers Weekly]] |access-date=2020-06-27 |quote=Since she became director [of the MIT Press] in 2015, there's little that Brand hasn't reenvisioned at the press. In 2017, the press partnered with the Internet Archive to make its deep backlist available free at libraries, resurrecting books that had not seen the light of day in generations. |archive-date=June 27, 2020 |archive-url=https://web.archive.org/web/20200627161843/https://www.publishersweekly.com/pw/by-topic/industry-news/publisher-news/article/81872-new-takes-on-academic-publishing.html |url-status=live }}</ref> A year later, the Internet Archive received further funding from the Arcadia Fund to invite some other university presses to partner with the Internet Archive to digitize books, a project called "Unlocking University Press Books".<ref>{{cite web |last=Freeland |first=Chris |date=May 21, 2018 |title=Internet Archive awarded grant from Arcadia Fund to digitize university press collections |url=https://blog.archive.org/2018/05/21/internet-archive-awarded-grant-from-arcadia-fund-to-digitize-university-press-collections/ |website=blog.archive.org |access-date=2020-06-27 |quote=Internet Archive has received a $1 million dollar grant from Arcadia β a charitable fund of Lisbet Rausing and Peter Baldwin β to digitize titles from university press collections to make them available via controlled digital lending.}}</ref><ref>{{cite web |last=Albanese |first=Andrew |date=May 25, 2018 |title=Internet Archive Lands Grant to Digitize and Lend University Press Collections |url=https://www.publishersweekly.com/pw/by-topic/industry-news/libraries/article/76974-is-it-time-to-rethink-how-we-do-library-advocacy.html |website=[[Publishers Weekly]] |access-date=2020-06-27 |archive-date=June 27, 2020 |archive-url=https://web.archive.org/web/20200627172007/https://www.publishersweekly.com/pw/by-topic/industry-news/libraries/article/76974-is-it-time-to-rethink-how-we-do-library-advocacy.html |url-status=live }}</ref> The [[Library of Congress]] created numerous [[Handle System]] identifiers that pointed to free digitized books in the Internet Archive.<ref>For example: {{cite web |title=hdl.loc.gov/loc.gdc/scd0001.00198115083 |url=https://hdl.loc.gov/loc.gdc/scd0001.00198115083 |access-date=November 25, 2020 |mode=cs2 |archive-date=July 4, 2021 |archive-url=https://web.archive.org/web/20210704233132/https://hdl.loc.gov/loc.gdc/scd0001.00198115083 |url-status=dead }}; {{cite web |title=hdl.loc.gov/loc.gdc/scd0001.00060921933 |url=https://hdl.loc.gov/loc.gdc/scd0001.00060921933 |access-date=November 25, 2020 |mode=cs2 |archive-date=July 4, 2021 |archive-url=https://web.archive.org/web/20210704233036/https://hdl.loc.gov/loc.gdc/scd0001.00060921933 |url-status=dead }}; {{cite web |title=hdl.loc.gov/loc.gdc/scd0001.00060927248 |url=https://hdl.loc.gov/loc.gdc/scd0001.00060927248 |access-date=November 25, 2020 |mode=cs2 |archive-date=July 4, 2021 |archive-url=https://web.archive.org/web/20210704233041/https://hdl.loc.gov/loc.gdc/scd0001.00060927248 |url-status=dead }}; {{cite web |title=hdl.loc.gov/loc.gdc/scd0001.00001740908 |url=https://hdl.loc.gov/loc.gdc/scd0001.00001740908 |access-date=November 25, 2020 |mode=cs2 |archive-date=July 4, 2021 |archive-url=https://web.archive.org/web/20210704233037/https://hdl.loc.gov/loc.gdc/scd0001.00001740908 |url-status=dead }}; {{cite web |title=hdl.loc.gov/loc.gdc/scd0001.00027740005 |url=https://hdl.loc.gov/loc.gdc/scd0001.00027740005 |access-date=November 25, 2020 |mode=cs2 |archive-date=July 4, 2021 |archive-url=https://web.archive.org/web/20210704233038/https://hdl.loc.gov/loc.gdc/scd0001.00027740005 |url-status=dead }}.</ref> The Internet Archive and Open Library are listed on the Library of Congress website as a source of e-books.<ref>{{cite web |title=External Web Sites β Finding E-books: A Guide β Library of Congress Bibliographies, Research Guides, and Finding Aids (Virtual Programs & Services) |url=https://www.loc.gov/rr/program/bib/ebooks/external.html |website=Library of Congress |access-date=November 25, 2020 |quote=The Internet Archive includes the full text of more than 2.5 million e-books, including e-books supplied by the Library of Congress. Books can be read online or downloaded and read in a variety of formats. E-books from the Internet Archive can also be found through Open Library, an Internet Archive initiative devoted to texts. |date=2017 |orig-date=April 2011 |first1=J. Cheyenne |last1=Hohman |first2=Yasmeen |last2=Mughal |archive-date=November 25, 2020 |archive-url=https://web.archive.org/web/20201125170006/https://www.loc.gov/rr/program/bib/ebooks/external.html |url-status=deviated }} And: {{cite web |title=Devices and Formats β Finding E-books: A Guide β Library of Congress Bibliographies, Research Guides, and Finding Aids (Virtual Programs & Services) |url=https://www.loc.gov/rr/program/bib/ebooks/devicesformats.html |website=Library of Congress |access-date=November 25, 2020 |quote=Library of Congress publications are available for free download to the Kindle from the Internet Archive. ... The iPad can be used as an e-reader via apps such as iBooks, which support both ePub (.epub) and PDF (.pdf) formats. Both formats are available from the Internet Archive. |date=2017 |orig-date=April 2011 |first1=J. Cheyenne |last1=Hohman |first2=Yasmeen |last2=Mughal |archive-date=February 12, 2021 |archive-url=https://web.archive.org/web/20210212005106/https://www.loc.gov/rr/program/bib/ebooks/devicesformats.html |url-status=deviated }}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)