Harvard Releases Institutional Books 1.0, a Dataset for AI Researchers with 242B Tokens, From 394M Scanned Pages and 983K Public Domain Books in 254 Languages
Summary by Techmeme
2 Articles
2 Articles
All
Left
Center
Right
Institutional Books 1.0: A 242B Token Dataset from Harvard Library’s Collections, Refined for Accuracy and Usability (Harvard Library) | ResearchBuzz: Firehose
Harvard Library: Institutional Books 1.0: A 242B Token Dataset from Harvard Library’s Collections, Refined for Accuracy and Usability. “The rapid development and adoption of LLMs of varying quality has brought into focus the scarcity of publicly available, high-quality training data and revealed an urgent need to ground the stewardship of these datasets in sustainable practices with clear provenance chains. To that end, this technical report int…
Coverage Details
Total News Sources2
Leaning Left0Leaning Right0Center0Last UpdatedBias DistributionNo sources with tracked biases.
Bias Distribution
- There is no tracked Bias information for the sources covering this story.
Factuality
To view factuality data please Upgrade to Premium