Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability Paper • 2506.08300 • Published Jun 10, 2025 • 9
institutional/institutional-books-topic-classifier-bert Text Classification • 0.2B • Updated Jun 12, 2025 • 37 • 13
institutional/institutional-books-topic-classifier-bert Text Classification • 0.2B • Updated Jun 12, 2025 • 37 • 13
Institutional Books Collection A growing corpus of public domain books from library collections, seeded by Harvard Library. • 3 items • Updated Jun 11, 2025 • 7
Towards Best Practices for Open Datasets for LLM Training Paper • 2501.08365 • Published Jan 14, 2025 • 62