7 datasets found

h
Data from: newspaper-navigator
huggingface.co
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BigLAM: BigScience Libraries, Archives and Museums (2025). newspaper-navigator [Dataset]. https://huggingface.co/datasets/biglam/newspaper-navigator
Explore at:
Dataset updated
May 20, 2025
Dataset authored and provided by
BigLAM: BigScience Libraries, Archives and Museums
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Dataset Card for Newspaper Navigator

Dataset Summary

This dataset provides a Parquet-converted version of the Newspaper Navigator dataset from the Library of Congress. Originally released as JSON, Newspaper Navigator contains over 16 million pages of historic US newspapers annotated with bounding boxes, predicted visual types (e.g., photographs, maps), and OCR content. This work was carried out as part of a project by Benjamin Germain Lee et al. This version of the… See the full description on the dataset page: https://huggingface.co/datasets/biglam/newspaper-navigator.
Images from Newspaper Navigator predicted as maps, with human corrected...
zenodo.org
csv, json, txt, zip
Updated Mar 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel van Strien; Daniel van Strien (2021). Images from Newspaper Navigator predicted as maps, with human corrected labels [Dataset]. http://doi.org/10.5281/zenodo.4156510
Explore at:
txt, json, zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4156510
Dataset updated
Mar 15, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Daniel van Strien; Daniel van Strien
Description
The Dataset contains images derived from the Newspaper Navigator (news-navigator.labs.loc.gov/), a dataset of images drawn from the Library of Congress Chronicling America collection (chroniclingamerica.loc.gov/).

[The Newspaper Navigator dataset] consists of extracted visual content for 16,358,041 historic newspaper pages in Chronicling America. The visual content was identified using an object detection model trained on annotations of World War 1-era Chronicling America pages, including annotations made by volunteers as part of the Beyond Words crowdsourcing project.

source: https://news-navigator.labs.loc.gov/

One of these categories is 'maps'. In the original training data for Newspaper Navigator, there were relatively few labelled examples of maps. The predictions for maps have an Average Precision of 69.5%, and 34 images in the validation data.

This dataset contains a sample of these images which have been predicted as 'maps'. It also includes additional labels which indicate whether the predicted map image is a 'map' or 'not a map'.

The data is organised as follows:

The images themselves can be found in 'newspaper_maps.zip'

`2020_30_10_13_19_228_sample.json` contains metadata about each image drawn from the Newspaper Navigator Dataset.

map_labels.csv contains the labels for the images as a CSV file
19th Century United States Newspaper Advert images with 'illustrated' or...
zenodo.org
data.niaid.nih.gov
csv, zip
Updated Jan 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel van Strien; Daniel van Strien (2022). 19th Century United States Newspaper Advert images with 'illustrated' or 'non illustrated' labels [Dataset]. http://doi.org/10.5281/zenodo.5838410
Explore at:
csv, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5838410
Dataset updated
Jan 12, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Daniel van Strien; Daniel van Strien
Area covered
United States
Description
The Dataset contains images derived from the Newspaper Navigator (news-navigator.labs.loc.gov/), a dataset of images drawn from the Library of Congress Chronicling America collection (chroniclingamerica.loc.gov/).

[The Newspaper Navigator dataset] consists of extracted visual content for 16,358,041 historic newspaper pages in Chronicling America. The visual content was identified using an object detection model trained on annotations of World War 1-era Chronicling America pages, including annotations made by volunteers as part of the Beyond Words crowdsourcing project.

source: https://news-navigator.labs.loc.gov/

One of these categories is 'advertisements. This dataset contains a sample of these images with additional labels indicating if the advert is 'illustrated' or 'not illustrated'.

The data is organised as follows:

The images themselves can be found in `images.zip`

`newspaper-navigator-sample-metadata.csv` contains metadata about each image drawn from the Newspaper Navigator Dataset.

`ads.csv` contains the labels for the images as a CSV file

`sample.csv` contains additional metadata about the images (based on the newspapers those images came from).

This dataset was created for use in an under-review Programming Historian tutorial (http://programminghistorian.github.io/ph-submissions/lessons/computer-vision-deep-learning-pt1) The primary aim of the data was to provide a realistic example dataset for teaching computer vision for working with digitised heritage material. The data is shared here since it may be useful for others. This data documentation is a work in progress and will be updated when the Programming Historian tutorial is released publicly.

The metadata CSV file contains the following columns:

- filepath
- pub_date
- page_seq_num
- edition_seq_num
- batch
- lccn
- box
- score
- ocr
- place_of_publication
- geographic_coverage
- name
- publisher
- url
- page_url
- month
- year
- iiif_url
o
19th Century United States Newspaper images predicted as Photographs with...
explore.openaire.eu
data.niaid.nih.gov
Updated Jan 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel van Strien (2022). 19th Century United States Newspaper images predicted as Photographs with labels for "human", "animal", "human-structure" and "landscape" [Dataset]. http://doi.org/10.5281/zenodo.4487141
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4487141
Dataset updated
Jan 11, 2022
Authors
Daniel van Strien
Area covered
United States
Description
The Dataset contains images derived from the Newspaper Navigator (news-navigator.labs.loc.gov/), a dataset of images drawn from the Library of Congress Chronicling America collection (chroniclingamerica.loc.gov/). [The Newspaper Navigator dataset] consists of extracted visual content for 16,358,041 historic newspaper pages in Chronicling America. The visual content was identified using an object detection model trained on annotations of World War 1-era Chronicling America pages, including annotations made by volunteers as part of the Beyond Words crowdsourcing project. source: https://news-navigator.labs.loc.gov/ One of these categories is 'photographs'. This dataset contains a sample of these images with additional labels indicating if the photograph has one or more of the following labels: "human", "animal", "human-structure" and "landscape" The data is organised as follows: The images themselves can be found in images.zip newspaper-navigator-sample-metadata.csv contains metadata about each image drawn from the Newspaper Navigator Dataset. multi_label.csv contains the labels for the images as a CSV file annotations.csv conains the labels for the images with additional metadata This dataset was created for use in an under-review Programming Historian tutorial (http://programminghistorian.github.io/ph-submissions/lessons/computer-vision-deep-learning-pt2) The primary aim of the data was to provide a realistic example dataset for teaching computer vision for working with digitised heritage material. The data is shared here since it may be useful for others. This data documentation is a work in progress and will be updated when the Programming Historian tutorial is released publicly. The metadata CSV file contains the following columns: - filepath - pub_date - page_seq_num - edition_seq_num - batch - lccn - box - score - ocr - place_of_publication - geographic_coverage - name - publisher - url - page_url - month - year - iiif_url
h
newspaper_navigator
huggingface.co
Updated Oct 14, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel van Strien (2022). newspaper_navigator [Dataset]. https://huggingface.co/datasets/davanstrien/newspaper_navigator
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 14, 2022
Authors
Daniel van Strien
License
https://choosealicense.com/licenses/undefined/https://choosealicense.com/licenses/undefined/
Description
davanstrien/newspaper_navigator dataset hosted on Hugging Face and contributed by the HF Datasets community
O
Data from: Newspaper Navigator
opendatalab.com
zip
Updated Mar 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Library of Congress (2023). Newspaper Navigator [Dataset]. https://opendatalab.com/OpenDataLab/Newspaper_Navigator
Explore at:
zipAvailable download formats
Dataset updated
Mar 24, 2023
Dataset provided by
LC Labs
University of Washington
Library of Congress
License
https://github.com/LibraryOfCongress/newspaper-navigator/blob/master/LICENSEhttps://github.com/LibraryOfCongress/newspaper-navigator/blob/master/LICENSE
Description
The goal of Newspaper Navigator is to re-imagine searching over the visual content in Chronicling America. The project consists of two stages: Creating the Newspaper Navigator dataset by extracting headlines, photographs, illustrations, maps, comics, cartoons, and advertisements from 16.3 million historic newspaper pages in Chronicling America using emerging machine learning techniques. In addition to the visual content, the dataset includes captions and other relevant text derived from the METS/ALTO OCR, as well as image embeddings for fast similarity querying. Creating an exploratory search application for the Newspaper Navigator dataset in order to enable new ways for the American public to navigate Chronicling America.
h
loc_beyond_words
huggingface.co
Updated Mar 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BigLAM: BigScience Libraries, Archives and Museums (2023). loc_beyond_words [Dataset]. https://huggingface.co/datasets/biglam/loc_beyond_words
Explore at:
Dataset updated
Mar 2, 2023
Dataset authored and provided by
BigLAM: BigScience Libraries, Archives and Museums
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Dataset Card for Beyond Words

Dataset Summary

The Beyond Words dataset is a crowdsourced collection of bounding box annotations on World War I-era historical newspaper pages from the Library of Congress’s Chronicling America collection. Volunteers marked seven types of visual content — photographs, illustrations, maps, comics, editorial cartoons, headlines, and advertisements — enabling the training of the visual content recognition model behind the Newspaper Navigator… See the full description on the dataset page: https://huggingface.co/datasets/biglam/loc_beyond_words.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

BigLAM: BigScience Libraries, Archives and Museums (2025). newspaper-navigator [Dataset]. https://huggingface.co/datasets/biglam/newspaper-navigator

Data from: newspaper-navigator

Newspaper Navigator

biglam/newspaper-navigator

Explore at:

Dataset updated

May 20, 2025

Dataset authored and provided by

BigLAM: BigScience Libraries, Archives and Museums

License

https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

Description

Dataset Card for Newspaper Navigator

  Dataset Summary

This dataset provides a Parquet-converted version of the Newspaper Navigator dataset from the Library of Congress. Originally released as JSON, Newspaper Navigator contains over 16 million pages of historic US newspapers annotated with bounding boxes, predicted visual types (e.g., photographs, maps), and OCR content. This work was carried out as part of a project by Benjamin Germain Lee et al. This version of the… See the full description on the dataset page: https://huggingface.co/datasets/biglam/newspaper-navigator.

Clear search

Close search

Google apps

Main menu

Data from: newspaper-navigator

Images from Newspaper Navigator predicted as maps, with human corrected...

19th Century United States Newspaper Advert images with 'illustrated' or...

19th Century United States Newspaper images predicted as Photographs with...

newspaper_navigator

Data from: Newspaper Navigator

loc_beyond_words

Data from: newspaper-navigator

Newspaper Navigator

biglam/newspaper-navigator