1 dataset found
  1. Archaeological Sites in the Amazon Biome

    • kaggle.com
    Updated Jun 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JP (2025). Archaeological Sites in the Amazon Biome [Dataset]. https://www.kaggle.com/datasets/josepart/archeological-sites-in-the-amazon-biome
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 22, 2025
    Dataset provided by
    Kaggle
    Authors
    JP
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Amazon Rainforest
    Description

    Description

    This dataset compiles geolocation information of archaeological sites in the Amazon Biome and surrounding area. It includes a total of 5442 sites. However, please note that the dataset contains duplicates. This was a deliberate choice to allow the user to select the subset of data points that they want to use. For example, the user could choose to deduplicate the dataset using their own chosen strategy, or use a subset from a particular source. The locations of the sites are illustrated in the figure below.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F358347%2Ff8916b3723b9297a397ba1732e3afa74%2Famazon_biome_sites.png?generation=1750548569960405&alt=media" alt=""> Fig. 1: Archaeological sites spread across the Amazon Biome (black outline). Each color corresponds to a different source (see References) as follows: de Souza (red), Coomes (green), Kalliola (blue), Walker (orange) and Jacobs (purple).

    :warning: Note that the amazon_biome_sites.csv contains duplicates. This is because some of the sources are compilations that contain other sources included in this dataset. For example, Jacobs' compilation contains some of the sites present in Kalliola's list. Likewise, there is a lot of overlap between Walker's list and all other lists (as can be appreciated in Fig. 1). In many cases, site naming is not consistent, and coordinates may also vary. For example, Walker didn't provide site names and seems to have rounded location coordinates, and Jacobs updated coordinates based on his verification of the sites on platforms like Google Earth. Additionally, some authors consider different neighboring structures as different sites, whereas others consider them as part of the same site. Therefore, deduplication is not trivial. However, depending on the intended use, there are several strategies that could be employed. For example, the user could choose to use the sites from a particular source, or remove duplicates based on approximate coordinates. You can have a look at this notebook for an example of how to deduplicate data.

    Data Collection

    • The Excel sheet with a list of archaeological sites in the Department of Loreto in Peru (sources/original/coomes/Coomes et al_Table of archaeological sites in Department of Loreto.xlsx) was processed manually and converted into a CSV file (sources/processed/coomes_loreto_peruvian_amazon_sites.csv).
    • The PDF file containing archaeological sites in the Upper Tapajós Basin in Brazil (sources/original/desouza/41467_2018_3510_MOESM1_ESM.pdf) was processed by first extracting the pages containing the relevant table (sources/processed/desouza_upper_tapajos_basin_sites.pdf), and then asking Gemini 2.5 Flash to convert the PDF table into a CSV file. The final file was manually verified to correct for any mistakes, including swapping the latitude and longitude values, which were inverted in the source file.
    • The PDF file containing archaeological sites in the southwestern Amazon (sources/original/kalliola/List of Southwestern Amazonian Earthworks 25.08.2024b.pdf) was processed by asking Gemini 2.5 Flash to convert the PDF table into a CSV file. The final file was verified manually by checking some entries and swapping the latitude and longitude values, which were inverted in the source file. Additionally, typos existing in the original file, as well as some introduced by Gemini, were fixed, as well as the order of some entries (e.g., the sequence of structures corresponding to the same site were sometimes out of order). When in doubt, things were left as in the original file. Despite efforts to fix all mistakes and try to attain consistency among the naming of the entries, there are no guarantees that all errors have been fixed.
    • The file with the sites compiled by Walker et al. (sources/original/walker/submit.csv) was copied with the coordinate columns renamed. Additionally, the file sources/original/walker/variables.xlsx was converted into a CSV file, and a column was added to indicate which variables correspond to which columns in the sites file.
    • The file sources/original/jacobs/amazon_geoglyphs.xls, compiled by Jacobs, was turned into a CSV file by concatenating the sheets corresponding to geoglyphs, mound villages and earthworks in Mato Grosso. Additionally, some entries were fixed, where the coordinates had the wrong signs. In particular, ronq18 had a positive latitude, and mgro5 had a positive longitude.

    Data Processing

    Once all the source files were compiled and processed, in order to generate the amazon_biome_sites.csv, some further processing was performed. In particular, these are the steps that were followed:

    • For each source file, site names were modified (when necessary) by combining information from m...
  2. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
JP (2025). Archaeological Sites in the Amazon Biome [Dataset]. https://www.kaggle.com/datasets/josepart/archeological-sites-in-the-amazon-biome
Organization logo

Archaeological Sites in the Amazon Biome

A List of Archaeological Sites in the Amazon Biome with Geolocation Information

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 22, 2025
Dataset provided by
Kaggle
Authors
JP
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
Amazon Rainforest
Description

Description

This dataset compiles geolocation information of archaeological sites in the Amazon Biome and surrounding area. It includes a total of 5442 sites. However, please note that the dataset contains duplicates. This was a deliberate choice to allow the user to select the subset of data points that they want to use. For example, the user could choose to deduplicate the dataset using their own chosen strategy, or use a subset from a particular source. The locations of the sites are illustrated in the figure below.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F358347%2Ff8916b3723b9297a397ba1732e3afa74%2Famazon_biome_sites.png?generation=1750548569960405&alt=media" alt=""> Fig. 1: Archaeological sites spread across the Amazon Biome (black outline). Each color corresponds to a different source (see References) as follows: de Souza (red), Coomes (green), Kalliola (blue), Walker (orange) and Jacobs (purple).

:warning: Note that the amazon_biome_sites.csv contains duplicates. This is because some of the sources are compilations that contain other sources included in this dataset. For example, Jacobs' compilation contains some of the sites present in Kalliola's list. Likewise, there is a lot of overlap between Walker's list and all other lists (as can be appreciated in Fig. 1). In many cases, site naming is not consistent, and coordinates may also vary. For example, Walker didn't provide site names and seems to have rounded location coordinates, and Jacobs updated coordinates based on his verification of the sites on platforms like Google Earth. Additionally, some authors consider different neighboring structures as different sites, whereas others consider them as part of the same site. Therefore, deduplication is not trivial. However, depending on the intended use, there are several strategies that could be employed. For example, the user could choose to use the sites from a particular source, or remove duplicates based on approximate coordinates. You can have a look at this notebook for an example of how to deduplicate data.

Data Collection

  • The Excel sheet with a list of archaeological sites in the Department of Loreto in Peru (sources/original/coomes/Coomes et al_Table of archaeological sites in Department of Loreto.xlsx) was processed manually and converted into a CSV file (sources/processed/coomes_loreto_peruvian_amazon_sites.csv).
  • The PDF file containing archaeological sites in the Upper Tapajós Basin in Brazil (sources/original/desouza/41467_2018_3510_MOESM1_ESM.pdf) was processed by first extracting the pages containing the relevant table (sources/processed/desouza_upper_tapajos_basin_sites.pdf), and then asking Gemini 2.5 Flash to convert the PDF table into a CSV file. The final file was manually verified to correct for any mistakes, including swapping the latitude and longitude values, which were inverted in the source file.
  • The PDF file containing archaeological sites in the southwestern Amazon (sources/original/kalliola/List of Southwestern Amazonian Earthworks 25.08.2024b.pdf) was processed by asking Gemini 2.5 Flash to convert the PDF table into a CSV file. The final file was verified manually by checking some entries and swapping the latitude and longitude values, which were inverted in the source file. Additionally, typos existing in the original file, as well as some introduced by Gemini, were fixed, as well as the order of some entries (e.g., the sequence of structures corresponding to the same site were sometimes out of order). When in doubt, things were left as in the original file. Despite efforts to fix all mistakes and try to attain consistency among the naming of the entries, there are no guarantees that all errors have been fixed.
  • The file with the sites compiled by Walker et al. (sources/original/walker/submit.csv) was copied with the coordinate columns renamed. Additionally, the file sources/original/walker/variables.xlsx was converted into a CSV file, and a column was added to indicate which variables correspond to which columns in the sites file.
  • The file sources/original/jacobs/amazon_geoglyphs.xls, compiled by Jacobs, was turned into a CSV file by concatenating the sheets corresponding to geoglyphs, mound villages and earthworks in Mato Grosso. Additionally, some entries were fixed, where the coordinates had the wrong signs. In particular, ronq18 had a positive latitude, and mgro5 had a positive longitude.

Data Processing

Once all the source files were compiled and processed, in order to generate the amazon_biome_sites.csv, some further processing was performed. In particular, these are the steps that were followed:

  • For each source file, site names were modified (when necessary) by combining information from m...
Search
Clear search
Close search
Google apps
Main menu