Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset compiles geolocation information of archaeological sites in the Amazon Biome and surrounding area. It includes a total of 5442 sites. However, please note that the dataset contains duplicates. This was a deliberate choice to allow the user to select the subset of data points that they want to use. For example, the user could choose to deduplicate the dataset using their own chosen strategy, or use a subset from a particular source. The locations of the sites are illustrated in the figure below.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F358347%2Ff8916b3723b9297a397ba1732e3afa74%2Famazon_biome_sites.png?generation=1750548569960405&alt=media" alt="">
Fig. 1: Archaeological sites spread across the Amazon Biome (black outline). Each color corresponds to a different source (see References) as follows: de Souza (red), Coomes (green), Kalliola (blue), Walker (orange) and Jacobs (purple).
:warning: Note that the amazon_biome_sites.csv
contains duplicates. This is because some of the sources are compilations that contain other sources included in this dataset. For example, Jacobs' compilation contains some of the sites present in Kalliola's list. Likewise, there is a lot of overlap between Walker's list and all other lists (as can be appreciated in Fig. 1). In many cases, site naming is not consistent, and coordinates may also vary. For example, Walker didn't provide site names and seems to have rounded location coordinates, and Jacobs updated coordinates based on his verification of the sites on platforms like Google Earth. Additionally, some authors consider different neighboring structures as different sites, whereas others consider them as part of the same site. Therefore, deduplication is not trivial. However, depending on the intended use, there are several strategies that could be employed. For example, the user could choose to use the sites from a particular source, or remove duplicates based on approximate coordinates. You can have a look at this notebook for an example of how to deduplicate data.
sources/original/coomes/Coomes et al_Table of archaeological sites in Department of Loreto.xlsx
) was processed manually and converted into a CSV file (sources/processed/coomes_loreto_peruvian_amazon_sites.csv
).sources/original/desouza/41467_2018_3510_MOESM1_ESM.pdf
) was processed by first extracting the pages containing the relevant table (sources/processed/desouza_upper_tapajos_basin_sites.pdf
), and then asking Gemini 2.5 Flash to convert the PDF table into a CSV file. The final file was manually verified to correct for any mistakes, including swapping the latitude and longitude values, which were inverted in the source file.sources/original/kalliola/List of Southwestern Amazonian Earthworks 25.08.2024b.pdf
) was processed by asking Gemini 2.5 Flash to convert the PDF table into a CSV file. The final file was verified manually by checking some entries and swapping the latitude and longitude values, which were inverted in the source file. Additionally, typos existing in the original file, as well as some introduced by Gemini, were fixed, as well as the order of some entries (e.g., the sequence of structures corresponding to the same site were sometimes out of order). When in doubt, things were left as in the original file. Despite efforts to fix all mistakes and try to attain consistency among the naming of the entries, there are no guarantees that all errors have been fixed.sources/original/walker/submit.csv
) was copied with the coordinate columns renamed. Additionally, the file sources/original/walker/variables.xlsx
was converted into a CSV file, and a column was added to indicate which variables correspond to which columns in the sites file.sources/original/jacobs/amazon_geoglyphs.xls
, compiled by Jacobs, was turned into a CSV file by concatenating the sheets corresponding to geoglyphs, mound villages and earthworks in Mato Grosso. Additionally, some entries were fixed, where the coordinates had the wrong signs. In particular, ronq18
had a positive latitude, and mgro5
had a positive longitude.Once all the source files were compiled and processed, in order to generate the amazon_biome_sites.csv
, some further processing was performed. In particular, these are the steps that were followed:
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset compiles geolocation information of archaeological sites in the Amazon Biome and surrounding area. It includes a total of 5442 sites. However, please note that the dataset contains duplicates. This was a deliberate choice to allow the user to select the subset of data points that they want to use. For example, the user could choose to deduplicate the dataset using their own chosen strategy, or use a subset from a particular source. The locations of the sites are illustrated in the figure below.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F358347%2Ff8916b3723b9297a397ba1732e3afa74%2Famazon_biome_sites.png?generation=1750548569960405&alt=media" alt="">
Fig. 1: Archaeological sites spread across the Amazon Biome (black outline). Each color corresponds to a different source (see References) as follows: de Souza (red), Coomes (green), Kalliola (blue), Walker (orange) and Jacobs (purple).
:warning: Note that the amazon_biome_sites.csv
contains duplicates. This is because some of the sources are compilations that contain other sources included in this dataset. For example, Jacobs' compilation contains some of the sites present in Kalliola's list. Likewise, there is a lot of overlap between Walker's list and all other lists (as can be appreciated in Fig. 1). In many cases, site naming is not consistent, and coordinates may also vary. For example, Walker didn't provide site names and seems to have rounded location coordinates, and Jacobs updated coordinates based on his verification of the sites on platforms like Google Earth. Additionally, some authors consider different neighboring structures as different sites, whereas others consider them as part of the same site. Therefore, deduplication is not trivial. However, depending on the intended use, there are several strategies that could be employed. For example, the user could choose to use the sites from a particular source, or remove duplicates based on approximate coordinates. You can have a look at this notebook for an example of how to deduplicate data.
sources/original/coomes/Coomes et al_Table of archaeological sites in Department of Loreto.xlsx
) was processed manually and converted into a CSV file (sources/processed/coomes_loreto_peruvian_amazon_sites.csv
).sources/original/desouza/41467_2018_3510_MOESM1_ESM.pdf
) was processed by first extracting the pages containing the relevant table (sources/processed/desouza_upper_tapajos_basin_sites.pdf
), and then asking Gemini 2.5 Flash to convert the PDF table into a CSV file. The final file was manually verified to correct for any mistakes, including swapping the latitude and longitude values, which were inverted in the source file.sources/original/kalliola/List of Southwestern Amazonian Earthworks 25.08.2024b.pdf
) was processed by asking Gemini 2.5 Flash to convert the PDF table into a CSV file. The final file was verified manually by checking some entries and swapping the latitude and longitude values, which were inverted in the source file. Additionally, typos existing in the original file, as well as some introduced by Gemini, were fixed, as well as the order of some entries (e.g., the sequence of structures corresponding to the same site were sometimes out of order). When in doubt, things were left as in the original file. Despite efforts to fix all mistakes and try to attain consistency among the naming of the entries, there are no guarantees that all errors have been fixed.sources/original/walker/submit.csv
) was copied with the coordinate columns renamed. Additionally, the file sources/original/walker/variables.xlsx
was converted into a CSV file, and a column was added to indicate which variables correspond to which columns in the sites file.sources/original/jacobs/amazon_geoglyphs.xls
, compiled by Jacobs, was turned into a CSV file by concatenating the sheets corresponding to geoglyphs, mound villages and earthworks in Mato Grosso. Additionally, some entries were fixed, where the coordinates had the wrong signs. In particular, ronq18
had a positive latitude, and mgro5
had a positive longitude.Once all the source files were compiled and processed, in order to generate the amazon_biome_sites.csv
, some further processing was performed. In particular, these are the steps that were followed: