13 datasets found

d
Census of Population, 2021 [Canada]: Data Tables, Indigenous Peoples [B2020...
search.dataone.org
dataone.org
Updated Dec 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2024). Census of Population, 2021 [Canada]: Data Tables, Indigenous Peoples [B2020 & CSV] [Dataset]. http://doi.org/10.5683/SP3/XZXQRW
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/XZXQRW
Dataset updated
Dec 4, 2024
Dataset provided by
Borealis
Authors
Statistics Canada
Area covered
Canada
Description
The Data Tables are a series of cross-tabulations presents a portrait of Canada based on the various census topics. They range in complexity and are available for various levels of geography. The topic of this dataset is Indigenous Peoples.
d
Population statistics on marital status by age group for indigenous people...
data.gov.tw
xml
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dept. of Statistics (2025). Population statistics on marital status by age group for indigenous people aged 15 and above in the statistical region [Dataset]. https://data.gov.tw/en/datasets/18676
Explore at:
xmlAvailable download formats
Dataset updated
Jun 1, 2025
Dataset authored and provided by
Dept. of Statistics
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Description
The statistics office of the Ministry of the Interior provides the latest annual statistics data for each county and city in XML format on the government's data open platform. When viewed in a browser, it appears as a series of text and numbers. This is not gibberish but can be used by programmers to develop applications. If you wish to download the data in CSV format (viewable in Excel), you can visit the Social and Economic Data Service Platform on the website of the Social and Economic Database Group of the National Land Information System (segis.moi.gov.tw) for downloading.
d
Population statistics of indigenous people aged 15 and above by age group,...
data.gov.tw
xml
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dept. of Statistics (2025). Population statistics of indigenous people aged 15 and above by age group, gender, and educational level population statistics [Dataset]. https://data.gov.tw/en/datasets/18672
Explore at:
xmlAvailable download formats
Dataset updated
Jun 1, 2025
Dataset authored and provided by
Dept. of Statistics
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Description
Statistical area indigenous population aged 15 and over by age group, gender, and educational attainment_ second-tier release area, statistical area indigenous population aged 15 and over by age group, gender, and educational attainment_ first-tier release areaThe Statistics Department of the Ministry of the Interior provides the latest annual statistical data for each county and city on the government's data open platform in XML format. When viewed with a browser, it appears as a series of alphanumeric characters. Typically, this can be used by programmers to develop programs for data applications rather than being gibberish. If you want to download the data in CSV format (viewable in Excel), please refer to the Social and Economic Data Service Platform of the National Land Information System's subgroup webpage (segis.moi.gov.tw) for downloading.
d
Statistical region indigenous population statistics by gender in the...
data.gov.tw
xml
Updated Jun 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dept. of Statistics (2024). Statistical region indigenous population statistics by gender in the ten-year age group [Dataset]. https://data.gov.tw/en/datasets/18665
Explore at:
xmlAvailable download formats
Dataset updated
Jun 26, 2024
Dataset authored and provided by
Dept. of Statistics
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Description
The statistics department of the Ministry of the Interior provides the latest statistical data for each county and city on the government's data open platform in xml format. When viewed in a browser, it appears as a series of characters and numbers, which can be processed by programmers rather than being garbled code. If you prefer to download the data in csv format (viewable in Excel), you can visit the social and economic data service platform of the Social and Economic Database Group webpage (segis.moi.gov.tw) of the National Spatial Information System for download.
d
Census of Population, 2021 [Canada]: Data Quality Tables [B2020 & CSV]
search.dataone.org
borealisdata.ca
+1more
Updated Dec 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2024). Census of Population, 2021 [Canada]: Data Quality Tables [B2020 & CSV] [Dataset]. http://doi.org/10.5683/SP3/0NGD3X
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/0NGD3X
Dataset updated
Dec 4, 2024
Dataset provided by
Borealis
Authors
Statistics Canada
Area covered
Canada
Description
These tables present data quality indicators for the Census of Population. Data quality indicators are available for various levels of geography, most questions on the Census of Population questionnaires and other key variables of the Census of Population.
Data from: Projected speaker numbers and dormancy risks of Canada's...
zenodo.org
bin, csv, txt
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Boissonneault; Michael Boissonneault (2024). Projected speaker numbers and dormancy risks of Canada's Indigenous languages [Dataset]. http://doi.org/10.5281/zenodo.14221520
Explore at:
txt, csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14221520
Dataset updated
Dec 3, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michael Boissonneault; Michael Boissonneault
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Canada
Description
Data and code for the paper "Projected speaker numbers and dormancy risks of Canada’s Indigenous languages".

Contain speaker numbers by age and language (Indigenous mother tongue, unique responses).

There is one file per year (2001, 2006, 2011, 2016, 2021).

Data were provided by Statistics Canada.

These include:

- indigenousmothertongue2001.csv
- indigenousmothertongue2006.csv
- indigenousmothertongue2011.csv
- indigenousmothertongue2016.csv
- indigenousmothertongue2021.csv

Additionally, the file 'coordinates.xlsx' contains the geographic coordinates necessary for Fig. 1. Information comes from Ethnologue with modifications.

Also included is the life table information produced by World Population Prospects (WPP) 2024 available at https://population.un.org/wpp/Download/Standard/Mortality/. These are provided here for convenience as well as to prevent updates by the WPP.

Also contains the whole code to produce the results described in the paper and a README file.
f
COVID speed reach and spread dataset (.csv file)
figshare.com
xlsx
Updated Jan 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandre Augusto de Paula da Silva; Rodrigo Reis; Franciele Iachecen; Fabio Duarte; Cristina Pellegrino Baena; Adriano Akira Hino (2024). COVID speed reach and spread dataset (.csv file) [Dataset]. http://doi.org/10.6084/m9.figshare.24999911.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24999911.v1
Dataset updated
Jan 15, 2024
Dataset provided by
figshare
Authors
Alexandre Augusto de Paula da Silva; Rodrigo Reis; Franciele Iachecen; Fabio Duarte; Cristina Pellegrino Baena; Adriano Akira Hino
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
City level open access data from 26 States and the Federal District and from the Brazilian Institute of Geography and Statistics (IBGE) [20], the Department of Informatics of Brazilian Public Health System – DATASUS, Ministry of Health, the Brazilian Agricultural Research Corporation (Embrapa) and from Brazil.io. Data from all 5,570 cities in Brazil were included in the analysis. COVID-19 data included cases and deaths reported between February 26th, 2020 and February 4th, 2021. The following outcomes were computed: a) days between the first case in Brazil until the first case in the city; b) days between the first case in the city until the day when 1,000 cases were reported; and c) days between the first death in city until the day when 50 deaths inhabitants were reported. Descriptive analyses were performed on the following: proportion of cities reaching 1,000 cases; number of cases at three, six, nine and 12 months after first case; cities reporting at least one COVID-19 related death; number of COVID-19 related deaths at three, six, nine and 12 months after first death in the country. All incidence data is adjusted for 100,000 inhabitants.The following covariates were included: a) geographic region where the city is located (Midwest, North, Northeast, Southeast and South), metropolitan city (no/yes) and urban or rural; b) social and environmental city characteristics [total area (Km2), urban area (Km2), population size (inhabitants), population living within urban area (inhabitants), population older than 60 years (%), indigenous population (%), black population (%), illiterate older than 25 years (%) and city in extreme poverty (no/yes)]; c) housing conditions [household with density >2 per dormitory (%), household with garbage collection (%), household connected to the water supply system (%) and household connected to the sewer system (%)]; d) job characteristics [commerce (%) and informal workers (%)]; e) socioeconomic and inequalities characteristics [GINI index; income per capita; poor or extremely poor (%) and households in informal urban settlements (%)]; f) health services access and coverage [number of National Public Health System (SUS) physicians per inhabitants (100,000 inhabitants), number of SUS nurses per inhabitants (100,000 inhabitants), number of intensive care units or ICU per inhabitants (100,000 inhabitants). All health services access and coverage variables were standardized using z-scores, combined into one single variable categorized into tertiles.
d
Census of Population, 2021 [Canada]: Census Profile [B2020 & CSV]
search.dataone.org
borealisdata.ca
Updated Dec 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2024). Census of Population, 2021 [Canada]: Census Profile [B2020 & CSV] [Dataset]. http://doi.org/10.5683/SP3/V8OZ4R
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/V8OZ4R
Dataset updated
Dec 4, 2024
Dataset provided by
Borealis
Authors
Statistics Canada
Area covered
Canada
Description
This profile presents information from the 2021 Census of Population for various levels of geography, including provinces and territories, census metropolitan areas and census tracts.
f
Data_Sheet_6_The Role of Language in Structuring Social Networks Following...
frontiersin.figshare.com
txt
Updated Jun 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cecilia Padilla-Iglesias; Karen L. Kramer (2023). Data_Sheet_6_The Role of Language in Structuring Social Networks Following Market Integration in a Yucatec Maya Population.CSV [Dataset]. http://doi.org/10.3389/fpsyg.2021.656963.s006
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2021.656963.s006
Dataset updated
Jun 8, 2023
Dataset provided by
Frontiers
Authors
Cecilia Padilla-Iglesias; Karen L. Kramer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Language is the human universal mode of communication, and is dynamic and constantly in flux accommodating user needs as individuals interface with a changing world. However, we know surprisingly little about how language responds to market integration, a pressing force affecting indigenous communities worldwide today. While models of culture change often emphasize the replacement of one language, trait, or phenomenon with another following socioeconomic transitions, we present a more nuanced framework. We use demographic, economic, linguistic, and social network data from a rural Maya community that spans a 27-year period and the transition to market integration. By adopting this multivariate approach for the acquisition and use of languages, we find that while the number of bilingual speakers has significantly increased over time, bilingualism appears stable rather than transitionary. We provide evidence that when indigenous and majority languages provide complementary social and economic payoffs, both can be maintained. Our results predict the circumstances under which indigenous language use may be sustained or at risk. More broadly, the results point to the evolutionary dynamics that shaped the current distribution of the world’s linguistic diversity.
r
Metapopulation genetic management of the Macquarie perch
researchdata.edu.au
Updated Jul 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Sunnucks; Diana Robledo Ruiz; Alexandra Pavlova (2025). Metapopulation genetic management of the Macquarie perch [Dataset]. http://doi.org/10.26180/26983255.V2
Explore at:
Unique identifier
https://doi.org/10.26180/26983255.V2
Dataset updated
Jul 4, 2025
Dataset provided by
Monash University
Authors
Paul Sunnucks; Diana Robledo Ruiz; Alexandra Pavlova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains data and R script script used in the manuscript

A shift to metapopulation genetic management for persistence of a species threatened by fragmentation: the case of an endangered Australian freshwater fish, by Pavlova A, Tonkin Z, Pearce L, Robledo-Ruiz D, Lintermans M, Ingram B, Lyon J, Beitzel M, Broadhurst B, Rourke ML, Sturgiss F, Lake E, Castrejón-Figueroa, Stocks JR, and Sunnucks P. Molecular Ecology MEC-24-1090.R1, accepted pending minor revision.

The content of this data repository:

MaccaGM_SNPs.R –– R script for analyses of DArT SNP genotypic data and plotting the results of analyses of JeDi pipeline

Input files for MaccaGM_SNPs.R:

Report_DMacq23-8576_14_moreOrders_SNP_mapping_1.csv –– DArT genotypes in original format
Covariate_MaccaGM_recaller25inds_then_max50per_pop.csv –– covariate file for individuals
Unbiased genetic diversity: outputs of JeDi pipeline
Estimates of unbiased heterozygosity (piawka_uHe) resulting from JeDi pipeline run for all sites, biallelic sites and tri- and tetra-allelic sites. Analyses were run with 9 different settings (A-H). Estimates of individual unbiased heterozygosity from JeDi pipeline are appended to the individual covariate file with results of analyses of SNP dataset (SNP heterozygosity, PHt, and membership in STRUCTURE clusters (X1of15 to X15of15).
ind.gen.div_nonaPHt_noNAuHe.with.PHt.residuals.noB_qual30_minDP10_SETA.csv –– Output of SetA: Piawka run with Settings: no (-B), QUAL30, MinDP10, No doubleton filter
ind.gen.div_nonaPHt_noNAuHe.with.PHt.residuals.B_qual30_minDP10_SETB.csv –– Output of SetB: Piawka run with Settings: (-B), QUAL30, MinDP10, No doubleton filter
ind.gen.div_nonaPHt_noNAuHe.with.PHt.residuals.B_qual40_minDP10_SETC.csv –– Output of SetC: Piawka run with Settings: (-B), QUAL40, MinDP10, No doubleton filter
ind.gen.div_nonaPHt_noNAuHe.with.PHt.residuals.B_qual30_minDP12_setD.csv –– Output of SetD: Piawka run with Settings: (-B), QUAL30, MinDP12, No doubleton filter
ind.gen.div_nonaPHt_noNAuHe.with.PHt.residuals.B_qual30_minDP10_doubleton_SETE.csv–– Output of SetE: Piawka run with Settings: (-B), QUAL30, MinDP10, doubleton filter
ind.gen.div_nonaPHt_noNAuHe.with.PHt.residuals.B_qual50_minDP15_SETF.csv–– Output of SetF: Piawka run with Settings : (-B), QUAL50, MinDP15, No doubleton filter
ind.gen.div_nonaPHt_noNAuHe.with.PHt.residuals.noB_qual30_minDP10_doub_SETG.csv–– Output of SetG: Piawka run with Settings: No (-B), QUAL30, MinDP10, doubleton filter
ind.gen.div_nonaPHt_noNAuHe.with.PHt.residuals.noB_qual30_minDP10_doub_sing_SETH.csv–– Output of SetH: Piawka run with Settings: No (-B), QUAL30, MinDP10, doubleton and singleton filter
ind.gen.div_nonaPHt_noNAuHe.with.PHt.residuals.noB_qual30_minDP15_doub_sing_SETI.csv–– Output of SetI: Piawka run with Settings: No (-B), QUAL30, MinDP15, doubleton and singleton filter
Estimates of population nucleotide diversity (pi): outputs of JeDi pipeline run under three settings:
genomic_pi_table_SETA.tsv—pi from JeDi pipeline run under Settings A: no (-B), QUAL30, MinDP10, No doubleton filter
genomic_pi_table SETG.tsv—pi from JeDi pipeline run under Settings G: No (-B), QUAL30, MinDP10, doubleton filter
genomic_pi_table - SETH.tsv—pi from JeDi pipeline run under Settings H: No (-B), QUAL30, MinDP10, doubleton and singleton filter
Population summaries of the genetic estimates from SNP dataset (Ho- observed heterozygosity, He-expected heterozygosity, private.alleles.dartr.one2rest and PA- number of private alleles per population including and excluding three with sample size <13, Mean_Allelic_Richness- allelic richness, Ne- effective population size estimated by LDNe) and sequencing dataset (pi_SetI and pi_SetH- nucleotide diversity estimated with settings I and H, mean_uHe_SetI and mean_uHe_SetH- mean unbiased heterozygosity estimated with setting I and H, respectively):
Pop.genetic.div.estimates.csv––collated population outputs of the script MaccaGM_SNPs.R and JeDi pipeline

Metapop2 files:
Metapop_res.csv –– results of the Metapop2 analyses to create a new population (Fig. 4 of the manuscript)
Metapop2_24sims.zip––input and output files for 24 realistic simulations (Table 3 of the main manuscript).

SRA_PRJNA1242510_accession_numbers.txt––GenBank SRA accession numbers for DArT sequences

Appendices for the manuscript:
Pavlova_etal_AppendixA_Genetic_augmentations.xlsx–– Genetic augmentation of Macquarie perch via translocations and stocking conducted from 2010 onwards.
Pavlova_et_al_AppendixB_JeDi_pipeline_revised.png–– Schematic representation of revised JeDi pipeline for estimating unbiased individual heterozygosity and population nucleotide diversity from reduced-representation sequencing data, in presence of reference genome.
Pavlova_et_al_MEC_Supplemental_Information_revised.docx––Supplementary Information for the revised version of the main manuscript: S1. Additional tables and figures, S2. Details and optimization of the JeDi pipeline.

We acknowledge the First Nations throughout Australia, recognise their continuing connection to land, waters and culture, and pay our respects to their Elders past, present and emerging. This research was conducted on Ngarigo, Ngambri and Ngunnawal, Taungurung, Wirajuri and Wurundjeri Woi-wurrung Countries.
d
Food retail in remote Australia
search.dataone.org
dataone.org
+3more
Updated Jul 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luke Greenacre; Emma van Burgel; Amanda Hill; Emma McMahon; Megan Ferguson; Cristina Rodrigues; Julie Brimblecombe (2025). Food retail in remote Australia [Dataset]. http://doi.org/10.5061/dryad.vmcvdnczc
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.vmcvdnczc
Dataset updated
Jul 21, 2025
Dataset provided by
Dryad Digital Repository
Authors
Luke Greenacre; Emma van Burgel; Amanda Hill; Emma McMahon; Megan Ferguson; Cristina Rodrigues; Julie Brimblecombe
Time period covered
Jan 1, 2023
Area covered
Australia
Description
This dataset documents food retail stores that service remote Indigenous communities in Australia. A seed list created by the National Indigenous Australians Agency (NIAA) was extended and validated, including reviews by experts and stakeholders, during 2022. Store location, contact information, management, and ownership/legal registration were identified along with the size of the community the store serves. A final dataset of 233 remote or very remote stores was created., A seed list from the National Indigenous Australians Agency (NIAA) was extended and validated. The dataset was extended using desk research with additional reviews by experts and stakeholders, during 2022., Data is stored in a single comma-delimited CSV. A data dictionary is provided in a separate CSV.
v
Census local area profiles 2001
opendata.vancouver.ca
Updated Mar 25, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2013). Census local area profiles 2001 [Dataset]. https://opendata.vancouver.ca/explore/dataset/census-local-area-profiles-2001/
Explore at:
Dataset updated
Mar 25, 2013
License
https://opendata.vancouver.ca/pages/licence/https://opendata.vancouver.ca/pages/licence/
Description
The census is Canada's largest and most comprehensive data source conducted by Statistics Canada every five years. The Census of Population collects demographics and linguistic information on every man, woman and child living in Canada. The data shown here is provided by Statistics Canada from the 2001 Census as a custom profile data order for the City of Vancouver, using the City's 22 local planning areas. The data may be reproduced provided they are credited to Statistics Canada, Census 2001, custom order for City of Vancouver Local Areas.Data AccessThis dataset has not yet been converted to a format compatible with our new platform. Please use the links below to access the files from our legacy site. Census local area profiles 2001 (CSV) Census local area profiles 2001 (XLS) Dataset schema (Attributes) Please see the Census local area profiles 2001 attributes page. NoteThe 22 Local Areas is defined by the Census blocks and is equal to the City's 22 local planning areas and includes the Musqueam 2 reserve.Vancouver CSD (Census Subdivision) is defined by the City of Vancouver municipal boundary which excludes the Musqueam 2 reserve but includes Stanley Park.Vancouver CMA (Census Metropolitan Area) is defined by the Metro Vancouver boundary which includes the following Census Subdivisions: Vancouver, Surrey, Burnaby, Richmond, Coquitlam, District of Langley, Delta, District of North Vancouver, Maple Ridge, New Westminster, Port Coquitlam, City of North Vancouver, West Vancouver, Port Moody, City of Langley, White Rock, Pitt Meadows, Greater Vancouver A, Bowen Island, Capilano 5, Anmore, Musqueam 2, Burrard Inlet 3, Lions Bay, Tsawwassen, Belcarra, Mission 1, Matsqui 4, Katzie 1, Semiahmoo, Seymour Creek 2, McMillian Island 6, Coquitlam 1, Musqueam 4, Coquitlam 2, Katzie 2, Whonnock 1, Barnston Island 3, and Langley 5. Data products that are identified as 20% sample data refer to information that was collected using the long census questionnaire. For the most part, these data were collected from 20% of the households; however they also include some areas, such as First Nations communities and remote areas, where long census form data were collected from 100% of the households. The following changes were made to the census family concept for 2001 and account for some of the increase in the total number of families, single parent families and children living at home: Two persons living in a same-sex common law relationship are now considered a family. Children living at home now include previously married children, provided they are not currently living with a spouse or common-law partner. A grandchild living in a three generation household where the parent (middle generation) was never married is now considered a child of the census family. A grandchild of a three-generation household where the middle generation is not present is now considered a child of the census family.Mode of transportation to work data is not reliable for the 2001 Census due to the TransLink Transit Strike that occurred during the data collection period. Data currencyThe data for Census 2001 was collected in May 2001. Data accuracyStatistics Canada is committed to protect the privacy of all Canadians and the confidentiality of the data they provide to us. As part of this commitment, some population counts of geographic areas are adjusted in order to ensure confidentiality. Counts of the total population are rounded to a base of 5 for any dissemination block having a population less than 15. Population counts for all standard geographic areas above the dissemination block level are derived by summing the adjusted dissemination block counts. The adjustment of dissemination block counts is controlled to ensure that the population counts for dissemination areas will always be within 5 of the actual values. The adjustment has no impact on the population counts of census divisions and large census subdivisions. Websites for further information Statistics Canada 2001 Census Dictionary Local area boundary dataset
n
A dataset of 5 million city trees from 63 US cities: species, location,...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Aug 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dakota McCoy; Benjamin Goulet-Scott; Weilin Meng; Bulent Atahan; Hana Kiros; Misako Nishino; John Kartesz (2022). A dataset of 5 million city trees from 63 US cities: species, location, nativity status, health, and more. [Dataset]. http://doi.org/10.5061/dryad.2jm63xsrf
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.2jm63xsrf
Dataset updated
Aug 31, 2022
Dataset provided by
The Biota of North America Program (BONAP)
Cornell University
Harvard University
Stanford University
Worcester Polytechnic Institute
Authors
Dakota McCoy; Benjamin Goulet-Scott; Weilin Meng; Bulent Atahan; Hana Kiros; Misako Nishino; John Kartesz
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
United States
Description
Sustainable cities depend on urban forests. City trees -- a pillar of urban forests -- improve our health, clean the air, store CO2, and cool local temperatures. Comparatively less is known about urban forests as ecosystems, particularly their spatial composition, nativity statuses, biodiversity, and tree health. Here, we assembled and standardized a new dataset of N=5,660,237 trees from 63 of the largest US cities. The data comes from tree inventories conducted at the level of cities and/or neighborhoods. Each data sheet includes detailed information on tree location, species, nativity status (whether a tree species is naturally occurring or introduced), health, size, whether it is in a park or urban area, and more (comprising 28 standardized columns per datasheet). This dataset could be analyzed in combination with citizen-science datasets on bird, insect, or plant biodiversity; social and demographic data; or data on the physical environment. Urban forests offer a rare opportunity to intentionally design biodiverse, heterogenous, rich ecosystems. Methods See eLife manuscript for full details. Below, we provide a summary of how the dataset was collected and processed.

Data Acquisition We limited our search to the 150 largest cities in the USA (by census population). To acquire raw data on street tree communities, we used a search protocol on both Google and Google Datasets Search (https://datasetsearch.research.google.com/). We first searched the city name plus each of the following: street trees, city trees, tree inventory, urban forest, and urban canopy (all combinations totaled 20 searches per city, 10 each in Google and Google Datasets Search). We then read the first page of google results and the top 20 results from Google Datasets Search. If the same named city in the wrong state appeared in the results, we redid the 20 searches adding the state name. If no data were found, we contacted a relevant state official via email or phone with an inquiry about their street tree inventory. Datasheets were received and transformed to .csv format (if they were not already in that format). We received data on street trees from 64 cities. One city, El Paso, had data only in summary format and was therefore excluded from analyses.

Data Cleaning All code used is in the zipped folder Data S5 in the eLife publication. Before cleaning the data, we ensured that all reported trees for each city were located within the greater metropolitan area of the city (for certain inventories, many suburbs were reported - some within the greater metropolitan area, others not). First, we renamed all columns in the received .csv sheets, referring to the metadata and according to our standardized definitions (Table S4). To harmonize tree health and condition data across different cities, we inspected metadata from the tree inventories and converted all numeric scores to a descriptive scale including “excellent,” “good”, “fair”, “poor”, “dead”, and “dead/dying”. Some cities included only three points on this scale (e.g., “good”, “poor”, “dead/dying”) while others included five (e.g., “excellent,” “good”, “fair”, “poor”, “dead”). Second, we used pandas in Python (W. McKinney & Others, 2011) to correct typos, non-ASCII characters, variable spellings, date format, units used (we converted all units to metric), address issues, and common name format. In some cases, units were not specified for tree diameter at breast height (DBH) and tree height; we determined the units based on typical sizes for trees of a particular species. Wherever diameter was reported, we assumed it was DBH. We standardized health and condition data across cities, preserving the highest granularity available for each city. For our analysis, we converted this variable to a binary (see section Condition and Health). We created a column called “location_type” to label whether a given tree was growing in the built environment or in green space. All of the changes we made, and decision points, are preserved in Data S9. Third, we checked the scientific names reported using gnr_resolve in the R library taxize (Chamberlain & Szöcs, 2013), with the option Best_match_only set to TRUE (Data S9). Through an iterative process, we manually checked the results and corrected typos in the scientific names until all names were either a perfect match (n=1771 species) or partial match with threshold greater than 0.75 (n=453 species). BGS manually reviewed all partial matches to ensure that they were the correct species name, and then we programmatically corrected these partial matches (for example, Magnolia grandifolia-- which is not a species name of a known tree-- was corrected to Magnolia grandiflora, and Pheonix canariensus was corrected to its proper spelling of Phoenix canariensis). Because many of these tree inventories were crowd-sourced or generated in part through citizen science, such typos and misspellings are to be expected. Some tree inventories reported species by common names only. Therefore, our fourth step in data cleaning was to convert common names to scientific names. We generated a lookup table by summarizing all pairings of common and scientific names in the inventories for which both were reported. We manually reviewed the common to scientific name pairings, confirming that all were correct. Then we programmatically assigned scientific names to all common names (Data S9). Fifth, we assigned native status to each tree through reference to the Biota of North America Project (Kartesz, 2018), which has collected data on all native and non-native species occurrences throughout the US states. Specifically, we determined whether each tree species in a given city was native to that state, not native to that state, or that we did not have enough information to determine nativity (for cases where only the genus was known). Sixth, some cities reported only the street address but not latitude and longitude. For these cities, we used the OpenCageGeocoder (https://opencagedata.com/) to convert addresses to latitude and longitude coordinates (Data S9). OpenCageGeocoder leverages open data and is used by many academic institutions (see https://opencagedata.com/solutions/academia). Seventh, we trimmed each city dataset to include only the standardized columns we identified in Table S4. After each stage of data cleaning, we performed manual spot checking to identify any issues.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statistics Canada (2024). Census of Population, 2021 [Canada]: Data Tables, Indigenous Peoples [B2020 & CSV] [Dataset]. http://doi.org/10.5683/SP3/XZXQRW

Census of Population, 2021 [Canada]: Data Tables, Indigenous Peoples [B2020 & CSV]

Explore at:

Unique identifier

https://doi.org/10.5683/SP3/XZXQRW

Dataset updated

Dec 4, 2024

Dataset provided by

Borealis

Authors

Statistics Canada

Area covered

Canada

Description

The Data Tables are a series of cross-tabulations presents a portrait of Canada based on the various census topics. They range in complexity and are available for various levels of geography. The topic of this dataset is Indigenous Peoples.

Clear search

Close search

Google apps

Main menu

Census of Population, 2021 [Canada]: Data Tables, Indigenous Peoples [B2020...

Population statistics on marital status by age group for indigenous people...

Population statistics of indigenous people aged 15 and above by age group,...

Statistical region indigenous population statistics by gender in the...

Census of Population, 2021 [Canada]: Data Quality Tables [B2020 & CSV]

Data from: Projected speaker numbers and dormancy risks of Canada's...

COVID speed reach and spread dataset (.csv file)

Census of Population, 2021 [Canada]: Census Profile [B2020 & CSV]

Data_Sheet_6_The Role of Language in Structuring Social Networks Following...

Metapopulation genetic management of the Macquarie perch

Food retail in remote Australia

Census local area profiles 2001

A dataset of 5 million city trees from 63 US cities: species, location,...

Census of Population, 2021 [Canada]: Data Tables, Indigenous Peoples [B2020 & CSV]See More Versions

Census of Population, 2021 [Canada]: Data Tables, Indigenous Peoples [B2020 & CSV]