83 datasets found

N
Median Household Income Variation by Family Size in Combine, TX: Comparative...
neilsberg.com
csv, json
Updated Jan 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Median Household Income Variation by Family Size in Combine, TX: Comparative analysis across 7 household sizes [Dataset]. https://www.neilsberg.com/research/datasets/1accbea3-73fd-11ee-949f-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Jan 11, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Combine, Texas
Variables measured
Household size, Median Household Income
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across 7 household sizes (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out how household income varies with the size of the family unit. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents median household incomes for various household sizes in Combine, TX, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.

Key observations

Of the 7 household sizes (1 person to 7-or-more person households) reported by the census bureau, Combine did not include 7-person households. Across the different household sizes in Combine the mean income is $101,729, and the standard deviation is $57,970. The coefficient of variation (CV) is 56.98%. This high CV indicates high relative variability, suggesting that the incomes vary significantly across different sizes of households.

In the most recent year, 2021, The smallest household size for which the bureau reported a median household income was 1-person households, with an income of $23,263. It then further increased to $126,782 for 6-person households, the largest household size for which the bureau reported a median household income.

https://i.neilsberg.com/ch/combine-tx-median-household-income-by-household-size.jpeg" alt="Combine, TX median household income, by household size (in 2022 inflation-adjusted dollars)">

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Household Sizes:

1-person households

2-person households

3-person households

4-person households

5-person households

6-person households

7-or-more-person households

Variables / Data Columns

Household Size: This column showcases 7 household sizes ranging from 1-person households to 7-or-more-person households (As mentioned above).

Median Household Income: Median household income, in 2022 inflation-adjusted dollars for the specific household size.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Combine median household income. You can refer the same here
Data from: Combined Analysis of Multiple Glycan-Array Datasets: New...
acs.figshare.com
xlsx
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zachary Klamer; Brian Haab (2023). Combined Analysis of Multiple Glycan-Array Datasets: New Explorations of Protein–Glycan Interactions [Dataset]. http://doi.org/10.1021/acs.analchem.1c01739.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.analchem.1c01739.s002
Dataset updated
Jun 5, 2023
Dataset provided by
ACS Publications
Authors
Zachary Klamer; Brian Haab
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Glycan arrays are indispensable for learning about the specificities of glycan-binding proteins. Despite the abundance of available data, the current analysis methods do not have the ability to interpret and use the variety of data types and to integrate information across datasets. Here, we evaluated whether a novel, automated algorithm for glycan-array analysis could meet that need. We developed a regression-tree algorithm with simultaneous motif optimization and packaged it in software called MotifFinder. We applied the software to analyze data from eight different glycan-array platforms with widely divergent characteristics and observed an accurate analysis of each dataset. We then evaluated the feasibility and value of the combined analyses of multiple datasets. In an integrated analysis of datasets covering multiple lectin concentrations, the software determined approximate binding constants for distinct motifs and identified major differences between the motifs that were not apparent from single-concentration analyses. Furthermore, an integrated analysis of data sources with complementary sets of glycans produced broader views of lectin specificity than produced by the analysis of just one data source. MotifFinder, therefore, enables the optimal use of the expanding resource of the glycan-array data and promises to advance the studies of protein–glycan interactions.
N
Combine, TX Age Group Population Dataset: A Complete Breakdown of Combine...
neilsberg.com
csv, json
Updated Feb 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Combine, TX Age Group Population Dataset: A Complete Breakdown of Combine Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/combine-tx-population-by-age/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 22, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Combine, Texas
Variables measured
Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Combine population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Combine. The dataset can be utilized to understand the population distribution of Combine by age. For example, using this dataset, we can identify the largest age group in Combine.

Key observations

The largest age group in Combine, TX was for the group of age 5 to 9 years years with a population of 311 (11.19%), according to the ACS 2019-2023 5-Year Estimates. At the same time, the smallest age group in Combine, TX was the 85 years and over years with a population of 6 (0.22%). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Variables / Data Columns

Age Group: This column displays the age group in consideration

Population: The population for the specific age group in the Combine is shown in this column.

% of Total Population: This column displays the population of each age group as a proportion of Combine total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Combine Population by Age. You can refer the same here
u
American Community Survey
gstore.unm.edu
csv, geojson, gml +5
Updated Mar 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Earth Data Analysis Center (2020). American Community Survey [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/adecfea6-fcd7-4c41-8165-165c4490a9da/metadata/FGDC-STD-001-1998.html
Explore at:
kml(5), csv(5), xls(5), json(5), geojson(5), zip(5), gml(5), shp(5)Available download formats
Dataset updated
Mar 6, 2020
Dataset provided by
Earth Data Analysis Center
Time period covered
2018
Area covered
New Mexico, West Bounding Coordinate -109.050173 East Bounding Coordinate -103.001964 North Bounding Coordinate 37.000293 South Bounding Coordinate 31.332172
Description
A broad and generalized selection of 2014-2018 US Census Bureau 2018 5-year American Community Survey population data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico Census tracts). The selection is not comprehensive, but allows a first-level characterization of total population, male and female, and both broad and narrowly-defined age groups. In addition to the standard selection of age-group breakdowns (by male or female), the dataset provides supplemental calculated fields which combine several attributes into one (for example, the total population of persons under 18, or the number of females over 65 years of age). The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users.The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. The ACS combines population or housing data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. While the ACS contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by Census tract boundaries in New Mexico. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2010 Census Participant Statistical Areas Program. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy. In a few rare instances, a census tract may consist of noncontiguous areas. These noncontiguous areas may occur where the census tracts are coextensive with all or parts of legal entities that are themselves noncontiguous. For the 2010 Census, the census tract code range of 9400 through 9499 was enforced for census tracts that include a majority American Indian population according to Census 2000 data and/or their area was primarily covered by federally recognized American Indian reservations and/or off-reservation trust lands; the code range 9800 through 9899 was enforced for those census tracts that contained little or no population and represented a relatively large special land use area such as a National Park, military installation, or a business/industrial park; and the code range 9900 through 9998 was enforced for those census tracts that contained only water area, no land area.
ERA5 post-processed daily statistics on single levels from 1940 to present
cds.climate.copernicus.eu
grib
Updated Mar 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ECMWF (2025). ERA5 post-processed daily statistics on single levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.4991cf48
Explore at:
gribAvailable download formats
Unique identifier
https://doi.org/10.24381/cds.4991cf48
Dataset updated
Mar 26, 2025
Dataset provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
Authors
ECMWF
License
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdf
Time period covered
Jan 1, 1940 - Mar 20, 2025
Description
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. This catalogue entry provides post-processed ERA5 hourly single-level data aggregated to daily time steps. In addition to the data selection options found on the hourly page, the following options can be selected for the daily statistic calculation:

The daily aggregation statistic (daily mean, daily max, daily min, daily sum*) The sub-daily frequency sampling of the original data (1 hour, 3 hours, 6 hours) The option to shift to any local time zone in UTC (no shift means the statistic is computed from UTC+00:00)

*The daily sum is only available for the accumulated variables (see ERA5 documentation for more details). Users should be aware that the daily aggregation is calculated during the retrieval process and is not part of a permanently archived dataset. For more details on how the daily statistics are calculated, including demonstrative code, please see the documentation. For more details on the hourly data used to calculate the daily statistics, please refer to the ERA5 hourly single-level data catalogue entry and the documentation found therein.
d
Addresses (Open Data)
catalog.data.gov
open.tempe.gov
+10more
Updated Mar 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2025). Addresses (Open Data) [Dataset]. https://catalog.data.gov/dataset/addresses-open-data
Explore at:
Dataset updated
Mar 8, 2025
Dataset provided by
City of Tempe
Description
This dataset is a compilation of address point data for the City of Tempe. The dataset contains a point location, the official address (as defined by The Building Safety Division of Community Development) for all occupiable units and any other official addresses in the City. There are several additional attributes that may be populated for an address, but they may not be populated for every address. Contact: Lynn Flaaen-Hanna, Development Services Specialist Contact E-mail Link: Map that Lets You Explore and Export Address Data Data Source: The initial dataset was created by combining several datasets and then reviewing the information to remove duplicates and identify errors. This published dataset is the system of record for Tempe addresses going forward, with the address information being created and maintained by The Building Safety Division of Community Development. Data Source Type: ESRI ArcGIS Enterprise Geodatabase Preparation Method: N/A Publish Frequency: Weekly Publish Method: Automatic Data Dictionary
u
American Community Survey
gstore.unm.edu
csv, geojson, gml +5
Updated Mar 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Earth Data Analysis Center (2020). American Community Survey [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/cd10009e-a79f-4de5-a12c-87bb5b499e9f/metadata/FGDC-STD-001-1998.html
Explore at:
json(5), gml(5), xls(5), geojson(5), kml(5), zip(1), csv(5), shp(5)Available download formats
Dataset updated
Mar 6, 2020
Dataset provided by
Earth Data Analysis Center
Time period covered
2017
Area covered
West Bounding Coordinate -109.05017 East Bounding Coordinate -103.00196 North Bounding Coordinate 37.000293 South Bounding Coordinate 31.33217, New Mexico
Description
A broad and generalized selection of 2013-2017 US Census Bureau 2017 5-year American Community Survey population data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection is not comprehensive, but allows a first-level characterization of total population, male and female, and both broad and narrowly-defined age groups. In addition to the standard selection of age-group breakdowns (by male or female), the dataset provides supplemental calculated fields which combine several attributes into one (for example, the total population of persons under 18, or the number of females over 65 years of age). The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users.The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or housing data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries.
Z
Flow map data of the singel pendulum, double pendulum and 3-body problem
data.niaid.nih.gov
Updated Apr 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Koren, Barry (2024). Flow map data of the singel pendulum, double pendulum and 3-body problem [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11032351
Explore at:
Dataset updated
Apr 23, 2024
Dataset provided by
Simon, Portegies Zwart
Horn, Philipp
Koren, Barry
Veronica, Saz Ulibarrena
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was constructed to compare the performance of various neural network architectures learning the flow maps of Hamiltonian systems. It was created for the paper: A Generalized Framework of Neural Networks for Hamiltonian Systems.

The dataset consists of trajectory data from three different Hamiltonian systems. Namely, the single pendulum, double pendulum and 3-body problem. The data was generated using numerical integrators. For the single pendulum, the symplectic Euler method with a step size of 0.01 was used. The data of the double pendulum was also computed by the symplectic Euler method, however, with an adaptive step size. The trajectories of the 3-body problem were calculated by the arbitrarily high-precision code Brutus.

For each Hamiltonian system, there is one file containing the entire trajectory information (*_all_runs.h5.1). In these files, the states along all trajectories are recorded with a step size of 0.01. These files are composed of several Pandas DataFrames. One DataFrame per trajectory, called "run0", "run1", ... and finally one large DataFrame in which all the trajectories are combined, called "all_runs". Additionally, one Pandas Series called "constants" is contained in these files, in which several parameters of the data are listed.

Also, there is a second file per Hamiltonian system in which the data is prepared as features and labels ready for neural networks to be trained (*_training.h5.1). Similar to the first type of files, they contain a Series called "constants". The features and labels are then separated into 6 DataFrames called "features", "labels", "val_features", "val_labels", "test_features" and "test_labels". The data is split into 80% training data, 10% validation data and 10% test data.

The code used to train various neural network architectures on this data can be found on GitHub at: https://github.com/AELITTEN/GHNN.

Already trained neural networks can be found on GitHub at: https://github.com/AELITTEN/NeuralNets_GHNN.

Single pendulum Double pendulum 3-body problem

Number of trajectories 500 2000 5000

final time in all_runs T (one period of the pendulum) 10 10

final time in training data 0.25*T 5 5

step size in training data 0.1 0.1 0.5
BirdVox-scaper-10k: a synthetic dataset for multilabel species...
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elizabeth Mendoza; Vincent Lostanlen; Vincent Lostanlen; Justin Salamon; Justin Salamon; Andrew Farnsworth; Andrew Farnsworth; Steve Kelling; Juan Pablo Bello; Juan Pablo Bello; Elizabeth Mendoza; Steve Kelling (2020). BirdVox-scaper-10k: a synthetic dataset for multilabel species classification of flight calls from 10-second audio recordings [Dataset]. http://doi.org/10.5281/zenodo.2560773
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2560773
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Elizabeth Mendoza; Vincent Lostanlen; Vincent Lostanlen; Justin Salamon; Justin Salamon; Andrew Farnsworth; Andrew Farnsworth; Steve Kelling; Juan Pablo Bello; Juan Pablo Bello; Elizabeth Mendoza; Steve Kelling
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BirdVox-scaper-10k: a synthetic dataset for multilabel species classification of flight calls from 10-second audio recordings
=============================================================================================
Version 1.0, September 2019.

Created By
-------------

Elizabeth Mendoza (1), Vincent Lostanlen (2, 3, 4), Justin Salamon (3, 4), Andrew Farnsworth (2), Steve Kelling (2), and Juan Pablo Bello (3, 4).

(1): Forest Hills High School, New York, NY, USA
(2): Cornell Lab of Ornithology, Cornell University, Ithaca, NY, USA
(3): Center for Urban Science and Progress, New York University, New York, NY, USA
(4): Music and Audio Research Lab, New York University, New York, NY, USA

https://wp.nyu.edu/birdvox

Description
--------------

The BirdVox-scaper-10k dataset contains 9983 artificial soundscapes. Each soundscape lasts exactly ten seconds and contains one or several avian flight calls from up to 30 different species of New World warblers (Parulidae). Alongside each audio file, we include an annotation file describing the start time and end time of each flight call in the corresponding soundscape, as well as the species of warbler it belongs to.

In order to synthesize soundscapes in BirdVox-scaper-10k, we mixed natural sounds from various pre-recorded sources. First, we extracted isolated recordings of flight calls containing little or no background noise from the CLO-43SD dataset [1]. Secondly, we extracted 10-second "empty" acoustic scenes from the BirdVox-DCASE-20k dataset [2]. These acoustic scenes contain various sources of real-world background noise, including biophony (insects) and anthropophony (vehicles), yet are guaranteed to be devoid of any flight calls. Lastly, we "fill" each acoustic scene by mixing it with flight calls sampled at random.

Although the BirdVox-scaper-10k does not consist of natural recordings, we have taken several measures to ensure the plausibility of each synthesized soundscape, both from qualitative and quantitative standpoints.

The BirdVox-scaper-10k dataset can be used, among other things, for the research, development, and testing of bioacoustic classification models.

For details on the hardware of ROBIN recording units, we refer the reader to [2].

[1] J. Salamon, J. Bello. Fusing shallow and deep learning for bioacoustic bird species classification. Proc. IEEE ICASSP, 2017.

[2] V. Lostanlen, J. Salamon, A. Farnsworth, S. Kelling, and J. Bello. BirdVox-full-night: a dataset and benchmark for avian flight call detection. Proc. IEEE ICASSP, 2018.

[3] J. Salamon, J. P. Bello, A. Farnsworth, M. Robbins, S. Keen, H. Klinck, and S. Kelling. Towards the Automatic Classification of Avian Flight Calls for Bioacoustic Monitoring. PLoS One, 2016.

@inproceedings{lostanlen2018icassp,
title = {BirdVox-full-night: a dataset and benchmark for avian flight call detection},
author = {Lostanlen, Vincent and Salamon, Justin and Farnsworth, Andrew and Kelling, Steve and Bello, Juan Pablo},
booktitle = {Proc. IEEE ICASSP},
year = {2018},
published = {IEEE},
venue = {Calgary, Canada},
month = {April},
}
d
Coastal Habitat Modification - Hawaii
catalog.data.gov
data.ioos.us
Updated Jan 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Center for Ecological Analysis and Synthesis (NCEAS) (Point of Contact) (2025). Coastal Habitat Modification - Hawaii [Dataset]. https://catalog.data.gov/dataset/coastal-habitat-modification-hawaii
Explore at:
Dataset updated
Jan 27, 2025
Dataset provided by
National Center for Ecological Analysis and Synthesis (NCEAS) (Point of Contact)
Area covered
Hawaii
Description
Coastal habitats are utilized and altered for a suite of human uses. Habitat modification is here defined as the alteration or removal of geomorphic structure as a result of human use. This includes several habitat-modifying features like seawalls, piers, breakwaters, dredged areas, artificial land (i.e., filled wetlands), and offshore structures. This data layer represents the presence of habitat modification in shallow waters of the Main Hawaiian Islands. The Ocean Tipping Points (OTP) project mapped the presence of habitat-modifying features by combining several existing datasets derived primarily from satellite and aerial imagery, including the following datasets: benthic habitat maps (NOAA Center for Coastal Monitoring and Assessment (CCMA), 2007); NOAA Environmental Sensitivity Index (ESI) line data (NOAA Office of Response and Restoration (OR&R), 2001); maintained channels (NOAA, US Army Corps of Engineers (USACE), MarineCadastre.gov); and locations of offshore aquaculture. The layer represents the presence or absence of habitat modification, with a cell size of 500 m. Relevant man-made features were extracted from each individual dataset and saved (features classified as artificial and dredged areas in NOAA benthic habitat maps; coastal segments designated as man-made structures and riprap in NOAA ESI line data; all features from the maintained channels and aquaculture datasets). The resulting polygon datasets were merged together. A field was added to all vector layers with a value of 1 for each feature to represent the presence of habitat modification. Vector data were then converted to 500-m rasters and combined into a mosaic.
N
Combined Locks, WI Population Breakdown By Race (Excluding Ethnicity)...
neilsberg.com
csv, json
Updated Feb 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Combined Locks, WI Population Breakdown By Race (Excluding Ethnicity) Dataset: Population Counts and Percentages for 7 Racial Categories as Identified by the US Census Bureau // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/combined-locks-wi-population-by-race/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 21, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Wisconsin, Combined Locks
Variables measured
Asian Population, Black Population, White Population, Some other race Population, Two or more races Population, American Indian and Alaska Native Population, Asian Population as Percent of Total Population, Black Population as Percent of Total Population, White Population as Percent of Total Population, Native Hawaiian and Other Pacific Islander Population, and 4 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and do not rely on any ethnicity classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Combined Locks by race. It includes the population of Combined Locks across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Combined Locks across relevant racial categories.

Key observations

The percent distribution of Combined Locks population by race (across all racial categories recognized by the U.S. Census Bureau): 87.67% are white, 4.51% are Black or African American, 3.85% are some other race and 3.96% are multiracial.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Racial categories include:

White

Black or African American

American Indian and Alaska Native

Asian

Native Hawaiian and Other Pacific Islander

Some other race

Two or more races (multiracial)

Variables / Data Columns

Race: This column displays the racial categories (excluding ethnicity) for the Combined Locks

Population: The population of the racial category (excluding ethnicity) in the Combined Locks is shown in this column.

% of Total Population: This column displays the percentage distribution of each race as a proportion of Combined Locks total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Combined Locks Population by Race & Ethnicity. You can refer the same here
BLM Natl WesternUS GRSG Sagebrush Focal Areas
catalog.data.gov
gimi9.com
+1more
Updated Nov 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of Land Management (2024). BLM Natl WesternUS GRSG Sagebrush Focal Areas [Dataset]. https://catalog.data.gov/dataset/blm-natl-westernus-grsg-sagebrush-focal-areas
Explore at:
Dataset updated
Nov 20, 2024
Dataset provided by
Bureau of Land Managementhttp://www.blm.gov/
Description
This dataset is a modified version of the FWS developed data depicting “Highly Important Landscapes”, as outlined in Memorandum FWS/AES/058711 and provided to the Wildlife Habitat Spatial analysis Lab on October 29th 2014. Other names and acronyms used to refer to this dataset have included: Areas of Significance (AoSs - name of GIS data set provided by FWS), Strongholds (FWS), and Sagebrush Focal Areas (SFAs - BLM). The BLM will refer to these data as Sagebrush Focal Areas (SFAs). Data were provided as a series of ArcGIS map packages which, when extracted, contained several datasets each. Based on the recommendation of the FWS Geographer/Ecologist (email communication, see data originator for contact information) the dataset called “Outiline_AreasofSignificance” was utilized as the source for subsequent analysis and refinement. Metadata was not provided by the FWS for this dataset. For detailed information regarding the dataset’s creation refer to Memorandum FWS/AES/058711 or contact the FWS directly. Several operations and modifications were made to this source data, as outlined in the “Description” and “Process Step” sections of this metadata file. Generally: The source data was named by the Wildlife Habitat Spatial Analysis Lab to identify polygons as described (but not identified in the GIS) in the FWS memorandum. The Nevada/California EIS modified portions within their decision space in concert with local FWS personnel and provided the modified data back to the Wildlife Habitat Spatial Analysis Lab. Gaps around Nevada State borders, introduced by the NVCA edits, were then closed as was a large gap between the southern Idaho & southeast Oregon present in the original dataset. Features with an area below 40 acres were then identified and, based on FWS guidance, either removed or retained. Finally, guidance from BLM WO resulted in the removal of additional areas, primarily non-habitat with BLM surface or subsurface management authority. Data were then provided to each EIS for use in FEIS development. Based on guidance from WO, SFAs were to be limited to BLM decision space (surface/sub-surface management areas) within PHMA. Each EIS was asked to provide the limited SFA dataset back to the National Operations Center to ensure consistent representation and analysis. Returned SFA data, modified by each individual EIS, was then consolidated at the BLM’s National Operations Center retaining the three standardized fields contained in this dataset.Several Modifications from the original FWS dataset have been made. Below is a summary of each modification.1. The data as received from FWS: 16,514,163 acres & 1 record.2. Edited to name SFAs by Wildlife Habitat Spatial Analysis Lab:Upon receipt of the “Outiline_AreasofSignificance” dataset from the FWS, a copy was made and the one existing & unnamed record was exploded in an edit session within ArcMap. A text field, “AoS_Name”, was added. Using the maps provided with Memorandum FWS/AES/058711, polygons were manually selected and the “AoS_Name” field was calculated to match the names as illustrated. Once all polygons in the exploded dataset were appropriately named, the dataset was dissolved, resulting in one record representing each of the seven SFAs identified in the memorandum.3. The NVCA EIS made modifications in concert with local FWS staff. Metadata and detailed change descriptions were not returned with the modified data. Contact Leisa Wesch, GIS Specialist, BLM Nevada State Office, 775-861-6421, lwesch@blm.gov, for details.4. Once the data was returned to the Wildlife Habitat Spatial Analysis Lab from the NVCA EIS, gaps surrounding the State of NV were closed. These gaps were introduced by the NVCA edits, exacerbated by them, or existed in the data as provided by the FWS. The gap closing was performed in an edit session by either extending each polygon towards each other or by creating a new polygon, which covered the gap, and merging it with the existing features. In addition to the gaps around state boundaries, a large area between the S. Idaho and S.E. Oregon SFAs was filled in. To accomplish this, ADPP habitat (current as of January 2015) and BLM GSSP SMA data were used to create a new polygon representing PHMA and BLM management that connected the two existing SFAs.5. In an effort to simplify the FWS dataset, features whose areas were less than 40 acres were identified and FWS was consulted for guidance on possible removal. To do so, features from #4 above were exploded once again in an ArcMap edit session. Features whose areas were less than forty acres were selected and exported (770 total features). This dataset was provided to the FWS and then returned with specific guidance on inclusion/exclusion via email by Lara Juliusson (lara_juliusson@fws.gov). The specific guidance was:a. Remove all features whose area is less than 10 acresb. Remove features identified as slivers (the thinness ratio was calculated and slivers identified by Lara Juliusson according to https://tereshenkov.wordpress.com/2014/04/08/fighting-sliver-polygons-in-arcgis-thinness-ratio/) and whose area was less than 20 acres.c. Remove features with areas less than 20 acres NOT identified as slivers and NOT adjacent to other features.d. Keep the remainder of features identified as less than 40 acres.To accomplish “a” and “b”, above, a simple selection was applied to the dataset representing features less than 40 acres. The select by location tool was used, set to select identical, to select these features from the dataset created in step 4 above. The records count was confirmed as matching between the two data sets and then these features were deleted. To accomplish “c” above, a field (“AdjacentSH”, added by FWS but not calculated) was calculated to identify features touching or intersecting other features. A series of selections was used: first to select records 6. Based on direction from the BLM Washington Office, the portion of the Upper Missouri River Breaks National Monument (UMRBNM) that was included in the FWS SFA dataset was removed. The BLM NOC GSSP NLCS dataset was used to erase these areas from #5 above. Resulting sliver polygons were also removed and geometry was repaired.7. In addition to removing UMRBNM, the BLM Washington Office also directed the removal of Non-ADPP habitat within the SFAs, on BLM managed lands, falling outside of Designated Wilderness’ & Wilderness Study Areas. An exception was the retention of the Donkey Hills ACEC and adjacent BLM lands. The BLM NOC GSSP NLCS datasets were used in conjunction with a dataset containing all ADPP habitat, BLM SMA and BLM sub-surface management unioned into one file to identify and delete these areas.8. The resulting dataset, after steps 2 – 8 above were completed, was dissolved to the SFA name field yielding this feature class with one record per SFA area.9. Data were provided to each EIS for use in FEIS allocation decision data development.10. Data were subset to BLM decision space (surface/sub-surface) within PHMA by each EIS and returned to the NOC.11. Due to variations in field names and values, three standardized fields were created and calculated by the NOC:a. SFA Name – The name of the SFA.b. Subsurface – Binary “Yes” or “No” to indicated federal subsurface estate.c. SMA – Represents BLM, USFS, other federal and non-federal surface management 12. The consolidated data (with standardized field names and values) were dissolved on the three fields illustrated above and geometry was repaired, resulting in this dataset.
u
Respiration_chambers/raw_log_files and combined datasets of biomass and...
catalogue-temperatereefbase.imas.utas.edu.au
data.aad.gov.au
+2more
Updated Dec 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Respiration_chambers/raw_log_files and combined datasets of biomass and chamber data, and physical parameters [Dataset]. https://catalogue-temperatereefbase.imas.utas.edu.au/geonetwork/srv/api/records/AAS_4127_Respiration_chambers
Explore at:
www:link-1.0-http--linkAvailable download formats
Dataset updated
Dec 18, 2018
Dataset provided by
AU/AADC > Australian Antarctic Data Centre, Australia
Time period covered
Jan 27, 2015 - Feb 23, 2015
Area covered
Description
General overview The following datasets are described by this metadata record, and are available for download from the provided URL.

Raw log files, physical parameters raw log files

Raw excel files, respiration/PAM chamber raw excel spreadsheets

Processed and cleaned excel files, respiration chamber biomass data

Raw rapid light curve excel files (this is duplicated from Raw log files), combined dataset pH, temperature, oxygen, salinity, velocity for experiment

Associated R script file for pump cycles of respirations chambers

####

Physical parameters raw log files

Raw log files 1) DATE= 2) Time= UTC+11 3) PROG=Automated program to control sensors and collect data 4) BAT=Amount of battery remaining 5) STEP=check aquation manual 6) SPIES=check aquation manual 7) PAR=Photoactive radiation 8) Levels=check aquation manual 9) Pumps= program for pumps 10) WQM=check aquation manual

####

Respiration/PAM chamber raw excel spreadsheets

Abbreviations in headers of datasets Note: Two data sets are provided in different formats. Raw and cleaned (adj). These are the same data with the PAR column moved over to PAR.all for analysis. All headers are the same. The cleaned (adj) dataframe will work with the R syntax below, alternative add code to do cleaning in R.

Date: ISO 1986 - Check Time:UTC+11 unless otherwise stated DATETIME: UTC+11 unless otherwise stated ID (of instrument in respiration chambers) ID43=Pulse amplitude fluoresence measurement of control ID44=Pulse amplitude fluoresence measurement of acidified chamber ID=1 Dissolved oxygen ID=2 Dissolved oxygen ID3= PAR ID4= PAR PAR=Photo active radiation umols F0=minimal florescence from PAM Fm=Maximum fluorescence from PAM Yield=(F0 – Fm)/Fm rChl=an estimate of chlorophyll (Note this is uncalibrated and is an estimate only) Temp=Temperature degrees C PAR=Photo active radiation PAR2= Photo active radiation2 DO=Dissolved oxygen %Sat= Saturation of dissolved oxygen Notes=This is the program of the underwater submersible logger with the following abreviations: Notes-1) PAM= Notes-2) PAM=Gain level set (see aquation manual for more detail) Notes-3) Acclimatisation= Program of slowly introducing treatment water into chamber Notes-4) Shutter start up 2 sensors+sample…= Shutter PAMs automatic set up procedure (see aquation manual) Notes-5) Yield step 2=PAM yield measurement and calculation of control Notes-6) Yield step 5= PAM yield measurement and calculation of acidified Notes-7) Abatus respiration DO and PAR step 1= Program to measure dissolved oxygen and PAR (see aquation manual). Steps 1-4 are different stages of this program including pump cycles, DO and PAR measurements.

8) Rapid light curve data Pre LC: A yield measurement prior to the following measurement After 10.0 sec at 0.5% to 8%: Level of each of the 8 steps of the rapid light curve Odessey PAR (only in some deployments): An extra measure of PAR (umols) using an Odessey data logger Dataflow PAR: An extra measure of PAR (umols) using a Dataflow sensor. PAM PAR: This is copied from the PAR or PAR2 column PAR all: This is the complete PAR file and should be used Deployment: Identifying which deployment the data came from

####

Respiration chamber biomass data

The data is chlorophyll a biomass from cores from the respiration chambers. The headers are: Depth (mm) Treat (Acidified or control) Chl a (pigment and indicator of biomass) Core (5 cores were collected from each chamber, three were analysed for chl a), these are psudoreplicates/subsamples from the chambers and should not be treated as replicates.

####

Associated R script file for pump cycles of respirations chambers

Associated respiration chamber data to determine the times when respiration chamber pumps delivered treatment water to chambers. Determined from Aquation log files (see associated files). Use the chamber cut times to determine net production rates. Note: Users need to avoid the times when the respiration chambers are delivering water as this will give incorrect results. The headers that get used in the attached/associated R file are start regression and end regression. The remaining headers are not used unless called for in the associated R script. The last columns of these datasets (intercept, ElapsedTimeMincoef) are determined from the linear regressions described below.

To determine the rate of change of net production, coefficients of the regression of oxygen consumption in discrete 180 minute data blocks were determined. R squared values for fitted regressions of these coefficients were consistently high (greater than 0.9). We make two assumptions with calculation of net production rates: the first is that heterotrophic community members do not change their metabolism under OA; and the second is that the heterotrophic communities are similar between treatments.

####

Combined dataset pH, temperature, oxygen, salinity, velocity for experiment

This data is rapid light curve data generated from a Shutter PAM fluorimeter. There are eight steps in each rapid light curve. Note: The software component of the Shutter PAM fluorimeter for sensor 44 appeared to be damaged and would not cycle through the PAR cycles. Therefore the rapid light curves and recovery curves should only be used for the control chambers (sensor ID43).

The headers are PAR: Photoactive radiation relETR: F0/Fm x PAR Notes: Stage/step of light curve Treatment: Acidified or control

The associated light treatments in each stage. Each actinic light intensity is held for 10 seconds, then a saturating pulse is taken (see PAM methods).

After 10.0 sec at 0.5% = 1 umols PAR After 10.0 sec at 0.7% = 1 umols PAR After 10.0 sec at 1.1% = 0.96 umols PAR After 10.0 sec at 1.6% = 4.32 umols PAR After 10.0 sec at 2.4% = 4.32 umols PAR After 10.0 sec at 3.6% = 8.31 umols PAR After 10.0 sec at 5.3% =15.78 umols PAR After 10.0 sec at 8.0% = 25.75 umols PAR

This dataset appears to be missing data, note D5 rows potentially not useable information

See the word document in the download file for more information.
DCASE-2023-TASK-5
kaggle.com
zip
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Víctor Aguado (2023). DCASE-2023-TASK-5 [Dataset]. https://www.kaggle.com/datasets/aguado/dcase-2023-task-5
Explore at:
zip(7712922302 bytes)Available download formats
Dataset updated
Jun 5, 2023
Authors
Víctor Aguado
Description
Introduction

This task focuses on sound event detection in a few-shot learning setting for animal (mammal and bird) vocalisations. Participants will be expected to create a method that can extract information from five exemplar vocalisations (shots) of mammals or birds and detect and classify sounds in field recordings.

For more info please reffer to the official website: https://dcase.community/challenge2023/task-few-shot-bioacoustic-event-detection

Description

Few-shot learning is a highly promising paradigm for sound event detection. It is also an extremely good fit to the needs of users in bioacoustics, in which increasingly large acoustic datasets commonly need to be labelled for events of an identified category (e.g. species or call-type), even though this category might not be known in other datasets or have any yet-known label. While satisfying user needs, this will also benchmark few-shot learning for the wider domain of sound event detection (SED).

Few-shot learning describes tasks in which an algorithm must make predictions given only a few instances of each class, contrary to standard supervised learning paradigm. The main objective is to find reliable algorithms that are capable of dealing with data sparsity, class imbalance and noisy/busy environments. Few-shot learning is usually studied using N-way-K-shot classification, where N denotes the number of classes and K the number of examples for each class.

Some reasons why few-shot learning has been of increasing interest:

Scarcity of supervised data can lead to unreliable generalisations of machine learning models. Explicitly labeling a huge dataset can be costly both in time and resources. Fixed ontologies or class labels used in SED and other DCASE tasks are often a poor fit to a given user’s goal. Development Set The development set is pre-split into training and validation sets. The training set consists of five sub-folders deriving from a different source each. Along with the audio files multi-class annotations are provided for each. The validation set consists of two sub-folders deriving from a different source each, with a single-class (class of interest) annotation file provided for each audio file.

Training Set

The training set contains four different sub-folders (BV, HV, JD, MT,WMW). Statistics are given overall and specific for each sub-folder.

Overall Statistics Values Number of audio recordings 174 Total duration 21 hours Total classes (excl. UNK) 47 Total events (excl. UNK) 14229

BV

The BirdVox-DCASE-10h (BV for short) contains five audio files from four different autonomous recording units, each lasting two hours. These autonomous recording units are all located in Tompkins County, New York, United States. Furthermore, they follow the same hardware specification: the Recording and Observing Bird Identification Node (ROBIN) developed by the Cornell Lab of Ornithology. Andrew Farnsworth, an expert ornithologist, has annotated these recordings for the presence of flight calls from migratory passerines, namely: American sparrows, cardinals, thrushes, and warblers. In total, the annotator found 2,662 from 11 different species. We estimate these flight calls to have a duration of 150 milliseconds and a fundamental frequency between 2 kHz and 10 kHz.

Statistics Values Number of audio recordings 5 Total duration 10 hours Total classes (excl. UNK) 11 Total events (excl. UNK) 9026 Ratio event/duration 0.04 Sampling rate 24,000 Hz

HT

Spotted hyenas are a highly social species that live in "fission-fusion" groups where group members range alone or in smaller subgroups that split and merge over time. Hyenas use a variety of types of vocalizations to coordinate with one another over both short and long distances. Spotted hyena vocalization data were recorded on custom-developed audio tags designed by Mark Johnson and integrated into combined GPS / acoustic collars (Followit Sweden AB) by Frants Jensen and Mark Johnson. Collars were deployed on female hyenas of the Talek West hyena clan at the MSU-Mara Hyena Project (directed by Kay Holekamp) in the Masai Mara, Kenya as part of a multi-species study on communication and collective behavior. Field work was carried out by Kay Holekamp, Andrew Gersick, Frants Jensen, Ariana Strandburg-Peshkin, and Benson Pion; labeling was done by Kenna Lehmann and colleagues.

Statistics Values Number of audio recordings 5 Total duration 5 hours Total classes (excl. UNK) 3 Total events (excl. UNK) 611 Ratio events/duration 0.05 Sampling rate 6000 Hz

JD

Jackdaws are corvid songbirds which usually breed, forage and sleep in large groups, but form a pair bond with the same partner for life. They produce thousands of vocalisations per day, but many aspects of their vocal behaviour remained unexplored due to the difficulty in recording and assigning vocalisations to specific individuals, especia...
f
Data and tools for studying isograms
figshare.com
Updated Jul 31, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian Breit (2017). Data and tools for studying isograms [Dataset]. http://doi.org/10.6084/m9.figshare.5245810.v1
Explore at:
application/x-sqlite3Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5245810.v1
Dataset updated
Jul 31, 2017
Dataset provided by
figshare
Authors
Florian Breit
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A collection of datasets and python scripts for extraction and analysis of isograms (and some palindromes and tautonyms) from corpus-based word-lists, specifically Google Ngram and the British National Corpus (BNC).Below follows a brief description, first, of the included datasets and, second, of the included scripts.1. DatasetsThe data from English Google Ngrams and the BNC is available in two formats: as a plain text CSV file and as a SQLite3 database.1.1 CSV formatThe CSV files for each dataset actually come in two parts: one labelled ".csv" and one ".totals". The ".csv" contains the actual extracted data, and the ".totals" file contains some basic summary statistics about the ".csv" dataset with the same name.The CSV files contain one row per data point, with the colums separated by a single tab stop. There are no labels at the top of the files. Each line has the following columns, in this order (the labels below are what I use in the database, which has an identical structure, see section below):

Label Data type Description

isogramy int The order of isogramy, e.g. "2" is a second order isogram

length int The length of the word in letters

word text The actual word/isogram in ASCII

source_pos text The Part of Speech tag from the original corpus

count int Token count (total number of occurences)

vol_count int Volume count (number of different sources which contain the word)

count_per_million int Token count per million words

vol_count_as_percent int Volume count as percentage of the total number of volumes

is_palindrome bool Whether the word is a palindrome (1) or not (0)

is_tautonym bool Whether the word is a tautonym (1) or not (0)

The ".totals" files have a slightly different format, with one row per data point, where the first column is the label and the second column is the associated value. The ".totals" files contain the following data:

Label

Data type

Description

!total_1grams

int

The total number of words in the corpus

!total_volumes

int

The total number of volumes (individual sources) in the corpus

!total_isograms

int

The total number of isograms found in the corpus (before compacting)

!total_palindromes

int

How many of the isograms found are palindromes

!total_tautonyms

int

How many of the isograms found are tautonyms

The CSV files are mainly useful for further automated data processing. For working with the data set directly (e.g. to do statistics or cross-check entries), I would recommend using the database format described below.1.2 SQLite database formatOn the other hand, the SQLite database combines the data from all four of the plain text files, and adds various useful combinations of the two datasets, namely:• Compacted versions of each dataset, where identical headwords are combined into a single entry.• A combined compacted dataset, combining and compacting the data from both Ngrams and the BNC.• An intersected dataset, which contains only those words which are found in both the Ngrams and the BNC dataset.The intersected dataset is by far the least noisy, but is missing some real isograms, too.The columns/layout of each of the tables in the database is identical to that described for the CSV/.totals files above.To get an idea of the various ways the database can be queried for various bits of data see the R script described below, which computes statistics based on the SQLite database.2. ScriptsThere are three scripts: one for tiding Ngram and BNC word lists and extracting isograms, one to create a neat SQLite database from the output, and one to compute some basic statistics from the data. The first script can be run using Python 3, the second script can be run using SQLite 3 from the command line, and the third script can be run in R/RStudio (R version 3).2.1 Source dataThe scripts were written to work with word lists from Google Ngram and the BNC, which can be obtained from http://storage.googleapis.com/books/ngrams/books/datasetsv2.html and [https://www.kilgarriff.co.uk/bnc-readme.html], (download all.al.gz).For Ngram the script expects the path to the directory containing the various files, for BNC the direct path to the *.gz file.2.2 Data preparationBefore processing proper, the word lists need to be tidied to exclude superfluous material and some of the most obvious noise. This will also bring them into a uniform format.Tidying and reformatting can be done by running one of the following commands:python isograms.py --ngrams --indir=INDIR --outfile=OUTFILEpython isograms.py --bnc --indir=INFILE --outfile=OUTFILEReplace INDIR/INFILE with the input directory or filename and OUTFILE with the filename for the tidied and reformatted output.2.3 Isogram ExtractionAfter preparing the data as above, isograms can be extracted from by running the following command on the reformatted and tidied files:python isograms.py --batch --infile=INFILE --outfile=OUTFILEHere INFILE should refer the the output from the previosu data cleaning process. Please note that the script will actually write two output files, one named OUTFILE with a word list of all the isograms and their associated frequency data, and one named "OUTFILE.totals" with very basic summary statistics.2.4 Creating a SQLite3 databaseThe output data from the above step can be easily collated into a SQLite3 database which allows for easy querying of the data directly for specific properties. The database can be created by following these steps:1. Make sure the files with the Ngrams and BNC data are named “ngrams-isograms.csv” and “bnc-isograms.csv” respectively. (The script assumes you have both of them, if you only want to load one, just create an empty file for the other one).2. Copy the “create-database.sql” script into the same directory as the two data files.3. On the command line, go to the directory where the files and the SQL script are. 4. Type: sqlite3 isograms.db 5. This will create a database called “isograms.db”.See the section 1 for a basic descript of the output data and how to work with the database.2.5 Statistical processingThe repository includes an R script (R version 3) named “statistics.r” that computes a number of statistics about the distribution of isograms by length, frequency, contextual diversity, etc. This can be used as a starting point for running your own stats. It uses RSQLite to access the SQLite database version of the data described above.
Data from: EyeFi: Fast Human Identification Through Vision and WiFi-based...
zenodo.org
data.niaid.nih.gov
zip
Updated Dec 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shiwei Fang; Tamzeed Islam; Sirajum Munir; Shahriar Nirjon; Shiwei Fang; Tamzeed Islam; Sirajum Munir; Shahriar Nirjon (2022). EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching [Dataset]. http://doi.org/10.5281/zenodo.7396485
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7396485
Dataset updated
Dec 5, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shiwei Fang; Tamzeed Islam; Sirajum Munir; Shahriar Nirjon; Shiwei Fang; Tamzeed Islam; Sirajum Munir; Shahriar Nirjon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
EyeFi Dataset

This dataset is collected as a part of the EyeFi project at Bosch Research and Technology Center, Pittsburgh, PA, USA. The dataset contains WiFi CSI values of human motion trajectories along with ground truth location information captured through a camera. This dataset is used in the following paper "EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching" that is published in the IEEE International Conference on Distributed Computing in Sensor Systems 2020 (DCOSS '20). We also published a dataset paper titled as "Dataset: Person Tracking and Identification using Cameras and Wi-Fi Channel State Information (CSI) from Smartphones" in Data: Acquisition to Analysis 2020 (DATA '20) workshop describing details of data collection. Please check it out for more information on the dataset.

Data Collection Setup

In our experiments, we used Intel 5300 WiFi Network Interface Card (NIC) installed in an Intel NUC and Linux CSI tools [1] to extract the WiFi CSI packets. The (x,y) coordinates of the subjects are collected from Bosch Flexidome IP Panoramic 7000 panoramic camera mounted on the ceiling and Angle of Arrivals (AoAs) are derived from the (x,y) coordinates. Both the WiFi card and camera are located at the same origin coordinates but at different height, the camera is location around 2.85m from the ground and WiFi antennas are around 1.12m above the ground.

The data collection environment consists of two areas, first one is a rectangular space measured 11.8m x 8.74m, and the second space is an irregularly shaped kitchen area with maximum distances of 19.74m and 14.24m between two walls. The kitchen also has numerous obstacles and different materials that pose different RF reflection characteristics including strong reflectors such as metal refrigerators and dishwashers.

To collect the WiFi data, we used a Google Pixel 2 XL smartphone as an access point and connect the Intel 5300 NIC to it for WiFi communication. The transmission rate is about 20-25 packets per second. The same WiFi card and phone are used in both lab and kitchen area.

List of Files
Here is a list of files included in the dataset:

|- 1_person |- 1_person_1.h5 |- 1_person_2.h5 |- 2_people |- 2_people_1.h5 |- 2_people_2.h5 |- 2_people_3.h5 |- 3_people |- 3_people_1.h5 |- 3_people_2.h5 |- 3_people_3.h5 |- 5_people |- 5_people_1.h5 |- 5_people_2.h5 |- 5_people_3.h5 |- 5_people_4.h5 |- 10_people |- 10_people_1.h5 |- 10_people_2.h5 |- 10_people_3.h5 |- Kitchen |- 1_person |- kitchen_1_person_1.h5 |- kitchen_1_person_2.h5 |- kitchen_1_person_3.h5 |- 3_people |- kitchen_3_people_1.h5 |- training |- shuffuled_train.h5 |- shuffuled_valid.h5 |- shuffuled_test.h5 View-Dataset-Example.ipynb README.md

In this dataset, folder `1_person/` , `2_people/` , `3_people/` , `5_people/`, and `10_people/` contains data collected from the lab area whereas `Kitchen/` folder contains data collected from the kitchen area. To see how the each file is structured, please see below in section Access the data.

The training folder contains the training dataset we used to train the neural network discussed in our paper. They are generated by shuffling all the data from `1_person/` folder collected in the lab area (`1_person_1.h5` and `1_person_2.h5`).

Why multiple files in one folder?

Each folder contains multiple files. For example, `1_person` folder has two files: `1_person_1.h5` and `1_person_2.h5`. Files in the same folder always have the same number of human subjects present simultaneously in the scene. However, the person who is holding the phone can be different. Also, the data could be collected through different days and/or the data collection system needs to be rebooted due to stability issue. As result, we provided different files (like `1_person_1.h5`, `1_person_2.h5`) to distinguish different person who is holding the phone and possible system reboot that introduces different phase offsets (see below) in the system.

Special note:

For `1_person_1.h5`, this file is generated by the same person who is holding the phone, and `1_person_2.h5` contains different people holding the phone but only one person is present in the area at a time. Boths files are collected in different days as well.

Access the data
To access the data, hdf5 library is needed to open the dataset. There are free HDF5 viewer available on the official website: https://www.hdfgroup.org/downloads/hdfview/. We also provide an example Python code View-Dataset-Example.ipynb to demonstrate how to access the data.

Each file is structured as (except the files under *"training/"* folder):

|- csi_imag |- csi_real |- nPaths_1 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- nPaths_2 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- nPaths_3 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- nPaths_4 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- num_obj |- obj_0 |- cam_aoa |- coordinates |- obj_1 |- cam_aoa |- coordinates ... |- timestamp

The `csi_real` and `csi_imag` are the real and imagenary part of the CSI measurements. The order of antennas and subcarriers are as follows for the 90 `csi_real` and `csi_imag` values : [subcarrier1-antenna1, subcarrier1-antenna2, subcarrier1-antenna3, subcarrier2-antenna1, subcarrier2-antenna2, subcarrier2-antenna3,… subcarrier30-antenna1, subcarrier30-antenna2, subcarrier30-antenna3]. `nPaths_x` group are SpotFi [2] calculated WiFi Angle of Arrival (AoA) with `x` number of multiple paths specified during calculation. Under the `nPath_x` group are `offset_xx` subgroup where `xx` stands for the offset combination used to correct the phase offset during the SpotFi calculation. We measured the offsets as:

|Antennas | Offset 1 (rad) | Offset 2 (rad) | |:-------:|:---------------:|:-------------:| | 1 & 2 | 1.1899 | -2.0071 | 1 & 3 | 1.3883 | -1.8129

The measurement is based on the work [3], where the authors state there are two possible offsets between two antennas which we measured by booting the device multiple times. The combination of the offset are used for the `offset_xx` naming. For example, `offset_12` is offset 1 between antenna 1 & 2 and offset 2 between antenna 1 & 3 are used in the SpotFi calculation.

The `num_obj` field is used to store the number of human subjects present in the scene. The `obj_0` is always the subject who is holding the phone. In each file, there are `num_obj` of `obj_x`. For each `obj_x1`, we have the `coordinates` reported from the camera and `cam_aoa`, which is estimated AoA from the camera reported coordinates. The (x,y) coordinates and AoA listed here are chronologically ordered (except the files in the `training` folder) . It reflects the way the person carried the phone moved in the space (for `obj_0`) and everyone else walked (for other `obj_y`, where `y` > 0).

The `timestamp` is provided here for time reference for each WiFi packets.

To access the data (Python):

import h5py data = h5py.File('3_people_3.h5','r') csi_real = data['csi_real'][()] csi_imag = data['csi_imag'][()] cam_aoa = data['obj_0/cam_aoa'][()] cam_loc = data['obj_0/coordinates'][()]

For file inside `training/` folder:

Files inside training folder has a different data structure:

|- nPath-1 |- aoa |- csi_imag |- csi_real |- spotfi |- nPath-2 |- aoa |- csi_imag |- csi_real |- spotfi |- nPath-3 |- aoa |- csi_imag |- csi_real |- spotfi |- nPath-4 |- aoa |- csi_imag |- csi_real |- spotfi

The group `nPath-x` is the number of multiple path specified during the SpotFi calculation. `aoa` is the camera generated angle of arrival (AoA) (can be considered as ground truth), `csi_image` and `csi_real` is the imaginary and real component of the CSI value. `spotfi` is the SpotFi calculated AoA values. The SpotFi values are chosen based on the lowest median and mean error from across `1_person_1.h5` and `1_person_2.h5`. All the rows under the same `nPath-x` group are aligned (i.e., first row of `aoa` corresponds to the first row of `csi_imag`, `csi_real`, and `spotfi`. There is no timestamp recorded and the sequence of the data is not chronological as they are randomly shuffled from the `1_person_1.h5` and `1_person_2.h5` files.

Citation
If you use the dataset, please cite our paper:

@inproceedings{eyefi2020, title={EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching}, author={Fang, Shiwei and Islam, Tamzeed and Munir, Sirajum and Nirjon, Shahriar}, booktitle={2020 IEEE International Conference on Distributed Computing in Sensor Systems (DCOSS)}, year={2020},
d
Growth and photosynthetic responses of encroaching tree seedlings to CO2 and...
datadryad.org
zenodo.org
zip
Updated Feb 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Raubenheimer (2022). Growth and photosynthetic responses of encroaching tree seedlings to CO2 and stress interactions [Dataset]. http://doi.org/10.5061/dryad.hmgqnk9jt
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.hmgqnk9jt
Dataset updated
Feb 22, 2022
Dataset provided by
Dryad
Authors
Sarah Raubenheimer
Time period covered
2022
Description
Woody encroachment in southern African savanna has been partly attributed to rising atmospheric CO2 fertilising the growth of C3 trees but less so that of competing C4 grasses. However, growth conditions (resource availability, competition, rooting space, and herbivory) must be suitable for the effects of elevated CO2 (eCO2) to be realised.

This research investigated the interactions between the positive effect of eCO2 on tree seedling growth and limitations imposed by drought, disturbance, and competition with C4 grasses. Seedlings of the prolific encroacher C3 tree Vachellia karroo were grown at ambient (400 ppm) or eCO2 (800 ppm) in Open-Top Chambers and exposed to a variety of stresses and disturbances typical of savanna systems. Photosynthetic, growth and allocation responses to eCO2 and other treatments were determined.

Unsurprisingly, we show strong growth and water-saving responses of V. karroo seedlings to eCO2 when in the absence of competition and herbivory. However,...
Public electrophysiological datasets collected in the Buzsaki Lab
zenodo.org
Updated Jul 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Christian Petersen; Peter Christian Petersen; Michelle Hernandez; György Buzsáki; György Buzsáki; Michelle Hernandez (2024). Public electrophysiological datasets collected in the Buzsaki Lab [Dataset]. http://doi.org/10.5281/zenodo.3629881
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3629881
Dataset updated
Jul 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Peter Christian Petersen; Peter Christian Petersen; Michelle Hernandez; György Buzsáki; György Buzsáki; Michelle Hernandez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Buzsaki Lab is proud to present a large selection of experimental data available for public access: https://buzsakilab.com/wp/database/. We publicly share more than a thousand sessions (about 40TB of raw and spike- and LFP-processed data) via our public data repository. The datasets are from freely moving rodents and include sleep-task-sleep sessions (3 to 24 hrs continuous recording sessions) in various brain structures, including metadata. We are happy to assist you in using the data. Our goal is that by sharing these data, other users can provide new insights, extend, contradict, or clarify our conclusions.

The databank contains electrophysiological recordings performed in freely moving rats and mice collected by investigators in the Buzsaki Lab over several years (a subset from head-fixed mice). Sessions have been collected with extracellular electrodes using high-channel-count silicon probes, with spike sorted single units, and intracellular and juxtacellular combined with extracellular electrodes. Several sessions include physiologically and optogenetically identified units. The sessions have been collected from various brain region pairs: the hippocampus, thalamus, amygdala, post-subiculum, septal region, and the entorhinal cortex, and various neocortical regions. In most behavioral tasks, the animals performed spatial behaviors (linear mazes and open fields), preceded and followed by long sleep sessions. Brain state classification is provided.

Getting started

The top menu “Databank” serves as a navigational menu to the databank. The metadata describing the experiments is stored in a relational database which means that there are many entry points for exploring the data. The databank is organized by projects, animal subjects, and sessions.

Accessing and downloading the datasets

We share the data through two services: our public Globus.org endpoint and our webshare: buzsakilab.nyumc.org. A subset of the datasets is also available at CRCNS.org. If you have an interest in a dataset that is not listed or is lacking information, please contact us. We pledge to make our data available immediately after publication.

Support

For support, please use our Buzsaki Databank google group. If you have an interest in a dataset that is not listed or is lacking information, please send us a request. Feel free to contact us, if you need more details on a given dataset or if a dataset is missing.
Z
NewsUnravel Dataset
data.niaid.nih.gov
zenodo.org
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
anon (2024). NewsUnravel Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8344890
Explore at:
Dataset updated
Jul 11, 2024
Dataset authored and provided by
anon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
About the NUDA DatasetMedia bias is a multifaceted problem, leading to one-sided views and impacting decision-making. A way to address bias in news articles is to automatically detect and indicate it through machine-learning methods. However, such detection is limited due to the difficulty of obtaining reliable training data. To facilitate the data-gathering process, we introduce NewsUnravel, a news-reading web application leveraging an initially tested feedback mechanism to collect reader feedback on machine-generated bias highlights within news articles. Our approach augments dataset quality by significantly increasing inter-annotator agreement by 26.31% and improving classifier performance by 2.49%. As the first human-in-the-loop application for media bias, NewsUnravel shows that a user-centric approach to media bias data collection can return reliable data while being scalable and evaluated as easy to use. NewsUnravel demonstrates that feedback mechanisms are a promising strategy to reduce data collection expenses, fluidly adapt to changes in language, and enhance evaluators' diversity.

General

This dataset was created through user feedback on automatically generated bias highlights on news articles on the website NewsUnravel made by ANON. Its goal is to improve the detection of linguistic media bias for analysis and to indicate it to the public. Support came from ANON. None of the funders played any role in the dataset creation process or publication-related decisions.

The dataset consists of text, namely biased sentences with binary bias labels (processed, biased or not biased) as well as metadata about the article. It includes all feedback that was given. The single ratings (unprocessed) used to create the labels with correlating User IDs are included.

For training, this dataset was combined with the BABE dataset. All data is completely anonymous. Some sentences might be offensive or triggering as they were taken from biased or more extreme news sources. The dataset does not identify sub-populations or can be considered sensitive to them, nor is it possible to identify individuals.

Description of the Data Files

This repository contains the datasets for the anonymous NewsUnravel submission. The tables contain the following data:

NUDAdataset.csv: the NUDA dataset with 310 new sentences with bias labelsStatistics.png: contains all Umami statistics for NewsUnravel's usage dataFeedback.csv: holds the participantID of a single feedback with the sentence ID (contentId), the bias rating, and provided reasonsContent.csv: holds the participant ID of a rating with the sentence ID (contentId) of a rated sentence and the bias rating, and reason, if givenArticle.csv: holds the article ID, title, source, article metadata, article topic, and bias amount in %Participant.csv: holds the participant IDs and data processing consent

Collection Process

Data was collected through interactions with the Feedback Mechanism on NewsUnravel. A news article was displayed with automatically generated bias highlights. Each highlight could be selected, and readers were able to agree or disagree with the automatic label. Through a majority vote, labels were generated from those feedback interactions. Spammers were excluded through a spam detection approach.

Readers came to our website voluntarily through posts on LinkedIn and social media as well as posts on university boards. The data collection period lasted for one week, from March 4th to March 11th (2023). The landing page informed them about the goal and the data processing. After being informed, they could proceed to the article overview.

So far, the dataset has been used on top of BABE to train a linguistic bias classifier, adopting hyperparameter configurations from BABE with a pre-trained model from Hugging Face.The dataset will be open source. On acceptance, a link with all details and contact information will be provided. No third parties are involved.

The dataset will not be maintained as it captures the first test of NewsUnravel at a specific point in time. However, new datasets will arise from further iterations. Those will be linked in the repository. Please cite the NewsUnravel paper if you use the dataset and contact us if you're interested in more information or joining the project.
c
12th population census 31 may 1947 - Nederland
datacatalogue.cessda.eu
ssh.datastations.nl
Updated Jan 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centraal Bureau voor de Statistiek (CBS); NIWI - KNAW (2025). 12th population census 31 may 1947 - Nederland [Dataset]. http://doi.org/10.17026/dans-zhv-syqa
Explore at:
Unique identifier
https://doi.org/10.17026/dans-zhv-syqa
Dataset updated
Jan 18, 2025
Authors
Centraal Bureau voor de Statistiek (CBS); NIWI - KNAW
Area covered
Netherlands
Description
EN: The dataset is based on tables with detailed data for municipalities and boroughs of the population census and the occupational census of the Netherlands 1947. These detailed tables from the archive of Statistics Netherlands never have been published. They are written on so-called ‘transparanten’, sheets in A4-format. The set contains more than 35 table-types, some of which spread over two or more sheets, some combined on one sheet.
Image scans of the detailed tables have been made in February 2005. Those scans, 29489 in total were published on www.volkstellingen.nl, ordered by province and municipality. In a later stage the scans have been converted by data-entry to Excel worksheets. In most cases one scan has been converted to one Excel file. However, if a scan contains two or more tables, a separate Excel file is made for each table. The Excel files also have been converted to CSV-text files.
The thematic collection: 12th Population Census 31 May 1947 contains 11 datasets for the provinces plus one dataset for the Netherlands as a whole. The documentation for any dataset in the collection contains a description of the contents of all table-types and the instruction given for data-entry.
This dataset regards the files of the Netherlands as a whole. The files are grouped by province.
The metadata per file (details) contains the table number. An overview of table numbers by file is contained in ‘Table number per scan_Nederland.csv’. This applies for the scans as well as the Excel files and the CSV-text files. The file 'Titles of Tables' shows the table numbers with the corresponding titles of the tables.
NL: De dataset is gebaseerd op gedetailleerde tabellen op plaatselijk en wijkniveau van de Volks- en Beroepstellingen 1947. Deze gedetailleerde tabellen uit het CBS-archief zijn nooit gepubliceerd. Zij stonden op perkamentachtig papier (‘transparanten’) in A4-formaat. Het betreft meer dan 35 tabeltypen, waarvan sommige per tabel op één transparant, sommige per tabel gespreid over twee of meer transparanten (afhankelijk van de grootte van de gemeente) en enkele met twee of drie tabellen op één transparant.
Van deze gedetailleerde tabellen zijn in februari 2005 tijdens de Landelijke Contactdag Document Management image scans gemaakt in JPEG-formaat. De in totaal 29489 scans zijn in eerste instantie opgenomen op de website www.volkstellingen.nl, geordend per provincie en gemeente. Later zijn de scans met data-entry overgenomen in Excelbestanden. In principe is van elke scan één Excelbestand gemaakt. Alleen als een scan twee of meer tabellen bevat, is van elke tabel een afzonderlijk Excelbestand gemaakt. De Excelbestanden zijn ook geconverteerd naar CSV-tekstbestanden.
De collectie datasets ‘Volks- en Beroepentellingen 1947’ bestaat uit 11 datasets voor de provincies plus een dataset voor Nederland als geheel. De documentatie voor alle datasets in deze collectie omvat onder meer een beschrijving van de inhoud van elk tabeltype en de instructies die zijn gegeven voor de data-entry.
Deze dataset betreft de bestanden van Nederland als geheel. De bestanden zijn ingedeeld per provincie.
De metadata per bestand (details) bevat het tabelnummer. Een overzicht met het tabelnummer per bestand staat in ‘Table number per scan_Nederland.csv’. Dat is ook van toepassing op de bijbehorende Excelbestanden en CSV-tekstbestanden. Het bestand 'Titles of Tables' geeft een overzicht van de tabelnummers met de bijbehorende tabelnamen. Dit bestand is beschikbaar gesteld als pdf-document en als CSV-tekstbestand.

12de volkstelling 31 mei 1947 - Nederland

Facebook

Twitter

Click to copy link

Link copied

Cite

Neilsberg Research (2024). Median Household Income Variation by Family Size in Combine, TX: Comparative analysis across 7 household sizes [Dataset]. https://www.neilsberg.com/research/datasets/1accbea3-73fd-11ee-949f-3860777c1fe6/

Median Household Income Variation by Family Size in Combine, TX: Comparative analysis across 7 household sizes

Explore at:

json, csvAvailable download formats

Dataset updated

Jan 11, 2024

Dataset authored and provided by

Neilsberg Research

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

Combine, Texas

Variables measured

Household size, Median Household Income

Measurement technique

The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across 7 household sizes (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out how household income varies with the size of the family unit. For additional information about these estimations, please contact us via email at research@neilsberg.com

Dataset funded by

Neilsberg Research

Description

About this dataset

Context

The dataset presents median household incomes for various household sizes in Combine, TX, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.

Key observations

Of the 7 household sizes (1 person to 7-or-more person households) reported by the census bureau, Combine did not include 7-person households. Across the different household sizes in Combine the mean income is $101,729, and the standard deviation is $57,970. The coefficient of variation (CV) is 56.98%. This high CV indicates high relative variability, suggesting that the incomes vary significantly across different sizes of households.
In the most recent year, 2021, The smallest household size for which the bureau reported a median household income was 1-person households, with an income of $23,263. It then further increased to $126,782 for 6-person households, the largest household size for which the bureau reported a median household income.

https://i.neilsberg.com/ch/combine-tx-median-household-income-by-household-size.jpeg" alt="Combine, TX median household income, by household size (in 2022 inflation-adjusted dollars)">

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Household Sizes:

1-person households
2-person households
3-person households
4-person households
5-person households
6-person households
7-or-more-person households

Variables / Data Columns

Household Size: This column showcases 7 household sizes ranging from 1-person households to 7-or-more-person households (As mentioned above).
Median Household Income: Median household income, in 2022 inflation-adjusted dollars for the specific household size.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Combine median household income. You can refer the same here

Clear search

Close search

Google apps

Main menu

Median Household Income Variation by Family Size in Combine, TX: Comparative...

About this dataset

Content

Inspiration

Recommended for further research

Data from: Combined Analysis of Multiple Glycan-Array Datasets: New...

Combine, TX Age Group Population Dataset: A Complete Breakdown of Combine...

About this dataset

Content

Inspiration

Recommended for further research

American Community Survey

ERA5 post-processed daily statistics on single levels from 1940 to present

Addresses (Open Data)

American Community Survey

Flow map data of the singel pendulum, double pendulum and 3-body problem

BirdVox-scaper-10k: a synthetic dataset for multilabel species...

Coastal Habitat Modification - Hawaii

Combined Locks, WI Population Breakdown By Race (Excluding Ethnicity)...

About this dataset

Content

Inspiration

Recommended for further research

BLM Natl WesternUS GRSG Sagebrush Focal Areas

Respiration_chambers/raw_log_files and combined datasets of biomass and...

DCASE-2023-TASK-5

Introduction

Description

Training Set

BV

HT

JD

Data and tools for studying isograms

Data from: EyeFi: Fast Human Identification Through Vision and WiFi-based...

Growth and photosynthetic responses of encroaching tree seedlings to CO2 and...

Public electrophysiological datasets collected in the Buzsaki Lab

NewsUnravel Dataset

12th population census 31 may 1947 - Nederland

Median Household Income Variation by Family Size in Combine, TX: Comparative analysis across 7 household sizes

About this dataset

Content

Inspiration

Recommended for further research