100+ datasets found

Data from: Modeling Rabbit Responses to Single and Multiple Aerosol...
catalog.data.gov
data.wu.ac.at
Updated Nov 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Modeling Rabbit Responses to Single and Multiple Aerosol Exposures of Bacillus anthracis Spores Data Set [Dataset]. https://catalog.data.gov/dataset/modeling-rabbit-responses-to-single-and-multiple-aerosol-exposures-of-bacillus-anthracis-s
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The two excel files contain all of the raw data that was modeled in the R code. The 6 word documents contain all of the R code that can be used in R to model the raw rabbit data. This dataset is associated with the following publication: Bartrand, T., H. Marks, M. Coleman, D. Donahue, S. Hines, J. Comer, and S. Taft. Modeling Rabbit Responses to Single and Multiple Aerosol Exposures of Bacillus anthracis Spores (HS 4.04.02 - 475). RISK ANALYSIS. Blackwell Publishing, Malden, MA, USA, 37(5): 943-957, (2017).
N
Two Harbors, MN Population Breakdown by Gender Dataset: Male and Female...
neilsberg.com
csv, json
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Two Harbors, MN Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b258b218-f25d-11ef-8c1b-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Minnesota, Two Harbors
Variables measured
Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Two Harbors by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Two Harbors across both sexes and to determine which sex constitutes the majority.

Key observations

There is a slight majority of male population, with 50.53% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Scope of gender :

Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

Variables / Data Columns

Gender: This column displays the Gender (Male / Female)

Population: The population of the gender in the Two Harbors is shown in this column.

% of Total Population: This column displays the percentage distribution of each gender as a proportion of Two Harbors total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Two Harbors Population by Race & Ethnicity. You can refer the same here
Climate Change: Earth Surface Temperature Data
kaggle.com
redivis.com
zip
Updated May 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Berkeley Earth (2017). Climate Change: Earth Surface Temperature Data [Dataset]. https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data
Explore at:
zip(88843537 bytes)Available download formats
Dataset updated
May 1, 2017
Dataset authored and provided by
Berkeley Earthhttp://berkeleyearth.org/
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
Earth
Description
Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.

Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.

Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.

We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.

In this dataset, we have include several files:

Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):

Date: starts in 1750 for average land temperature and 1850 for max and min land temperatures and global ocean and land temperatures

LandAverageTemperature: global average land temperature in celsius

LandAverageTemperatureUncertainty: the 95% confidence interval around the average

LandMaxTemperature: global average maximum land temperature in celsius

LandMaxTemperatureUncertainty: the 95% confidence interval around the maximum land temperature

LandMinTemperature: global average minimum land temperature in celsius

LandMinTemperatureUncertainty: the 95% confidence interval around the minimum land temperature

LandAndOceanAverageTemperature: global average land and ocean temperature in celsius

LandAndOceanAverageTemperatureUncertainty: the 95% confidence interval around the global average land and ocean temperature

Other files include:

Global Average Land Temperature by Country (GlobalLandTemperaturesByCountry.csv)

Global Average Land Temperature by State (GlobalLandTemperaturesByState.csv)

Global Land Temperatures By Major City (GlobalLandTemperaturesByMajorCity.csv)

Global Land Temperatures By City (GlobalLandTemperaturesByCity.csv)

The raw data comes from the Berkeley Earth data page.
Z
[Database] Urban Water Consumption at Multiple Spatial and Temporal Scales....
data.niaid.nih.gov
Updated Mar 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Castelletti Andrea (2021). [Database] Urban Water Consumption at Multiple Spatial and Temporal Scales. A Review of Existing Datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4390459
Explore at:
Dataset updated
Mar 2, 2021
Dataset provided by
Castelletti Andrea
Di Mauro Anna
Cominola Andrea
Di Nardo Armando
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This file contains the complete catalog of datasets and publications reviewed in: Di Mauro A., Cominola A., Castelletti A., Di Nardo A.. Urban Water Consumption at Multiple Spatial and Temporal Scales. A Review of Existing Datasets. Water 2021.The complete catalog contains:

92 state-of-the-art water demand datasets identified at the district, household, and end use scales;

120 related peer-reviewed publications;

57 additional datasets with electricity demand data at the end use and household scales.

The following metadata are reported, for each dataset:

Authors

Year

Location

Dataset Size

Time Series Length

Time Sampling Resolution

Access Policy.

The following metadata are reported, for each publication:

Authors

Year

Journal

Title

Spatial Scale

Type of Study: Survey (S) / Dataset (D)

Domain: Water (W)/Electricity (E)

Time Sampling Resolution

Access Policy

Dataset Size

Time Series Length

Location

Authors: Anna Di Mauro - Department of Engineering | Università degli studi della Campania Luigi Vanvitelli (Italy) | anna.dimauro@unicampania.it; Andrea Cominola - Chair of Smart Water Networks | Technische Universität Berlin - Einstein Center Digital Future (Germany) | andrea.cominola@tu-berlin.de; Andrea Castelletti - Department of Electronics, Information and Bioengineering | Politecnico di Milano (Italy) | andrea.castelletti@polimi.it Armando Di Nardo -Department of Engineering | Università degli studi della Campania Luigi Vanvitelli (Italy) | armando.dinardo@unicampania.it

Citation and reference:

If you use this database, please consider citing our paper

Di Mauro, A., Cominola, A., Castelletti, A., & Di Nardo, A. (2021). Urban Water Consumption at Multiple Spatial and Temporal Scales. A Review of Existing Datasets. Water, 13(1), 36, https://doi.org/10.3390/w13010036

Updates and Contributions:

The catalogue stored in this public repository can be collaboratively updated as more datasets become available. The authors will periodically update it to a new version.

New requests can be submitted to the authors, so that the dataset collection can be improved by different contributors. Contributors will be cited, step by step, in the updated versions of the dataset catalogue.

Updates history:

March 1st, 2021 - Pacheco, C.J.B., Horsburgh, J.S., Tracy, J.R. (Utah State University, Logan, UT - USA) --- The dataset associated with paper Bastidas Pacheco, C.J.; Horsburgh, J.S.; Tracy, R.J.. A Low-Cost, Open Source Monitoring System for Collecting High Temporal Resolution Water Use Data on Magnetically Driven Residential Water Meters. Sensors 2020, 20, 3655. is published in the HydroShare repository, where it is available as an OPEN dataset. Data can be found here: https://doi.org/10.4211/hs.4de42db6485f47b290bd9e17b017bb51
Event Registry titles dataset with multiple extracted features (both sparse...
data.europa.eu
data.niaid.nih.gov
unknown
Updated Jun 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2022). Event Registry titles dataset with multiple extracted features (both sparse and dense) and degraded by OCR [Dataset]. https://data.europa.eu/88u/dataset/oai-zenodo-org-6631082
Explore at:
unknown(535403842)Available download formats
Dataset updated
Jun 10, 2022
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the same dataset as: Guillaume Bernard. (2022). Event Registry titles only dataset with multiple extracted features (both sparse and dense) (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6630447 But with texts degraded by OCR as described in: Guillaume Bernard. (2022). Event Registry titles dataset texts with OCR degradations (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6630828
N
Tuscaloosa, AL Population Dataset: Yearly Figures, Population Change, and...
neilsberg.com
csv, json
Updated Sep 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2023). Tuscaloosa, AL Population Dataset: Yearly Figures, Population Change, and Percent Change Analysis [Dataset]. https://www.neilsberg.com/research/datasets/6f90a844-3d85-11ee-9abe-0aa64bf2eeb2/
Explore at:
json, csvAvailable download formats
Dataset updated
Sep 18, 2023
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Alabama, Tuscaloosa
Variables measured
Annual Population Growth Rate, Population Between 2000 and 2022, Annual Population Growth Rate Percent
Measurement technique
The data presented in this dataset is derived from the 20 years data of U.S. Census Bureau Population Estimates Program (PEP) 2000 - 2022. To measure the variables, namely (a) population and (b) population change in ( absolute and as a percentage ), we initially analyzed and tabulated the data for each of the years between 2000 and 2022. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Tuscaloosa population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Tuscaloosa across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.

Key observations

In 2022, the population of Tuscaloosa was 110,602, a 1.39% increase year-by-year from 2021. Previously, in 2021, Tuscaloosa population was 109,082, an increase of 4.67% compared to a population of 104,214 in 2020. Over the last 20 plus years, between 2000 and 2022, population of Tuscaloosa increased by 31,687. In this period, the peak population was 110,602 in the year 2022. The numbers suggest that the population has not reached its peak yet and is showing a trend of further growth. Source: U.S. Census Bureau Population Estimates Program (PEP).

Content

When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).

Data Coverage:

From 2000 to 2022

Variables / Data Columns

Year: This column displays the data year (Measured annually and for years 2000 to 2022)

Population: The population for the specific year for the Tuscaloosa is shown in this column.

Year on Year Change: This column displays the change in Tuscaloosa population for each year compared to the previous year.

Change in Percent: This column displays the year on year change as a percentage. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Tuscaloosa Population by Year. You can refer the same here
h
uplimit-synthetic-data-week-2-with-multi-turn
huggingface.co
Updated Jul 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan Jackson (2025). uplimit-synthetic-data-week-2-with-multi-turn [Dataset]. https://huggingface.co/datasets/djackson-proofpoint/uplimit-synthetic-data-week-2-with-multi-turn
Explore at:
Dataset updated
Jul 4, 2025
Authors
Dan Jackson
Description
Dataset Card for uplimit-synthetic-data-week-2-with-multi-turn

This dataset has been created with distilabel. The pipeline script was uploaded to easily reproduce the dataset: colab_kernel_launcher.py. It can be run directly using the CLI: distilabel pipeline run --script "https://huggingface.co/datasets/djackson-proofpoint/uplimit-synthetic-data-week-2-with-multi-turn/raw/main/colab_kernel_launcher.py"

Dataset Summary

This dataset contains a… See the full description on the dataset page: https://huggingface.co/datasets/djackson-proofpoint/uplimit-synthetic-data-week-2-with-multi-turn.
Purchase Order Data
data.ca.gov
catalog.data.gov
csv, docx, pdf
Updated Oct 23, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of General Services (2019). Purchase Order Data [Dataset]. https://data.ca.gov/dataset/purchase-order-data
Explore at:
csv, pdf, docxAvailable download formats
Dataset updated
Oct 23, 2019
Dataset authored and provided by
California Department of General Services
Description
The State Contract and Procurement Registration System (SCPRS) was established in 2003, as a centralized database of information on State contracts and purchases over $5000. eSCPRS represents the data captured in the State's eProcurement (eP) system, Bidsync, as of March 16, 2009. The data provided is an extract from that system for fiscal years 2012-2013, 2013-2014, and 2014-2015

Data Limitations:
Some purchase orders have multiple UNSPSC numbers, however only first was used to identify the purchase order. Multiple UNSPSC numbers were included to provide additional data for a DGS special event however this affects the formatting of the file. The source system Bidsync is being deprecated and these issues will be resolved in the future as state systems transition to Fi$cal.

Data Collection Methodology:

The data collection process starts with a data file from eSCPRS that is scrubbed and standardized prior to being uploaded into a SQL Server database. There are four primary tables. The Supplier, Department and United Nations Standard Products and Services Code (UNSPSC) tables are reference tables. The Supplier and Department tables are updated and mapped to the appropriate numbering schema and naming conventions. The UNSPSC table is used to categorize line item information and requires no further manipulation. The Purchase Order table contains raw data that requires conversion to the correct data format and mapping to the corresponding data fields. A stacking method is applied to the table to eliminate blanks where needed. Extraneous characters are removed from fields. The four tables are joined together and queries are executed to update the final Purchase Order Dataset table. Once the scrubbing and standardization process is complete the data is then uploaded into the SQL Server database.

Secondary/Related Resources:

State Contract Manual (SCM) vol. 2 http://www.dgs.ca.gov/pd/Resources/publications/SCM2.aspx

State Contract Manual (SCM) vol. 3 http://www.dgs.ca.gov/pd/Resources/publications/SCM3.aspx

Buying Green http://www.dgs.ca.gov/buyinggreen/Home.aspx

United Nations Standard Products and Services Code, http://www.unspsc.org/
g
DISTRIBUTED ANOMALY DETECTION USING SATELLITE DATA FROM MULTIPLE MODALITIES
gimi9.com
data.nasa.gov
+3more
Updated Sep 24, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2010). DISTRIBUTED ANOMALY DETECTION USING SATELLITE DATA FROM MULTIPLE MODALITIES [Dataset]. https://gimi9.com/dataset/data-gov_distributed-anomaly-detection-using-satellite-data-from-multiple-modalities/
Explore at:
Dataset updated
Sep 24, 2010
Description
DISTRIBUTED ANOMALY DETECTION USING SATELLITE DATA FROM MULTIPLE MODALITIES KANISHKA BHADURI*, KAMALIKA DAS**, AND PETR VOTAVA*** Abstract. There has been a tremendous increase in the volume of Earth Science data over the last decade from modern satellites, in-situ sensors and different climate models. All these datasets need to be co-analyzed for finding interesting patterns or for searching for extremes or outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets ate physically stored at different geographical locations. Moving these petabytes of data over the network to a single location may waste a lot of bandwidth, and can take days to finish. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the global data without moving all the data to one location. The algorithm is highly accurate (close to 99%) and requires centralizing less than 5% of the entire dataset. We demonstrate the performance of the algorithm using data obtained from the NASA MODerate-resolution Imaging Spectroradiometer (MODIS) satellite images.
N
Two Rivers Town, Wisconsin Population Dataset: Yearly Figures, Population...
neilsberg.com
csv, json
Updated Sep 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2023). Two Rivers Town, Wisconsin Population Dataset: Yearly Figures, Population Change, and Percent Change Analysis [Dataset]. https://www.neilsberg.com/research/datasets/6f9150d6-3d85-11ee-9abe-0aa64bf2eeb2/
Explore at:
json, csvAvailable download formats
Dataset updated
Sep 18, 2023
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Two Rivers, Two Rivers, Wisconsin
Variables measured
Annual Population Growth Rate, Population Between 2000 and 2022, Annual Population Growth Rate Percent
Measurement technique
The data presented in this dataset is derived from the 20 years data of U.S. Census Bureau Population Estimates Program (PEP) 2000 - 2022. To measure the variables, namely (a) population and (b) population change in ( absolute and as a percentage ), we initially analyzed and tabulated the data for each of the years between 2000 and 2022. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Two Rivers town population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Two Rivers town across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.

Key observations

In 2022, the population of Two Rivers town was 1,676, a 0.30% decrease year-by-year from 2021. Previously, in 2021, Two Rivers town population was 1,681, an increase of 0.48% compared to a population of 1,673 in 2020. Over the last 20 plus years, between 2000 and 2022, population of Two Rivers town decreased by 251. In this period, the peak population was 1,928 in the year 2001. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).

Content

When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).

Data Coverage:

From 2000 to 2022

Variables / Data Columns

Year: This column displays the data year (Measured annually and for years 2000 to 2022)

Population: The population for the specific year for the Two Rivers town is shown in this column.

Year on Year Change: This column displays the change in Two Rivers town population for each year compared to the previous year.

Change in Percent: This column displays the year on year change as a percentage. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Two Rivers town Population by Year. You can refer the same here
i
Multi-domain data description sessions data - Dataset - CKAN
rdm.inesctec.pt
Updated Jan 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Multi-domain data description sessions data - Dataset - CKAN [Dataset]. https://rdm.inesctec.pt/dataset/cs-2020-001
Explore at:
Dataset updated
Jan 9, 2020
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset results from 13 data description sessions conducted at U. Porto. In each session researchers have created metadata in the Dendro, research data management platform. A project for each session was created beforehand in Dendro and all the sessions were kept under the same account. All projects were kept private. This was explained to the researchers and they could have changed any information if they wanted to. When scheduling the sessions researchers were asked to choose a dataset to describe. The sessions started by introducing researchers to Dendro with a brief demonstration of its features. The researchers were then asked to create a folder and upload their datasets. During the session the selection of descriptors was mostly up to them. Exceptionally, they were asked if a given descriptor was suitable to contextualize their data. Sessions audio was recorded with the researchers’ consent and were deleted after the transcription of relevant events and comments during each session to complement the analysis of the metadata produced. The audio was also used to mark the moment the researchers started and finished the description, in order to ascertain the session duration.
N
Two Harbors, MN Age Group Population Dataset: A Complete Breakdown of Two...
neilsberg.com
csv, json
Updated Jul 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Two Harbors, MN Age Group Population Dataset: A Complete Breakdown of Two Harbors Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/aabeab9e-4983-11ef-ae5d-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Jul 24, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Minnesota, Two Harbors
Variables measured
Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Two Harbors population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Two Harbors. The dataset can be utilized to understand the population distribution of Two Harbors by age. For example, using this dataset, we can identify the largest age group in Two Harbors.

Key observations

The largest age group in Two Harbors, MN was for the group of age 40 to 44 years years with a population of 310 (8.56%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in Two Harbors, MN was the 75 to 79 years years with a population of 109 (3.01%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Variables / Data Columns

Age Group: This column displays the age group in consideration

Population: The population for the specific age group in the Two Harbors is shown in this column.

% of Total Population: This column displays the population of each age group as a proportion of Two Harbors total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Two Harbors Population by Age. You can refer the same here
Data cleaning using unstructured data
zenodo.org
zip
Updated Jul 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rihem Nasfi; Rihem Nasfi; Antoon Bronselaer; Antoon Bronselaer (2024). Data cleaning using unstructured data [Dataset]. http://doi.org/10.5281/zenodo.13135983
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13135983
Dataset updated
Jul 30, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rihem Nasfi; Rihem Nasfi; Antoon Bronselaer; Antoon Bronselaer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this project, we work on repairing three datasets:

Trials design: This dataset was obtained from the European Union Drug Regulating Authorities Clinical Trials Database (EudraCT) register and the ground truth was created from external registries. In the dataset, multiple countries, identified by the attribute country_protocol_code, conduct the same clinical trials which is identified by eudract_number. Each clinical trial has a title that can help find informative details about the design of the trial.

Trials population: This dataset delineates the demographic origins of participants in clinical trials primarily conducted across European countries. This dataset include structured attributes indicating whether the trial pertains to a specific gender, age group or healthy volunteers. Each of these categories is labeled as (`1') or (`0') respectively denoting whether it is included in the trials or not. It is important to note that the population category should remain consistent across all countries conducting the same clinical trial identified by an eudract_number. The ground truth samples in the dataset were established by aligning information about the trial populations provided by external registries, specifically the CT.gov database and the German Trials database. Additionally, the dataset comprises other unstructured attributes that categorize the inclusion criteria for trial participants such as inclusion.

Allergens: This dataset contains information about products and their allergens. The data was collected from the German version of the `Alnatura' (Access date: 24 November, 2020), a free database of food products from around the world `Open Food Facts', and the websites: `Migipedia', 'Piccantino', and `Das Ist Drin'. There may be overlapping products across these websites. Each product in the dataset is identified by a unique code. Samples with the same code represent the same product but are extracted from a differentb source. The allergens are indicated by (‘2’) if present, or (‘1’) if there are traces of it, and (‘0’) if it is absent in a product. The dataset also includes information on ingredients in the products. Overall, the dataset comprises categorical structured data describing the presence, trace, or absence of specific allergens, and unstructured text describing ingredients.

N.B: Each '.zip' file contains a set of 5 '.csv' files which are part of the afro-mentioned datasets:

"{dataset_name}_train.csv": samples used for the ML-model training. (e.g "allergens_train.csv")

"{dataset_name}_test.csv": samples used to test the the ML-model performance. (e.g "allergens_test.csv")

"{dataset_name}_golden_standard.csv": samples represent the ground truth of the test samples. (e.g "allergens_golden_standard.csv")

"{dataset_name}_parker_train.csv": samples repaired using Parker Engine used for the ML-model training. (e.g "allergens_parker_train.csv")

"{dataset_name}_parker_train.csv": samples repaired using Parker Engine used to test the the ML-model performance. (e.g "allergens_parker_test.csv")
a
ACLED Conflict and Demonstrations Event Data
hub.arcgis.com
cacgeoportal.com
Updated May 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Central Asia and the Caucasus GeoPortal (2024). ACLED Conflict and Demonstrations Event Data [Dataset]. https://hub.arcgis.com/maps/1bacc9e3d30f4383af61c12cbf0401d8
Explore at:
Dataset updated
May 23, 2024
Dataset authored and provided by
Central Asia and the Caucasus GeoPortal
Area covered

Description
The Armed Conflict Location & Event Data Project (ACLED) is a US-registered non-profit whose mission is to provide the highest quality real-time data on political violence and demonstrations globally. The information collected includes the type of event, its date, the location, the actors involved, a brief narrative summary, and any reported fatalities. ACLED users rely on our robust global dataset to support decision-making around policy and programming, accurately analyze political and country risk, support operational security planning, and improve supply chain management.ACLED’s transparent methodology, expert team composed of 250 individuals speaking more than 70 languages, real-time coding system, and weekly update schedule are unrivaled in the field of data collection on conflict and disorder. Global Coverage: We track political violence, demonstrations, and strategic developments around the world, covering more than 240 countries and territories.Published Weekly: Our data are collected in real time and published weekly. It is the only dataset of its kind to provide such a high update frequency, with peer datasets most often updating monthly or yearly.Historical Data: Our dataset contains at least two full years of data for all countries and territories, with more extensive coverage available for multiple regions.Experienced Researchers: Our data are coded by experienced researchers with local, country, and regional expertise and language skills.Thorough Data Collection and Sourcing: Pulling from traditional media, reports, local partner data, and verified new media, ACLED uses a tailor-made sourcing methodology for individual regions/countries.Extensive Review Process: Our data go through an exhaustive multi-stage quality assurance process to ensure their accuracy and reliability. This process includes both manual and automated error checking and contextual review.Clean, Standardized, and Validated: Our data can be easily connected with internal dashboards through our API or downloaded through the Data Export Tool on our website.Resources Available on ESRI’s Living AtlasACLED data are available through the Living Atlas for the most recent 12 month period. The data are mapped to the centroid of first administrative divisions (“admin1”) within countries (e.g., states, districts, provinces) and aggregated by month. Variables in the data include:The number of events per admin1-month, disaggregated by event type (protests, riots, battles, violence against civilians, explosions/remote violence, and strategic developments)A conservative estimate of reported fatalities per admin1-monthThe total number of distinct violent actors active in the corresponding admin1 for each monthThis Living Atlas item is a Web Map, which provides a pre-configured view of ACLED event data in a few layers:ACLED Event Counts layer: events per admin1-month, styled by predominant event type for each location.ACLED Violent Actors layer: the number of distinct violent actors per admin1-month.ACLED Fatality Estimates layer: the estimated number of fatalities from political violence per admin1-month.These layers are based on the ACLED Conflict and Demonstrations Event Data Feature Layer, which has the same data but only a basic default styling that is similar to the Event Counts layer. The Web Map layers are configured with a time-slider component to account for the multiple months of data per admin1 unit. These indicators are also available in the ACLED Conflict and Demonstrations Data Key Indicators Group Layer, which includes the same preconfigured layers but without the time-slider component or background layers.Resources Available on the ACLED WebsiteThe fully disaggregated dataset is available for download on ACLED's website including:Date (day, month, year)Actors, associated actors, and actor typesLocation information (ADMIN1, ADMIN2, ADMIN3, location and geo coordinates)A conservative fatality estimateDisorder type, event types, and sub-event typesTags further categorizing the data A notes column providing a narrative of the event For more information, please see the ACLED Codebook.To explore ACLED’s full dataset, please register on the ACLED Access Portal, following the instructions available in this Access Guide. Upon registration, you’ll receive access to ACLED data on a limited basis. Commercial users have access to 3 free data downloads company-wide with access to up to one year of historical data. Public sector users have access to 6 downloads of up to three years of historical data organization-wide. To explore options for extended access, please reach out to our Access Team (access@acleddata.com).With an ACLED license, users can also leverage ACLED’s interactive Global Dashboard and check in for weekly data updates and analysis tracking key political violence and protest trends around the world. ACLED also has several analytical tools available such as our Early Warning Dashboard, Conflict Alert System (CAST), and Conflict Index Dashboard.
N
Two Creeks, Wisconsin Age Group Population Dataset: A complete breakdown of...
neilsberg.com
csv, json
Updated Sep 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2023). Two Creeks, Wisconsin Age Group Population Dataset: A complete breakdown of Two Creeks town age demographics from 0 to 85 years, distributed across 18 age groups [Dataset]. https://www.neilsberg.com/research/datasets/5fd02065-3d85-11ee-9abe-0aa64bf2eeb2/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Sep 16, 2023
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Wisconsin, Two Creeks
Variables measured
Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Two Creeks town population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Two Creeks town. The dataset can be utilized to understand the population distribution of Two Creeks town by age. For example, using this dataset, we can identify the largest age group in Two Creeks town.

Key observations

The largest age group in Two Creeks, Wisconsin was for the group of age 15-19 years with a population of 41 (11.68%), according to the 2021 American Community Survey. At the same time, the smallest age group in Two Creeks, Wisconsin was the 85+ years with a population of 3 (0.85%). Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Variables / Data Columns

Age Group: This column displays the age group in consideration

Population: The population for the specific age group in the Two Creeks town is shown in this column.

% of Total Population: This column displays the population of each age group as a proportion of Two Creeks town total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Two Creeks town Population by Age. You can refer the same here
CDC WONDER: Mortality - Multiple Cause of Death
catalog.data.gov
healthdata.gov
+1more
Updated Jul 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention, Department of Health & Human Services (2025). CDC WONDER: Mortality - Multiple Cause of Death [Dataset]. https://catalog.data.gov/dataset/cdc-wonder-mortality-multiple-cause-of-death
Explore at:
Dataset updated
Jul 29, 2025
Dataset provided by
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Description
The Mortality - Multiple Cause of Death data on CDC WONDER are county-level national mortality and population data spanning the years 1999-2009. Data are based on death certificates for U.S. residents. Each death certificate contains a single underlying cause of death, up to twenty additional multiple causes (Boolean set analysis), and demographic data. The number of deaths, crude death rates, age-adjusted death rates, standard errors and 95% confidence intervals for death rates can be obtained by place of residence (total U.S., region, state, and county), age group (including infants and single-year-of-age cohorts), race (4 groups), Hispanic ethnicity, sex, year of death, and cause-of-death (4-digit ICD-10 code or group of codes, injury intent and mechanism categories, or drug and alcohol related causes), year, month and week day of death, place of death and whether an autopsy was performed. The data are produced by the National Center for Health Statistics.
O*NET Database
onetcenter.org
excel, mysql, oracle +2
Updated May 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Center for O*NET Development (2025). O*NET Database [Dataset]. https://www.onetcenter.org/database.html
Explore at:
oracle, sql server, text, mysql, excelAvailable download formats
Dataset updated
May 22, 2025
Dataset provided by
Occupational Information Network
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Dataset funded by
United States Department of Laborhttp://www.dol.gov/
Description
The O*NET Database contains hundreds of standardized and occupation-specific descriptors on almost 1,000 occupations covering the entire U.S. economy. The database, which is available to the public at no cost, is continually updated by a multi-method data collection program. Sources of data include: job incumbents, occupational experts, occupational analysts, employer job postings, and customer/professional association input.
Data content areas include:
Worker Characteristics (e.g., Abilities, Interests, Work Styles)
Worker Requirements (e.g., Education, Knowledge, Skills)
Experience Requirements (e.g., On-the-Job Training, Work Experience)
Occupational Requirements (e.g., Detailed Work Activities, Work Context)
Occupation-Specific Information (e.g., Job Titles, Tasks, Technology Skills)
Z
Data from: ODDS: Real-Time Object Detection using Depth Sensors on Embedded...
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Munir, Sirajum (2020). ODDS: Real-Time Object Detection using Depth Sensors on Embedded GPUs [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1163769
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Guo, Karen
Mithun, Niluthpol Chowdhury
Munir, Sirajum
Shelton, Charles
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ODDS Smart Building Depth Dataset

Introduction:

The goal of this dataset is to facilitate research focusing on recognizing objects in smart buildings using the depth sensor mounted at the ceiling. This dataset contains annotations of depth images for eight frequently seen object classes. The classes are: person, backpack, laptop, gun, phone, umbrella, cup, and box.

Data Collection:

We collected data from two settings. We had Kinect mounted at a 9.3 feet ceiling near to a 6 feet wide door. We also used a tripod with a horizontal extender holding the kinect at a similar height looking downwards. We asked about 20 volunteers to enter and exit a number of times each in different directions (3 times walking straight, 3 times walking towards left side, 3 times walking towards right side) holding objects in many different ways and poses underneath the Kinect. Each subject was using his/her own backpack, purse, laptop, etc. As a result, we considered varieties within the same object, e.g., for laptops, we considered Macbooks, HP laptops, Lenovo laptops of different years and models, and for backpacks, we considered backpacks, side bags, and purse of women. We asked the subjects to walk while holding it in many ways, e.g., for laptop, the laptop was fully open, partially closed, and fully closed while carried. Also, people hold laptops in front and side of their bodies, and underneath their elbow. The subjects carried their backpacks in their back, in their side at different levels from foot to shoulder. We wanted to collect data with real guns. However, bringing real guns to the office is prohibited. So, we obtained a few nerf guns and the subjects were carrying these guns pointing it to front, side, up, and down while walking.

Annotated Data Description:

The Annotated dataset is created following the structure of Pascal VOC devkit, so that the data preparation becomes simple and it can be used quickly with different with object detection libraries that are friendly to Pascal VOC style annotations (e.g. Faster-RCNN, YOLO, SSD). The annotated data consists of a set of images; each image has an annotation file giving a bounding box and object class label for each object in one of the eight classes present in the image. Multiple objects from multiple classes may be present in the same image. The dataset has 3 main directories:

1)DepthImages: Contains all the images of training set and validation set.

2)Annotations: Contains one xml file per image file, (e.g., 1.xml for image file 1.png). The xml file includes the bounding box annotations for all objects in the corresponding image.

3)ImagesSets: Contains two text files training_samples.txt and testing_samples.txt. The training_samples.txt file has the name of images used in training and the testing_samples.txt has the name of images used for testing. (We randomly choose 80%, 20% split)

UnAnnotated Data Description:

The un-annotated data consists of several set of depth images. No ground-truth annotation is available for these images yet. These un-annotated sets contain several challenging scenarios and no data has been collected from this office during annotated dataset construction. Hence, it will provide a way to test generalization performance of the algorithm.

Citation:

If you use ODDS Smart Building dataset in your work, please cite the following reference in any publications: @inproceedings{mithun2018odds, title={ODDS: Real-Time Object Detection using Depth Sensors on Embedded GPUs}, author={Niluthpol Chowdhury Mithun and Sirajum Munir and Karen Guo and Charles Shelton}, booktitle={ ACM/IEEE Conference on Information Processing in Sensor Networks (IPSN)}, year={2018}, }
Z
ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...
data.niaid.nih.gov
elki-project.github.io
+2more
Updated May 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zimek, Arthur (2024). ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of Object Images (ALOI) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6355683
Explore at:
Dataset updated
May 2, 2024
Dataset provided by
Schubert, Erich
Zimek, Arthur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These data sets were originally created for the following publications:

M. E. Houle, H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.

H.-P. Kriegel, E. Schubert, A. Zimek Evaluation of Multiple Clustering Solutions In 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece, 2011.

The outlier data set versions were introduced in:

E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel On Evaluation of Outlier Rankings and Outlier Scores In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, 2012.

They are derived from the original image data available at https://aloi.science.uva.nl/

The image acquisition process is documented in the original ALOI work: J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, The Amsterdam library of object images, Int. J. Comput. Vision, 61(1), 103-112, January, 2005

Additional information is available at: https://elki-project.github.io/datasets/multi_view

The following views are currently available:

Feature type Description Files Object number Sparse 1000 dimensional vectors that give the true object assignment objs.arff.gz RGB color histograms Standard RGB color histograms (uniform binning) aloi-8d.csv.gz aloi-27d.csv.gz aloi-64d.csv.gz aloi-125d.csv.gz aloi-216d.csv.gz aloi-343d.csv.gz aloi-512d.csv.gz aloi-729d.csv.gz aloi-1000d.csv.gz HSV color histograms Standard HSV/HSB color histograms in various binnings aloi-hsb-2x2x2.csv.gz aloi-hsb-3x3x3.csv.gz aloi-hsb-4x4x4.csv.gz aloi-hsb-5x5x5.csv.gz aloi-hsb-6x6x6.csv.gz aloi-hsb-7x7x7.csv.gz aloi-hsb-7x2x2.csv.gz aloi-hsb-7x3x3.csv.gz aloi-hsb-14x3x3.csv.gz aloi-hsb-8x4x4.csv.gz aloi-hsb-9x5x5.csv.gz aloi-hsb-13x4x4.csv.gz aloi-hsb-14x5x5.csv.gz aloi-hsb-10x6x6.csv.gz aloi-hsb-14x6x6.csv.gz Color similiarity Average similarity to 77 reference colors (not histograms) 18 colors x 2 sat x 2 bri + 5 grey values (incl. white, black) aloi-colorsim77.arff.gz (feature subsets are meaningful here, as these features are computed independently of each other) Haralick features First 13 Haralick features (radius 1 pixel) aloi-haralick-1.csv.gz Front to back Vectors representing front face vs. back faces of individual objects front.arff.gz Basic light Vectors indicating basic light situations light.arff.gz Manual annotations Manually annotated object groups of semantically related objects such as cups manual1.arff.gz

Outlier Detection Versions

Additionally, we generated a number of subsets for outlier detection:

Feature type Description Files RGB Histograms Downsampled to 100000 objects (553 outliers) aloi-27d-100000-max10-tot553.csv.gz aloi-64d-100000-max10-tot553.csv.gz Downsampled to 75000 objects (717 outliers) aloi-27d-75000-max4-tot717.csv.gz aloi-64d-75000-max4-tot717.csv.gz Downsampled to 50000 objects (1508 outliers) aloi-27d-50000-max5-tot1508.csv.gz aloi-64d-50000-max5-tot1508.csv.gz
O
BUTTER - Empirical Deep Learning Dataset
data.openei.org
datasets.ai
+2more
code, data, website
Updated May 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek; Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek (2022). BUTTER - Empirical Deep Learning Dataset [Dataset]. http://doi.org/10.25984/1872441
Explore at:
code, website, dataAvailable download formats
Unique identifier
https://doi.org/10.25984/1872441
Dataset updated
May 20, 2022
Dataset provided by
USDOE Office of Energy Efficiency and Renewable Energy (EERE), Multiple Programs (EE)
National Renewable Energy Laboratory
Open Energy Data Initiative (OEDI)
Authors
Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek; Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The BUTTER Empirical Deep Learning Dataset represents an empirical study of the deep learning phenomena on dense fully connected networks, scanning across thirteen datasets, eight network shapes, fourteen depths, twenty-three network sizes (number of trainable parameters), four learning rates, six minibatch sizes, four levels of label noise, and fourteen levels of L1 and L2 regularization each. Multiple repetitions (typically 30, sometimes 10) of each combination of hyperparameters were preformed, and statistics including training and test loss (using a 80% / 20% shuffled train-test split) are recorded at the end of each training epoch. In total, this dataset covers 178 thousand distinct hyperparameter settings ("experiments"), 3.55 million individual training runs (an average of 20 repetitions of each experiments), and a total of 13.3 billion training epochs (three thousand epochs were covered by most runs). Accumulating this dataset consumed 5,448.4 CPU core-years, 17.8 GPU-years, and 111.2 node-years.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. EPA Office of Research and Development (ORD) (2020). Modeling Rabbit Responses to Single and Multiple Aerosol Exposures of Bacillus anthracis Spores Data Set [Dataset]. https://catalog.data.gov/dataset/modeling-rabbit-responses-to-single-and-multiple-aerosol-exposures-of-bacillus-anthracis-s

Data from: Modeling Rabbit Responses to Single and Multiple Aerosol Exposures of Bacillus anthracis Spores Data Set

Explore at:

Dataset updated

Nov 12, 2020

Dataset provided by

United States Environmental Protection Agencyhttp://www.epa.gov/

Description

The two excel files contain all of the raw data that was modeled in the R code. The 6 word documents contain all of the R code that can be used in R to model the raw rabbit data. This dataset is associated with the following publication: Bartrand, T., H. Marks, M. Coleman, D. Donahue, S. Hines, J. Comer, and S. Taft. Modeling Rabbit Responses to Single and Multiple Aerosol Exposures of Bacillus anthracis Spores (HS 4.04.02 - 475). RISK ANALYSIS. Blackwell Publishing, Malden, MA, USA, 37(5): 943-957, (2017).

Clear search

Close search

Google apps

Main menu

Data from: Modeling Rabbit Responses to Single and Multiple Aerosol...

Two Harbors, MN Population Breakdown by Gender Dataset: Male and Female...

About this dataset

Content

Inspiration

Recommended for further research

Climate Change: Earth Surface Temperature Data

[Database] Urban Water Consumption at Multiple Spatial and Temporal Scales....

Event Registry titles dataset with multiple extracted features (both sparse...

Tuscaloosa, AL Population Dataset: Yearly Figures, Population Change, and...

About this dataset

Content

Inspiration

Recommended for further research

uplimit-synthetic-data-week-2-with-multi-turn

Purchase Order Data

DISTRIBUTED ANOMALY DETECTION USING SATELLITE DATA FROM MULTIPLE MODALITIES

Two Rivers Town, Wisconsin Population Dataset: Yearly Figures, Population...

About this dataset

Content

Inspiration

Recommended for further research

Multi-domain data description sessions data - Dataset - CKAN

Two Harbors, MN Age Group Population Dataset: A Complete Breakdown of Two...

About this dataset

Content

Inspiration

Recommended for further research

Data cleaning using unstructured data

ACLED Conflict and Demonstrations Event Data

Two Creeks, Wisconsin Age Group Population Dataset: A complete breakdown of...

About this dataset

Content

Inspiration

Recommended for further research

CDC WONDER: Mortality - Multiple Cause of Death

O*NET Database

Data from: ODDS: Real-Time Object Detection using Depth Sensors on Embedded...

Introduction:

Data Collection:

Annotated Data Description:

UnAnnotated Data Description:

Citation:

ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...

BUTTER - Empirical Deep Learning Dataset

Data from: Modeling Rabbit Responses to Single and Multiple Aerosol Exposures of Bacillus anthracis Spores Data SetSee More Versions

Data from: Modeling Rabbit Responses to Single and Multiple Aerosol Exposures of Bacillus anthracis Spores Data Set