100+ datasets found

d
Agency Data Coordinator Role Description
catalog.data.gov
data.oregon.gov
+1more
Updated Aug 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.oregon.gov (2021). Agency Data Coordinator Role Description [Dataset]. https://catalog.data.gov/dataset/agency-data-coordinator-role-description
Explore at:
Dataset updated
Aug 7, 2021
Dataset provided by
data.oregon.gov
Description
This document is a sample role description for the Agency Data Coordinator to assist agencies in selecting the appropriate individual to serve as Data Coordinator and in setting expectations for their duties.
d
Data Management Plan Examples Database
search.dataone.org
borealisdata.ca
Updated Sep 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evering, Danica; Acharya, Shrey; Pratt, Isaac; Behal, Sarthak (2024). Data Management Plan Examples Database [Dataset]. http://doi.org/10.5683/SP3/SDITUG
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/SDITUG
Dataset updated
Sep 4, 2024
Dataset provided by
Borealis
Authors
Evering, Danica; Acharya, Shrey; Pratt, Isaac; Behal, Sarthak
Time period covered
Jan 1, 2011 - Jan 1, 2023
Description
This dataset is comprised of a collection of example DMPs from a wide array of fields; obtained from a number of different sources outlined below. Data included/extracted from the examples include the discipline and field of study, author, institutional affiliation and funding information, location, date created, title, research and data-type, description of project, link to the DMP, and where possible external links to related publications or grant pages. This CSV document serves as the content for a McMaster Data Management Plan (DMP) Database as part of the Research Data Management (RDM) Services website, located at https://u.mcmaster.ca/dmps. Other universities and organizations are encouraged to link to the DMP Database or use this dataset as the content for their own DMP Database. This dataset will be updated regularly to include new additions and will be versioned as such. We are gathering submissions at https://u.mcmaster.ca/submit-a-dmp to continue to expand the collection.
Data Description for Risk of Dementia Due to Co-Exposure to Air Pollution...
catalog.data.gov
s.cnmilf.com
Updated May 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2024). Data Description for Risk of Dementia Due to Co-Exposure to Air Pollution and Neighborhood Disadvantage [Dataset]. https://catalog.data.gov/dataset/data-description-for-risk-of-dementia-due-to-co-exposure-to-air-pollution-and-neighborhood
Explore at:
Dataset updated
May 13, 2024
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
This dataset contains information on air pollution exposure, social determinants of health, dementia outcomes, and related confounders from the National Health and Aging Trends Study. This dataset is not publicly accessible because: This data was not produced by the EPA and is not owned by the EPA. It can be accessed through the following means: The data can be accessed by contacting the corresponding author listed in the manuscript. Format: The data is in tabular format with rows corresponding to observations and columns corresponding to outcomes, confounders, and exposures. This dataset is associated with the following publication: Frndak, S., Z. Deng, C. Ward-Caviness, I. Gorski-Steiner, R. Thorpe, and A. Dickerson. Risk of Dementia Due to Co-Exposure to Air Pollution and Neighborhood Disadvantage. ENVIRONMENTAL RESEARCH. Elsevier B.V., Amsterdam, NETHERLANDS, 251(Part 2): 118709, (2024).
D
Description of data management practices for SANDBOX research data
data.4tu.nl
4tu.edu.hpc.n-helix.com
zip
Updated May 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erik Hendriks; Chiu H. Cheng; Bram van Prooijen (2022). Description of data management practices for SANDBOX research data [Dataset]. http://doi.org/10.4121/19786174.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/19786174.v1
Dataset updated
May 23, 2022
Dataset provided by
4TU.ResearchData
Authors
Erik Hendriks; Chiu H. Cheng; Bram van Prooijen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 2017 - Oct 2017
Area covered

Dataset funded by
Dutch Research Council
Description
In the SANDBOX research project, we investigated the natural dynamics of the North Sea bed. As part of this research, we conducted multiple research cruises on the North Sea. The documents in this dataset explain which data was collected, when it was collected and the structure of the data repository (svn.citg.tudelft.nl/sandbox).
u
Data from: Current and projected research data storage needs of Agricultural...
agdatacommons.nal.usda.gov
datasets.ai
+4more
pdf
Updated Nov 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cynthia Parr (2023). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. http://doi.org/10.15482/USDA.ADC/1346946
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1346946
Dataset updated
Nov 30, 2023
Dataset provided by
Ag Data Commons
Authors
Cynthia Parr
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey.
Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values.

Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
o
University SET data, with faculty and courses characteristics
openicpsr.org
Updated Sep 12, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Under blind review in refereed journal (2021). University SET data, with faculty and courses characteristics [Dataset]. http://doi.org/10.3886/E149801V1
Explore at:
Unique identifier
https://doi.org/10.3886/E149801V1
Dataset updated
Sep 12, 2021
Authors
Under blind review in refereed journal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper explores a unique dataset of all the SET ratings provided by students of one university in Poland at the end of the winter semester of the 2020/2021 academic year. The SET questionnaire used by this university is presented in Appendix 1. The dataset is unique for several reasons. It covers all SET surveys filled by students in all fields and levels of study offered by the university. In the period analysed, the university was entirely in the online regime amid the Covid-19 pandemic. While the expected learning outcomes formally have not been changed, the online mode of study could have affected the grading policy and could have implications for some of the studied SET biases. This Covid-19 effect is captured by econometric models and discussed in the paper. The average SET scores were matched with the characteristics of the teacher for degree, seniority, gender, and SET scores in the past six semesters; the course characteristics for time of day, day of the week, course type, course breadth, class duration, and class size; the attributes of the SET survey responses as the percentage of students providing SET feedback; and the grades of the course for the mean, standard deviation, and percentage failed. Data on course grades are also available for the previous six semesters. This rich dataset allows many of the biases reported in the literature to be tested for and new hypotheses to be formulated, as presented in the introduction section. The unit of observation or the single row in the data set is identified by three parameters: teacher unique id (j), course unique id (k) and the question number in the SET questionnaire (n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9} ). It means that for each pair (j,k), we have nine rows, one for each SET survey question, or sometimes less when students did not answer one of the SET questions at all. For example, the dependent variable SET_score_avg(j,k,n) for the triplet (j=Calculus, k=John Smith, n=2) is calculated as the average of all Likert-scale answers to question nr 2 in the SET survey distributed to all students that took the Calculus course taught by John Smith. The data set has 8,015 such observations or rows. The full list of variables or columns in the data set included in the analysis is presented in the attached filesection. Their description refers to the triplet (teacher id = j, course id = k, question number = n). When the last value of the triplet (n) is dropped, it means that the variable takes the same values for all n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9}.Two attachments:- word file with variables description- Rdata file with the data set (for R language).Appendix 1. Appendix 1. The SET questionnaire was used for this paper. Evaluation survey of the teaching staff of [university name] Please, complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5- I strongly agree; 4- I agree; 3- Neutral; 2- I don’t agree; 1- I strongly don’t agree. Questions 1 2 3 4 5 I learnt a lot during the course. ○ ○ ○ ○ ○ I think that the knowledge acquired during the course is very useful. ○ ○ ○ ○ ○ The professor used activities to make the class more engaging. ○ ○ ○ ○ ○ If it was possible, I would enroll for the course conducted by this lecturer again. ○ ○ ○ ○ ○ The classes started on time. ○ ○ ○ ○ ○ The lecturer always used time efficiently. ○ ○ ○ ○ ○ The lecturer delivered the class content in an understandable and efficient way. ○ ○ ○ ○ ○ The lecturer was available when we had doubts. ○ ○ ○ ○ ○ The lecturer treated all students equally regardless of their race, background and ethnicity. ○ ○
Drinking Water - Public Water System Information
data.ca.gov
data.cnra.ca.gov
+2more
csv, pdf
Updated Aug 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California State Water Resources Control Board (2024). Drinking Water - Public Water System Information [Dataset]. https://data.ca.gov/dataset/drinking-water-public-water-system-information
Explore at:
csv(2814168), pdf(213048)Available download formats
Dataset updated
Aug 22, 2024
Dataset authored and provided by
California State Water Resources Control Board
Description
The is a curated dataset of information for all public water systems (PWS) in California, including the name, location and some general informaiton for each PWS. The source of the data https://sdwis.waterboards.ca.gov/PDWW/ is a public web portal to view public water systems (PWS) location, facilities, sources, and samples.
d
Major Object Descriptions
catalog.data.gov
opendata.hawaii.gov
+2more
Updated Apr 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hawaii (2024). Major Object Descriptions [Dataset]. https://catalog.data.gov/dataset/major-object-descriptions
Explore at:
Dataset updated
Apr 10, 2024
Dataset provided by
Hawaii
Description
for use with Expenditure data
Household Survey on Information and Communications Technology, 2014 - West...
pcbs.gov.ps
Updated Jan 28, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Palestinian Central Bureau of statistics (2020). Household Survey on Information and Communications Technology, 2014 - West Bank and Gaza [Dataset]. https://www.pcbs.gov.ps/PCBS-Metadata-en-v5.2/index.php/catalog/465
Explore at:
Dataset updated
Jan 28, 2020
Dataset provided by
Palestinian Central Bureau of Statisticshttp://pcbs.gov.ps/
Authors
Palestinian Central Bureau of statistics
Time period covered
2014
Area covered
Gaza, Gaza Strip, West Bank
Description
Abstract

Within the frame of PCBS' efforts in providing official Palestinian statistics in the different life aspects of Palestinian society and because the wide spread of Computer, Internet and Mobile Phone among the Palestinian people, and the important role they may play in spreading knowledge and culture and contribution in formulating the public opinion, PCBS conducted the Household Survey on Information and Communications Technology, 2014.

The main objective of this survey is to provide statistical data on Information and Communication Technology in the Palestine in addition to providing data on the following: -

· Prevalence of computers and access to the Internet. · Study the penetration and purpose of Technology use.

Geographic coverage

Palestine (West Bank and Gaza Strip) , type of locality (Urban, Rural, Refugee Camps) and governorate

Analysis unit

Household. Person 10 years and over .

Universe

All Palestinian households and individuals whose usual place of residence in Palestine with focus on persons aged 10 years and over in year 2014.

Kind of data

Sample survey data [ssd]

Sampling procedure

Sampling Frame The sampling frame consists of a list of enumeration areas adopted in the Population, Housing and Establishments Census of 2007. Each enumeration area has an average size of about 124 households. These were used in the first phase as Preliminary Sampling Units in the process of selecting the survey sample.

Sample Size The total sample size of the survey was 7,268 households, of which 6,000 responded.

Sample Design The sample is a stratified clustered systematic random sample. The design comprised three phases:

Phase I: Random sample of 240 enumeration areas. Phase II: Selection of 25 households from each enumeration area selected in phase one using systematic random selection. Phase III: Selection of an individual (10 years or more) in the field from the selected households; KISH TABLES were used to ensure indiscriminate selection.

Sample Strata Distribution of the sample was stratified by: 1- Governorate (16 governorates, J1). 2- Type of locality (urban, rural and camps).

Sampling deviation

-

Mode of data collection

Face-to-face [f2f]

Research instrument

The survey questionnaire consists of identification data, quality controls and three main sections: Section I: Data on household members that include identification fields, the characteristics of household members (demographic and social) such as the relationship of individuals to the head of household, sex, date of birth and age.

Section II: Household data include information regarding computer processing, access to the Internet, and possession of various media and computer equipment. This section includes information on topics related to the use of computer and Internet, as well as supervision by households of their children (5-17 years old) while using the computer and Internet, and protective measures taken by the household in the home.

Section III: Data on persons (aged 10 years and over) about computer use, access to the Internet and possession of a mobile phone.

Cleaning operations

Preparation of Data Entry Program: This stage included preparation of the data entry programs using an ACCESS package and defining data entry control rules to avoid errors, plus validation inquiries to examine the data after it had been captured electronically.

Data Entry: The data entry process started on 8 May 2014 and ended on 23 June 2014. The data entry took place at the main PCBS office and in field offices using 28 data clerks.

Editing and Cleaning procedures: Several measures were taken to avoid non-sampling errors. These included editing of questionnaires before data entry to check field errors, using a data entry application that does not allow mistakes during the process of data entry, and then examining the data by using frequency and cross tables. This ensured that data were error free; cleaning and inspection of the anomalous values were conducted to ensure harmony between the different questions on the questionnaire.

Response rate

Response Rates= 79%

Sampling error estimates

There are many aspects of the concept of data quality; this includes the initial planning of the survey to the dissemination of the results and how well users understand and use the data. There are three components to the quality of statistics: accuracy, comparability, and quality control procedures.

Checks on data accuracy cover many aspects of the survey and include statistical errors due to the use of a sample, non-statistical errors resulting from field workers or survey tools, and response rates and their effect on estimations. This section includes:

Statistical Errors Data of this survey may be affected by statistical errors due to the use of a sample and not a complete enumeration. Therefore, certain differences can be expected in comparison with the real values obtained through censuses. Variances were calculated for the most important indicators.

Variance calculations revealed that there is no problem in disseminating results nationally or regionally (the West Bank, Gaza Strip), but some indicators show high variance by governorate, as noted in the tables of the main report.

Non-Statistical Errors Non-statistical errors are possible at all stages of the project, during data collection or processing. These are referred to as non-response errors, response errors, interviewing errors and data entry errors. To avoid errors and reduce their effects, strenuous efforts were made to train the field workers intensively. They were trained on how to carry out the interview, what to discuss and what to avoid, and practical and theoretical training took place during the training course. Training manuals were provided for each section of the questionnaire, along with practical exercises in class and instructions on how to approach respondents to reduce refused cases. Data entry staff were trained on the data entry program, which was tested before starting the data entry process.

Several measures were taken to avoid non-sampling errors. These included editing of questionnaires before data entry to check field errors, using a data entry application that does not allow mistakes during the process of data entry, and then examining the data by using frequency and cross tables. This ensured that data were error free; cleaning and inspection of the anomalous values were conducted to ensure harmony between the different questions on the questionnaire.

The sources of non-statistical errors can be summarized as: 1. Some of the households were not at home and could not be interviewed, and some households refused to be interviewed. 2. In unique cases, errors occurred due to the way the questions were asked by interviewers and respondents misunderstood some of the questions.
d
Campaign Finance Summary
catalog.data.gov
data.wa.gov
Updated Mar 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.wa.gov (2025). Campaign Finance Summary [Dataset]. https://catalog.data.gov/dataset/campaign-finance-summary
Explore at:
Dataset updated
Mar 22, 2025
Dataset provided by
data.wa.gov
Description
This data set contains a summary of information about candidate campaigns and political committees by election year. For candidate campaigns and single-year/election committees, a single record is provided that covers all activity of the campaign for the given election year. Information for continuing political committees is summarized by calendar/reporting year. The data set covers that prior 16 years plus the current election year. The data are compiled from the campaign reports deposit (C3), campaign summary reports (C4), campaign registrations (C1/C1pc) and candidate declarations and elections data provided to the PDC by the Washington Secretary of State. Records are updated in near real-time, typically less than 2 minutes from the time the campaign submits new data. This dataset is a best-effort by the PDC to provide a complete set of records as described herewith. The PDC provides access to the original reports for the purpose of record verification. Descriptions attached to this dataset do not constitute legal definitions; please consult RCW 42.17A and WAC Title 390 for legal definitions and additional information regarding political finance disclosure requirements. CONDITION OF RELEASE: This publication and or referenced documents constitutes a list of individuals prepared by the Washington State Public Disclosure Commission and may not be used for commercial purposes. This list is provided on the condition and with the understanding that the persons receiving it agree to this statutorily imposed limitation on its use. See RCW 42.56.070(9) and AGO 1975 No. 15.
C
National Hydrography Data - NHD and 3DHP
data.cnra.ca.gov
data.ca.gov
+3more
Updated Oct 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Water Resources (2024). National Hydrography Data - NHD and 3DHP [Dataset]. https://data.cnra.ca.gov/dataset/national-hydrography-dataset-nhd
Explore at:
pdf(1634485), pdf(9867020), pdf(182651), pdf(3684753), website, pdf(4856863), zip(578260992), pdf, zip(15824984), csv(12977), arcgis geoservices rest api, zip(10029073), zip(1647291), zip(972664), zip(128966494), pdf(1175775), zip(13901824), zip(73817620), zip(4657694), pdf(1436424), zip(39288832)Available download formats
Dataset updated
Oct 15, 2024
Dataset authored and provided by
California Department of Water Resources
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
The USGS National Hydrography Dataset (NHD) Downloadable Data Collection from The National Map (TNM) is a comprehensive set of digital spatial data that encodes information about naturally occurring and constructed bodies of surface water (lakes, ponds, and reservoirs), paths through which water flows (canals, ditches, streams, and rivers), and related entities such as point features (springs, wells, stream gages, and dams). The information encoded about these features includes classification and other characteristics, delineation, geographic name, position and related measures, a "reach code" through which other information can be related to the NHD, and the direction of water flow. The network of reach codes delineating water and transported material flow allows users to trace movement in upstream and downstream directions. In addition to this geographic information, the dataset contains metadata that supports the exchange of future updates and improvements to the data. The NHD supports many applications, such as making maps, geocoding observations, flow modeling, data maintenance, and stewardship. For additional information on NHD, go to https://www.usgs.gov/core-science-systems/ngp/national-hydrography.

DWR was the steward for NHD and Watershed Boundary Dataset (WBD) in California. We worked with other organizations to edit and improve NHD and WBD, using the business rules for California. California's NHD improvements were sent to USGS for incorporation into the national database. The most up-to-date products are accessible from the USGS website. Please note that the California portion of the National Hydrography Dataset is appropriate for use at the 1:24,000 scale.

For additional derivative products and resources, including the major features in geopackage format, please go to this page: https://data.cnra.ca.gov/dataset/nhd-major-features Archives of previous statewide extracts of the NHD going back to 2018 may be found at https://data.cnra.ca.gov/dataset/nhd-archive.

In September 2022, USGS officially notified DWR that the NHD would become static as USGS resources will be devoted to the transition to the new 3D Hydrography Program (3DHP). 3DHP will consist of LiDAR-derived hydrography at a higher resolution than NHD. Upon completion, 3DHP data will be easier to maintain, based on a modern data model and architecture, and better meet the requirements of users that were documented in the Hydrography Requirements and Benefits Study (2016). The initial releases of 3DHP will be the NHD data cross-walked into the 3DHP data model. It will take several years for the 3DHP to be built out for California. Please refer to the resources on this page for more information.

The FINAL,STATIC version of the National Hydrography Dataset for California was published for download by USGS on December 27, 2023. This dataset can no longer be edited by the state stewards.

The first public release of the 3D Hydrography Program map service may be accessed at https://hydro.nationalmap.gov/arcgis/rest/services/3DHP_all/MapServer.

Questions about the California stewardship of these datasets may be directed to nhd_stewardship@water.ca.gov.
Z
Condition Monitoring for Packaging Industry dataset (CoMoPI)
data.niaid.nih.gov
zenodo.org
Updated Feb 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gian Antonio Susto (2023). Condition Monitoring for Packaging Industry dataset (CoMoPI) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7572500
Explore at:
Dataset updated
Feb 2, 2023
Dataset provided by
Diego Tosato
Gian Antonio Susto
Chiara Masiero
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Condition Monitoring for Packaging Industry dataset (CoMoPI)

CoMoPI dataset contains data from eight industrial packaging machines:

serial

Starting Date

Ending Date

A001

2022-08-10 03:45:00+00:00

2023-01-09 07:00:00+00:00

A005

2022-09-19 13:15:00+00:00

2023-01-09 15:45:00+00:00

B002

2022-07-14 14:00:00+00:00

2023-01-09 15:50:00+00:00

B005

2022-10-04 12:55:00+00:00

2023-01-05 01:55:00+00:00

C003

2022-12-15 06:50:00+00:00

2023-01-03 16:35:00+00:00

C004

2022-07-18 07:55:00+00:00

2022-12-28 08:10:00+00:00

E002

2022-07-14 14:20:00+00:00

2022-12-03 14:00:00+00:00

E004

2022-11-23 15:25:00+00:00

2023-01-09 10:00:00+00:00

The dataset provides sensor measurements related to a specific module involved in the watertight closure of packages. It also provides all alarms and warnings generated by the packaging equipment.

Dataset description

For the sake of simplicity, the dataset is divided into three files, one for sensor measurements, one for alarms and one for warnings.

Sensor measurements

This data is included in the file called industrial_dataset_sensors_10m_agg.csv. Each row corresponds to average sensor value in a 10-minute window. Beyond the machine identifier ('_serial') and timestamp ('_time'), the following measurements are available: 'AE', 'BE', 'AF', 'BF', 'APP', 'BPP', 'AP', 'BP', 'ALE', 'BLE', 'ALP', 'BLP', 'ADS', 'BDS', 'AES', 'BES'. Suffixes A_ and B_ relate to the two elements that can perform a specific operation required by the watertight closure of packages. For confidentiality reasons, it is not possible to provide further details.

Alarm measurements

This data is included in the file called industrial_dataset_alarm_10m_agg.csv, according to the following schema:

'_serial' : 'machine unique identifier'

'_time' : 'timestamp',

'AL_1' : 'Counter of AL_1 alarms' ,

...

'AL_123' : 'Counter of AL_123 alarms'

Warning measurements

This data is included in the file called industrial_dataset_warnings_10m_agg.csv_, according to the following schema:

'_serial' : 'machine unique identifier'

'_time' : 'timestamp',

'WR_1' : 'Counter of AL_1 alarms' ,

...

'WR_358' : 'Counter of WR_358 alarms'

Condition monitoring

Alarms 'AL_53' are 'AL_54' are related to faulty conditions of the components and can be used as prediction target. The following alarms are generated by the target module: 'AL_17', 'AL_18', 'AL_40', 'AL_41', 'AL_42', 'AL_43', 'AL_45', 'AL_46', 'AL_47', 'AL_48', 'AL_49', 'AL_50', 'AL_51', 'AL_52', 'AL_53', 'AL_54'.

Anonymization procedure

The anonymization procedure is the following:

To make it impossible to identify the specific piece of equipment, its serial number was replaced with a mock equipment_ID. Moreover, no additional information is provided about the machine, its location and the processed products.

Sensor measurements were renamed and rescaled to remove all information related to actual working setup.

Alarms and warnings underwent an anonymization process: no description is provided, and the original alarm codes were mapped to mock codes.
d
3.34 Community Health and Well-Being (summary)
catalog.data.gov
performance.tempe.gov
+6more
Updated Jan 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2025). 3.34 Community Health and Well-Being (summary) [Dataset]. https://catalog.data.gov/dataset/3-34-community-health-and-well-being-summary
Explore at:
Dataset updated
Jan 17, 2025
Dataset provided by
City of Tempe
Description
This dataset comes from the Community Survey questions relating to the Community Health & Well-Being performance measure: "With “10” representing the best possible life for you and “0” representing the worst, how would you say you personally feel you stand at this time?" and "With “10” representing the best possible life for you and “0” representing the worst, how do you think you will stand about five years from now?" – the results of both scores are then used to assess a Cantril Scale which is a way of assessing general life satisfaction. As per the Cantril Self-Anchoring Striving Scale the three categories of identification are as follows: Thriving – Respondents rate their current life as a 7 or higher AND their future life as an 8 or higher. Struggling – Respondents either rate their current life moderately (5 or 6) OR rate their future life moderately (5, 6 or 7) or negatively (0 to 4). Suffering – Respondents rate their current life negatively (0 to 4) AND their future life negatively (0 to 4). The survey is mailed to a random sample of households in the City of Tempe and has a 95% confidence level.This page provides data for the Community Health and Well-Being performance measure.The performance measure dashboard is available at 3.34 Community Health and Well-Being.Additional InformationSource: Community Attitude Survey (Vendor: ETC Institute)Contact: Adam SamuelsContact email: adam_samuels@tempe.govPreparation Method: Survey results from two questions are calculated to create a Cantril Scale value that falls into the categories of Thriving, Struggling, and Suffering.Publish Frequency: AnnuallyPublish Method: ManualData Dictionary
d
Data from: A dataset containing S&P500 information security breaches and...
b2find.dkrz.de
dataverse.nl
Updated Jul 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). A dataset containing S&P500 information security breaches and related financial firm performances [Dataset]. https://b2find.dkrz.de/dataset/3dda16ee-4e98-570f-8db0-53837a6b899c
Explore at:
Dataset updated
Jul 8, 2024
Description
In this document, comprehensive datasets are presented to advance research on information security breaches. The datasets include data on disclosed information security breaches affecting S&P500 companies between 2020 and 2023, collected through manual search of the Internet. Overall, the datasets include 504 companies, with detailed information security breach and financial data available for 97 firms that experienced a disclosed information security breach. This document will describe the datasets in detail, explain the data collection procedure and shows the initial versions of the datasets. Contact at Tilburg University Francesco Lelli Data files: 6 raw Microsoft Excel files (.xls) Supplemental material: Data_Publication_Package.pdf Detailed description of the data has been released in the following preprint: [Preprint in progress] Structure data package The folder contains the 6 .xls documents, the data publication package. Link to the preprint describing the dataset is in the description of the dataset itself. The six .xls documents are also present in their preferred file format csv (see Notes for further explanation). Production date: 01-2024---- 05-2024 Method: Data on information security breaches through manual search of the Internet, financial data through Refinitiv (LSEG). (Approval obtained from Refinitiv to publish these data) Universe: S&P500 companies Country / Nation: USA
Small Molecule-Protein Interaction Data
kaggle.com
zip
Updated Apr 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Indranil Bhattacharyya (2024). Small Molecule-Protein Interaction Data [Dataset]. https://www.kaggle.com/datasets/photon98/leash-bio-engineered-data-training
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 19, 2024
Authors
Indranil Bhattacharyya
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
About the Dataset and How I augmented the data:

The dataset used in this augmentation process(used a subset of the original training data) is sourced from the Leash Bio - Predict New Medicines with BELKA competition(Read More). It comprises examples of small molecules categorized through binary classification, determining whether each molecule is a binder to one of three protein targets. The data collection method involves utilizing DNA-encoded chemical library (DEL) technology.

Chemical representations are expressed in SMILES (Simplified Molecular-Input Line-Entry System), while the labels denote binary binding classifications, corresponding to three distinct protein targets.

I've expanded the original dataset by augmenting it with additional features derived from the existing data. Specifically, I've calculated and included three new features:

mol_wt (Molecular Weight): Calculated based on the SMILES data using RDKit, providing insight into the mass of each molecule.

logP (Partition Coefficient): Also derived from the SMILES data using RDKit, representing the logarithm of the partition coefficient, a measure of a molecule's hydrophobicity and its ability to partition between a hydrophobic solvent and water.

rotamers (Number of Rotamers): Determined from the SMILES data using RDKit, indicating the number of distinct conformations or rotational isomers a molecule can adopt. These additional features aim to enrich the feature matrix, potentially enhancing the predictive power of models trained on the augmented dataset.

Data Description:

id- A unique example_id we use to identify the molecule-binding target pair. buildingblock1_smiles - The structure, in SMILES, of the first building block **buildingblock2_smiles **- The structure, in SMILES, of the second building block buildingblock3_smiles - The structure, in SMILES, of the third building block **molecule_smiles **- The structure of the fully assembled molecule, in SMILES. This includes the three building blocks and the triazine core. Note we use a [Dy] as the stand-in for the DNA linker. protein_name - The protein target name binds - The target column. A binary class label of whether the molecule binds to the protein. Not available for the test set. mol_wt - The molecule's molecular weight derived from SMILES data using RDKit. logP - The logP of the molecule derived from SMILES data using RDKit. **rotamers **- The number of rotamers of the molecule derived from SMILES data using RDKit.

Targets: binds

Proteins are encoded in the genome, and names of the genes encoding those proteins are typically bestowed by their discoverers and regulated by the Hugo Gene Nomenclature Committee. The protein products of these genes can sometimes have different names, often due to the history of their discovery.
n
InFORM Fire Occurrence Data Records - Dataset - CKAN
nationaldataplatform.org
Updated Feb 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). InFORM Fire Occurrence Data Records - Dataset - CKAN [Dataset]. https://nationaldataplatform.org/catalog/dataset/inform-fire-occurrence-data-records
Explore at:
Dataset updated
Feb 28, 2024
Description
This data set is part of an ongoing project to consolidate interagency fire perimeter data. The record is complete from the present back to 2020. The incorporation of all available historic data is in progress.The InFORM (Interagency Fire Occurrence Reporting Modules) FODR (Fire Occurrence Data Records) are the official record of fire events. Built on top of IRWIN (Integrated Reporting of Wildland Fire Information), the FODR starts with an IRWIN record and then captures the final incident information upon certification of the record by the appropriate local authority. This service contains all wildland fire incidents from the InFORM FODR incident service that meet the following criteria:Categorized as a Wildfire (WF) or Prescribed Fire (RX) recordIs Valid and not "quarantined" due to potential conflicts with other recordsNo "fall-off" rules are applied to this service.Service is a real time display of data.Warning: Please refrain from repeatedly querying the service using a relative date range. This includes using the “(not) in the last” operators in a Web Map filter and any reference to CURRENT_TIMESTAMP. This type of query puts undue load on the service and may render it temporarily unavailable.Attributes:ABCDMiscA FireCode used by USDA FS to track and compile cost information for emergency initial attack fire suppression expenditures. for A, B, C & D size class fires on FS lands.ADSPermissionStateIndicates the permission hierarchy that is currently being applied when a system utilizes the UpdateIncident operation.CalculatedAcresA measure of acres calculated (i.e., infrared) from a geospatial perimeter of a fire. More specifically, the number of acres within the current perimeter of a specific, individual incident, including unburned and unburnable islands. The minimum size must be 0.1.ContainmentDateTimeThe date and time a wildfire was declared contained. ControlDateTimeThe date and time a wildfire was declared under control.CreatedBySystemArcGIS Server Username of system that created the IRWIN Incident record.CreatedOnDateTimeDate/time that the Incident record was created.IncidentSizeReported for a fire. The minimum size is 0.1.DiscoveryAcresAn estimate of acres burning upon the discovery of the fire. More specifically when the fire is first reported by the first person that calls in the fire. The estimate should include number of acres within the current perimeter of a specific, individual incident, including unburned and unburnable islands.DispatchCenterIDA unique identifier for a dispatch center responsible for supporting the incident.EstimatedCostToDateThe total estimated cost of the incident to date.FinalAcresReported final acreage of incident.FinalFireReportApprovedByTitleThe title of the person that approved the final fire report for the incident.FinalFireReportApprovedByUnitNWCG Unit ID associated with the individual who approved the final report for the incident.FinalFireReportApprovedDateThe date that the final fire report was approved for the incident.FireBehaviorGeneralA general category describing the manner in which the fire is currently reacting to the influences of fuel, weather, and topography. FireCodeA code used within the interagency wildland fire community to track and compile cost information for emergency fire suppression expenditures for the incident. FireDepartmentIDThe U.S. Fire Administration (USFA) has created a national database of Fire Departments. Most Fire Departments do not have an NWCG Unit ID and so it is the intent of the IRWIN team to create a new field that includes this data element to assist the National Association of State Foresters (NASF) with data collection.FireDiscoveryDateTimeThe date and time a fire was reported as discovered or confirmed to exist. May also be the start date for reporting purposes.FireMgmtComplexityThe highest management level utilized to manage a wildland fire event. FireOutDateTimeThe date and time when a fire is declared out. FSJobCodeA code use to indicate the Forest Service job accounting code for the incident. This is specific to the Forest Service. Usually displayed as 2 char prefix on FireCode.FSOverrideCodeA code used to indicate the Forest Service override code for the incident. This is specific to the Forest Service. Usually displayed as a 4 char suffix on FireCode. For example, if the FS is assisting DOI, an override of 1502 will be used.GACCA code that identifies one of the wildland fire geographic area coordination center at the point of origin for the incident.A geographic area coordination center is a facility that is used for the coordination of agency or jurisdictional resources in support of one or more incidents within a geographic coordination area.IncidentNameThe name assigned to an incident.IncidentShortDescriptionGeneral descriptive location of the incident such as the number of miles from an identifiable town. IncidentTypeCategoryThe Event Category is a sub-group of the Event Kind code and description. The Event Category further breaks down the Event Kind into more specific event categories.IncidentTypeKindA general, high-level code and description of the types of incidents and planned events to which the interagency wildland fire community responds.InitialLatitudeThe latitude location of the initial reported point of origin specified in decimal degrees.InitialLongitudeThe longitude location of the initial reported point of origin specified in decimal degrees.InitialResponseDateTimeThe date/time of the initial response to the incident. More specifically when the IC arrives and performs initial size up. IsFireCauseInvestigatedIndicates if an investigation is underway or was completed to determine the cause of a fire.IsFSAssistedIndicates if the Forest Service provided assistance on an incident outside their jurisdiction.IsReimbursableIndicates the cost of an incident may be another agency’s responsibility.IsTrespassIndicates if the incident is a trespass claim or if a bill will be pursued.LocalIncidentIdentifierA number or code that uniquely identifies an incident for a particular local fire management organization within a particular calendar year.ModifiedBySystemArcGIS Server username of system that last modified the IRWIN Incident record.ModifiedOnDateTimeDate/time that the Incident record was last modified.PercentContainedIndicates the percent of incident area that is no longer active. Reference definition in fire line handbook when developing standard.POOCityThe closest city to the incident point of origin.POOCountyThe County Name identifying the county or equivalent entity at point of origin designated at the time of collection.POODispatchCenterIDA unique identifier for the dispatch center that intersects with the incident point of origin. POOFipsThe code which uniquely identifies counties and county equivalents. The first two digits are the FIPS State code and the last three are the county code within the state.POOJurisdictionalAgencyThe agency having land and resource management responsibility for a incident as provided by federal, state or local law. POOJurisdictionalUnitNWCG Unit Identifier to identify the unit with jurisdiction for the land where the point of origin of a fire falls. POOJurisdictionalUnitParentUnitThe unit ID for the parent entity, such as a BLM State Office or USFS Regional Office, that resides over the Jurisdictional Unit.POOLandownerCategoryMore specific classification of land ownership within land owner kinds identifying the deeded owner at the point of origin at the time of the incident.POOLandownerKindBroad classification of land ownership identifying the deeded owner at the point of origin at the time of the incident.POOProtectingAgencyIndicates the agency that has protection responsibility at the point of origin.POOProtectingUnitNWCG Unit responsible for providing direct incident management and services to a an incident pursuant to its jurisdictional responsibility or as specified by law, contract or agreement. Definition Extension: - Protection can be re-assigned by agreement. - The nature and extent of the incident determines protection (for example Wildfire vs. All Hazard.)POOStateThe State alpha code identifying the state or equivalent entity at point of origin.PredominantFuelGroupThe fuel majority fuel model type that best represents fire behavior in the incident area, grouped into one of seven categories.PredominantFuelModelDescribes the type of fuels found within the majority of the incident area. UniqueFireIdentifierUnique identifier assigned to each wildland fire. yyyy = calendar year, SSUUUU = POO protecting unit identifier (5 or 6 characters), xxxxxx = local incident identifier (6 to 10 characters) FORIDUnique identifier assigned to each incident record in the FODR database.
d
Official Information - Dataset - data.govt.nz - discover and use data
catalogue.data.govt.nz
Updated Nov 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Official Information - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/official-information
Explore at:
Dataset updated
Nov 10, 2020
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
Te Kawa Mataaho Public Service Commission has a lead role in providing advice and assistance to agencies on the management of official information and is committed to improving agency practices in this area. This includes improving compliance with the letter and spirit of the Official Information Act 1982 when requests are made and promoting the proactive release of information by agencies.
D
Freight Analysis Framework - All FAF summary datasets
data.transportation.gov
data.virginia.gov
+1more
application/rdfxml +5
Updated Dec 17, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Freight Analysis Framework - All FAF summary datasets [Dataset]. https://data.transportation.gov/Roadways-and-Bridges/Freight-Analysis-Framework-All-FAF-summary-dataset/miub-cu89
Explore at:
application/rdfxml, xml, csv, json, tsv, application/rssxmlAvailable download formats
Dataset updated
Dec 17, 2018
Description
The Freight Analysis Framework (FAF) integrates data from a variety of sources to create a comprehensive picture of freight movement among states and major metropolitan areas by all modes of transportation. With data from the 2007 Commodity Flow Survey and additional sources, FAF version 3 (FAF3) provides estimates for tonnage, value, and domestic ton-miles by region of origin and destination, commodity type, and mode for 2007, the most recent year, and forecasts through 2040. Also included are state-to-state flows for these years plus 1997 and 2002, summary statistics, and flows by truck assigned to the highway network for 2007 and 2040.
S
Data from: Multi-sensor dataset for normal air, Methyl Mercaptan and...
snd.se
data.europa.eu
csv, pdf
Updated Apr 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mazhar Hussain (2023). Multi-sensor dataset for normal air, Methyl Mercaptan and Hydrogen Sulfide gas classification [Dataset]. http://doi.org/10.5878/7zs8-5611
Explore at:
pdf(211275), csv(29170241)Available download formats
Unique identifier
https://doi.org/10.5878/7zs8-5611
Dataset updated
Apr 6, 2023
Dataset provided by
Swedish National Data Service
Mid Sweden University
Authors
Mazhar Hussain
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset includes time-series data collected by four different sensors, which measure two target gases, Hydrogen Sulfide and Methyl Mercaptan, in the presence of air. To obtain measurements, each gas was individually exposed to the multi-sensor setup while in the presence of air. The dataset is particularly useful for gas classification tasks, as deep learning and data fusion techniques can be applied to identify the target gases.

The dataset comprises time-series data collected by four sensors, which measure two target gases, Hydrogen Sulfide (H2S) and Methyl Mercaptan (CH3SH) in the presence of air. To obtain measurements, each gas was individually exposed to the multi-sensor setup, while maintaining room temperature. Table 2 in the description file presents the distribution of data samples for the target gases collected from the multi-sensor system against the true gas concentration in parts per million (ppm) at two different humidity levels. The dataset file is available in CSV format and contains 9 columns with a total of 654440x4 gas samples. The CSV file also includes additional information on temperature, humidity, and true concentrations of Hydrogen Sulfide and Methyl Mercaptan. Out of the 654440x4 samples, there are 151682x4 samples of Methyl Mercaptan, 126142x4 samples of Hydrogen Sulfide, and the remaining samples are normal air samples.

The dataset was originally published in DiVA and moved to SND in 2024.
w
Work Programme statistical summary: data to December 2013
gov.uk
Updated May 22, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Work and Pensions (2014). Work Programme statistical summary: data to December 2013 [Dataset]. https://www.gov.uk/government/statistics/work-programme-statistical-summary-march-2014
Explore at:
Dataset updated
May 22, 2014
Dataset provided by
GOV.UK
Authors
Department for Work and Pensions
Description
This summary contains the latest Work Programme official statistics on referrals, attachments and validated job outcome and sustainment payments to 31 December 2013. The official statistics aim to reflect the Work Programme payment model and contractual agreements which are primarily based on sustained employment.

There is more information on the Work Programme statistics page.

Facebook

Twitter

Click to copy link

Link copied

Cite

data.oregon.gov (2021). Agency Data Coordinator Role Description [Dataset]. https://catalog.data.gov/dataset/agency-data-coordinator-role-description

Agency Data Coordinator Role Description

Explore at:

Dataset updated

Aug 7, 2021

Dataset provided by

data.oregon.gov

Description

This document is a sample role description for the Agency Data Coordinator to assist agencies in selecting the appropriate individual to serve as Data Coordinator and in setting expectations for their duties.

Clear search

Close search

Google apps

Main menu

Agency Data Coordinator Role Description

Data Management Plan Examples Database

Data Description for Risk of Dementia Due to Co-Exposure to Air Pollution...

Description of data management practices for SANDBOX research data

Data from: Current and projected research data storage needs of Agricultural...

University SET data, with faculty and courses characteristics

Drinking Water - Public Water System Information

Major Object Descriptions

Household Survey on Information and Communications Technology, 2014 - West...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Sampling deviation

Mode of data collection

Research instrument

Cleaning operations

Response rate

Sampling error estimates

Campaign Finance Summary

National Hydrography Data - NHD and 3DHP

Condition Monitoring for Packaging Industry dataset (CoMoPI)

3.34 Community Health and Well-Being (summary)

Data from: A dataset containing S&P500 information security breaches and...

Small Molecule-Protein Interaction Data

About the Dataset and How I augmented the data:

Data Description:

Targets: binds

InFORM Fire Occurrence Data Records - Dataset - CKAN

Official Information - Dataset - data.govt.nz - discover and use data

Freight Analysis Framework - All FAF summary datasets

Data from: Multi-sensor dataset for normal air, Methyl Mercaptan and...

Work Programme statistical summary: data to December 2013

Agency Data Coordinator Role Description

Targets: `binds`