48 datasets found

u
Data from: Pesticide Data Program (PDP)
agdatacommons.nal.usda.gov
txt
Updated Dec 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Agriculture (USDA), Agricultural Marketing Service (AMS) (2025). Pesticide Data Program (PDP) [Dataset]. http://doi.org/10.15482/USDA.ADC/1520764
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1520764
Dataset updated
Dec 2, 2025
Dataset provided by
Ag Data Commons
Authors
U.S. Department of Agriculture (USDA), Agricultural Marketing Service (AMS)
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
The Pesticide Data Program (PDP) is a national pesticide residue database program. Through cooperation with State agriculture departments and other Federal agencies, PDP manages the collection, analysis, data entry, and reporting of pesticide residues on agricultural commodities in the U.S. food supply, with an emphasis on those commodities highly consumed by infants and children.This dataset provides information on where each tested sample was collected, where the product originated from, what type of product it was, and what residues were found on the product, for calendar years 1992 through 2023. The data can measure residues of individual compounds and classes of compounds, as well as provide information about the geographic distribution of the origin of samples, from growers, packers and distributors. The dataset also includes information on where the samples were taken, what laboratory was used to test them, and all testing procedures (by sample, so can be linked to the compound that is identified). The dataset also contains a reference variable for each compound that denotes the limit of detection for a pesticide/commodity pair (LOD variable). The metadata also includes EPA tolerance levels or action levels for each pesticide/commodity pair. The dataset will be updated on a continual basis, with a new resource data file added annually after the PDP calendar-year survey data is released.Resources in this dataset:Resource Title: CSV Data Dictionary for PDP.File Name: PDP_DataDictionary.csv. Resource Description: Machine-readable Comma Separated Values (CSV) format data dictionary for PDP Database Zip files. Defines variables for the sample identity and analytical results data tables/files. The ## characters in the Table and Text Data File name refer to the 2-digit year for the PDP survey, like 97 for 1997 or 01 for 2001. For details on table linking, see PDF. Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excelResource Title: Data dictionary for Pesticide Data Program. File Name: PDP DataDictionary.pdf. Resource Description: Data dictionary for PDP Database Zip files. Resource Software Recommended: Adobe Acrobat, url: https://www.adobe.comResource Title: 2023 PDP Database Zip File. File Name: 2023PDPDatabase.zipResource Title: 2022 PDP Database Zip File. File Name: 2022PDPDatabase.zipResource Title: 2021 PDP Database Zip File. File Name: 2021PDPDatabase.zipResource Title: 2020 PDP Database Zip File. File Name: 2020PDPDatabase.zipResource Title: 2019 PDP Database Zip File. File Name: 2019PDPDatabase.zipResource Title: 2018 PDP Database Zip File. File Name: 2018PDPDatabase.zipResource Title: 2017 PDP Database Zip File. File Name: 2017PDPDatabase.zipResource Title: 2016 PDP Database Zip File. File Name: 2016PDPDatabase.zipResource Title: 2015 PDP Database Zip File. File Name: 2015PDPDatabase.zipResource Title: 2014 PDP Database Zip File. File Name: 2014PDPDatabase.zipResource Title: 2013 PDP Database Zip File. File Name: 2013PDPDatabase.zipResource Title: 2012 PDP Database Zip File. File Name: 2012PDPDatabase.zipResource Title: 2011 PDP Database Zip File. File Name: 2011PDPDatabase.zipResource Title: 2010 PDP Database Zip File. File Name: 2010PDPDatabase.zipResource Title: 2009 PDP Database Zip File. File Name: 2009PDPDatabase.zipResource Title: 2008 PDP Database Zip File. File Name: 2008PDPDatabase.zipResource Title: 2007 PDP Database Zip File. File Name: 2007PDPDatabase.zipResource Title: 2006 PDP Database Zip File. File Name: 2006PDPDatabase.zipResource Title: 2005 PDP Database Zip File. File Name: 2005PDPDatabase.zipResource Title: 2004 PDP Database Zip File. File Name: 2004PDPDatabase.zipResource Title: 2003 PDP Database Zip File. File Name: 2003PDPDatabase.zipResource Title: 2002 PDP Database Zip File. File Name: 2002PDPDatabase.zipResource Title: 2001 PDP Database Zip File. File Name: 2001PDPDatabase.zipResource Title: 2000 PDP Database Zip File. File Name: 2000PDPDatabase.zipResource Title: 1999 PDP Database Zip File. File Name: 1999PDPDatabase.zipResource Title: 1998 PDP Database Zip File. File Name: 1998PDPDatabase.zipResource Title: 1997 PDP Database Zip File. File Name: 1997PDPDatabase.zipResource Title: 1996 PDP Database Zip File. File Name: 1996PDPDatabase.zipResource Title: 1995 PDP Database Zip File. File Name: 1995PDPDatabase.zipResource Title: 1994 PDP Database Zip File. File Name: 1994PDPDatabase.zipResource Title: 1993 PDP Database Zip File. File Name: 1993PDPDatabase.zipResource Title: 1992 PDP Database Zip File. File Name: 1992PDPDatabase.zip
l
LScDC Word-Category RIG Matrix
figshare.le.ac.uk
pdf
Updated Apr 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neslihan Suzen (2020). LScDC Word-Category RIG Matrix [Dataset]. http://doi.org/10.25392/leicester.data.12133431.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.25392/leicester.data.12133431.v2
Dataset updated
Apr 28, 2020
Dataset provided by
University of Leicester
Authors
Neslihan Suzen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LScDC Word-Category RIG MatrixApril 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk / suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny MirkesGetting StartedThis file describes the Word-Category RIG Matrix for theLeicester Scientific Corpus (LSC) [1], the procedure to build the matrix and introduces the Leicester Scientific Thesaurus (LScT) with the construction process. The Word-Category RIG Matrix is a 103,998 by 252 matrix, where rows correspond to words of Leicester Scientific Dictionary-Core (LScDC) [2] and columns correspond to 252 Web of Science (WoS) categories [3, 4, 5]. Each entry in the matrix corresponds to a pair (category,word). Its value for the pair shows the Relative Information Gain (RIG) on the belonging of a text from the LSC to the category from observing the word in this text. The CSV file of Word-Category RIG Matrix in the published archive is presented with two additional columns of the sum of RIGs in categories and the maximum of RIGs over categories (last two columns of the matrix). So, the file ‘Word-Category RIG Matrix.csv’ contains a total of 254 columns.This matrix is created to be used in future research on quantifying of meaning in scientific texts under the assumption that words have scientifically specific meanings in subject categories and the meaning can be estimated by information gains from word to categories. LScT (Leicester Scientific Thesaurus) is a scientific thesaurus of English. The thesaurus includes a list of 5,000 words from the LScDC. We consider ordering the words of LScDC by the sum of their RIGs in categories. That is, words are arranged in their informativeness in the scientific corpus LSC. Therefore, meaningfulness of words evaluated by words’ average informativeness in the categories. We have decided to include the most informative 5,000 words in the scientific thesaurus. Words as a Vector of Frequencies in WoS CategoriesEach word of the LScDC is represented as a vector of frequencies in WoS categories. Given the collection of the LSC texts, each entry of the vector consists of the number of texts containing the word in the corresponding category.It is noteworthy that texts in a corpus do not necessarily belong to a single category, as they are likely to correspond to multidisciplinary studies, specifically in a corpus of scientific texts. In other words, categories may not be exclusive. There are 252 WoS categories and a text can be assigned to at least 1 and at most 6 categories in the LSC. Using the binary calculation of frequencies, we introduce the presence of a word in a category. We create a vector of frequencies for each word, where dimensions are categories in the corpus.The collection of vectors, with all words and categories in the entire corpus, can be shown in a table, where each entry corresponds to a pair (word,category). This table is build for the LScDC with 252 WoS categories and presented in published archive with this file. The value of each entry in the table shows how many times a word of LScDC appears in a WoS category. The occurrence of a word in a category is determined by counting the number of the LSC texts containing the word in a category. Words as a Vector of Relative Information Gains Extracted for CategoriesIn this section, we introduce our approach to representation of a word as a vector of relative information gains for categories under the assumption that meaning of a word can be quantified by their information gained for categories.For each category, a function is defined on texts that takes the value 1, if the text belongs to the category, and 0 otherwise. For each word, a function is defined on texts that takes the value 1 if the word belongs to the text, and 0 otherwise. Consider LSC as a probabilistic sample space (the space of equally probable elementary outcomes). For the Boolean random variables, the joint probability distribution, the entropy and information gains are defined.The information gain about the category from the word is the amount of information on the belonging of a text from the LSC to the category from observing the word in the text [6]. We used the Relative Information Gain (RIG) providing a normalised measure of the Information Gain. This provides the ability of comparing information gains for different categories. The calculations of entropy, Information Gains and Relative Information Gains can be found in the README file in the archive published. Given a word, we created a vector where each component of the vector corresponds to a category. Therefore, each word is represented as a vector of relative information gains. It is obvious that the dimension of vector for each word is the number of categories. The set of vectors is used to form the Word-Category RIG Matrix, in which each column corresponds to a category, each row corresponds to a word and each component is the relative information gain from the word to the category. In Word-Category RIG Matrix, a row vector represents the corresponding word as a vector of RIGs in categories. We note that in the matrix, a column vector represents RIGs of all words in an individual category. If we choose an arbitrary category, words can be ordered by their RIGs from the most informative to the least informative for the category. As well as ordering words in each category, words can be ordered by two criteria: sum and maximum of RIGs in categories. The top n words in this list can be considered as the most informative words in the scientific texts. For a given word, the sum and maximum of RIGs are calculated from the Word-Category RIG Matrix.RIGs for each word of LScDC in 252 categories are calculated and vectors of words are formed. We then form the Word-Category RIG Matrix for the LSC. For each word, the sum (S) and maximum (M) of RIGs in categories are calculated and added at the end of the matrix (last two columns of the matrix). The Word-Category RIG Matrix for the LScDC with 252 categories, the sum of RIGs in categories and the maximum of RIGs over categories can be found in the database.Leicester Scientific Thesaurus (LScT)Leicester Scientific Thesaurus (LScT) is a list of 5,000 words form the LScDC [2]. Words of LScDC are sorted in descending order by the sum (S) of RIGs in categories and the top 5,000 words are selected to be included in the LScT. We consider these 5,000 words as the most meaningful words in the scientific corpus. In other words, meaningfulness of words evaluated by words’ average informativeness in the categories and the list of these words are considered as a ‘thesaurus’ for science. The LScT with value of sum can be found as CSV file with the published archive. Published archive contains following files:1) Word_Category_RIG_Matrix.csv: A 103,998 by 254 matrix where columns are 252 WoS categories, the sum (S) and the maximum (M) of RIGs in categories (last two columns of the matrix), and rows are words of LScDC. Each entry in the first 252 columns is RIG from the word to the category. Words are ordered as in the LScDC.2) Word_Category_Frequency_Matrix.csv: A 103,998 by 252 matrix where columns are 252 WoS categories and rows are words of LScDC. Each entry of the matrix is the number of texts containing the word in the corresponding category. Words are ordered as in the LScDC.3) LScT.csv: List of words of LScT with sum (S) values. 4) Text_No_in_Cat.csv: The number of texts in categories. 5) Categories_in_Documents.csv: List of WoS categories for each document of the LSC.6) README.txt: Description of Word-Category RIG Matrix, Word-Category Frequency Matrix and LScT and forming procedures.7) README.pdf (same as 6 in PDF format)References[1] Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v2[2] Suzen, Neslihan (2019): LScDC (Leicester Scientific Dictionary-Core). figshare. Dataset. https://doi.org/10.25392/leicester.data.9896579.v3[3] Web of Science. (15 July). Available: https://apps.webofknowledge.com/[4] WoS Subject Categories. Available: https://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html [5] Suzen, N., Mirkes, E. M., & Gorban, A. N. (2019). LScDC-new large scientific dictionary. arXiv preprint arXiv:1912.06858. [6] Shannon, C. E. (1948). A mathematical theory of communication. Bell system technical journal, 27(3), 379-423.
H
Dictionary of Titles
dataverse.harvard.edu
search.dataone.org
Updated Apr 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shahad Althobaiti; Ahmad Alabdulkareem; Judy Hanwen Shen; Iyad Rahwan; Esteban Moro; Alex Rutherford (2022). Dictionary of Titles [Dataset]. http://doi.org/10.7910/DVN/DQW8IP
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/DQW8IP
Dataset updated
Apr 6, 2022
Dataset provided by
Harvard Dataverse
Authors
Shahad Althobaiti; Ahmad Alabdulkareem; Judy Hanwen Shen; Iyad Rahwan; Esteban Moro; Alex Rutherford
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Hand transcribed content from the United States Bureau of Labour Statistics Dictionary of Titles (DoT). The DoT is a record of occupations and a description of the tasks performed. Five editions exist from 1939, 1949, 1965, 1977 and 1991. The DoT was replaced by O*NET structured data on jobs, workers and their characteristics. However, apart from the 1991 data, the data in the DoT is not easily ingestible, existing only in scalar PDF documents. Attempts at Optical Character Recognition led to low accuracy. For that reason we present here hand transcribed textual data from these documents. Various data are available for each occupation e.g. numerical codes, references to other occupations as well as the free text description. For that reason the data for each edition is presented in 'long' format with a variable number of lines, with a blank line between occupations. Consult the transcription instructions for more details. Structured meta-data (see here) on occupations is also available for the 1965, 1977 and 1991 editions. For the 1965, 1977 and 1991 editions, this data can be extracted from the numerical codes with the occupational entries, the key for these codes is found in the 1965 edition in separate tables exist which were transcribed. The instructions provided to transcribers for this edition are also added to the repository. The original documents are freely available in PDF format (e.g. here) This data accompanies the paper 'Longitudinal Complex Dynamics of Labour Markets Reveal Increasing Polarisation' by Althobaiti et al
Steam Dataset 2025: Multi-Modal Gaming Analytics
kaggle.com
zip
Updated Oct 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CrainBramp (2025). Steam Dataset 2025: Multi-Modal Gaming Analytics [Dataset]. https://www.kaggle.com/datasets/crainbramp/steam-dataset-2025-multi-modal-gaming-analytics
Explore at:
zip(12478964226 bytes)Available download formats
Dataset updated
Oct 7, 2025
Authors
CrainBramp
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Steam Dataset 2025: Multi-Modal Gaming Analytics Platform

The first multi-modal Steam dataset with semantic search capabilities. 239,664 applications collected from official Steam Web APIs with PostgreSQL database architecture, vector embeddings for content discovery, and comprehensive review analytics.

Made by a lifelong gamer for the gamer in all of us. Enjoy!🎮

GitHub Repository https://github.com/vintagedon/steam-dataset-2025

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F28514182%2F4b7eb73ac0f2c3cc9f0d57f37321b38f%2FScreenshot%202025-10-18%20180450.png?generation=1760825194507387&alt=media" alt=""> 1024-dimensional game embeddings projected to 2D via UMAP reveal natural genre clustering in semantic space

What Makes This Different

Unlike traditional flat-file Steam datasets, this is built as an analytically-native database optimized for advanced data science workflows:

☑️ Semantic Search Ready - 1024-dimensional BGE-M3 embeddings enable content-based game discovery beyond keyword matching

☑️ Multi-Modal Architecture - PostgreSQL + JSONB + pgvector in unified database structure

☑️ Production Scale - 239K applications vs typical 6K-27K in existing datasets

☑️ Complete Review Corpus - 1,048,148 user reviews with sentiment and metadata

☑️ 28-Year Coverage - Platform evolution from 1997-2025

☑️ Publisher Networks - Developer and publisher relationship data for graph analysis

☑️ Complete Methodology & Infrastructure - Full work logs document every technical decision and challenge encountered, while my API collection scripts, database schemas, and processing pipelines enable you to update the dataset, fork it for customized analysis, learn from real-world data engineering workflows, or critique and improve the methodology

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F28514182%2F649e9f7f46c6ce213101d0948c89e8ac%2F4_price_distribution_by_top_10_genres.png?generation=1760824835918620&alt=media" alt=""> Market segmentation and pricing strategy analysis across top 10 genres

What's Included

Core Data (CSV Exports): - 239,664 Steam applications with complete metadata - 1,048,148 user reviews with scores and statistics - 13 normalized relational tables for pandas/SQL workflows - Genre classifications, pricing history, platform support - Hardware requirements (min/recommended specs) - Developer and publisher portfolios

Advanced Features (PostgreSQL): - Full database dump with optimized indexes - JSONB storage preserving complete API responses - Materialized columns for sub-second query performance - Vector embeddings table (pgvector-ready)

Documentation: - Complete data dictionary with field specifications - Database schema documentation - Collection methodology and validation reports

Example Analysis: Published Notebooks (v1.0)

Three comprehensive analysis notebooks demonstrate dataset capabilities. All notebooks render directly on GitHub with full visualizations and output:

📊 Platform Evolution & Market Landscape

View on GitHub | PDF Export
28 years of Steam's growth, genre evolution, and pricing strategies.

🔍 Semantic Game Discovery

View on GitHub | PDF Export
Content-based recommendations using vector embeddings across genre boundaries.

🎯 The Semantic Fingerprint

View on GitHub | PDF Export
Genre prediction from game descriptions - demonstrates text analysis capabilities.

Notebooks render with full output on GitHub. Kaggle-native versions planned for v1.1 release. CSV data exports included in dataset for immediate analysis.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F28514182%2F4079e43559d0068af00a48e2c31f0f1d%2FScreenshot%202025-10-18%20180214.png?generation=1760824950649726&alt=media" alt=""> *Steam platfor...
d
ESS-DIVE Reporting Format for Dataset Package Metadata
search.dataone.org
knb.ecoinformatics.org
+1more
Updated Jun 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deb Agarwal; Shreyas Cholia; Valerie C. Hendrix; Robert Crystal-Ornelas; Cory Snavely; Joan Damerow; Charuleka Varadharajan (2022). ESS-DIVE Reporting Format for Dataset Package Metadata [Dataset]. http://doi.org/10.15485/1866026
Explore at:
Unique identifier
https://doi.org/10.15485/1866026
Dataset updated
Jun 11, 2022
Dataset provided by
ESS-DIVE
Authors
Deb Agarwal; Shreyas Cholia; Valerie C. Hendrix; Robert Crystal-Ornelas; Cory Snavely; Joan Damerow; Charuleka Varadharajan
Time period covered
Jan 1, 2017
Description
ESS-DIVE’s (Environmental Systems Science Data Infrastructure for a Virtual Ecosystem) dataset metadata reporting format is intended to compile information about a dataset (e.g., title, description, funding sources) that can enable reuse of data submitted to the ESS-DIVE data repository. The files contained in this dataset include instructions (dataset_metadata_guide.md and README.md) that can be used to understand the types of metadata ESS-DIVE collects. The data dictionary (dd.csv) follows ESS-DIVE’s file-level metadata reporting format and includes brief descriptions about each element of the dataset metadata reporting format. This dataset also includes a terminology crosswalk (dataset_metadata_crosswalk.csv) that shows how ESS-DIVE’s metadata reporting format maps onto other existing metadata standards and reporting formats. Data contributors to ESS-DIVE can provide this metadata by manual entry using a web form or programmatically via ESS-DIVE’s API (Application Programming Interface). A metadata template (dataset_metadata_template.docx or dataset_metadata_template.pdf) can be used to collaboratively compile metadata before providing it to ESS-DIVE. Since being incorporated into ESS-DIVE’s data submission user interface, ESS-DIVE’s dataset metadata reporting format, has enabled features like automated metadata quality checks, and dissemination of ESS-DIVE datasets onto other data platforms including Google Dataset Search and DataCite.
Prescription Drugs Introduced to Market
data.ca.gov
data.chhs.ca.gov
+3more
csv, xlsx, zip
Updated Nov 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health Care Access and Information (2025). Prescription Drugs Introduced to Market [Dataset]. https://data.ca.gov/dataset/prescription-drugs-introduced-to-market
Explore at:
csv, xlsx, zipAvailable download formats
Dataset updated
Nov 7, 2025
Dataset authored and provided by
Department of Health Care Access and Information
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset provides data for new prescription drugs introduced to market in California with a Wholesale Acquisition Cost (WAC) that exceeds the Medicare Part D specialty drug cost threshold. Prescription drug manufacturers submit information to HCAI within a specified time period after a drug is introduced to market. Key data elements include the National Drug Code (NDC) administered by the FDA, a narrative description of marketing and pricing plans, and WAC, among other information. Manufacturers may withhold information that is not in the public domain. Note that prescription drug manufacturers are able to submit new drug reports for a prior quarter at any time. Therefore, the data set may include additional new drug report(s) from previous quarter(s).

There are two types of New Drug data sets: Monthly and Annual. The Monthly data sets include the data in completed reports submitted by manufacturers for calendar year 2025, as of November 7, 2025. The Annual data sets include data in completed reports submitted by manufacturers for the specified year. The data sets may include reports that do not meet the specified minimum thresholds for reporting.

The program regulations are available here: https://hcai.ca.gov/wp-content/uploads/2024/03/CTRx-Regulations-Text.pdf

The data format and file specifications are available here: https://hcai.ca.gov/wp-content/uploads/2024/03/Format-and-File-Specifications-version-2.0-ada.pdf

DATA NOTES: Due to recent changes in Excel capabilities, it is not recommended that you save these files to .csv format. If you do, when importing back into Excel the leading zeros in the NDC number column will be dropped. If you need to save it into a different format other than .xlsx it must be .txt

Submitted reports that are still under review by HCAI are not included in these files.

DATA UPDATES: Drug manufacturers may submit New Drug reports to HCAI for prescription drugs which were not initially reported when they were introduced to market. CTRx staff update the posted datasets monthly for current year data and as needed for previous years. Please check the 'Data last updated' date on each dataset page to ensure you are viewing the most current data.
a
Personal Property Data Extract EOY19
hub.arcgis.com
dataold-stlcogis.opendata.arcgis.com
+4more
Updated Feb 25, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saint Louis County GIS Service Center (2020). Personal Property Data Extract EOY19 [Dataset]. https://hub.arcgis.com/datasets/44b454338d9449f68dd64cd3c92343a8
Explore at:
Dataset updated
Feb 25, 2020
Dataset authored and provided by
Saint Louis County GIS Service Center
Description
This is a comprehensive collection of tax and assessment data extracted at a specific time. The data is in CSV format. A data dictionary (pdf) and the current tax rate book (pdf) are also included.
a
Assessment Real Estate Data Extract Preliminary Values 19
hub.arcgis.com
data.stlouisco.com
+4more
Updated Mar 15, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saint Louis County GIS Service Center (2019). Assessment Real Estate Data Extract Preliminary Values 19 [Dataset]. https://hub.arcgis.com/datasets/5be155a36a3a44a0b222f155c64ffb29
Explore at:
Dataset updated
Mar 15, 2019
Dataset authored and provided by
Saint Louis County GIS Service Center
Description
This is a comprehensive collection of tax and assessment data extracted at a specific time. The data is in CSV format. A data dictionary (pdf) and the current tax rate book (pdf) are also included.
Data from: Cancer-Dictionary
kaggle.com
zip
Updated Apr 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AnandaVelMurugan Chandra Mohan (2020). Cancer-Dictionary [Dataset]. https://www.kaggle.com/anandavelmuruganchandramohan/cancerdictionary
Explore at:
zip(11881 bytes)Available download formats
Dataset updated
Apr 4, 2020
Authors
AnandaVelMurugan Chandra Mohan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This is a dictionary of common medical terms related to Cancer. This could be used in NLP(natural language processing) tasks related to cancer.

Content

There are two columns. First column contains the terminology and second column contains the description. For example, a terms "adenocarcinoma" with its meaning "a cancer that grows in gland tissue" can be found in this dataset.

Acknowledgements

I sourced this list from https://www.petermac.org/sites/default/files/page/downloads/Peter_Mac_cancer_words_terms-A_to_Z_WEB.pdf. I used "tabulizer" R package to extract the tabular contents from the PDF

Inspiration

My father is a cancer survivor. I want the world to get better in terms of handling cancer and eradicating this deadly disease. I believe this dataset would help that cause.
Z
Data from: Traffic and Log Data Captured During a Cyber Defense Exercise
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Jun 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Tovarňák; Stanislav Špaček; Jan Vykopal (2020). Traffic and Log Data Captured During a Cyber Defense Exercise [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3746128
Explore at:
Dataset updated
Jun 12, 2020
Dataset provided by
Masarykova Univerzita, Brno, CZ
Authors
Daniel Tovarňák; Stanislav Špaček; Jan Vykopal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was acquired during Cyber Czech – a hands-on cyber defense exercise (Red Team/Blue Team) held in March 2019 at Masaryk University, Brno, Czech Republic. Network traffic flows and a high variety of event logs were captured in an exercise network deployed in the KYPO Cyber Range Platform.

Contents

The dataset covers two distinct time intervals, which correspond to the official schedule of the exercise. The timestamps provided below are in the ISO 8601 date format.

Day 1, March 19, 2019

Start: 2019-03-19T11:00:00.000000+01:00

End: 2019-03-19T18:00:00.000000+01:00

Day 2, March 20, 2019

Start: 2019-03-20T08:00:00.000000+01:00

End: 2019-03-20T15:30:00.000000+01:00

The captured and collected data were normalized into three distinct event types and they are stored as structured JSON. The data are sorted by a timestamp, which represents the time they were observed. Each event type includes a raw payload ready for further processing and analysis. The description of the respective event types and the corresponding data files follows.

cz.muni.csirt.IpfixEntry.tgz – an archive of IPFIX traffic flows enriched with an additional payload of parsed application protocols in raw JSON.

cz.muni.csirt.SyslogEntry.tgz – an archive of Linux Syslog entries with the payload of corresponding text-based log messages.

cz.muni.csirt.WinlogEntry.tgz – an archive of Windows Event Log entries with the payload of original events in raw XML.

Each archive listed above includes a directory of the same name with the following four files, ready to be processed.

data.json.gz – the actual data entries in a single gzipped JSON file.

dictionary.yml – data dictionary for the entries.

schema.ddl – data schema for Apache Spark analytics engine.

schema.jsch – JSON schema for the entries.

Finally, the exercise network topology is described in a machine-readable NetJSON format and it is a part of a set of auxiliary files archive – auxiliary-material.tgz – which includes the following.

global-gateway-config.json – the network configuration of the global gateway in the NetJSON format.

global-gateway-routing.json – the routing configuration of the global gateway in the NetJSON format.

redteam-attack-schedule.{csv,odt} – the schedule of the Red Team attacks in CSV and ODT format. Source for Table 2.

redteam-reserved-ip-ranges.{csv,odt} – the list of IP segments reserved for the Red Team in CSV and ODT format. Source for Table 1.

topology.{json,pdf,png} – the topology of the complete Cyber Czech exercise network in the NetJSON, PDF and PNG format.

topology-small.{pdf,png} – simplified topology in the PDF and PNG format. Source for Figure 1.
g
Census of Population and Housing, 1980 [United States]: Census Software...
search.gesis.org
Updated May 6, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Department of Commerce. Bureau of the Census (2021). Census of Population and Housing, 1980 [United States]: Census Software Package (CENSPAC) Version 3.2 with STF4 Data Dictionaries - Archival Version [Dataset]. http://doi.org/10.3886/ICPSR07789
Explore at:
Unique identifier
https://doi.org/10.3886/ICPSR07789
Dataset updated
May 6, 2021
Dataset provided by
GESIS search
ICPSR - Interuniversity Consortium for Political and Social Research
Authors
United States Department of Commerce. Bureau of the Census
License
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de442109https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de442109
Area covered
United States
Description
Abstract (en): This data collection contains the Census Software Package (CENSPAC), a generalized data retrieval system that the Census Bureau developed for use with its public use statistical data files. CENSPAC primarily provides processing capabilities for summary data files, but it also has some features that are applicable to microdata files. The actual software provides sample JCL for system installation, programs for system reconfiguration, source code for CENSPAC, and machine-readable data dictionaries for STF 1, STF 2, STF 3, and STF 4. 2006-01-12 All files were removed from dataset 19 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 19 and flagged as study-level files, so that they will accompany all downloads. (1) The codebook is provided by ICPSR as a Portable Document Format (PDF) file. The PDF file format was developed by Adobe Systems Incorporated and can be accessed using PDF reader software, such as the Adobe Acrobat Reader. Information on how to obtain a copy of the Acrobat Reader is provided on the ICPSR Web site. (2) Documentation is provided from the Bureau of the Census detailing the CENSPAC command language for file definition and report generation, the Census documentor for preparing file documentation, and information on system installation. (3) Version 3.2 of the the Census Software Package consists of programs written in 1974 ANSI COBOL and requires 170k bytes of main memory, direct access storage for dictionary files, and input and output devices. CENSPAC was developed on an IBM 370/168 VS, but is also operational under UNIVAC EXEC-8, IBM OS, IBM DOS, Burroughs 7700 CDC 7000, UNIVAC 90/80, Honeywell 6600, DEC 20, DEC Vax, and APPLE II operating systems.
s
Assessment Real Estate Data Extract EOY18
data.stlouisco.com
hub.arcgis.com
+5more
Updated Mar 6, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saint Louis County GIS Service Center (2019). Assessment Real Estate Data Extract EOY18 [Dataset]. https://data.stlouisco.com/datasets/48f9a368831e4553ba504bc8d74cb352
Explore at:
Dataset updated
Mar 6, 2019
Dataset authored and provided by
Saint Louis County GIS Service Center
Description
This is a comprehensive collection of tax and assessment data extracted at a specific time. The data is in CSV format. A data dictionary (pdf) and the current tax rate book (pdf) are also included.
Stanford Mass Shootings in America (MSA)
kaggle.com
zip
Updated Oct 7, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlos Paradis (2017). Stanford Mass Shootings in America (MSA) [Dataset]. https://www.kaggle.com/carlosparadis/stanford-msa
Explore at:
zip(708798 bytes)Available download formats
Dataset updated
Oct 7, 2017
Authors
Carlos Paradis
Area covered
Stanford, United States
Description
https://www.youtube.com/watch?v=A8syQeFtBKc

Context

The Stanford Mass Shootings in America (MSA) is a dataset released under Creative Commons Attribution 4.0 international license by the Stanford Geospatial Center. While not an exhaustive collection of mass shootings, it is a high-quality dataset ranging from 1966 to 2016 with well-defined methodology, definitions and source URLs for user validation.

This dataset can be used to validate other datasets, such as us-mass-shootings-last-50-years, which contains more recent data, or conduct other analysis, as more information is provided.

Content

This dataset contains data by the MSA project both from it's website and from it's Github account. The difference between the two sources is only on the data format (i.e. .csv versus .geojson for the data, or .csv versus .pdf for the dictionary).

mass_shooting_events_stanford_msa_release_06142016

Contains a nonexaustive list of US Mass Shootings from 1966 to 2016 in both .csv and .geojson formats.

dictionary_stanford_msa_release_06142016

Contains the data dictionary in .csv and .pdf formats. Note the .pdf format provides an easier way to visualize sub-fields.

Note the data was reproduced here without any modifications other than file renaming for clarity, the content is the same as in the source.

The following sections are reproduced from the dataset creators website. For more details, please see the source.

Project background

The Stanford Mass Shootings of America (MSA) data project began in 2012, in reaction to the mass shooting in Sandy Hook, CT. In our initial attempts to map this phenomena it was determined that no comprehensive collection of these incidents existed online. The Stanford Geospatial Center set out to create, as best we could, a single point repository for as many mass shooting events as could be collected via online media. The result was the Stanford MSA.

What the Stanford MSA is

The Stanford MSA is a data aggregation effort. It is a curated set of spatial and temporal data about mass shootings in America, taken from online media sources. It is an attempt to facilitate research on gun violence in the US by making raw data more accessible.

What the Stanford MSA is not

The Stanford MSA is not a comprehensive, longitudinal research project. The data collected in the MSA are not investigated past the assessment for inclusion in the database. The MSA is not an attempt to answer specific questions about gun violence or gun laws.

The Stanford Geospatial Center does not provide analysis or commentary on the contents of this database or any derivatives produced with it.

Data collection methodology

The information collected for the Stanford MSA is limited to online resources. An initial intensive investigation was completed looking back over existing online reports to fill in the historic record going back to 1966. Contemporary records come in as new events occur and are cross referenced against a number of online reporting sources. In general a minimum of three corroborating sources are required to add the full record into the MSA (as many as 6 or 7 sources may have been consulted in many cases). All sources for each event are listed in the database.

Due to the time involved in vetting the details of any new incident, there is often a 2 to 4 week lag between a mass shooting event and its inclusion in the public release database.

It is important to note the records in the Stanford MSA span a time from well before the advent of online media reporting, through its infancy, to the modern era of web based news and information resources. Researchers using this database need to be aware of the reporting bias these changes in technology present. A spike in incidents for recent years is likely due to increased online reporting and not necessarily indicative of the rate of mass shootings alone. Researchers should look at this database as a curated collection of quality checked data regarding mass shootings, and not an exhaustive research data set itself. Independent verification and analysis will be required to use this data in examining trends in mass shootings over time.

Definition of Mass Shooting

The definition of mass shooting used for the Stanford database is 3 or more shooting victims (not necessarily fatalities), not including the shooter. The shooting must not be identifiably gang, drug, or organized crime related.

Acknowledgements

The Stanford Mass Shootings in America (MSA) is a dataset released under [Creative Commons Attribution 4.0 int...
d
City of Tempe 2022 Community Survey Data
catalog.data.gov
performance.tempe.gov
+10more
Updated Sep 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2024). City of Tempe 2022 Community Survey Data [Dataset]. https://catalog.data.gov/dataset/city-of-tempe-2022-community-survey-data
Explore at:
Dataset updated
Sep 20, 2024
Dataset provided by
City of Tempe
Area covered
Tempe
Description
Description and PurposeThese data include the individual responses for the City of Tempe Annual Community Survey conducted by ETC Institute. These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Community Survey results are used as indicators for several city performance measures. The summary data for each performance measure is provided as an open dataset for that measure (separate from this dataset). The performance measures with indicators from the survey include the following (as of 2022):1. Safe and Secure Communities1.04 Fire Services Satisfaction1.06 Crime Reporting1.07 Police Services Satisfaction1.09 Victim of Crime1.10 Worry About Being a Victim1.11 Feeling Safe in City Facilities1.23 Feeling of Safety in Parks2. Strong Community Connections2.02 Customer Service Satisfaction2.04 City Website Satisfaction2.05 Online Services Satisfaction Rate2.15 Feeling Invited to Participate in City Decisions2.21 Satisfaction with Availability of City Information3. Quality of Life3.16 City Recreation, Arts, and Cultural Centers3.17 Community Services Programs3.19 Value of Special Events3.23 Right of Way Landscape Maintenance3.36 Quality of City Services4. Sustainable Growth & DevelopmentNo Performance Measures in this category presently relate directly to the Community Survey5. Financial Stability & VitalityNo Performance Measures in this category presently relate directly to the Community SurveyMethodsThe survey is mailed to a random sample of households in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used. To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city. Additionally, demographic data were used to monitor the distribution of responses to ensure the responding population of each survey is representative of city population. Processing and LimitationsThe location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city. This data is the weighted data provided by the ETC Institute, which is used in the final published PDF report.The 2022 Annual Community Survey report is available on data.tempe.gov. The individual survey questions as well as the definition of the response scale (for example, 1 means “very dissatisfied” and 5 means “very satisfied”) are provided in the data dictionary.Additional InformationSource: Community Attitude SurveyContact (author): Wydale HolmesContact E-Mail (author): wydale_holmes@tempe.govContact (maintainer): Wydale HolmesContact E-Mail (maintainer): wydale_holmes@tempe.govData Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData Dictionary
⛈ NOAA Storm Events Database
kaggle.com
zip
Updated Aug 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BwandoWando (2025). ⛈ NOAA Storm Events Database [Dataset]. https://www.kaggle.com/datasets/bwandowando/noaa-storm-events-database/code
Explore at:
zip(226678534 bytes)Available download formats
Dataset updated
Aug 12, 2025
Authors
BwandoWando
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F4cd683ac2eb5f1ac7c70102c62e3f660%2F_b8f65cae-7e1d-4f8c-a291-6556c8ad2295.jpeg?generation=1758107205268513&alt=media" alt="">

The database currently contains data from January 1950 to April 2025, as entered by NOAA's National Weather Service (NWS). Bulk data are available in comma-separated files (CSV). These files can be viewed in Excel and other spreadsheet applications.

https://www.ncdc.noaa.gov/stormevents/ftp.jsp

Disclaimer

Storm Data is an official publication of the National Oceanic and Atmospheric Administration (NOAA) which documents the occurrence of storms and other significant weather phenomena having sufficient intensity to cause loss of life, injuries, significant property damage, and/or disruption to commerce. In addition, it is a partial record of other significant meteorological events, such as record maximum or minimum temperatures or precipitation that occurs in connection with another event. Some information appearing in Storm Data may be provided by or gathered from sources outside the National Weather Service (NWS), such as the media, law enforcement and/or other government agencies, private companies, individuals, etc. An effort is made to use the best available information but because of time and resource constraints, information from these sources may be unverified by the NWS. Therefore, when using information from Storm Data, customers should be cautious as the NWS does not guarantee the accuracy or validity of the information. Further, when it is apparent information appearing in Storm Data originated from a source outside the NWS (frequently credit is provided), Storm Data customers requiring additional information should contact that source directly. In most cases, NWS employees will not have the knowledge to respond to such requests. In cases of legal proceedings, Federal regulations generally prohibit NWS employees from appearing as witnesses in litigation not involving the United States.

https://www.ncdc.noaa.gov/stormevents/faq.jsp

Data Dictionary

Please see Data Dictionary

What is NOAA?

The National Oceanic and Atmospheric Administration (abbreviated as NOAA /ˈnoʊ.ə/ NOH-ə) is a US scientific and regulatory agency charged with forecasting weather, monitoring oceanic and atmospheric conditions, charting the seas, conducting deep-sea exploration, and managing fishing and protection of marine mammals and endangered species in the US exclusive economic zone. The agency is part of the United States Department of Commerce and is headquartered in Silver Spring, Maryland.

Contact NOAA

https://www.ncei.noaa.gov/contact

Image

Image Generated with Bing
Z
Rapid Creation of a Data Product for the World's Specimens of Horseshoe Bats...
data.niaid.nih.gov
zenodo.org
Updated Jul 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mast, Austin R.; Paul, Deborah L.; Rios, Nelson; Bruhn, Robert; Dalton, Trevor; Krimmel, Erica R.; Pearson, Katelin D.; Sherman, Aja; Shorthouse, David P.; Simmons, Nancy B.; Soltis, Pam; Upham, Nathan; Abibou, Djihbrihou (2024). Rapid Creation of a Data Product for the World's Specimens of Horseshoe Bats and Relatives, a Known Reservoir for Coronaviruses [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3974999
Explore at:
Dataset updated
Jul 18, 2024
Dataset provided by
University of Florida
Yale University Peabody Museum of Natural History
Agriculture and Agri-Food Canada
American Museum of Natural History
Florida State University
Arizona State University
Authors
Mast, Austin R.; Paul, Deborah L.; Rios, Nelson; Bruhn, Robert; Dalton, Trevor; Krimmel, Erica R.; Pearson, Katelin D.; Sherman, Aja; Shorthouse, David P.; Simmons, Nancy B.; Soltis, Pam; Upham, Nathan; Abibou, Djihbrihou
License
https://creativecommons.org/licenses/publicdomain/https://creativecommons.org/licenses/publicdomain/
Area covered
World
Description
This repository is associated with NSF DBI 2033973, RAPID Grant: Rapid Creation of a Data Product for the World's Specimens of Horseshoe Bats and Relatives, a Known Reservoir for Coronaviruses (https://www.nsf.gov/awardsearch/showAward?AWD_ID=2033973). Specifically, this repository contains (1) raw data from iDigBio (http://portal.idigbio.org) and GBIF (https://www.gbif.org), (2) R code for reproducible data wrangling and improvement, (3) protocols associated with data enhancements, and (4) enhanced versions of the dataset published at various project milestones. Additional code associated with this grant can be found in the BIOSPEX repository (https://github.com/iDigBio/Biospex). Long-term data management of the enhanced specimen data created by this project is expected to be accomplished by the natural history collections curating the physical specimens, a list of which can be found in this Zenodo resource.

Grant abstract: "The award to Florida State University will support research contributing to the development of georeferenced, vetted, and versioned data products of the world's specimens of horseshoe bats and their relatives for use by researchers studying the origins and spread of SARS-like coronaviruses, including the causative agent of COVID-19. Horseshoe bats and other closely related species are reported to be reservoirs of several SARS-like coronaviruses. Species of these bats are primarily distributed in regions where these viruses have been introduced to populations of humans. Currently, data associated with specimens of these bats are housed in natural history collections that are widely distributed both nationally and globally. Additionally, information tying these specimens to localities are mostly vague, or in many instances missing. This decreases the utility of the specimens for understanding the source, emergence, and distribution of SARS-COV-2 and similar viruses. This project will provide quality georeferenced data products through the consolidation of ancillary information linked to each bat specimen, using the extended specimen model. The resulting product will serve as a model of how data in biodiversity collections might be used to address emerging diseases of zoonotic origin. Results from the project will be disseminated widely in opensource journals, at scientific meetings, and via websites associated with the participating organizations and institutions. Support of this project provides a quality resource optimized to inform research relevant to improving our understanding of the biology and spread of SARS-CoV-2. The overall objectives are to deliver versioned data products, in formats used by the wider research and biodiversity collections communities, through an open-access repository; project protocols and code via GitHub and described in a peer-reviewed paper, and; sustained engagement with biodiversity collections throughout the project for reintegration of improved data into their local specimen data management systems improving long-term curation.

This RAPID award will produce and deliver a georeferenced, vetted and consolidated data product for horseshoe bats and related species to facilitate understanding of the sources, distribution, and spread of SARS-CoV-2 and related viruses, a timely response to the ongoing global pandemic caused by SARS-CoV-2 and an important contribution to the global effort to consolidate and provide quality data that are relevant to understanding emergent and other properties the current pandemic. This RAPID award is made by the Division of Biological Infrastructure (DBI) using funds from the Coronavirus Aid, Relief, and Economic Security (CARES) Act.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria."

Files included in this resource

9d4b9069-48c4-4212-90d8-4dd6f4b7f2a5.zip: Raw data from iDigBio, DwC-A format

0067804-200613084148143.zip: Raw data from GBIF, DwC-A format

0067806-200613084148143.zip: Raw data from GBIF, DwC-A format

1623690110.zip: Full export of this project's data (enhanced and raw) from BIOSPEX, CSV format

bionomia-datasets-attributions.zip: Directory containing 103 Frictionless Data packages for datasets that have attributions made containing Rhinolophids or Hipposiderids, each package also containing a CSV file for mismatches in person date of birth/death and specimen eventDate. File bionomia-datasets-attributions-key_2021-02-25.csv included in this directory provides a key between dataset identifier (how the Frictionless Data package files are named) and dataset name.

bionomia-problem-dates-all-datasets_2021-02-25.csv: List of 21 Hipposiderid or Rhinolophid records whose eventDate or dateIdentified mismatches a wikidata recipient’s date of birth or death across all datasets.

flagEventDate.txt: file containing term definition to reference in DwC-A

flagExclude.txt: file containing term definition to reference in DwC-A

flagGeoreference.txt: file containing term definition to reference in DwC-A

flagTaxonomy.txt: file containing term definition to reference in DwC-A

georeferencedByID.txt: file containing term definition to reference in DwC-A

identifiedByNames.txt: file containing term definition to reference in DwC-A

instructions-to-get-people-data-from-bionomia-via-datasetKey: instructions given to data providers

RAPID-code_collection-date.R: code associated with enhancing collection dates

RAPID-code_compile-deduplicate.R: code associated with compiling and deduplicating raw data

RAPID-code_external-linkages-bold.R: code associated with enhancing external linkages

RAPID-code_external-linkages-genbank.R: code associated with enhancing external linkages

RAPID-code_external-linkages-standardize.R: code associated with enhancing external linkages

RAPID-code_people.R: code associated with enhancing data about people

RAPID-code_standardize-country.R: code associated with standardizing country data

RAPID-data-dictionary.pdf: metadata about terms included in this project’s data, in PDF format

RAPID-data-dictionary.xlsx: metadata about terms included in this project’s data, in spreadsheet format

rapid-data-providers_2021-05-03.csv: list of data providers and number of records provided to rapid-joined-records_country-cleanup_2020-09-23.csv

rapid-final-data-product_2021-06-29.zip: Enhanced data from BIOSPEX, DwC-A format

rapid-final-gazetteer.zip: Gazetteer providing georeference data and metadata for 10,341 localities assessed as part of this project

rapid-joined-records_country-cleanup_2020-09-23.csv: data product initial version where raw data has been compiled and deduplicated, and country data has been standardized

RAPID-protocol_collection-date.pdf: protocol associated with enhancing collection dates

RAPID-protocol_compile-deduplicate.pdf: protocol associated with compiling and deduplicating raw data

RAPID-protocol_external-linkages.pdf: protocol associated with enhancing external linkages

RAPID-protocol_georeference.pdf: protocol associated with georeferencing

RAPID-protocol_people.pdf: protocol associated with enhancing data about people

RAPID-protocol_standardize-country.pdf: protocol associated with standardizing country data

RAPID-protocol_taxonomic-names.pdf: protocol associated with enhancing taxonomic name data

RAPIDAgentStrings1_archivedCopy_30March2021.ods: resource used in conjunction with RAPID people protocol

recordedByNames.txt: file containing term definition to reference in DwC-A

Rhinolophid-HipposideridAgentStrings_and_People2_archivedCopy_30March2021.ods: resource used in conjunction with RAPID people protocol

wikidata-notes-for-bat-collectors_leachman_2020: please see https://zenodo.org/record/4724139 for this resource
d
City of Tempe 2023 Community Survey Data
catalog.data.gov
open.tempe.gov
+6more
Updated Apr 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2024). City of Tempe 2023 Community Survey Data [Dataset]. https://catalog.data.gov/dataset/city-of-tempe-2023-community-survey-data
Explore at:
Dataset updated
Apr 12, 2024
Dataset provided by
City of Tempe
Area covered
Tempe
Description
These data include the individual responses for the City of Tempe Annual Community Survey conducted by ETC Institute. This dataset has two layers and includes both the weighted data and unweighted data. Weighting data is a statistical method in which datasets are adjusted through calculations in order to more accurately represent the population being studied. The weighted data are used in the final published PDF report.These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Community Survey results are used as indicators for several city performance measures. The summary data for each performance measure is provided as an open dataset for that measure (separate from this dataset). The performance measures with indicators from the survey include the following (as of 2023):1. Safe and Secure Communities1.04 Fire Services Satisfaction1.06 Crime Reporting1.07 Police Services Satisfaction1.09 Victim of Crime1.10 Worry About Being a Victim1.11 Feeling Safe in City Facilities1.23 Feeling of Safety in Parks2. Strong Community Connections2.02 Customer Service Satisfaction2.04 City Website Satisfaction2.05 Online Services Satisfaction Rate2.15 Feeling Invited to Participate in City Decisions2.21 Satisfaction with Availability of City Information3. Quality of Life3.16 City Recreation, Arts, and Cultural Centers3.17 Community Services Programs3.19 Value of Special Events3.23 Right of Way Landscape Maintenance3.36 Quality of City Services4. Sustainable Growth & DevelopmentNo Performance Measures in this category presently relate directly to the Community Survey5. Financial Stability & VitalityNo Performance Measures in this category presently relate directly to the Community SurveyMethods:The survey is mailed to a random sample of households in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used. To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city. Additionally, demographic data were used to monitor the distribution of responses to ensure the responding population of each survey is representative of city population. Processing and Limitations:The location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city. The weighted data are used by the ETC Institute, in the final published PDF report.The 2023 Annual Community Survey report is available on data.tempe.gov or by visiting https://www.tempe.gov/government/strategic-management-and-innovation/signature-surveys-research-and-dataThe individual survey questions as well as the definition of the response scale (for example, 1 means “very dissatisfied” and 5 means “very satisfied”) are provided in the data dictionary.Additional InformationSource: Community Attitude SurveyContact (author): Adam SamuelsContact E-Mail (author): Adam_Samuels@tempe.govContact (maintainer): Contact E-Mail (maintainer): Data Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData Dictionary
ByteEnc dataset
figshare.com
7z
Updated Sep 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Young-Seob Jeong (2025). ByteEnc dataset [Dataset]. http://doi.org/10.6084/m9.figshare.30127783.v1
Explore at:
7zAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.30127783.v1
Dataset updated
Sep 15, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Young-Seob Jeong
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
IntroductionByteEnc dataset contains two sub-datasets as follows.1. Pretraining dataset: byte sequences extracted from HWP files. - Files - data.pretrain.ngram1.ms100: Byte sequences of 1-gram - data.pretrain.ngram2.ms100: Byte sequences of 2-gram2. Malware detection dataset: byte sequences extracted from HWP, MSOffice, PDF files. - Files - data.task.HWP.test.ngram1.ms100: HWP byte sequences of 1-gram for test - data.task.HWP.test.ngram2.ms100: HWP byte sequences of 2-gram for test - data.task.HWP.train.ngram1.ms100: HWP byte sequences of 1-gram for training - data.task.HWP.train.ngram2.ms100: HWP byte sequences of 2-gram for training - data.task.MSOffice.test.ngram1.ms100: MSOffice byte sequences of 1-gram for test - data.task.MSOffice.test.ngram2.ms100: MSOffice byte sequences of 2-gram for test - data.task.MSOffice.train.ngram1.ms100: MSOffice byte sequences of 1-gram for training - data.task.MSOffice.train.ngram2.ms100: MSOffice byte sequences of 2-gram for training - data.task.PDF.test.ngram1.ms100: PDF byte sequences of 1-gram for test - data.task.PDF.test.ngram2.ms100: PDF byte sequences of 2-gram for test - data.task.PDF.train.ngram1.ms100: PDF byte sequences of 1-gram for training - data.task.PDF.train.ngram2.ms100: PDF byte sequences of 2-gram for training### How to use (in python)You can easily load the dataset using pickle.pythonwith open(dataset_filepath, 'rb') as f: D = pickle.load(f)##### Details of the pre-training datasetWhen you load the pre-training dataset, then D is a list containing multiple dictionary objects.Every dictionary object has keys input_ids, token_type_ids, next_sentence_label and metainfo.input_ids and token_type_ids are input byte sequences and sequence labels (0 or 1) for next sentence prediction (NSP) pre-trianing algorithm.next_sentence_label is a binary label (0 or 1) for the NSP pre-training algorithm.metainfo contains malware label (0 or 1) and source file name.For example, the 80010-th instance of D looks like below.metainfo contains several items, where the first and second items indicate the malware label and the source file name. Other items in metainfo are neglectable.pythonprint(D[80010]){'input_ids': tensor([3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]), 'token_type_ids': tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), 'next_sentence_label': tensor(0), 'metainfo': array(['1', 'stream_898783cf245a86ac2722b243a56713fe98609df3ba095f11ef82718e346f316a.csv', '\t9451520', '42', '00', '60'], dtype='If your code doesn't work due to an error related to metainfo, then removing it will resolve the error as follows.pythonfor d in D: d.pop('metainfo', None)##### Details of the malware detection datasetThe following is an example of data.task.MSOffice.train.ngram1.ms100.As you can see, the type and structure is similar to the pre-training dataset, except that the malware detection dataset has the key labels which is the malware binary label (0 or 1).```pythonprint(D[0]){'input_ids': tensor([3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]), 'token_type_ids': tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), 'labels': tensor(0), 'metainfo': array(['0', '3-2-8. Ã\x81Â¾Ã\x87Ã\x95Â¼Ã\x92ÂµÃ¦Â¼Â¼Â¿Â¡ ÂµÃ»Â¸Â¥ Â³Ã³Â¾Ã®Ã\x83Ã\x8cÃ\x86Â¯ÂºÂ°Â¼Â¼ Â½Ã\x85Â°Ã\xad
e
Online survey data for the 2017 Aesthetic value project (NESP TWQ 3.2.3,...
catalogue.eatlas.org.au
Updated Nov 22, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australian Institute of Marine Science (AIMS) (2019). Online survey data for the 2017 Aesthetic value project (NESP TWQ 3.2.3, Griffith Institute for Tourism Research) [Dataset]. https://catalogue.eatlas.org.au/geonetwork/srv/api/records/595f79c7-b553-4aab-9ad8-42c092508f81
Explore at:
www:link-1.0-http--downloaddata, www:link-1.0-http--relatedAvailable download formats
Dataset updated
Nov 22, 2019
Dataset provided by
Australian Institute of Marine Science (AIMS)
Time period covered
Jan 28, 2017 - Jan 28, 2018
Description
This dataset consists of three data folders including all related documents of the online survey conducted within the NESP 3.2.3 project (Tropical Water Quality Hub) and a survey format document representing how the survey was designed. Apart from participants’ demographic information, the survey consists of three sections: conjoint analysis, picture rating and open question. Correspondent outcome of these three sections are downloaded from Qualtrics website and used for three different data analysis processes.

Related data to the first section “conjoint analysis” is saved in the Conjoint analysis folder which contains two sub-folders. The first one includes a plan file of SAV. Format representing the design suggestion by SPSS orthogonal analysis for testing beauty factors and 9 photoshoped pictures used in the survey. The second (i.e. Final results) contains 1 SAV. file named “data1” which is the imported results of conjoint analysis section in SPSS, 1 SPS. file named “Syntax1” representing the code used to run conjoint analysis, 2 SAV. files as the output of conjoint analysis by SPSS, and 1 SPV file named “Final output” showing results of further data analysis by SPSS on the basis of utility and importance data.

Related data to the second section “Picture rating” is saved into Picture rating folder including two subfolders. One subfolder contains 2500 pictures of Great Barrier Reef used in the rating survey section. These pictures are organised by named and stored in two folders named as “Survey Part 1” and “Survey Part 2” which are correspondent with two parts of the rating survey sections. The other subfolder “Rating results” consist of one XLSX. file representing survey results downloaded from Qualtric website.

Finally, related data to the open question is saved in “Open question” folder. It contains one csv. file and one PDF. file recording participants’ answers to the open question as well as one PNG. file representing a screenshot of Leximancer analysis outcome.

Methods: This dataset resulted from the input and output of an online survey regarding how people assess the beauty of Great Barrier Reef. This survey was designed for multiple purposes including three main sections: (1) conjoint analysis (ranking 9 photoshopped pictures to determine the relative importance weights of beauty attributes), (2) picture rating (2500 pictures to be rated) and (3) open question on the factors that makes a picture of the Great Barrier Reef beautiful in participants’ opinion (determining beauty factors from tourist perspective). Pictures used in this survey were downloaded from public sources such as websites of the Tourism and Events Queensland and Tropical Tourism North Queensland as well as tourist sharing sources (i.e. Flickr). Flickr pictures were downloaded using the key words “Great Barrier Reef”. About 10,000 pictures were downloaded in August and September 2017. 2,500 pictures were then selected based on several research criteria: (1) underwater pictures of GBR, (2) without humans, (3) viewed from 1-2 metres from objects and (4) of high resolution.

The survey was created on Qualtrics website and launched on 4th October 2017 using Qualtrics survey service. Each participant rated 50 pictures randomly selected from the pool of 2500 survey pictures. 772 survey completions were recorded and 705 questionnaires were eligible for data analysis after filtering unqualified questionnaires. Conjoint analysis data was imported to IBM SPSS using SAV. format and the output was saved using SPV. format. Automatic aesthetic rating of 2500 Great Barrier Reef pictures –all these pictures are rated (1 – 10 scale) by at least 10 participants and this dataset was saved in a XLSX. file which is used to train and test an Artificial Intelligence (AI)-based system recognising and assessing the beauty of natural scenes. Answers of the open-question were saved in a XLSX. file and a PDF. file to be employed for theme analysis by Leximancer software.

Further information can be found in the following publication: Becken, S., Connolly R., Stantic B., Scott N., Mandal R., Le D., (2018), Monitoring aesthetic value of the Great Barrier Reef by using innovative technologies and artificial intelligence, Griffith Institute for Tourism Research Report No 15.

Format: The Online survey dataset includes one PDF file representing the survey format with all sections and questions. It also contains three subfolders, each has multiple files. The subfolder of Conjoint analysis contains an image of the 9 JPG. Pictures, 1 SAV. format file for the Orthoplan subroutine outcome and 5 outcome documents (i.e. 3 SAV. files, 1 SPS. file, 1 SPV. file). The subfolder of Picture rating contains a capture of the 2500 pictures used in the survey, 1 excel file for rating results. The subfolder of Open question includes 1 CSV. file, 1 PDF. file representing participants’ answers and one PNG. file for the analysis outcome.

Data Dictionary:

Card 1: Picture design option number 1 suggested by SPSS orthogonal analysis. Importance value: The relative importance weight of each beauty attribute calculated by SPSS conjoint analysis. Utility: Score reflecting influential valence and degree of each beauty attribute on beauty score. Syntax: Code used to run conjoint analysis by SPSS Leximancer: Specialised software for qualitative data analysis. Concept map: A map showing the relationship between concepts identified Q1_1: Beauty score of the picture Q1_1 by the correspondent participant (i.e. survey part 1) Q2.1_1: Beauty score of the picture Q2.1_1 by the correspondent participant (i.e. survey part 2) Conjoint _1: Ranking of the picture 1 designed for conjoint analysis by the correspondent participant

References: Becken, S., Connolly R., Stantic B., Scott N., Mandal R., Le D., (2018), Monitoring aesthetic value of the Great Barrier Reef by using innovative technologies and artificial intelligence, Griffith Institute for Tourism Research Report No 15.

Data Location:

This dataset is filed in the eAtlas enduring data repository at: data esp3\3.2.3_Aesthetic-value-GBR
s
Real Estate Data Extract CERT19
data.stlouisco.com
datav3-stlcogis.opendata.arcgis.com
+5more
Updated Jul 2, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saint Louis County GIS Service Center (2019). Real Estate Data Extract CERT19 [Dataset]. https://data.stlouisco.com/datasets/1cd364e6e3de4ecca23d51f468b16091
Explore at:
Dataset updated
Jul 2, 2019
Dataset authored and provided by
Saint Louis County GIS Service Center
Description
This is a comprehensive collection of tax and assessment data extracted at a specific time. The data is in CSV format. A data dictionary (pdf) and the current tax rate book (pdf) are also included.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. Department of Agriculture (USDA), Agricultural Marketing Service (AMS) (2025). Pesticide Data Program (PDP) [Dataset]. http://doi.org/10.15482/USDA.ADC/1520764

Data from: Pesticide Data Program (PDP)

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.15482/USDA.ADC/1520764

Dataset updated

Dec 2, 2025

Dataset provided by

Ag Data Commons

Authors

U.S. Department of Agriculture (USDA), Agricultural Marketing Service (AMS)

License

U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically

Description

The Pesticide Data Program (PDP) is a national pesticide residue database program. Through cooperation with State agriculture departments and other Federal agencies, PDP manages the collection, analysis, data entry, and reporting of pesticide residues on agricultural commodities in the U.S. food supply, with an emphasis on those commodities highly consumed by infants and children.This dataset provides information on where each tested sample was collected, where the product originated from, what type of product it was, and what residues were found on the product, for calendar years 1992 through 2023. The data can measure residues of individual compounds and classes of compounds, as well as provide information about the geographic distribution of the origin of samples, from growers, packers and distributors. The dataset also includes information on where the samples were taken, what laboratory was used to test them, and all testing procedures (by sample, so can be linked to the compound that is identified). The dataset also contains a reference variable for each compound that denotes the limit of detection for a pesticide/commodity pair (LOD variable). The metadata also includes EPA tolerance levels or action levels for each pesticide/commodity pair. The dataset will be updated on a continual basis, with a new resource data file added annually after the PDP calendar-year survey data is released.Resources in this dataset:Resource Title: CSV Data Dictionary for PDP.File Name: PDP_DataDictionary.csv. Resource Description: Machine-readable Comma Separated Values (CSV) format data dictionary for PDP Database Zip files. Defines variables for the sample identity and analytical results data tables/files. The ## characters in the Table and Text Data File name refer to the 2-digit year for the PDP survey, like 97 for 1997 or 01 for 2001. For details on table linking, see PDF. Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excelResource Title: Data dictionary for Pesticide Data Program. File Name: PDP DataDictionary.pdf. Resource Description: Data dictionary for PDP Database Zip files. Resource Software Recommended: Adobe Acrobat, url: https://www.adobe.comResource Title: 2023 PDP Database Zip File. File Name: 2023PDPDatabase.zipResource Title: 2022 PDP Database Zip File. File Name: 2022PDPDatabase.zipResource Title: 2021 PDP Database Zip File. File Name: 2021PDPDatabase.zipResource Title: 2020 PDP Database Zip File. File Name: 2020PDPDatabase.zipResource Title: 2019 PDP Database Zip File. File Name: 2019PDPDatabase.zipResource Title: 2018 PDP Database Zip File. File Name: 2018PDPDatabase.zipResource Title: 2017 PDP Database Zip File. File Name: 2017PDPDatabase.zipResource Title: 2016 PDP Database Zip File. File Name: 2016PDPDatabase.zipResource Title: 2015 PDP Database Zip File. File Name: 2015PDPDatabase.zipResource Title: 2014 PDP Database Zip File. File Name: 2014PDPDatabase.zipResource Title: 2013 PDP Database Zip File. File Name: 2013PDPDatabase.zipResource Title: 2012 PDP Database Zip File. File Name: 2012PDPDatabase.zipResource Title: 2011 PDP Database Zip File. File Name: 2011PDPDatabase.zipResource Title: 2010 PDP Database Zip File. File Name: 2010PDPDatabase.zipResource Title: 2009 PDP Database Zip File. File Name: 2009PDPDatabase.zipResource Title: 2008 PDP Database Zip File. File Name: 2008PDPDatabase.zipResource Title: 2007 PDP Database Zip File. File Name: 2007PDPDatabase.zipResource Title: 2006 PDP Database Zip File. File Name: 2006PDPDatabase.zipResource Title: 2005 PDP Database Zip File. File Name: 2005PDPDatabase.zipResource Title: 2004 PDP Database Zip File. File Name: 2004PDPDatabase.zipResource Title: 2003 PDP Database Zip File. File Name: 2003PDPDatabase.zipResource Title: 2002 PDP Database Zip File. File Name: 2002PDPDatabase.zipResource Title: 2001 PDP Database Zip File. File Name: 2001PDPDatabase.zipResource Title: 2000 PDP Database Zip File. File Name: 2000PDPDatabase.zipResource Title: 1999 PDP Database Zip File. File Name: 1999PDPDatabase.zipResource Title: 1998 PDP Database Zip File. File Name: 1998PDPDatabase.zipResource Title: 1997 PDP Database Zip File. File Name: 1997PDPDatabase.zipResource Title: 1996 PDP Database Zip File. File Name: 1996PDPDatabase.zipResource Title: 1995 PDP Database Zip File. File Name: 1995PDPDatabase.zipResource Title: 1994 PDP Database Zip File. File Name: 1994PDPDatabase.zipResource Title: 1993 PDP Database Zip File. File Name: 1993PDPDatabase.zipResource Title: 1992 PDP Database Zip File. File Name: 1992PDPDatabase.zip

Clear search

Close search

Google apps

Main menu

Data from: Pesticide Data Program (PDP)

LScDC Word-Category RIG Matrix

Dictionary of Titles

Steam Dataset 2025: Multi-Modal Gaming Analytics

Steam Dataset 2025: Multi-Modal Gaming Analytics Platform

What Makes This Different

What's Included

Example Analysis: Published Notebooks (v1.0)

📊 Platform Evolution & Market Landscape

🔍 Semantic Game Discovery

🎯 The Semantic Fingerprint

ESS-DIVE Reporting Format for Dataset Package Metadata

Prescription Drugs Introduced to Market

Personal Property Data Extract EOY19

Assessment Real Estate Data Extract Preliminary Values 19

Data from: Cancer-Dictionary

Context

Content

Acknowledgements

Inspiration

Data from: Traffic and Log Data Captured During a Cyber Defense Exercise

Census of Population and Housing, 1980 [United States]: Census Software...

Assessment Real Estate Data Extract EOY18

Stanford Mass Shootings in America (MSA)

Context

Content

Project background

What the Stanford MSA is

What the Stanford MSA is not

Data collection methodology

Definition of Mass Shooting

Acknowledgements

City of Tempe 2022 Community Survey Data

⛈ NOAA Storm Events Database

Context

Disclaimer

Data Dictionary

What is NOAA?

Contact NOAA

Image

Rapid Creation of a Data Product for the World's Specimens of Horseshoe Bats...

City of Tempe 2023 Community Survey Data

ByteEnc dataset

Online survey data for the 2017 Aesthetic value project (NESP TWQ 3.2.3,...

Real Estate Data Extract CERT19

Data from: Pesticide Data Program (PDP)See More Versions

Data from: Pesticide Data Program (PDP)