100+ datasets found

f
Data Dictionary
mcri.figshare.com
txt
Updated Sep 6, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jennifer Piscionere (2018). Data Dictionary [Dataset]. http://doi.org/10.25374/MCRI.7039280.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.25374/MCRI.7039280.v1
Dataset updated
Sep 6, 2018
Dataset provided by
Murdoch Childrens Research Institute
Authors
Jennifer Piscionere
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This is a data dictionary example we will use in the MVP presentation. It can be deleted after 13/9/18.
d
Data from: Data Dictionary Template
catalog.data.gov
data-academy.tempe.gov
+10more
Updated Mar 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2023). Data Dictionary Template [Dataset]. https://catalog.data.gov/dataset/data-dictionary-template-2e170
Explore at:
Dataset updated
Mar 18, 2023
Dataset provided by
City of Tempe
Description
Data Dictionary template for Tempe Open Data.
Database Creation Description and Data Dictionaries
figshare.com
txt
Updated Aug 11, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jordan Kempker; John David Ike (2016). Database Creation Description and Data Dictionaries [Dataset]. http://doi.org/10.6084/m9.figshare.3569067.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3569067.v3
Dataset updated
Aug 11, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jordan Kempker; John David Ike
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
There are several Microsoft Word documents here detailing data creation methods and with various dictionaries describing the included and derived variables.The Database Creation Description is meant to walk a user through some of the steps detailed in the SAS code with this project.The alphabetical list of variables is intended for users as sometimes this makes some coding steps easier to copy and paste from this list instead of retyping.The NIS Data Dictionary contains some general dataset description as well as each variable's responses.
d
Open Data Dictionary Template Individual
catalog.data.gov
opendata.dc.gov
+2more
Updated Feb 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of the Chief Tecnology Officer (2025). Open Data Dictionary Template Individual [Dataset]. https://catalog.data.gov/dataset/open-data-dictionary-template-individual
Explore at:
Dataset updated
Feb 4, 2025
Dataset provided by
Office of the Chief Tecnology Officer
Description
This template covers section 2.5 Resource Fields: Entity and Attribute Information of the Data Discovery Form cited in the Open Data DC Handbook (2022). It completes documentation elements that are required for publication. Each field column (attribute) in the dataset needs a description clarifying the contents of the column. Data originators are encouraged to enter the code values (domains) of the column to help end-users translate the contents of the column where needed, especially when lookup tables do not exist.
A
Data Dictionary - Statewide Probability Sample (2009-2012)
data.amerigeoss.org
pdf
Updated Jan 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States (2022). Data Dictionary - Statewide Probability Sample (2009-2012) [Dataset]. https://data.amerigeoss.org/dataset/data-dictionary-statewide-probability-sample-2009-20121
Explore at:
pdfAvailable download formats
Dataset updated
Jan 11, 2022
Dataset provided by
United States
Description
Column descriptions for the data set
o
Getting Started Creating Data Dictionaries: How to Create a Shareable...
osf.io
Updated Jan 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erin Buchanan; Sarah Crain; Ari Wynn; Hannah Johnson; Hannah Hooven (Stash); Marietta Papadatou-Pastou; Peder Isager; Rickard Carlsson; Balazs Aczel (2021). Getting Started Creating Data Dictionaries: How to Create a Shareable Dataset [Dataset]. http://doi.org/10.17605/OSF.IO/3Y2EX
Explore at:
Unique identifier
https://doi.org/10.17605/OSF.IO/3Y2EX
Dataset updated
Jan 27, 2021
Dataset provided by
Center For Open Science
Authors
Erin Buchanan; Sarah Crain; Ari Wynn; Hannah Johnson; Hannah Hooven (Stash); Marietta Papadatou-Pastou; Peder Isager; Rickard Carlsson; Balazs Aczel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As researchers embrace open and transparent data sharing, they will need to provide information about their data that effectively helps others understand its contents. Without proper documentation, data stored in online repositories such as OSF will often be rendered unfindable and unreadable by other researchers and indexing search engines. Data dictionaries and codebooks provide a wealth of information about variables, data collection, and other important facets of a dataset. This information, called metadata, provides key insights into how the data might be further used in research and facilitates search engine indexing to reach a broader audience of interested parties. This tutorial first explains the terminology and standards surrounding data dictionaries and codebooks. We then present a guided workflow of the entire process from source data (e.g., survey answers on Qualtrics) to an openly shared dataset accompanied by a data dictionary or codebook that follows an agreed-upon standard. Finally, we explain how to use freely available web applications to assist this process of ensuring that psychology data are findable, accessible, interoperable, and reusable (FAIR; Wilkinson et al., 2016).
l
LScD (Leicester Scientific Dictionary)
figshare.le.ac.uk
docx
Updated Apr 15, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neslihan Suzen (2020). LScD (Leicester Scientific Dictionary) [Dataset]. http://doi.org/10.25392/leicester.data.9746900.v3
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.25392/leicester.data.9746900.v3
Dataset updated
Apr 15, 2020
Dataset provided by
University of Leicester
Authors
Neslihan Suzen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Leicester
Description
LScD (Leicester Scientific Dictionary)April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk/suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny Mirkes[Version 3] The third version of LScD (Leicester Scientific Dictionary) is created from the updated LSC (Leicester Scientific Corpus) - Version 2*. All pre-processing steps applied to build the new version of the dictionary are the same as in Version 2** and can be found in description of Version 2 below. We did not repeat the explanation. After pre-processing steps, the total number of unique words in the new version of the dictionary is 972,060. The files provided with this description are also same as described as for LScD Version 2 below.* Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v2** Suzen, Neslihan (2019): LScD (Leicester Scientific Dictionary). figshare. Dataset. https://doi.org/10.25392/leicester.data.9746900.v2[Version 2] Getting StartedThis document provides the pre-processing steps for creating an ordered list of words from the LSC (Leicester Scientific Corpus) [1] and the description of LScD (Leicester Scientific Dictionary). This dictionary is created to be used in future work on the quantification of the meaning of research texts. R code for producing the dictionary from LSC and instructions for usage of the code are available in [2]. The code can be also used for list of texts from other sources, amendments to the code may be required.LSC is a collection of abstracts of articles and proceeding papers published in 2014 and indexed by the Web of Science (WoS) database [3]. Each document contains title, list of authors, list of categories, list of research areas, and times cited. The corpus contains only documents in English. The corpus was collected in July 2018 and contains the number of citations from publication date to July 2018. The total number of documents in LSC is 1,673,824.LScD is an ordered list of words from texts of abstracts in LSC.The dictionary stores 974,238 unique words, is sorted by the number of documents containing the word in descending order. All words in the LScD are in stemmed form of words. The LScD contains the following information:1.Unique words in abstracts2.Number of documents containing each word3.Number of appearance of a word in the entire corpusProcessing the LSCStep 1.Downloading the LSC Online: Use of the LSC is subject to acceptance of request of the link by email. To access the LSC for research purposes, please email to ns433@le.ac.uk. The data are extracted from Web of Science [3]. You may not copy or distribute these data in whole or in part without the written consent of Clarivate Analytics.Step 2.Importing the Corpus to R: The full R code for processing the corpus can be found in the GitHub [2].All following steps can be applied for arbitrary list of texts from any source with changes of parameter. The structure of the corpus such as file format and names (also the position) of fields should be taken into account to apply our code. The organisation of CSV files of LSC is described in README file for LSC [1].Step 3.Extracting Abstracts and Saving Metadata: Metadata that include all fields in a document excluding abstracts and the field of abstracts are separated. Metadata are then saved as MetaData.R. Fields of metadata are: List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.Step 4.Text Pre-processing Steps on the Collection of Abstracts: In this section, we presented our approaches to pre-process abstracts of the LSC.1.Removing punctuations and special characters: This is the process of substitution of all non-alphanumeric characters by space. We did not substitute the character “-” in this step, because we need to keep words like “z-score”, “non-payment” and “pre-processing” in order not to lose the actual meaning of such words. A processing of uniting prefixes with words are performed in later steps of pre-processing.2.Lowercasing the text data: Lowercasing is performed to avoid considering same words like “Corpus”, “corpus” and “CORPUS” differently. Entire collection of texts are converted to lowercase.3.Uniting prefixes of words: Words containing prefixes joined with character “-” are united as a word. The list of prefixes united for this research are listed in the file “list_of_prefixes.csv”. The most of prefixes are extracted from [4]. We also added commonly used prefixes: ‘e’, ‘extra’, ‘per’, ‘self’ and ‘ultra’.4.Substitution of words: Some of words joined with “-” in the abstracts of the LSC require an additional process of substitution to avoid losing the meaning of the word before removing the character “-”. Some examples of such words are “z-test”, “well-known” and “chi-square”. These words have been substituted to “ztest”, “wellknown” and “chisquare”. Identification of such words is done by sampling of abstracts form LSC. The full list of such words and decision taken for substitution are presented in the file “list_of_substitution.csv”.5.Removing the character “-”: All remaining character “-” are replaced by space.6.Removing numbers: All digits which are not included in a word are replaced by space. All words that contain digits and letters are kept because alphanumeric characters such as chemical formula might be important for our analysis. Some examples are “co2”, “h2o” and “21st”.7.Stemming: Stemming is the process of converting inflected words into their word stem. This step results in uniting several forms of words with similar meaning into one form and also saving memory space and time [5]. All words in the LScD are stemmed to their word stem.8.Stop words removal: Stop words are words that are extreme common but provide little value in a language. Some common stop words in English are ‘I’, ‘the’, ‘a’ etc. We used ‘tm’ package in R to remove stop words [6]. There are 174 English stop words listed in the package.Step 5.Writing the LScD into CSV Format: There are 1,673,824 plain processed texts for further analysis. All unique words in the corpus are extracted and written in the file “LScD.csv”.The Organisation of the LScDThe total number of words in the file “LScD.csv” is 974,238. Each field is described below:Word: It contains unique words from the corpus. All words are in lowercase and their stem forms. The field is sorted by the number of documents that contain words in descending order.Number of Documents Containing the Word: In this content, binary calculation is used: if a word exists in an abstract then there is a count of 1. If the word exits more than once in a document, the count is still 1. Total number of document containing the word is counted as the sum of 1s in the entire corpus.Number of Appearance in Corpus: It contains how many times a word occurs in the corpus when the corpus is considered as one large document.Instructions for R CodeLScD_Creation.R is an R script for processing the LSC to create an ordered list of words from the corpus [2]. Outputs of the code are saved as RData file and in CSV format. Outputs of the code are:Metadata File: It includes all fields in a document excluding abstracts. Fields are List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.File of Abstracts: It contains all abstracts after pre-processing steps defined in the step 4.DTM: It is the Document Term Matrix constructed from the LSC[6]. Each entry of the matrix is the number of times the word occurs in the corresponding document.LScD: An ordered list of words from LSC as defined in the previous section.The code can be used by:1.Download the folder ‘LSC’, ‘list_of_prefixes.csv’ and ‘list_of_substitution.csv’2.Open LScD_Creation.R script3.Change parameters in the script: replace with the full path of the directory with source files and the full path of the directory to write output files4.Run the full code.References[1]N. Suzen. (2019). LSC (Leicester Scientific Corpus) [Dataset]. Available: https://doi.org/10.25392/leicester.data.9449639.v1[2]N. Suzen. (2019). LScD-LEICESTER SCIENTIFIC DICTIONARY CREATION. Available: https://github.com/neslihansuzen/LScD-LEICESTER-SCIENTIFIC-DICTIONARY-CREATION[3]Web of Science. (15 July). Available: https://apps.webofknowledge.com/[4]A. Thomas, "Common Prefixes, Suffixes and Roots," Center for Development and Learning, 2013.[5]C. Ramasubramanian and R. Ramya, "Effective pre-processing activities in text mining using improved porter’s stemming algorithm," International Journal of Advanced Research in Computer and Communication Engineering, vol. 2, no. 12, pp. 4536-4538, 2013.[6]I. Feinerer, "Introduction to the tm Package Text Mining in R," Accessible en ligne: https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf, 2013.
a
Commercial Fishing Regulations Data Dictionary
hub.arcgis.com
Updated Jan 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ministry for Primary Industries (2020). Commercial Fishing Regulations Data Dictionary [Dataset]. https://hub.arcgis.com/documents/95037c15c1f44366be4cc0d9843344b7
Explore at:
Dataset updated
Jan 10, 2020
Dataset authored and provided by
Ministry for Primary Industries
Description
This data dictionary describes the field names, expected data, examples of data and field types (schema) of the commercial fishing regulations data set.
d
Data from: Delta Neighborhood Physical Activity Study
catalog.data.gov
agdatacommons.nal.usda.gov
+1more
Updated Jun 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Delta Neighborhood Physical Activity Study [Dataset]. https://catalog.data.gov/dataset/delta-neighborhood-physical-activity-study-f82d7
Explore at:
Dataset updated
Jun 5, 2025
Dataset provided by
Agricultural Research Service
Description
The Delta Neighborhood Physical Activity Study was an observational study designed to assess characteristics of neighborhood built environments associated with physical activity. It was an ancillary study to the Delta Healthy Sprouts Project and therefore included towns and neighborhoods in which Delta Healthy Sprouts participants resided. The 12 towns were located in the Lower Mississippi Delta region of Mississippi. Data were collected via electronic surveys between August 2016 and September 2017 using the Rural Active Living Assessment (RALA) tools and the Community Park Audit Tool (CPAT). Scale scores for the RALA Programs and Policies Assessment and the Town-Wide Assessment were computed using the scoring algorithms provided for these tools via SAS software programming. The Street Segment Assessment and CPAT do not have associated scoring algorithms and therefore no scores are provided for them. Because the towns were not randomly selected and the sample size is small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Dataset one contains data collected with the RALA Programs and Policies Assessment (PPA) tool. Dataset two contains data collected with the RALA Town-Wide Assessment (TWA) tool. Dataset three contains data collected with the RALA Street Segment Assessment (SSA) tool. Dataset four contains data collected with the Community Park Audit Tool (CPAT). [Note : title changed 9/4/2020 to reflect study name] Resources in this dataset:Resource Title: Dataset One RALA PPA Data Dictionary. File Name: RALA PPA Data Dictionary.csvResource Description: Data dictionary for dataset one collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA Data Dictionary. File Name: RALA TWA Data Dictionary.csvResource Description: Data dictionary for dataset two collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA Data Dictionary. File Name: RALA SSA Data Dictionary.csvResource Description: Data dictionary for dataset three collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT Data Dictionary. File Name: CPAT Data Dictionary.csvResource Description: Data dictionary for dataset four collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset One RALA PPA. File Name: RALA PPA Data.csvResource Description: Data collected using the RALA PPA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Two RALA TWA. File Name: RALA TWA Data.csvResource Description: Data collected using the RALA TWA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Three RALA SSA. File Name: RALA SSA Data.csvResource Description: Data collected using the RALA SSA tool.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Dataset Four CPAT. File Name: CPAT Data.csvResource Description: Data collected using the CPAT.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel Resource Title: Data Dictionary. File Name: DataDictionary_RALA_PPA_SSA_TWA_CPAT.csvResource Description: This is a combined data dictionary from each of the 4 dataset files in this set.
v
Ecological Concerns Data Dictionary - Ecological Concerns data dictionary
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
fisheries.noaa.gov
+1more
Updated May 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact, Custodian) (2025). Ecological Concerns Data Dictionary - Ecological Concerns data dictionary [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/ecological-concerns-data-dictionary-ecological-concerns-data-dictionary2
Explore at:
Dataset updated
May 24, 2025
Dataset provided by
(Point of Contact, Custodian)
Description
Evaluating the status of threatened and endangered salmonid populations requires information on the current status of the threats (e.g., habitat, hatcheries, hydropower, and invasives) and the risk of extinction (e.g., status and trend in the Viable Salmonid Population criteria). For salmonids in the Pacific Northwest, threats generally result in changes to physical and biological characteristics of freshwater habitat. These changes are often described by terms like "limiting factors" or "habitat impairment." For example, the condition of freshwater habitat directly impacts salmonid abundance and population spatial structure by affecting carrying capacity and the variability and accessibility of rearing and spawning areas. Thus, one way to assess or quantify threats to ESUs and populations is to evaluate whether the ecological conditions on which fish depend is improving, becoming more degraded, or remains unchanged. In the attached spreadsheets, we have attempted to consistently record limiting factors and threats across all populations and ESUs to enable comparison to other datasets (e.g., restoration projects) in a consistent way. Limiting factors and threats (LF/T) identified in salmon recovery plans were translated in a common language using an ecological concerns data dictionary (see "Ecological Concerns" tab in the attached spreadsheets) (a data dictionaries defines the wording, meaning and scope of categories). The ecological concerns data dictionary defines how different elements are related, such as the relationships between threats, ecological concerns and life history stages. The data dictionary includes categories for ecological dynamics and population level effects such as "reduced genetic fitness" and "behavioral changes." The data dictionary categories are meant to encompass the ecological conditions that directly impact salmonids and can be addressed directly or indirectly by management (habitat restoration, hatchery reform, etc.) actions. Using the ecological concerns data dictionary enables us to more fully capture the range of effects of hydro, hatchery, and invasive threats as well as habitat threat categories. The organization and format of the data dictionary was also chosen so the information we record can be easily related to datasets we already posses (e.g., restoration data). Data Dictionary.
E
Viking II Data Dictionary
dtechtive.com
find.data.gov.scot
csv, docx, pdf, txt +1
Updated Oct 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Edinburgh. Institute of Genetics and Cancer. MRC Human Genetics Unit (2021). Viking II Data Dictionary [Dataset]. http://doi.org/10.7488/ds/3145
Explore at:
csv(0.0038 MB), csv(0.0065 MB), csv(0.0012 MB), docx(0.015 MB), csv(0.0098 MB), csv(0.0063 MB), csv(0.007 MB), csv(0.004 MB), csv(0.0042 MB), csv(0.0029 MB), csv(0.0068 MB), csv(0.01 MB), xlsx(0.0923 MB), csv(0.0008 MB), csv(0.0015 MB), pdf(1.215 MB), csv(0.0043 MB), csv(0.0021 MB), csv(0.0071 MB), csv(0.0051 MB), txt(0.0166 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/3145
Dataset updated
Oct 8, 2021
Dataset provided by
University of Edinburgh. Institute of Genetics and Cancer. MRC Human Genetics Unit
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
UNITED KINGDOM
Description
VIKING II was made possible thanks to Medical Research Council (MRC) funding. We aim to better understand what might cause diseases such as heart disease, eye disease, stroke, diabetes and others by inviting 4,000 people with 2 or more grandparents from Orkney and Shetland to complete a questionnaire and provide a saliva sample. This data dictionary outlines what volunteers were asked and indicates the data you can access. To access the data, please e-mail viking@ed.ac.uk.
H
Replication Data for: Automated Dictionary Generation for Political Event...
dataverse.harvard.edu
Updated Nov 12, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin J Radford (2018). Replication Data for: Automated Dictionary Generation for Political Event Coding [Dataset]. http://doi.org/10.7910/DVN/N3Y3MF
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/N3Y3MF
Dataset updated
Nov 12, 2018
Dataset provided by
Harvard Dataverse
Authors
Benjamin J Radford
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Event data provide high-resolution and high-volume information about political events and have supported a variety of research efforts across fields within and beyond political science. While these datasets are machine coded from vast amounts of raw text input, the necessary dictionaries require substantial prior knowledge and human effort to produce and update, effectively limiting the application of automated event-coding solutions to those domains for which dictionaries already exist. I introduce a novel method for generating dictionaries appropriate for event coding given only a small sample dictionary. This technique leverages recent advances in natural language processing and machine learning to reduce the prior knowledge and researcher-hours required to go from defining a new domain-of-interest to producing structured event data that describe that domain. I evaluate the method via actor-country classification and demonstrate the method’s ability to generalize to new domains with the production of a novel event dataset on cybersecurity.
u
Data from: Pesticide Data Program (PDP)
agdatacommons.nal.usda.gov
txt
Updated Nov 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Agriculture (USDA), Agricultural Marketing Service (AMS) (2023). Pesticide Data Program (PDP) [Dataset]. http://doi.org/10.15482/USDA.ADC/1520764
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1520764
Dataset updated
Nov 30, 2023
Dataset provided by
Ag Data Commons
Authors
U.S. Department of Agriculture (USDA), Agricultural Marketing Service (AMS)
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
The Pesticide Data Program (PDP) is a national pesticide residue database program. Through cooperation with State agriculture departments and other Federal agencies, PDP manages the collection, analysis, data entry, and reporting of pesticide residues on agricultural commodities in the U.S. food supply, with an emphasis on those commodities highly consumed by infants and children. This dataset provides information on where each tested sample was collected, where the product originated from, what type of product it was, and what residues were found on the product, for calendar years 1992 through 2020. The data can measure residues of individual compounds and classes of compounds, as well as provide information about the geographic distribution of the origin of samples, from growers, packers and distributors. The dataset also includes information on where the samples were taken, what laboratory was used to test them, and all testing procedures (by sample, so can be linked to the compound that is identified). The dataset also contains a reference variable for each compound that denotes the limit of detection for a pesticide/commodity pair (LOD variable). The metadata also includes EPA tolerance levels or action levels for each pesticide/commodity pair. The dataset will be updated on a continual basis, with a new resource data file added annually after the PDP calendar-year survey data is released. Resources in this dataset:Resource Title: CSV Data Dictionary for PDP. File Name: PDP_DataDictionary.csvResource Description: Machine-readable Comma Separated Values (CSV) format data dictionary for PDP Database Zip files. Defines variables for the sample identity and analytical results data tables/files. The ## characters in the Table and Text Data File name refer to the 2-digit year for the PDP survey, like 97 for 1997 or 01 for 2001. For details on table linking, see PDF. Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel Resource Title: Data dictionary for Pesticide Data Program. File Name: PDP DataDictionary.pdfResource Description: Data dictionary for PDP Database Zip files.Resource Software Recommended: Adobe Acrobat,url: https://www.adobe.com Resource Title: 2019 PDP Database Zip File. File Name: 2019PDPDatabase.zipResource Title: 2018 PDP Database Zip File. File Name: 2018PDPDatabase.zipResource Title: 2017 PDP Database Zip File. File Name: 2017PDPDatabase.zipResource Title: 2016 PDP Database Zip File. File Name: 2016PDPDatabase.zipResource Title: 2015 PDP Database Zip File. File Name: 2015PDPDatabase.zipResource Title: 2014 PDP Database Zip File. File Name: 2014PDPDatabase.zipResource Title: 2013 PDP Database Zip File. File Name: 2013PDPDatabase.zipResource Title: 2012 PDP Database Zip File. File Name: 2012PDPDatabase.zipResource Title: 2011 PDP Database Zip File. File Name: 2011PDPDatabase.zipResource Title: 2010 PDP Database Zip File. File Name: 2010PDPDatabase.zipResource Title: 2009 PDP Database Zip File. File Name: 2009PDPDatabase.zipResource Title: 2008 PDP Database Zip File. File Name: 2008PDPDatabase.zipResource Title: 2007 PDP Database Zip File. File Name: 2007PDPDatabase.zipResource Title: 2005 PDP Database Zip File. File Name: 2005PDPDatabase.zipResource Title: 2004 PDP Database Zip File. File Name: 2004PDPDatabase.zipResource Title: 2003 PDP Database Zip File. File Name: 2003PDPDatabase.zipResource Title: 2002 PDP Database Zip File. File Name: 2002PDPDatabase.zipResource Title: 2001 PDP Database Zip File. File Name: 2001PDPDatabase.zipResource Title: 2000 PDP Database Zip File. File Name: 2000PDPDatabase.zipResource Title: 1999 PDP Database Zip File. File Name: 1999PDPDatabase.zipResource Title: 1998 PDP Database Zip File. File Name: 1998PDPDatabase.zipResource Title: 1997 PDP Database Zip File. File Name: 1997PDPDatabase.zipResource Title: 1996 PDP Database Zip File. File Name: 1996PDPDatabase.zipResource Title: 1995 PDP Database Zip File. File Name: 1995PDPDatabase.zipResource Title: 1994 PDP Database Zip File. File Name: 1994PDPDatabase.zipResource Title: 1993 PDP Database Zip File. File Name: 1993PDPDatabase.zipResource Title: 1992 PDP Database Zip File. File Name: 1992PDPDatabase.zipResource Title: 2006 PDP Database Zip File. File Name: 2006PDPDatabase.zipResource Title: 2020 PDP Database Zip File. File Name: 2020PDPDatabase.zipResource Description: Data and supporting files for PDP 2020 surveyResource Software Recommended: Microsoft Access,url: https://products.office.com/en-us/access
d
Data Dictionary for Electron Microprobe Data Collected with Probe for EPMA...
catalog.data.gov
data.usgs.gov
+1more
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Data Dictionary for Electron Microprobe Data Collected with Probe for EPMA Software Package Developed by Probe Software [Dataset]. https://catalog.data.gov/dataset/data-dictionary-for-electron-microprobe-data-collected-with-probe-for-epma-software-packag
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Description
This data dictionary describes most of the possible output options given in the Probe for EPMA software package developed by Probe Software. Examples of the data output options include sample identification, analytical conditions, elemental weight percents, atomic percents, detection limits, and stage coordinates. Many more options are available and the data that is output will depend upon the end use.
f
Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov"
figshare.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura Miron; Rafael Gonçalves; Mark A. Musen (2023). Data from "Obstacles to the Reuse of Study Metadata in ClinicalTrials.gov" [Dataset]. http://doi.org/10.6084/m9.figshare.12743939.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12743939.v2
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Authors
Laura Miron; Rafael Gonçalves; Mark A. Musen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This fileset provides supporting data and corpora for the empirical study described in: Laura Miron, Rafael S. Goncalves and Mark A. Musen. Obstacles to the Reuse of Metadata in ClinicalTrials.govDescription of filesOriginal data files:- AllPublicXml.zip contains the set of all public XML records in ClinicalTrials.gov (protocols and summary results information), on which all remaining analyses are based. Set contains 302,091 records downloaded on April 3, 2019.- public.xsd is the XML schema downloaded from ClinicalTrials.gov on April 3, 2019, used to validate records in AllPublicXML.BioPortal API Query Results- condition_matches.csv contains the results of querying the BioPortal API for all ontology terms that are an 'exact match' to each condition string scraped from the ClinicalTrials.gov XML. Columns={filename, condition, url, bioportal term, cuis, tuis}. - intervention_matches.csv contains BioPortal API query results for all interventions scraped from the ClinicalTrials.gov XML. Columns={filename, intervention, url, bioportal term, cuis, tuis}.Data Element Definitions- supplementary_table_1.xlsx Mapping of element names, element types, and whether elements are required in ClinicalTrials.gov data dictionaries, the ClinicalTrials.gov XML schema declaration for records (public.XSD), the Protocol Registration System (PRS), FDAAA801, and the WHO required data elements for clinical trial registrations.Column and value definitions: - CT.gov Data Dictionary Section: Section heading for a group of data elements in the ClinicalTrials.gov data dictionary (https://prsinfo.clinicaltrials.gov/definitions.html) - CT.gov Data Dictionary Element Name: Name of an element/field according to the ClinicalTrials.gov data dictionaries (https://prsinfo.clinicaltrials.gov/definitions.html) and (https://prsinfo.clinicaltrials.gov/expanded_access_definitions.html) - CT.gov Data Dictionary Element Type: "Data" if the element is a field for which the user provides a value, "Group Heading" if the element is a group heading for several sub-fields, but is not in itself associated with a user-provided value. - Required for CT.gov for Interventional Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to interventional records (only observational or expanded access) - Required for CT.gov for Observational Records: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to observational records (only interventional or expanded access) - Required in CT.gov for Expanded Access Records?: "Required" if the element is required for interventional records according to the data dictionary, "CR" if the element is conditionally required, "Jan 2017" if the element is required for studies starting on or after January 18, 2017, the effective date of the FDAAA801 Final Rule, "-" indicates if this element is not applicable to expanded access records (only interventional or observational) - CT.gov XSD Element Definition: abbreviated xpath to the corresponding element in the ClinicalTrials.gov XSD (public.XSD). The full xpath includes 'clinical_study/' as a prefix to every element. (There is a single top-level element called "clinical_study" for all other elements.) - Required in XSD? : "Yes" if the element is required according to public.XSD, "No" if the element is optional, "-" if the element is not made public or included in the XSD - Type in XSD: "text" if the XSD type was "xs:string" or "textblock", name of enum given if type was enum, "integer" if type was "xs:integer" or "xs:integer" extended with the "type" attribute, "struct" if the type was a struct defined in the XSD - PRS Element Name: Name of the corresponding entry field in the PRS system - PRS Entry Type: Entry type in the PRS system. This column contains some free text explanations/observations - FDAAA801 Final Rule FIeld Name: Name of the corresponding required field in the FDAAA801 Final Rule (https://www.federalregister.gov/documents/2016/09/21/2016-22129/clinical-trials-registration-and-results-information-submission). This column contains many empty values where elements in ClinicalTrials.gov do not correspond to a field required by the FDA - WHO Field Name: Name of the corresponding field required by the WHO Trial Registration Data Set (v 1.3.1) (https://prsinfo.clinicaltrials.gov/trainTrainer/WHO-ICMJE-ClinTrialsgov-Cross-Ref.pdf)Analytical Results:- EC_human_review.csv contains the results of a manual review of random sample eligibility criteria from 400 CT.gov records. Table gives filename, criteria, and whether manual review determined the criteria to contain criteria for "multiple subgroups" of participants.- completeness.xlsx contains counts and percentages of interventional records missing fields required by FDAAA801 and its Final Rule.- industry_completeness.xlsx contains percentages of interventional records missing required fields, broken up by agency class of trial's lead sponsor ("NIH", "US Fed", "Industry", or "Other"), and before and after the effective date of the Final Rule- location_completeness.xlsx contains percentages of interventional records missing required fields, broken up by whether record listed at least one location in the United States and records with only international location (excluding trials with no listed location), and before and after the effective date of the Final RuleIntermediate Results:- cache.zip contains pickle and csv files of pandas dataframes with values scraped from the XML records in AllPublicXML. Downloading these files greatly speeds up running analysis steps from jupyter notebooks in our github repository.
E
Data from: Slovenian datasets for contextual synonym and antonym detection
live.european-language-grid.eu
binary format
Updated Oct 25, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Slovenian datasets for contextual synonym and antonym detection [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/20526
Explore at:
binary formatAvailable download formats
Dataset updated
Oct 25, 2022
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Slovenian datasets for contextual synonym and antonym detection can be used for training machine learning classifiers as described in the MSc thesis of Jasmina Pegan "Semantic detection of synonyms and antonyms with contextual embeddings" (https://repozitorij.uni-lj.si/IzpisGradiva.php?id=141456). Datasets contain example pairs of synonyms and antonyms in contexts together with additional information on a sense pair. Candidates for synonyms and antonyms were retrieved from the dataset created in the BSc thesis of Jasmina Pegan "Antonym detection with word embeddings" (https://repozitorij.uni-lj.si/IzpisGradiva.php?id=110533). Example sentences were retrieved from The comprehensive Slovenian-Hungarian dictionary (VSMS) (https://www.clarin.si/repository/xmlui/handle/11356/1453). Each dataset is class balanced and contains an equal amount of examples and counterexamples. An example is a pair of example sentences where the two words are synonyms/antonyms. A counterexample is a pair of example sentences where two words are not synonyms/antonyms. Note that a word pair can be synonymous or antonymous in some sense of the two words (but not in the given context).

Datasets are divided into two categories, datasets for synonyms and datasets for antonyms. Each category is further divided into base and updated datasets. These contain three dataset files: train, validation and test dataset. Base datasets include only manually-reviewed sense pairs. These are generated from all pairs of VSMS sense examples for all confirmed pairs of antonym and synonym senses. Updated datasets include automatically generated sense pairs while constraining the maximal number of examples per word. In this way, the dataset is more balanced word-wise, but is not fully manually-reviewed and contains less accurate data.

A single dataset entry contains the information on the base word, followed by data on synonym/antonym candidate. The last column discerns whether the sense pair is a pair of synonyms/antonyms or not. More details on this can be found inside the included README file.
Health Facilities State Enforcement Actions
healthdata.gov
data.chhs.ca.gov
+4more
application/rdfxml +5
Updated Apr 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
chhs.data.ca.gov (2025). Health Facilities State Enforcement Actions [Dataset]. https://healthdata.gov/State/Health-Facilities-State-Enforcement-Actions/kgyc-sgj5
Explore at:
csv, xml, application/rssxml, json, tsv, application/rdfxmlAvailable download formats
Dataset updated
Apr 8, 2025
Dataset provided by
chhs.data.ca.gov
Description
Note: This web page provides data on health facilities only. To file a complaint against a facility, please see: https://www.cdph.ca.gov/Programs/CHCQ/LCP/Pages/FileAComplaint.aspx

The California Department of Public Health (CDPH), Center for Health Care Quality, Licensing and Certification (L&C) Program licenses more than 30 types of healthcare facilities. The Electronic Licensing Management System (ELMS) is a CDPH data system created to manage state licensing-related data and enforcement actions.

The “Health Facilities’ State Enforcement Actions” dataset includes summary information for state enforcement actions (state citations or administrative penalties) issued to California healthcare facilities. This file, a sub-set of the ELMS system data, includes state enforcement actions that have been issued from July 1, 1997 through June 30, 2024. Data are presented for each citation/penalty, and include information about the type of enforcement action, violation category, penalty amount, violation date, appeal status, and facility.

The “LTC Citation Narrative” dataset contains the full text of citations that were issued to long-term care (LTC) facilities between January 1, 2012 – December 31, 2017. DO NOT DOWNLOAD in Excel as this file has large blocks of text which may truncate. For example, Excel 2007 and later display, and allow up to, 32,767 characters in each cell, whereas earlier versions of Excel allow 32,767 characters, but only display the first 1,024 characters. Please refer to instructions in “E_Citation_Access_DB_How_To_Docs”, about how to download and view data.

These files enable providers and the public to identify facility non-compliance and quality issues. By making this information available, quality issues can be identified and addressed. Please refer to the background paper, “About Health Facilities’ State Enforcement Actions” for information regarding California state enforcement actions before using these data. Data dictionaries and data summary charts are also available.

Note: The Data Dictionary at the bottom of the dataset incorrectly lists the data column formats as all Text. For proper format labels, please go here.
E
New Oxford Dictionary of English, 2nd Edition
live.european-language-grid.eu
catalog.elra.info
Updated Dec 6, 2005
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2005). New Oxford Dictionary of English, 2nd Edition [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/2276
Explore at:
Dataset updated
Dec 6, 2005
License
http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
Description
This is Oxford University Press's most comprehensive single-volume dictionary, with 170,000 entries covering all varieties of English worldwide. The NODE data set constitutes a fully integrated range of formal data types suitable for language engineering and NLP applications: It is available in XML or SGML. - Source dictionary data. The NODE data set includes all the information present in the New Oxford Dictionary of English itself, such as definition text, example sentences, grammatical indicators, and encyclopaedic material. - Morphological data. Each NODE lemma (both headwords and subentries) has a full listing of all possible syntactic forms (e.g. plurals for nouns, inflections for verbs, comparatives and superlatives for adjectives), tagged to show their syntactic relationships. Each form has an IPA pronunciation. Full morphological data is also given for spelling variants (e.g. typical American variants), and a system of links enables straightforward correlation of variant forms to standard forms. The data set thus provides robust support for all look-up routines, and is equally viable for applications dealing with American and British English. - Phrases and idioms. The NODE data set provides a rich and flexible codification of over 10,000 phrasal verbs and other multi-word phrases. It features comprehensive lexical resources enabling applications to identify a phrase not only in the form listed in the dictionary but also in a range of real-world variations, including alternative wording, variable syntactic patterns, inflected verbs, optional determiners, etc. - Subject classification. Using a categorization scheme of 200 key domains, over 80,000 words and senses have been associated with particular subject areas, from aeronautics to zoology. As well as facilitating the extraction of subject-specific sub-lexicons, this also provides an extensive resource for document categorization and information retrieval. - Semantic relationships. The relationships between every noun and noun sense in the dictionary are being codified using an extensive semantic taxonomy on the model of the Princeton WordNet project. (Mapping to WordNet 1.7 is supported.) This structure allows elements of the basic lexical database to function as a formal knowledge database, enabling functionality such as sense disambiguation and logical inference. - Derived from the detailed and authoritative corpus-based research of Oxford University Press's lexicographic team, the NODE data set is a powerful asset for any task dealing with real-world contemporary English usage. By integrating a number of different data types into a single structure, it creates a coherent resource which can be queried along numerous axes, allowing open-ended exploitation by many kinds of language-related applications.
H
Replication Code for: Proxy Advisory Firms and Corporate Shareholder...
dataverse.harvard.edu
search.dataone.org
Updated Sep 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joshua White (2024). Replication Code for: Proxy Advisory Firms and Corporate Shareholder Engagement [Dataset]. http://doi.org/10.7910/DVN/ABLKE4
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ABLKE4
Dataset updated
Sep 5, 2024
Dataset provided by
Harvard Dataverse
Authors
Joshua White
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Contains compressed file, "Proxy Advisory Firms and Corporate Shareholder Engagement.zip", which contains Stata code, Stata pseudo-datasets (to demonstrate format of data), and a data dictionary. Review of Financial Studies, forthcoming. (2024)
u
Data from: Delta Produce Sources Study
agdatacommons.nal.usda.gov
catalog.data.gov
txt
Updated Feb 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jessica Thomson (2024). Delta Produce Sources Study [Dataset]. http://doi.org/10.15482/USDA.ADC/1519430
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1519430
Dataset updated
Feb 15, 2024
Dataset provided by
Ag Data Commons
Authors
Jessica Thomson
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Delta Produce Sources Study was an observational study designed to measure and compare food environments of farmers markets (n=3) and grocery stores (n=12) in 5 rural towns located in the Lower Mississippi Delta region of Mississippi. Data were collected via electronic surveys from June 2019 to March 2020 using a modified version of the Nutrition Environment Measures Survey (NEMS) Farmers Market Audit tool. The tool was modified to collect information pertaining to source of fresh produce and also for use with both farmers markets and grocery stores. Availability, source, quality, and price information were collected and compared between farmers markets and grocery stores for 13 fresh fruits and 32 fresh vegetables via SAS software programming. Because the towns were not randomly selected and the sample sizes are relatively small, the data may not be generalizable to all rural towns in the Lower Mississippi Delta region of Mississippi. Resources in this dataset:Resource Title: Delta Produce Sources Study dataset . File Name: DPS Data Public.csvResource Description: The dataset contains variables corresponding to availability, source (country, state and town if country is the United States), quality, and price (by weight or volume) of 13 fresh fruits and 32 fresh vegetables sold in farmers markets and grocery stores located in 5 Lower Mississippi Delta towns.Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel Resource Title: Delta Produce Sources Study data dictionary. File Name: DPS Data Dictionary Public.csvResource Description: This file is the data dictionary corresponding to the Delta Produce Sources Study dataset.Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel