85 datasets found

T
Integrated Library System (ILS) Data Dictionary
cos-data.seattle.gov
data.seattle.gov
+1more
application/rdfxml +5
Updated May 10, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Seattle (2017). Integrated Library System (ILS) Data Dictionary [Dataset]. https://cos-data.seattle.gov/Community-and-Culture/Integrated-Library-System-ILS-Data-Dictionary/pbt3-ytbc
Explore at:
tsv, csv, application/rssxml, application/rdfxml, json, xmlAvailable download formats
Dataset updated
May 10, 2017
Dataset authored and provided by
City of Seattle
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Lookup table of Horizon item and borrower codes. The source of this data are the code definition tables in Horizon, such as bstat, btype, collection, and itype.

This dataset is useful for understanding the codes used in some of Seattle Public Library's other open datasets. These codes (namely "ItemType" and "ItemCollection") are systematically used in the cataloging of items within Integrated Library System (ILS), Horizon (Sirsidynix).
Datasets for figures and tables
datasets.ai
catalog.data.gov
+1more
57
Updated Aug 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency (2024). Datasets for figures and tables [Dataset]. https://datasets.ai/datasets/datasets-for-figures-and-tables
Explore at:
57Available download formats
Dataset updated
Aug 6, 2024
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Authors
U.S. Environmental Protection Agency
Description
Software

Model simulations were conducted using WRF version 3.8.1 (available at https://github.com/NCAR/WRFV3) and CMAQ version 5.2.1 (available at https://github.com/USEPA/CMAQ). The meteorological and concentration fields created using these models are too large to archive on ScienceHub, approximately 1 TB, and are archived on EPA’s high performance computing archival system (ASM) at /asm/MOD3APP/pcc/02.NOAH.v.CLM.v.PX/.

Figures

Figures 1 – 6 and Figure 8: Created using the NCAR Command Language (NCL) scripts (https://www.ncl.ucar.edu/get_started.shtml). NCLD code can be downloaded from the NCAR website (https://www.ncl.ucar.edu/Download/) at no cost. The data used for these figures are archived on EPA’s ASM system and are available upon request.

Figures 7, 8b-c, 8e-f, 8h-i, and 9 were created using the AMET utility developed by U.S. EPA/ORD. AMET can be freely downloaded and used at https://github.com/USEPA/AMET. The modeled data paired in space and time provided in this archive can be used to recreate these figures.

The data contained in the compressed zip files are organized in comma delimited files with descriptive headers or space delimited files that match tabular data in the manuscript. The data dictionary provides additional information about the files and their contents.

This dataset is associated with the following publication: Campbell, P., J. Bash, and T. Spero. Updates to the Noah Land Surface Model in WRF‐CMAQ to Improve Simulated Meteorology, Air Quality, and Deposition. Journal of Advances in Modeling Earth Systems. John Wiley & Sons, Inc., Hoboken, NJ, USA, 11(1): 231-256, (2019).
Open Data Portal Catalogue
open.canada.ca
datasets.ai
+1more
csv, json, jsonl, png +2
Updated Jun 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Treasury Board of Canada Secretariat (2025). Open Data Portal Catalogue [Dataset]. https://open.canada.ca/data/en/dataset/c4c5c7f1-bfa6-4ff6-b4a0-c164cb2060f7
Explore at:
csv, sqlite, json, png, jsonl, xlsxAvailable download formats
Dataset updated
Jun 14, 2025
Dataset provided by
Treasury Board of Canada Secretariathttp://www.tbs-sct.gc.ca/
Treasury Board of Canadahttps://www.canada.ca/en/treasury-board-secretariat/corporate/about-treasury-board.html
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
The open data portal catalogue is a downloadable dataset containing some key metadata for the general datasets available on the Government of Canada's Open Data portal. Resource 1 is generated using the ckanapi tool (external link) Resources 2 - 8 are generated using the Flatterer (external link) utility. ###Description of resources: 1. Dataset is a JSON Lines (external link) file where the metadata of each Dataset/Open Information Record is one line of JSON. The file is compressed with GZip. The file is heavily nested and recommended for users familiar with working with nested JSON. 2. Catalogue is a XLSX workbook where the nested metadata of each Dataset/Open Information Record is flattened into worksheets for each type of metadata. 3. datasets metadata contains metadata at the dataset level. This is also referred to as the package in some CKAN documentation. This is the main table/worksheet in the SQLite database and XLSX output. 4. Resources Metadata contains the metadata for the resources contained within each dataset. 5. resource views metadata contains the metadata for the views applied to each resource, if a resource has a view configured. 6. datastore fields metadata contains the DataStore information for CSV datasets that have been loaded into the DataStore. This information is displayed in the Data Dictionary for DataStore enabled CSVs. 7. Data Package Fields contains a description of the fields available in each of the tables within the Catalogue, as well as the count of the number of records each table contains. 8. data package entity relation diagram Displays the title and format for column, in each table in the Data Package in the form of a ERD Diagram. The Data Package resource offers a text based version. 9. SQLite Database is a .db database, similar in structure to Catalogue. This can be queried with database or analytical software tools for doing analysis.
d
APD Data Dictionary
catalog.data.gov
datahub.austintexas.gov
+1more
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.austintexas.gov (2025). APD Data Dictionary [Dataset]. https://catalog.data.gov/dataset/apd-data-dictionary
Explore at:
Dataset updated
Apr 25, 2025
Dataset provided by
data.austintexas.gov
Description
A table of the values and definitions of fields used in Austin Police Department datasets. City of Austin Open Data Terms of Use - https://data.austintexas.gov/stories/s/ranj-cccq
[Superseded] Intellectual Property Government Open Data 2019
researchdata.edu.au
data.gov.au
Updated Jun 6, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IP Australia (2019). [Superseded] Intellectual Property Government Open Data 2019 [Dataset]. https://researchdata.edu.au/superseded-intellectual-property-data-2019/2994670
Explore at:
Dataset updated
Jun 6, 2019
Dataset provided by
Data.govhttps://data.gov/
Authors
IP Australia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
What is IPGOD?\r

The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD.\r \r \r

How do I use IPGOD?\r

IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar.\r \r \r

IP Data Platform\r

IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform\r \r

References\r

\r The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset.\r \r * Patents\r * Trade Marks\r * Designs\r * Plant Breeder’s Rights\r \r \r

Updates\r

\r

Tables and columns\r

\r Due to the changes in our systems, some tables have been affected.\r \r * We have added IPGOD 225 and IPGOD 325 to the dataset!\r * The IPGOD 206 table is not available this year.\r * Many tables have been re-built, and as a result may have different columns or different possible values. Please check the data dictionary for each table before use.\r \r

Data quality improvements\r

\r Data quality has been improved across all tables.\r \r * Null values are simply empty rather than '31/12/9999'.\r * All date columns are now in ISO format 'yyyy-mm-dd'.\r * All indicator columns have been converted to Boolean data type (True/False) rather than Yes/No, Y/N, or 1/0.\r * All tables are encoded in UTF-8.\r * All tables use the backslash \ as the escape character.\r * The applicant name cleaning and matching algorithms have been updated. We believe that this year's method improves the accuracy of the matches. Please note that the "ipa_id" generated in IPGOD 2019 will not match with those in previous releases of IPGOD.
d
Open Data Dictionary Template Individual
opendata.dc.gov
catalog.data.gov
Updated Jan 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Washington, DC (2023). Open Data Dictionary Template Individual [Dataset]. https://opendata.dc.gov/documents/cb6a686b1e344eeb8136d0103c942346
Explore at:
Dataset updated
Jan 5, 2023
Dataset authored and provided by
City of Washington, DC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
This template covers section 2.5 Resource Fields: Entity and Attribute Information of the Data Discovery Form cited in the Open Data DC Handbook (2022). It completes documentation elements that are required for publication. Each field column (attribute) in the dataset needs a description clarifying the contents of the column. Data originators are encouraged to enter the code values (domains) of the column to help end-users translate the contents of the column where needed, especially when lookup tables do not exist.
Film Circulation dataset
zenodo.org
data.niaid.nih.gov
bin, csv, png
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Skadi Loist; Skadi Loist; Evgenia (Zhenya) Samoilova; Evgenia (Zhenya) Samoilova (2024). Film Circulation dataset [Dataset]. http://doi.org/10.5281/zenodo.7887672
Explore at:
csv, png, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7887672
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Skadi Loist; Skadi Loist; Evgenia (Zhenya) Samoilova; Evgenia (Zhenya) Samoilova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Complete dataset of “Film Circulation on the International Film Festival Network and the Impact on Global Film Culture”

A peer-reviewed data paper for this dataset is in review to be published in NECSUS_European Journal of Media Studies - an open access journal aiming at enhancing data transparency and reusability, and will be available from https://necsus-ejms.org/ and https://mediarep.org

Please cite this when using the dataset.

Detailed description of the dataset:

1 Film Dataset: Festival Programs

The Film Dataset consists a data scheme image file, a codebook and two dataset tables in csv format.

The codebook (csv file “1_codebook_film-dataset_festival-program”) offers a detailed description of all variables within the Film Dataset. Along with the definition of variables it lists explanations for the units of measurement, data sources, coding and information on missing data.

The csv file “1_film-dataset_festival-program_long” comprises a dataset of all films and the festivals, festival sections, and the year of the festival edition that they were sampled from. The dataset is structured in the long format, i.e. the same film can appear in several rows when it appeared in more than one sample festival. However, films are identifiable via their unique ID.

The csv file “1_film-dataset_festival-program_wide” consists of the dataset listing only unique films (n=9,348). The dataset is in the wide format, i.e. each row corresponds to a unique film, identifiable via its unique ID. For easy analysis, and since the overlap is only six percent, in this dataset the variable sample festival (fest) corresponds to the first sample festival where the film appeared. For instance, if a film was first shown at Berlinale (in February) and then at Frameline (in June of the same year), the sample festival will list “Berlinale”. This file includes information on unique and IMDb IDs, the film title, production year, length, categorization in length, production countries, regional attribution, director names, genre attribution, the festival, festival section and festival edition the film was sampled from, and information whether there is festival run information available through the IMDb data.

2 Survey Dataset

The Survey Dataset consists of a data scheme image file, a codebook and two dataset tables in csv format.

The codebook “2_codebook_survey-dataset” includes coding information for both survey datasets. It lists the definition of the variables or survey questions (corresponding to Samoilova/Loist 2019), units of measurement, data source, variable type, range and coding, and information on missing data.

The csv file “2_survey-dataset_long-festivals_shared-consent” consists of a subset (n=161) of the original survey dataset (n=454), where respondents provided festival run data for films (n=206) and gave consent to share their data for research purposes. This dataset consists of the festival data in a long format, so that each row corresponds to the festival appearance of a film.

The csv file “2_survey-dataset_wide-no-festivals_shared-consent” consists of a subset (n=372) of the original dataset (n=454) of survey responses corresponding to sample films. It includes data only for those films for which respondents provided consent to share their data for research purposes. This dataset is shown in wide format of the survey data, i.e. information for each response corresponding to a film is listed in one row. This includes data on film IDs, film title, survey questions regarding completeness and availability of provided information, information on number of festival screenings, screening fees, budgets, marketing costs, market screenings, and distribution. As the file name suggests, no data on festival screenings is included in the wide format dataset.

3 IMDb & Scripts

The IMDb dataset consists of a data scheme image file, one codebook and eight datasets, all in csv format. It also includes the R scripts that we used for scraping and matching.

The codebook “3_codebook_imdb-dataset” includes information for all IMDb datasets. This includes ID information and their data source, coding and value ranges, and information on missing data.

The csv file “3_imdb-dataset_aka-titles_long” contains film title data in different languages scraped from IMDb in a long format, i.e. each row corresponds to a title in a given language.

The csv file “3_imdb-dataset_awards_long” contains film award data in a long format, i.e. each row corresponds to an award of a given film.

The csv file “3_imdb-dataset_companies_long” contains data on production and distribution companies of films. The dataset is in a long format, so that each row corresponds to a particular company of a particular film.

The csv file “3_imdb-dataset_crew_long” contains data on names and roles of crew members in a long format, i.e. each row corresponds to each crew member. The file also contains binary gender assigned to directors based on their first names using the GenderizeR application.

The csv file “3_imdb-dataset_festival-runs_long” contains festival run data scraped from IMDb in a long format, i.e. each row corresponds to the festival appearance of a given film. The dataset does not include each film screening, but the first screening of a film at a festival within a given year. The data includes festival runs up to 2019.

The csv file “3_imdb-dataset_general-info_wide” contains general information about films such as genre as defined by IMDb, languages in which a film was shown, ratings, and budget. The dataset is in wide format, so that each row corresponds to a unique film.

The csv file “3_imdb-dataset_release-info_long” contains data about non-festival release (e.g., theatrical, digital, tv, dvd/blueray). The dataset is in a long format, so that each row corresponds to a particular release of a particular film.

The csv file “3_imdb-dataset_websites_long” contains data on available websites (official websites, miscellaneous, photos, video clips). The dataset is in a long format, so that each row corresponds to a website of a particular film.

The dataset includes 8 text files containing the script for webscraping. They were written using the R-3.6.3 version for Windows.

The R script “r_1_unite_data” demonstrates the structure of the dataset, that we use in the following steps to identify, scrape, and match the film data.

The R script “r_2_scrape_matches” reads in the dataset with the film characteristics described in the “r_1_unite_data” and uses various R packages to create a search URL for each film from the core dataset on the IMDb website. The script attempts to match each film from the core dataset to IMDb records by first conducting an advanced search based on the movie title and year, and then potentially using an alternative title and a basic search if no matches are found in the advanced search. The script scrapes the title, release year, directors, running time, genre, and IMDb film URL from the first page of the suggested records from the IMDb website. The script then defines a loop that matches (including matching scores) each film in the core dataset with suggested films on the IMDb search page. Matching was done using data on directors, production year (+/- one year), and title, a fuzzy matching approach with two methods: “cosine” and “osa.” where the cosine similarity is used to match titles with a high degree of similarity, and the OSA algorithm is used to match titles that may have typos or minor variations.

The script “r_3_matching” creates a dataset with the matches for a manual check. Each pair of films (original film from the core dataset and the suggested match from the IMDb website was categorized in the following five categories: a) 100% match: perfect match on title, year, and director; b) likely good match; c) maybe match; d) unlikely match; and e) no match). The script also checks for possible doubles in the dataset and identifies them for a manual check.

The script “r_4_scraping_functions” creates a function for scraping the data from the identified matches (based on the scripts described above and manually checked). These functions are used for scraping the data in the next script.

The script “r_5a_extracting_info_sample” uses the function defined in the “r_4_scraping_functions”, in order to scrape the IMDb data for the identified matches. This script does that for the first 100 films, to check, if everything works. Scraping for the entire dataset took a few hours. Therefore, a test with a subsample of 100 films is advisable.

The script “r_5b_extracting_info_all” extracts the data for the entire dataset of the identified matches.

The script “r_5c_extracting_info_skipped” checks the films with missing data (where data was not scraped) and tried to extract data one more time to make sure that the errors were not caused by disruptions in the internet connection or other technical issues.

The script “r_check_logs” is used for troubleshooting and tracking the progress of all of the R scripts used. It gives information on the amount of missing values and errors.

4 Festival Library Dataset

The Festival Library Dataset consists of a data scheme image file, one codebook and one dataset, all in csv format.

The codebook (csv file “4_codebook_festival-library_dataset”) offers a detailed description of all variables within the Library Dataset. It lists the definition of variables, such as location and festival name, and festival categories,
P
Census 2020 Table P1 12011 Blocks
data.pompanobeachfl.gov
broward-county-demographics-bcgis.hub.arcgis.com
+2more
Updated Feb 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
External Datasets (2023). Census 2020 Table P1 12011 Blocks [Dataset]. https://data.pompanobeachfl.gov/dataset/census-2020-table-p1-12011-blocks
Explore at:
zip, kml, geojson, html, csv, arcgis geoservices rest apiAvailable download formats
Dataset updated
Feb 28, 2023
Dataset provided by
cjennings_BCGIS
Authors
External Datasets
Description
2020 Census P.L. 94-171 is the first detailed data release from the 2020 Decennial Census of Population and Housing. The web layer is based on an extract for Table P1 – Race at the block level geography of Broward County, Florida. The data extract was then joined to the 2020 Census TIGER/Line Shapefiles.
For details on field names, table hierarchy, and table contents refer to TABLE (MATRIX) SECTION in Chapter 6. Data Dictionary, https://www2.census.gov/programs-surveys/decennial/2020/technical-documentation/complete-tech-docs/summary-file/2020Census_PL94_171Redistricting_StatesTechDoc_English.pdf" STYLE="text-decoration:underline;">2020 Census State Public Law 94-171 Summary File Technical Documentation.

Medical Service Study Area Data Dictionary

data.chhs.ca.gov

Updated Sep 5, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Department of Health Care Access and Information (2024). Medical Service Study Area Data Dictionary [Dataset]. https://data.chhs.ca.gov/dataset/medical-service-study-area-data-dictionary

Explore at:

csv, geojson, html, zip, kml, arcgis geoservices rest apiAvailable download formats

Dataset updated

Sep 5, 2024

Dataset provided by

CA Department of Health Care Access and Information

Authors

Department of Health Care Access and Information

Description

Field Name	Data Type	Description
Statefp	Number	US Census Bureau unique identifier of the state
Countyfp	Number	US Census Bureau unique identifier of the county
Countynm	Text	County name
Tractce	Number	US Census Bureau unique identifier of the census tract
Geoid	Number	US Census Bureau unique identifier of the state + county + census tract
Aland	Number	US Census Bureau defined land area of the census tract
Awater	Number	US Census Bureau defined water area of the census tract
Asqmi	Number	Area calculated in square miles from the Aland
MSSAid	Text	ID of the Medical Service Study Area (MSSA) the census tract belongs to
MSSAnm	Text	Name of the Medical Service Study Area (MSSA) the census tract belongs to
Definition	Text	Type of MSSA, possible values are urban, rural and frontier.
TotalPovPop	Number	US Census Bureau total population for whom poverty status is determined of the census tract, taken from the 2020 ACS 5 YR S1701

d
Integrated Tax System Data Dictionary
opendata.dc.gov
catalog.data.gov
+1more
Updated Feb 26, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Washington, DC (2018). Integrated Tax System Data Dictionary [Dataset]. https://opendata.dc.gov/datasets/integrated-tax-system-data-dictionary/api
Explore at:
Dataset updated
Feb 26, 2018
Dataset authored and provided by
City of Washington, DC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
Table defining the fields for the attribute table of the owner points feature. It reflects the new table structure OTR adopted in Tax Year 2005 reflecting the systems update to the public release file.
P
Census 2020 Table P2 12011 Blocks
data.pompanobeachfl.gov
Updated Feb 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
External Datasets (2023). Census 2020 Table P2 12011 Blocks [Dataset]. https://data.pompanobeachfl.gov/dataset/census-2020-table-p2-12011-blocks
Explore at:
csv, html, geojson, zip, kml, arcgis geoservices rest apiAvailable download formats
Dataset updated
Feb 28, 2023
Dataset provided by
cjennings_BCGIS
Authors
External Datasets
Description
2020 Census P.L. 94-171 is the first detailed data release from the 2020 Decennial Census of Population and Housing. The web layer is based on an extract for Table P2 – Hispanic or Latino, and Not Hispanic or Latino by Race at the block level geography of Broward County, Florida. The data extract was then joined to the 2020 Census TIGER/Line Shapefiles.
For details on field names, table hierarchy, and table contents refer to TABLE (MATRIX) SECTION in Chapter 6. Data Dictionary, https://www2.census.gov/programs-surveys/decennial/2020/technical-documentation/complete-tech-docs/summary-file/2020Census_PL94_171Redistricting_StatesTechDoc_English.pdf" STYLE="text-decoration:underline;">2020 Census State Public Law 94-171 Summary File Technical Documentation.
Restaurant Menu Items
kaggle.com
Updated Apr 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pranali Bose (2022). Restaurant Menu Items [Dataset]. https://www.kaggle.com/datasets/pranalibose/restaurant
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 28, 2022
Dataset provided by
Kaggle
Authors
Pranali Bose
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
About

The dataset has menu items price for around 100 restaurants.

Task

Perform EDA and share insights from the data.

Identify stores or items which are priced too high or too low with respect to other menu items or competitors.

Data Dictionary

Column Description
Restaurant Name of the Restaurant
Section Food Item Section
Item Food Item Name
Description Description of the Food Item
Price Cost of the Food Item
u
Data from: Pesticide Data Program (PDP)
agdatacommons.nal.usda.gov
txt
Updated Nov 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Agriculture (USDA), Agricultural Marketing Service (AMS) (2023). Pesticide Data Program (PDP) [Dataset]. http://doi.org/10.15482/USDA.ADC/1520764
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1520764
Dataset updated
Nov 30, 2023
Dataset provided by
Ag Data Commons
Authors
U.S. Department of Agriculture (USDA), Agricultural Marketing Service (AMS)
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
The Pesticide Data Program (PDP) is a national pesticide residue database program. Through cooperation with State agriculture departments and other Federal agencies, PDP manages the collection, analysis, data entry, and reporting of pesticide residues on agricultural commodities in the U.S. food supply, with an emphasis on those commodities highly consumed by infants and children. This dataset provides information on where each tested sample was collected, where the product originated from, what type of product it was, and what residues were found on the product, for calendar years 1992 through 2020. The data can measure residues of individual compounds and classes of compounds, as well as provide information about the geographic distribution of the origin of samples, from growers, packers and distributors. The dataset also includes information on where the samples were taken, what laboratory was used to test them, and all testing procedures (by sample, so can be linked to the compound that is identified). The dataset also contains a reference variable for each compound that denotes the limit of detection for a pesticide/commodity pair (LOD variable). The metadata also includes EPA tolerance levels or action levels for each pesticide/commodity pair. The dataset will be updated on a continual basis, with a new resource data file added annually after the PDP calendar-year survey data is released. Resources in this dataset:Resource Title: CSV Data Dictionary for PDP. File Name: PDP_DataDictionary.csvResource Description: Machine-readable Comma Separated Values (CSV) format data dictionary for PDP Database Zip files. Defines variables for the sample identity and analytical results data tables/files. The ## characters in the Table and Text Data File name refer to the 2-digit year for the PDP survey, like 97 for 1997 or 01 for 2001. For details on table linking, see PDF. Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel Resource Title: Data dictionary for Pesticide Data Program. File Name: PDP DataDictionary.pdfResource Description: Data dictionary for PDP Database Zip files.Resource Software Recommended: Adobe Acrobat,url: https://www.adobe.com Resource Title: 2019 PDP Database Zip File. File Name: 2019PDPDatabase.zipResource Title: 2018 PDP Database Zip File. File Name: 2018PDPDatabase.zipResource Title: 2017 PDP Database Zip File. File Name: 2017PDPDatabase.zipResource Title: 2016 PDP Database Zip File. File Name: 2016PDPDatabase.zipResource Title: 2015 PDP Database Zip File. File Name: 2015PDPDatabase.zipResource Title: 2014 PDP Database Zip File. File Name: 2014PDPDatabase.zipResource Title: 2013 PDP Database Zip File. File Name: 2013PDPDatabase.zipResource Title: 2012 PDP Database Zip File. File Name: 2012PDPDatabase.zipResource Title: 2011 PDP Database Zip File. File Name: 2011PDPDatabase.zipResource Title: 2010 PDP Database Zip File. File Name: 2010PDPDatabase.zipResource Title: 2009 PDP Database Zip File. File Name: 2009PDPDatabase.zipResource Title: 2008 PDP Database Zip File. File Name: 2008PDPDatabase.zipResource Title: 2007 PDP Database Zip File. File Name: 2007PDPDatabase.zipResource Title: 2005 PDP Database Zip File. File Name: 2005PDPDatabase.zipResource Title: 2004 PDP Database Zip File. File Name: 2004PDPDatabase.zipResource Title: 2003 PDP Database Zip File. File Name: 2003PDPDatabase.zipResource Title: 2002 PDP Database Zip File. File Name: 2002PDPDatabase.zipResource Title: 2001 PDP Database Zip File. File Name: 2001PDPDatabase.zipResource Title: 2000 PDP Database Zip File. File Name: 2000PDPDatabase.zipResource Title: 1999 PDP Database Zip File. File Name: 1999PDPDatabase.zipResource Title: 1998 PDP Database Zip File. File Name: 1998PDPDatabase.zipResource Title: 1997 PDP Database Zip File. File Name: 1997PDPDatabase.zipResource Title: 1996 PDP Database Zip File. File Name: 1996PDPDatabase.zipResource Title: 1995 PDP Database Zip File. File Name: 1995PDPDatabase.zipResource Title: 1994 PDP Database Zip File. File Name: 1994PDPDatabase.zipResource Title: 1993 PDP Database Zip File. File Name: 1993PDPDatabase.zipResource Title: 1992 PDP Database Zip File. File Name: 1992PDPDatabase.zipResource Title: 2006 PDP Database Zip File. File Name: 2006PDPDatabase.zipResource Title: 2020 PDP Database Zip File. File Name: 2020PDPDatabase.zipResource Description: Data and supporting files for PDP 2020 surveyResource Software Recommended: Microsoft Access,url: https://products.office.com/en-us/access

Column	Description
Restaurant	Name of the Restaurant
Section	Food Item Section
Item	Food Item Name
Description	Description of the Food Item
Price	Cost of the Food Item

Predict Women's Basketball Since 1997

kaggle.com

Updated Oct 27, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Aman Chauhan (2022). Predict Women's Basketball Since 1997 [Dataset]. https://www.kaggle.com/datasets/whenamancodes/predict-womens-basketball-since-1997

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 27, 2022

Dataset provided by

Kaggle

Authors

Aman Chauhan

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Women's National Basketball Association (WNBA) is a professional basketball league in the United States. It is composed of twelve teams. The league was founded on April 22, 1996, as the women's counterpart to the National Basketball Association (NBA), and league play started in 1997. The regular season is played from May to September, with the All Star game being played midway through the season in July (except in Olympic years) and the WNBA Finals at the end of September until the beginning of October. Find More on Wiki

About Files:

wnba_elo.csv contains game-by-game Elo ratings and forecasts since 1997.
wnba_elo_latest.csv contains game-by-game Elo ratings and forecasts for only the latest season.

Data Dictionary

Column	Definition
season	Year of season
date	Date of game
playoff	Whether game was in playoffs
neutral	Whether game was on a neutral site
status	post if the game already happened; pre if it hasn't happened yet; live if it is being played at the time of data export
home_team	Home team name
away_team	Away team name
home_team_abbr	Home team abbreviation. Multiple team names can fall under the same team_abbr because of name changes or moves. Interactive is grouped by team_abbr.
away_team_abbr	Away team abbreviation. Multiple team names can fall under the same team_abbr because of name changes or moves. Interactive is grouped by team_abbr.
home_team_pregame_rating	Home team's Elo rating before the game
away_team_pregame_rating	Away team's Elo rating before the game
home_team_winprob	Home team's probability of winning according to team ratings
away_team_winprob	Away team's probability of winning according to team ratings
home_team_score	Home team's score (will be blank for pre and live games)
away_team_score	Away team's score (will be blank for pre and live games)
home_team_postgame_rating	Home team's rating after the game (will be blank for pre and live games)
away_team_postgame_rating	Away team's rating after the game (will be blank for pre and live games)
commissioners_cup_final	Whether this game was the WNBA Commissioner's Cup Championship Game. All Championship Games shift teams' Elo ratings but are excluded from Monte Carlo simulations, as they do not count towards teams' regular season records.

Park, Beach, Open Space, or Coastline Access
data.chhs.ca.gov
healthdata.gov
+3more
csv, html, pdf, xlsx +1
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). Park, Beach, Open Space, or Coastline Access [Dataset]. https://data.chhs.ca.gov/dataset/park-beach-open-space-or-coastline-access
Explore at:
xlsx, zip, pdf, csv(129337734), htmlAvailable download formats
Dataset updated
Apr 21, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
This table contains data on access to parks measured as the percent of population within ½ a mile of a parks, beach, open space or coastline for California, its regions, counties, county subdivisions, cities, towns, and census tracts. More information on the data table and a data dictionary can be found in the Data and Resources section. As communities become increasingly more urban, parks and the protection of green and open spaces within cities increase in importance. Parks and natural areas buffer pollutants and contribute to the quality of life by providing communities with social and psychological benefits such as leisure, play, sports, and contact with nature. Parks are critical to human health by providing spaces for health and wellness activities. The access to parks table is part of a series of indicators in the Healthy Communities Data and Indicators Project (HCI) of the Office of Health Equity. The goal of HCI is to enhance public health by providing data, a standardized set of statistical measures, and tools that a broad array of sectors can use for planning healthy communities and evaluating the impact of plans, projects, policy, and environmental changes on community health. The creation of healthy social, economic, and physical environments that promote healthy behaviors and healthy outcomes requires coordination and collaboration across multiple sectors, including transportation, housing, education, agriculture and others. Statistical metrics, or indicators, are needed to help local, regional, and state public health and partner agencies assess community environments and plan for healthy communities that optimize public health. The format of the access to parks table is based on the standardized data format for all HCI indicators. As a result, this data table contains certain variables used in the HCI project (e.g., indicator ID, and indicator definition). Some of these variables may contain the same value for all observations.

‘Sample Super Store’ analyzed by Analyst-2

analyst-2.ai

Facebook

Twitter

Click to copy link

Link copied

Cite

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Sample Super Store’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-sample-super-store-2a4f/latest

Explore at:

Dataset authored and provided by

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Analysis of ‘Sample Super Store’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ibrahimelsayed182/sample-super-store on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Context

super Store in USA , the data contain about 10000 rows

Data Dictionary

Attributes	Definition	example
Ship Mode		Second Class
Segment	Segment Category	Consumer
Country		United State
City		Los Angeles
State		California
Postal Code		90032
Region		West
Category	Categories of product	Technology
Sub-Category		Phones
Sales	number of sales	114.9
Quantity		3
Discount		0.45
Profit		14.1694

Acknowledgements

All thanks to The Sparks Foundation For making this data set

Inspiration

Get the data and try to take insights. Good luck ❤️

--- Original source retains full ownership of the source dataset ---

‘Titanic dataset’ analyzed by Analyst-2

analyst-2.ai

Updated Feb 13, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Titanic dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-titanic-dataset-7d6b/0f1d826e/?iid=009-906&v=presentation

Explore at:

Dataset updated

Feb 13, 2022

Dataset authored and provided by

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Analysis of ‘Titanic dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ibrahimelsayed182/titanic-dataset on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Overview

This is Titanic dataset

Data Dictionary

Attributes	Definition	Key
sex	Sex/Gender	male/female
age	Age
sibsp	siblings of the passenger	0/1 /2 ...
parch	parents / children aboard the Titanic	0/1/2 ...
fare	Passenger fare
embarked	Port of Embarkation	C : Cherbourg, Q : Queenstown, S : Southampton
class	Ticket class	First / Second / Third
who	categories to passengers	male, female, child
alone	he was alone in ship or no	0/1
survived		0/1

--- Original source retains full ownership of the source dataset ---

d
Secondary Network (SPEN_015) Data Quality Checks - Dataset - Datopian CKAN...
demo.dev.datopian.com
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Secondary Network (SPEN_015) Data Quality Checks - Dataset - Datopian CKAN instance [Dataset]. https://demo.dev.datopian.com/dataset/sp-energy-networks--spen_data_quality_secondary_substation
Explore at:
Dataset updated
May 27, 2025
Description
This data table provides the detailed data quality assessment scores for the Secondary Network dataset. The quality assessment was carried out on 31st March. At SPEN, we are dedicated to sharing high-quality data with our stakeholders and being transparent about its' quality. This is why we openly share the results of our data quality assessments. We collaborate closely with Data Owners to address any identified issues and enhance our overall data quality; to demonstrate our progress we conduct annual assessments of our data quality in line with the dataset refresh rate. To learn more about our approach to how we assess data quality, visit Data Quality - SP Energy Networks. We welcome feedback and questions from our stakeholders regarding this process. Our Open Data Team is available to answer any enquiries or receive feedback on the assessments. You can contact them via our Open Data mailbox at opendata@spenergynetworks.co.uk.The first phase of our comprehensive data quality assessment measures the quality of our datasets across three dimensions. Please refer to the data table schema for the definitions of these dimensions. We are now in the process of expanding our quality assessments to include additional dimensions to provide a more comprehensive evaluation and will update the data tables with the results when available.
o
Long Term Development Statement (LTDS) Table 3a Observed Peak Demand
ukpowernetworks.opendatasoft.com
Updated May 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Long Term Development Statement (LTDS) Table 3a Observed Peak Demand [Dataset]. https://ukpowernetworks.opendatasoft.com/explore/dataset/ltds-table-3a-load-data-observed/
Explore at:
Dataset updated
May 30, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionThe Long Term Development Statements (LTDS) report on a 0-5 year period, describing a forecast of load on the network and envisioned network developments. The LTDS is published at the end of May and November each year. This is Table 3a from our current LTDS report (published 30 May 2025), showing the observed substation peak demands with no correction for demand served by generation. More information and full reports are available from the landing page below: Long Term Development Statement and Network Development Plan Landing Page

Methodological Approach

Site Functional Locations (FLOCs) are used to associate the Substation to Key characteristics of active Grid and Primary sites — UK Power Networks ID field added to identify row number for reference purposes

Quality Control Statement Quality Control Measures include:

Verification steps to match features only with confirmed functional locations. Manual review and correction of data inconsistencies. Use of additional verification steps to ensure accuracy in the methodology.

Assurance Statement The Open Data Team and Network Insights Team worked together to ensure data accuracy and consistency. Other

Download dataset information: Metadata (JSON) Definitions of key terms related to this dataset can be found in the Open Data Portal Glossary: https://ukpowernetworks.opendatasoft.com/pages/glossary/
S
NASICON-type solid electrolyte materials named entity recognition dataset
scidb.cn
Updated Apr 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liu Yue; Liu Dahui; Yang Zhengwei; Shi Siqi (2023). NASICON-type solid electrolyte materials named entity recognition dataset [Dataset]. http://doi.org/10.57760/sciencedb.j00213.00001
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.j00213.00001
Dataset updated
Apr 27, 2023
Dataset provided by
Science Data Bank
Authors
Liu Yue; Liu Dahui; Yang Zhengwei; Shi Siqi
Description
1.Framework overview. This paper proposed a pipeline to construct high-quality datasets for text mining in materials science. Firstly, we utilize the traceable automatic acquisition scheme of literature to ensure the traceability of textual data. Then, a data processing method driven by downstream tasks is performed to generate high-quality pre-annotated corpora conditioned on the characteristics of materials texts. On this basis, we define a general annotation scheme derived from materials science tetrahedron to complete high-quality annotation. Finally, a conditional data augmentation model incorporating materials domain knowledge (cDA-DK) is constructed to augment the data quantity.2.Dataset information. The experimental datasets used in this paper include: the Matscholar dataset publicly published by Weston et al. (DOI: 10.1021/acs.jcim.9b00470), and the NASICON entity recognition dataset constructed by ourselves. Herein, we mainly introduce the details of NASICON entity recognition dataset.2.1 Data collection and preprocessing. Firstly, 55 materials science literature related to NASICON system are collected through Crystallographic Information File (CIF), which contains a wealth of structure-activity relationship information. Note that materials science literature is mostly stored as portable document format (PDF), with content arranged in columns and mixed with tables, images, and formulas, which significantly compromises the readability of the text sequence. To tackle this issue, we employ the text parser PDFMiner (a Python toolkit) to standardize, segment, and parse the original documents, thereby converting PDF literature into plain text. In this process, the entire textual information of literature, encompassing title, author, abstract, keywords, institution, publisher, and publication year, is retained and stored as a unified TXT document. Subsequently, we apply rules based on Python regular expressions to remove redundant information, such as garbled characters and line breaks caused by figures, tables, and formulas. This results in a cleaner text corpus, enhancing its readability and enabling more efficient data analysis. Note that special symbols may also appear as garbled characters, but we refrain from directly deleting them, as they may contain valuable information such as chemical units. Therefore, we converted all such symbols to a special token

Facebook

Twitter

Click to copy link

Link copied

Cite

City of Seattle (2017). Integrated Library System (ILS) Data Dictionary [Dataset]. https://cos-data.seattle.gov/Community-and-Culture/Integrated-Library-System-ILS-Data-Dictionary/pbt3-ytbc

Integrated Library System (ILS) Data Dictionary

Explore at:

tsv, csv, application/rssxml, application/rdfxml, json, xmlAvailable download formats

Dataset updated

May 10, 2017

Dataset authored and provided by

City of Seattle

License

U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically

Description

Lookup table of Horizon item and borrower codes. The source of this data are the code definition tables in Horizon, such as bstat, btype, collection, and itype.

This dataset is useful for understanding the codes used in some of Seattle Public Library's other open datasets. These codes (namely "ItemType" and "ItemCollection") are systematically used in the cataloging of items within Integrated Library System (ILS), Horizon (Sirsidynix).

Clear search

Close search

Google apps

Main menu

Integrated Library System (ILS) Data Dictionary

Datasets for figures and tables

Open Data Portal Catalogue

APD Data Dictionary

[Superseded] Intellectual Property Government Open Data 2019

What is IPGOD?\r

How do I use IPGOD?\r

IP Data Platform\r

References\r

Updates\r

Tables and columns\r

Data quality improvements\r

Open Data Dictionary Template Individual

Film Circulation dataset

Census 2020 Table P1 12011 Blocks

Medical Service Study Area Data Dictionary

Integrated Tax System Data Dictionary

Census 2020 Table P2 12011 Blocks

Restaurant Menu Items

About

Task

Data Dictionary

Data from: Pesticide Data Program (PDP)

Predict Women's Basketball Since 1997

About Files:

Data Dictionary

Park, Beach, Open Space, or Coastline Access

‘Sample Super Store’ analyzed by Analyst-2

Context

Data Dictionary

Acknowledgements

Inspiration

‘Titanic dataset’ analyzed by Analyst-2

Overview

Data Dictionary

Secondary Network (SPEN_015) Data Quality Checks - Dataset - Datopian CKAN...

Long Term Development Statement (LTDS) Table 3a Observed Peak Demand

NASICON-type solid electrolyte materials named entity recognition dataset

Integrated Library System (ILS) Data Dictionary