40 datasets found

o
Open Data Portal Glossary
ukpowernetworks.opendatasoft.com
csv, excel, json
Updated Nov 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Open Data Portal Glossary [Dataset]. https://ukpowernetworks.opendatasoft.com/explore/dataset/ukpn-business-glossary/
Explore at:
csv, excel, jsonAvailable download formats
Dataset updated
Nov 7, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction This dataset contains the terms and definitions included on the UKPN Open Data Portal Glossary Page.

Methodological Approach This dataset is sourced from UK Power Networks internal business glossary.

Quality Control Statement Quality Control Measures include:

Manual review and correction of data inconsistencies Use of additional verification steps to ensure accuracy in the methodology

Assurance Statement The Open Data Team and Data Governance Team worked together to ensure data accuracy and consistency.

Other UKPN Open Data Portal Glossary helps ensure common understanding of terms, used or related to the datasets published on UKPN Open Data Portal. Download dataset information: Metadata (JSON) Definitions of key terms related to this dataset can be found in the Open Data Portal Glossary: https://ukpowernetworks.opendatasoft.com/pages/glossary/
Glosario: A multilingual glossary for computing and data science terms.
zenodo.org
bin
Updated Sep 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2025). Glosario: A multilingual glossary for computing and data science terms. [Dataset]. http://doi.org/10.5281/zenodo.17085869
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17085869
Dataset updated
Sep 9, 2025
Dataset provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
glosario is an open-source glossary of terms used in data science that is available online and also as a library in both R and Python. By adding glossary keys to a lesson’s metadata, authors can indicate what the lesson teaches, what learners ought to know before they start, and where they can go to find that knowledge. Authors can also use the library’s functions to insert consistent hyperlinks for terms and definitions in their lessons in any of several languages. The master copy of the glossary lives in the glossary.yml file.

Open-Source GitHub Repos: Stars, Issues & PRs

kaggle.com

zip

Updated Sep 6, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Mohammed Mebarek Mecheter (2024). Open-Source GitHub Repos: Stars, Issues & PRs [Dataset]. https://www.kaggle.com/datasets/mohammedmecheter/open-source-github-repos-stars-issues-and-prs

Explore at:

zip(17491462 bytes)Available download formats

Dataset updated

Sep 6, 2024

Authors

Mohammed Mebarek Mecheter

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Introduction to the Data and Fetching Process

This dataset comprises detailed information about GitHub repositories, issues, and pull requests, collected using the GitHub API. The data includes repository metadata (such as stars, forks, and open issues), along with historical data on issues and pull requests (PRs), including their creation, closure, and merging timelines.

Repositories Data Dictionary

This dataset contains information about GitHub repositories, including metadata such as stars, forks, and activity status.

Column Name	Data Type	Description
`id`	object	Unique identifier for the repository.
`name`	object	Name of the repository (e.g., "docker").
`full_name`	object	Full name of the repository (e.g., "prometheus/alertmanager").
`description`	object	Description of the repository, may be empty.
`stars`	int64	Number of stars the repository has.
`forks`	int64	Number of times the repository has been forked.
`open_issues`	int64	Number of open issues in the repository.
`created_at`	datetime	Date and time when the repository was created.
`updated_at`	datetime	Date and time when the repository was last updated.
`size_category`	object	Categorization of the repository based on the number of stars (micro, small, medium, large, mega).
`stale`	bool	Boolean flag indicating if the repository is "stale" (hasn't been updated in over 6 months).
`stars_per_fork`	float64	Number of stars per fork (calculated).
`stars_per_issue`	float64	Number of stars per open issue (calculated).
`contributor_per_star`	float64	Number of contributors per star (calculated).
`total_contributors`	int64	Total number of contributors from issues and pull requests.

Issues Data Dictionary

This dataset contains details of issues raised in the repositories, including information about their creation, closing, and state.

Column Name	Data Type	Description
`id`	object	Unique identifier for the issue.
`created_at`	datetime	Date and time when the issue was created.
`updated_at`	datetime	Date and time when the issue was last updated.
`closed_at`	datetime	Date and time when the issue was closed (optional, null if open).
`number`	int64	Issue number in the GitHub repository.
`repository`	object	The repository that the issue belongs to (name).
`state`	object	Current state of the issue (either "open" or "closed").
`title`	object	Title of the issue.
`resolution_time_days`	float64	Number of days taken to resolve the issue (calculated, -1 for unresolved issues).

Pull Requests Data Dictionary

This dataset contains information about pull requests (PRs) in the repositories, including metadata such as their state, creation, closing, and merging time.

Column Name	Data Type	Description
`id`	object	Unique identifier for the pull request.
`created_at`	datetime	Date and time when the pull request was created.
`updated_at`	datetime	Date and time when the pull request was last updated.
`closed_at`	datetime	Date and time when the pull request was closed (optional, null if open).
`merged_at`	datetime	Date and time when the pull request was merged (optional, null if not merge...

d
General Offenses (Open Data)
catalog.data.gov
data.tempe.gov
+11more
Updated Oct 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2025). General Offenses (Open Data) [Dataset]. https://catalog.data.gov/dataset/general-offenses-open-data
Explore at:
Dataset updated
Oct 25, 2025
Dataset provided by
City of Tempe
Description
The General Offense Crime Report Dataset includes criminal and city code violation offenses which document the scope and nature of each offense or information gathering activity. It is used to computate the Uniform Crime Report Index as reported to the Federal Bureau of Investigation and for local crime reporting purposes.Contact E-mailLink: N/AData Source: Versaterm Informix RMS \Data Source Type: Informix and/or SQL ServerPreparation Method: Preparation Method: Automated View pulled from SQL Server and published as hosted resource onto ArcGIS OnlinePublish Frequency: WeeklyPublish Method: AutomaticData Dictionary
n
Data from: Development of Data Dictionary for neonatal intensive care unit:...
data-staging.niaid.nih.gov
data.niaid.nih.gov
+1more
zip
Updated Dec 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harpreet Singh; Ravneet Kaur; Satish Saluja; Su Cho; Avneet Kaur; Ashish Pandey; Shubham Gupta; Ritu Das; Praveen Kumar; Jonathan Palma; Gautam Yadav; Yao Sun (2020). Development of Data Dictionary for neonatal intensive care unit: advancement towards a better critical care unit [Dataset]. http://doi.org/10.5061/dryad.zkh18936f
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.zkh18936f
Dataset updated
Dec 27, 2020
Dataset provided by
KLKH
Indraprastha Institute of Information Technology Delhi
Ewha Womans University
UCSF Benioff Children's Hospital
CHIL
Apollo Cradle For Women & Children
Sir Ganga Ram Hospital
Post Graduate Institute of Medical Education and Research
Lucile Packard Children's Hospital
Authors
Harpreet Singh; Ravneet Kaur; Satish Saluja; Su Cho; Avneet Kaur; Ashish Pandey; Shubham Gupta; Ritu Das; Praveen Kumar; Jonathan Palma; Gautam Yadav; Yao Sun
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Background: Critical care units (CCUs) with wide use of various monitoring devices generate massive data. To utilize the valuable information of these devices; data are collected and stored using systems like Clinical Information System (CIS), Laboratory Information Management System (LIMS), etc. These systems are proprietary in nature, allow limited access to their database and have vendor specific clinical implementation. In this study we focus on developing an open source web-based meta-data repository for CCU representing stay of patient with relevant details.

Methods: After developing the web-based open source repository we analyzed prospective data from two sites for four months for data quality dimensions (completeness, timeliness, validity, accuracy and consistency), morbidity and clinical outcomes. We used a regression model to highlight the significance of practice variations linked with various quality indicators. Results: Data dictionary (DD) with 1447 fields (90.39% categorical and 9.6% text fields) is presented to cover clinical workflow of NICU. The overall quality of 1795 patient days data with respect to standard quality dimensions is 87%. The data exhibit 82% completeness, 97% accuracy, 91% timeliness and 94% validity in terms of representing CCU processes. The data scores only 67% in terms of consistency. Furthermore, quality indicator and practice variations are strongly correlated (p-value < 0.05).

Results: Data dictionary (DD) with 1555 fields (89.6% categorical and 11.4% text fields) is presented to cover clinical workflow of a CCU. The overall quality of 1795 patient days data with respect to standard quality dimensions is 87%. The data exhibit 82% completeness, 97% accuracy, 91% timeliness and 94% validity in terms of representing CCU processes. The data scores only 67% in terms of consistency. Furthermore, quality indicators and practice variations are strongly correlated (p-value < 0.05).

Conclusion: This study documents DD for standardized data collection in CCU. This provides robust data and insights for audit purposes and pathways for CCU to target practice improvements leading to specific quality improvements.
SF Master data dictionary
kaggle.com
zip
Updated Jul 1, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of San Francisco (2021). SF Master data dictionary [Dataset]. https://www.kaggle.com/san-francisco/sf-master-data-dictionary
Explore at:
zip(267400 bytes)Available download formats
Dataset updated
Jul 1, 2021
Dataset authored and provided by
City of San Francisco
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Area covered
San Francisco
Description
Content

Note: This dataset is under active development and the schema is subject to change without notice. This represents the current list of fields available within the open data portal organized by dataset. Fields may be documented within through attached documentation or not at all. Over time we will collect and merge all field definitions to this dataset to simplify access to field documentation. It will be updated on a rolling basis.

Context

This is a dataset hosted by the city of San Francisco. The organization has an open data platform found here and they update their information according the amount of data that is brought in. Explore San Francisco's Data using Kaggle and all of the data sources available through the San Francisco organization page!

Update Frequency: This dataset is updated quarterly.

Acknowledgements

This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.

Cover photo by _HealthyMond on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
Data from: Framework to Develop an Open-Source Forage Data Network to...
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data from: Framework to Develop an Open-Source Forage Data Network to Improve Primary Productivity and Enhance System Resiliency [Dataset]. https://catalog.data.gov/dataset/data-from-framework-to-develop-an-open-source-forage-data-network-to-improve-primary-produ-79f90
Explore at:
Dataset updated
Jul 11, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
A compilation of experimental forage data from 108 unique locations across the United States, with harvest dates ranging from 1958 to 2022. This dataset contains a subset of the data compiled in the initial stages of development of the Forage Data Hub. In particular, these are the 37,970 data entries used for the forage system resiliency analysis presented in the primary article. Resources in this dataset: Resource Title: FDH Data Dictionary File Name: FDH_Data_Dictionary.csv Resource Description: Data dictionary for the data compiled as a result of the efforts described in Ashworth et al. (2023) - Framework to Develop an Open-Source Forage Data Network to Improve Primary Productivity and Enhance System Resiliency (in review). Includes descriptions for the data fields in the FDH Data data file. Resource Title: FDH Data File Name: FDH_Data_03-04-2023.csv Resource Description: Data compiled as a result of the efforts described in Ashworth et al. (2023) - Framework to Develop an Open-Source Forage Data Network to Improve Primary Productivity and Enhance System Resiliency (in review). Includes a lightly preprocessed version of the data housed in the Forage Data Hub as of March 4th, 2023.
a
Field Alias Glossary (PRA State Assets)
lda-open-data-1-lda-ie.hub.arcgis.com
data.europa.eu
+3more
Updated Mar 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Land Development Agency (2023). Field Alias Glossary (PRA State Assets) [Dataset]. https://lda-open-data-1-lda-ie.hub.arcgis.com/datasets/bd619afee7064981bf4a7b569a250f3b
Explore at:
Dataset updated
Mar 16, 2023
Dataset authored and provided by
Land Development Agency
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description: All sites shown within the “PRA State Assets” layer are folios sourced from searching the PRAI database and are accurate from the date of that PRAI search. The registered owner field may have been changed to keep consistency throughout the database.The folio boundary data available on this site is derived from source data provided by the Property Registration Authority (PRA) and is subject to PRA copyright. The currency and accuracy of this data at the time of inspection cannot be guaranteed. Those wishing to ensure that folio boundary data is the most accurate and up to date available should access this information through landdirect.ie.The information shown within the “PRA State Assets” layer depicts sites we believe to be within the ownership of the state however at the time of compiling this layer they were not registered with the PRAI. Please note the State Assets Sourced by the LDA sites have been manually sourced and drawn by the LDA and will be updated regularly.Please contact assetdatabase@lda.ie if we have shown any incorrect information or if we are missing State-owned assets within these layers.Access and Constraints: https://creativecommons.org/licenses/by/4.0/
National Bridge Inventory Element Data
catalog.data.gov
geodata.bts.gov
+3more
Updated Sep 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Highway Administration (FHWA) (Point of Contact) (2025). National Bridge Inventory Element Data [Dataset]. https://catalog.data.gov/dataset/national-bridge-inventory-element-data1
Explore at:
Dataset updated
Sep 5, 2025
Dataset provided by
Federal Highway Administrationhttps://highways.dot.gov/
Description
The National Bridge Inventory Elements dataset is as of June 20, 2025 from the Federal Highway Administration (FHWA) and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). The data describes more than 620,000 of the Nation's bridges located on public roads, including Interstate Highways, U.S. highways, State and county roads, as well as publicly-accessible bridges on Federal and Tribal lands. The element data present a breakdown of the condition of each structural and bridge management element for each bridge on the National Highway System (NHS). The Specification for the National Bridge Inventory Bridge Elements contains a detailed description of each data element including coding instructions and attribute definitions. The Coding Guide is available at: https://doi.org/10.21949/1519106. A data dictionary, or other source of attribute information, is accessible at https://doi.org/10.21949/1519106
C
Data from: Basque LMF Apertium Dictionary
dataverse.csuc.cat
dtd, html, txt, xml
Updated Oct 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Universitat d'Alacant. Grup Transducens; Universitat d'Alacant. Grup Transducens (2023). Basque LMF Apertium Dictionary [Dataset]. http://doi.org/10.34810/data277
Explore at:
html(20773), txt(35147), xml(12300942), dtd(7509), txt(435), txt(1652), xml(13807)Available download formats
Unique identifier
https://doi.org/10.34810/data277
Dataset updated
Oct 13, 2023
Dataset provided by
CORA.Repositori de Dades de Recerca
Authors
Universitat d'Alacant. Grup Transducens; Universitat d'Alacant. Grup Transducens
License
https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data277https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data277
Description
This is the LMF version of the Basque Apertium dictionary. Monolingual dictionaries for Spanish, Catalan, Gallego and Euskera have been generated from the Apertium expanded lexicons of the es-ca (for both Spanish andCatalan) es-gl (for Galician) and eu-es (for Basque). Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
C
Data from: Esperanto-English LMF Apertium Bilingual dictionary
dataverse.csuc.cat
dtd, txt, xml, zip
Updated Oct 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Nordfalk; Hèctor Alòs i Font; Universitat d'Alacant. Grup Transducens; Jacob Nordfalk; Hèctor Alòs i Font; Universitat d'Alacant. Grup Transducens (2023). Esperanto-English LMF Apertium Bilingual dictionary [Dataset]. http://doi.org/10.34810/data317
Explore at:
dtd(7509), txt(1850), xml(12623), zip(1416122), txt(249), txt(35147)Available download formats
Unique identifier
https://doi.org/10.34810/data317
Dataset updated
Oct 11, 2023
Dataset provided by
CORA.Repositori de Dades de Recerca
Authors
Jacob Nordfalk; Hèctor Alòs i Font; Universitat d'Alacant. Grup Transducens; Jacob Nordfalk; Hèctor Alòs i Font; Universitat d'Alacant. Grup Transducens
License
https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data317https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data317
Description
This is the LMF version of the Apertium bilingual dictionary for Esperanto and English languages. Bilingual LMF dictionaries were generated from Apertium bilingual dix files. For each Apertium bilingual correspondence, the corresponding source and target monolingual entries (LexicalEntry) were generated in addition to the bilingual correspondence (SenseAxis) element. Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as Esperanto-English). The platform provides: a language-independent machine translation engine; tools to manage the linguistic data necessary to build a machine translation system for a given language pair and linguistic data for a growing number of language pairs.
o
Net Zero Use Cases and Data Requirements
ukpowernetworks.opendatasoft.com
csv, excel, json
Updated Oct 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Net Zero Use Cases and Data Requirements [Dataset]. https://ukpowernetworks.opendatasoft.com/explore/dataset/top-30-use-cases/
Explore at:
excel, json, csvAvailable download formats
Dataset updated
Oct 7, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionFollowing the identification of Local Area Energy Planning (LAEP) use cases, this dataset lists the data sources and/or information that could help facilitate this research. View our dedicated page to find out how we derived this list: Local Area Energy Plan — UK Power Networks (opendatasoft.com)

Methodological Approach Data upload: a list of datasets and ancillary details are uploaded into a static Excel file before uploaded onto the Open Data Portal.

Quality Control Statement

Quality Control Measures include: Manual review and correct of data inconsistencies Use of additional verification steps to ensure accuracy in the methodology

Assurance Statement The Open Data Team and Local Net Zero Team worked together to ensure data accuracy and consistency.

Other Download dataset information: Metadata (JSON)

Definitions of key terms related to this dataset can be found in the Open Data Portal Glossary: https://ukpowernetworks.opendatasoft.com/pages/glossary/

Please note that "number of records" in the top left corner is higher than the number of datasets available as many datasets are indexed against multiple use cases leading to them being counted as multiple records.
o
Data Centre Demand Profiles
ukpowernetworks.opendatasoft.com
Updated Nov 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Data Centre Demand Profiles [Dataset]. https://ukpowernetworks.opendatasoft.com/explore/dataset/ukpn-data-centre-demand-profiles/
Explore at:
Dataset updated
Nov 4, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction

This dataset shows the half-hourly load profiles of identified data centres within UK Power Networks' licence areas.

The loads have been determined using actual demand data from connected sites within UK Power Networks' licence areas, from 1 January 2023 onwards.

Loads are expressed proportionally, by comparing the half-hourly observed import power seen across the site's meter point(s), against the meter's maximum import capacity. Units for both measures are apparent power, in kilovolt amperes (kVA).

To protect the identity of the sites, data points have been anonymised and only the site's voltage level information - and our estimation of the data centre type - has been provided.

Methodological Approach

Over 100 operational data centre sites (and at least 10 per voltage level) were identified through internal desktop exercises and corroboration with external sources.

After identifying these sites, their addresses, connection point, and MPAN(s) (Meter Point Administration Number(s)) were identified using internal systems.

Half-hourly smart meter import data were retrieved using internal systems. This included both half-hourly meter data, and static data (such as the MPAN's maximum import capacity and voltage group, the latter through the MPAN's Line Loss Factor Class Description). Half-hourly meter import data came in the form of active and reactive power, and the apparent power was calculated using the power triangle.

In cases where there are numerous meter points for a given data centre site, the observed import powers across all relevant meter points were summed, and compared against the sum total of maximum import capacity for the meters.

The percentage utilisation for each half-hour for each data centre was determined via the following equation:

% Utilisation_data centre site =

SUM( S_MPAN half-hourly observed import)

SUM( S_MPAN Maximum Import Capacity)

Where S = Apparent Power in kilovolt amperes (kVA)

To ensure the dataset includes only operational data centres, the dataset was then cleansed to exclude sites where utilisation was consistently at 0% across the year.

Based on the MPAN's address and corroboration with other open data sources, a data centre type was derived: either enterprise (i.e. company-owned and operated), or co-located (i.e. one company owns the data centre, but other customers operate IT load in the premises as tenants).

Each data centre site was then anonymised by removing any identifiers other than voltage level and UK Power Networks' view of the data centre type.

Quality Control Statement

The dataset is primarily built upon customer smart meter data for connected customer sites within the UK Power Networks' licence areas.

The smart meter data that is used is sourced from external providers. While UK Power Networks does not control the quality of this data directly, these data have been incorporated into our models with careful validation and alignment.

Any missing or bad data has been addressed though robust data cleaning methods, such as omission.

Assurance Statement

The dataset is generated through a manual process, conducted by the Distribution System Operator's Regional Development Team.

The dataset will be reviewed quarterly - both in terms of the operational data centre sites identified, their maximum observed demands and their maximum import capacities - to assess any changes and determine if updates of demand specific profiles are necessary.

Deriving the data centre type is a desktop-based process based on the MPAN's address and through corroboration with external, online sources.

This process ensures that the dataset remains relevant and reflective of real-world data centre usage over time.

There are sufficient data centre sites per voltage level to assure anonymity of data centre sites.

Other Definitions of key terms related to this dataset can be found in the Open Data Portal Glossary: https://ukpowernetworks.opendatasoft.com/pages/glossary/Download dataset information: Metadata (JSON)To view this data please register and login.
E
Arabic dictionary of inflected words
catalogue.elra.info
live.european-language-grid.eu
Updated Aug 31, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2017). Arabic dictionary of inflected words [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-L0098/
Explore at:
Dataset updated
Aug 31, 2017
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
Description
The Arabic dictionary of inflected words consists of a list of 6 million inflected forms, fully vowelized, generated in compliance with the grammatical rules of Arabic and tagged with grammatical information which includes POS and grammatical features, including number, gender, case, definiteness, tense, mood and compatibility with clitic agglutination.The data is formatted in conformity with the data formats of Unitex/GramLab, an open source corpus processing system for language processing. These data formats are publicly documented. The data can either be converted into user-specific formats, or be used directly with Unitex/GramLab. This dictionary is also available together with recognition of agglutinated clitics and inflection system in the ELRA Catalogue under reference ELRA-L0099.Authors: Alexis NEME et Eric LAPORTE
E
Arabic dictionary of inflected words with recognition of agglutinated...
catalog.elda.org
catalog.elra.info
+1more
Updated Aug 31, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2017). Arabic dictionary of inflected words with recognition of agglutinated clitics and inflection system [Dataset]. https://catalog.elda.org/en-us/repository/browse/ELRA-L0099/
Explore at:
Dataset updated
Aug 31, 2017
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalog.elda.org/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elda.org/static/from_media/metashare/licences/ELRA_VAR.pdf
Description
This dictionary consists of 6 million inflected forms, fully vowelized, generated in compliance with the grammatical rules of Arabic and tagged with grammatical information which includes POS and grammatical features, including number, gender, case, definiteness, tense, mood and compatibility with clitic agglutination.It is accompanied by a grammatical resource that recognizes hundreds of millions of valid agglutinated words, i.e. words consisting of one of the forms in the dictionary preceded and/or followed by clitics (conjunctions, prepositions, articles, pronouns) in compliance with the grammatical rules of Arabic.In order to be able to update the full-form dictionary, a dictionary of 65 000 lemmas and the data required to inflect them and regenerate the full-form dictionary are also provided. This allows adapting the dictionary to specific applications by deleting and/or adding entries. The resource as it stands covers more than 98% of the forms found in any sort of literature, newspaper articles...; the remaining 2% include proper names, which can be relevant.The data is formatted in conformity with the data formats of Unitex/GramLab, an open source corpus processing system for language processing. These data formats are publicly documented. The data can either be converted into user-specific formats, or be used directly with Unitex/GramLab.This dictionary is also available without recognition of agglutinated clitics and without inflection system in the ELRA Catalogue under reference ELRA-L0098.Authors: Alexis NEME et Eric LAPORTE
o
Large Demand List
ukpowernetworks.opendatasoft.com
Updated Nov 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Large Demand List [Dataset]. https://ukpowernetworks.opendatasoft.com/explore/dataset/ukpn-large-demand-list/
Explore at:
Dataset updated
Nov 4, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction This dataset shows an anonymised list of live, committed, import-related projects within UK Power Networks' licence areas. This includes demand-only projects that are 5,000 kilovolt-amperes (kVA) and above, as well as battery energy storage systems (BESS).

This list has been determined using internal systems UK Power Networks uses to manage all committed projects in the process of connecting to our network. To protect the identity of the sites, entries have been anonymised and only the licence area, the grid supply point the project is connecting at (or under), rounded requested import capacity, and application date have been provided.

Methodological Approach Live, committed demand projects are identified through desktop exercises using UK Power Networks' internal customer relationship management system and extracted.

The projects are then filtered to only show projects where The required import capacity is more than or equal to 5,000kVAThe required export capacity is 0MVA.

These project entries are then cross-referenced with other sources to verify its status. Any discrepancies are manually reviewed and kept/omitted as appropriate.To protect the identity of the demand projects the required import capacity is rounded, and the project names are anonymised by providing an arbitrary sequential number.

Quality Control Statement The dataset is primarily built upon internal data, relating to live demand projects in UK Power Networks' licence areas. Information about battery energy storage systems are taken from existing datasets relating to Appendix G information UK Power Networks manages.Data have been checked with both automatic and manual validation methods.

Assurance Statement The dataset is generated through a manual process, conducted by the Distribution System Operator's Regional Development Team. The dataset will be reviewed monthly to assess any changes, and to determine if any updates to the methodology are necessary. This process ensures that the dataset remains relevant and reflective of the live large demand projects UK Power Networks is working on. There are sufficient projects per licence area to assure anonymity of projects.While all reasonable efforts have been made to ensure the accuracy of the information provided in this dataset, neither the licensee nor any of its directors or employees is under any liability for any errors, or for any misstatement on which a user of the data seeks to rely. Please view our Terms and Conditions for more information.The data provided constitutes UK Power Networks’ provisional view of the status at this GSP at the date of publication and is for general information only.

Other Download dataset information: Metadata (JSON)

Definitions of key terms related to this dataset can be found in the Open Data Portal Glossary: https://ukpowernetworks.opendatasoft.com/pages/glossary/For prospective customers considering a connection to our network, we provide pre-application support on our website to make the connection journey as smooth as possible: Pre-application support and advice | UK Power NetworksWe also offer an "Ask the Expert" service, designed for some of your more complex connection questions that go beyond our FAQs. You can request an "Ask the Expert" surgery session, where our specialists can provide more specific technical guidance: Ask the Expert | UK Power NetworksTo view this data please register and login.
Canada Traveller Volumes
kaggle.com
zip
Updated Sep 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthony Kwok (2023). Canada Traveller Volumes [Dataset]. https://www.kaggle.com/datasets/anthonynam/canada-traveller-volumes/code
Explore at:
zip(2226105 bytes)Available download formats
Dataset updated
Sep 10, 2023
Authors
Anthony Kwok
Area covered
Canada
Description
!!! Please use this encoding standard: ISO-8859-1 !!!

Example: pandas.read_csv(path_to_file,encoding = 'ISO-8859-1')

Dataset Info

This dataset is extracted from Canada Government Website

Published by: Canada Border Services Agency

License: Open Government Licence - Canada

Data Dictionary

Travel Date: The date when travellers came through Canada

Traveller: A person who is traveling or who often travels

Volumes: An amount or quantity of travellers entering Canada

Year: The period of time according to calendar year the traveller came into Canada

Month: The month the travellers came into Canada

Port of Entry: The location in which the traveller is entering Canada

Mode: The type of way or manner in which travellers used to enter Canada

Air: Travellers entering Canada via airplane/ Voyageurs entrant le Canada par avion

Marine: Relating to any vessel and traveller entering Canada via water including ship, boat or craft being used for marine navigation

Rail: Travellers entering Canada via a train or railroad

Land: Travellers entering Canada via land rather than in water or air

Border: A line separating two political or geographical areas

Data: Facts and statistics collected together for reference or analysis on travellers

Highway: A main road connecting to major towns or cities the traveller used to enter Canada

Immigration: The action of travellers coming to live permanently in a foreign country

Large Port of entry: A considerable or relatively great size port of entryimportante

Small port of entry: A size that is less than normal or usual port of entry

Region: An area or division especially part of a country that the travellers are entering Canada
o
National Chargepoint Register
ukpowernetworks.opendatasoft.com
Updated Dec 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). National Chargepoint Register [Dataset]. https://ukpowernetworks.opendatasoft.com/explore/dataset/ozev-ukpn-national-chargepoint-register/
Explore at:
Dataset updated
Dec 5, 2024
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Introduction

5 December 2024: The National Chargepoint Registry (NCR) was decommissioned on 28 November 2024 by the Department of Transport. All public EV chargepoint operators are now required to share open data free of charge on elements such as location, real-time availability, connector types, and payment methods. The archived NCR data will be available on request to users and researchers. For any enquiries contact consumerofferconsult@ozev.gov.uk.

Methodological Approach This dataset was provided by the Office for Zero Emission Vehicles.

Quality Control Statement The data is provided "as is".

Assurance Statement The Open Data team has checked the code for the API pull against source to ensure data accuracy and consistency.

For more information, please visit their website: Department for Transport

The National Chargepoint Register (NCR) is a database of publicly available chargepoints for electric vehicles in the UK established in 2011. The underlying dataset from the Office of Zero Emission Vehicles (OZEV) is continually updated by chargepoint networks, owners and controllers.Note, we have restricted the coverage to overlap with UK Power Networks three licence areas of Eastern Power Networks, London Power Networks and South Eastern Power Networks.Other

Download dataset information: Metadata (JSON)

Definitions of key terms related to this dataset can be found in the Open Data Portal Glossary: https://ukpowernetworks.opendatasoft.com/pages/glossary/
State IO Two-Region Economic Input-Output Models for 50 U.S. States...
catalog.data.gov
Updated Aug 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2023). State IO Two-Region Economic Input-Output Models for 50 U.S. States 2012-2017 [Dataset]. https://catalog.data.gov/dataset/state-io-two-region-economic-input-output-models-for-50-u-s-states-2012-2017
Explore at:
Dataset updated
Aug 31, 2023
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
United States
Description
These are economic models in Make and Use formats with variations of one and two-region versions where the one region is just a U.S. state of interest (SoI) and the two-region version include both the SoI and Rest of the U.S. (RoUS). Inudstry and Commodity output vectors are also provided. Models are available representing annual totals for each year for each state from 2012 to 2017. Variations for "Domestic" forms of models are available. See the associated publication, also available without fees in PubMed, for details. These models were created with stateior v0.1.0 (https://github.com/USEPA/stateior/releases/tag/0.1.0). and can be used in that R software. See https://github.com/USEPA/stateior/tree/0.1.0 for usage details. The provided data link reveals many R Data Format (.RDS) files that can be read into R, along with metadata files in JSON format that provide information on provenance of the data. File names corresponded with the definitions in the associated data dictionary (for two-region files) and the associated supporting link (for one-region files). Other files are precursors to the one and two-region models with data that are used in the model building process and can be read into R. All model files corresponding to the associated publication have the the text "0.1.0" in the filename, for example "Census_StateExport_2013_0.1.0.rds". Each file contains all states for the year in the file name with a year is included. This dataset is associated with the following publication: Li, M., J. Ferreira, C.D. Court, D. Meyer, M. Li, and W.W. Ingwersen. StateIO - Open Source Economic Input-Output Models for the 50 States of the United States of America. International Regional Science Review. SAGE Publications, THOUSAND OAKS, CA, USA, 46(4): 428-481, (2023).
2018 Methodological Summary and Definitions
data.virginia.gov
healthdata.gov
+1more
html
Updated Sep 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Substance Abuse and Mental Health Services Administration (2025). 2018 Methodological Summary and Definitions [Dataset]. https://data.virginia.gov/dataset/2018-methodological-summary-and-definitions
Explore at:
htmlAvailable download formats
Dataset updated
Sep 6, 2025
Dataset provided by
Substance Abuse and Mental Health Services Administrationhttps://www.samhsa.gov/
Description
This report summarizes the 2018 NSDUH methods and other supporting information relevant to estimates of substance use and mental health issues, and organized into five chapters. Chapter 1 is an introduction to the report. Chapter 2 describes the survey, including information about the sample design; data collection procedures; and key aspects of data processing, such as development of analysis weights. Chapter 3 presents technical details on the statistical methods and measurement, such as suppression criteria for unreliable estimates, statistical testing procedures, and issues for selected substance use and mental health measures. Chapter 4 covers special topics related to prescription psychotherapeutic drugs. Chapter 5 describes other sources of data on substance use and mental health issues, including data sources for populations outside the NSDUH target population. Appendix A is a glossary that covers key definitions for use as a resource with the 2018 NSDUH reports and detailed tables. Appendix B provides a list of contributors to the report.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Open Data Portal Glossary [Dataset]. https://ukpowernetworks.opendatasoft.com/explore/dataset/ukpn-business-glossary/

Open Data Portal Glossary

Explore at:

csv, excel, jsonAvailable download formats

Dataset updated

Nov 7, 2025

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Introduction This dataset contains the terms and definitions included on the UKPN Open Data Portal Glossary Page.

Methodological Approach This dataset is sourced from UK Power Networks internal business glossary.

Quality Control Statement Quality Control Measures include:

Manual review and correction of data inconsistencies Use of additional verification steps to ensure accuracy in the methodology

Assurance Statement The Open Data Team and Data Governance Team worked together to ensure data accuracy and consistency.

Other UKPN Open Data Portal Glossary helps ensure common understanding of terms, used or related to the datasets published on UKPN Open Data Portal. Download dataset information: Metadata (JSON) Definitions of key terms related to this dataset can be found in the Open Data Portal Glossary: https://ukpowernetworks.opendatasoft.com/pages/glossary/

Clear search

Close search

Google apps

Main menu

Open Data Portal Glossary

Glosario: A multilingual glossary for computing and data science terms.

Open-Source GitHub Repos: Stars, Issues & PRs

Introduction to the Data and Fetching Process

Repositories Data Dictionary

Issues Data Dictionary

Pull Requests Data Dictionary

General Offenses (Open Data)

Data from: Development of Data Dictionary for neonatal intensive care unit:...

SF Master data dictionary

Content

Context

Acknowledgements

Data from: Framework to Develop an Open-Source Forage Data Network to...

Field Alias Glossary (PRA State Assets)

National Bridge Inventory Element Data

Data from: Basque LMF Apertium Dictionary

Data from: Esperanto-English LMF Apertium Bilingual dictionary

Net Zero Use Cases and Data Requirements

Data Centre Demand Profiles

Arabic dictionary of inflected words

Arabic dictionary of inflected words with recognition of agglutinated...

Large Demand List

Canada Traveller Volumes

!!! Please use this encoding standard: ISO-8859-1 !!!

Dataset Info

Data Dictionary

National Chargepoint Register

State IO Two-Region Economic Input-Output Models for 50 U.S. States...

2018 Methodological Summary and Definitions

Open Data Portal Glossary