Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Calls in favour of Open Data in research are becoming overwhelming. They are at national [@RCKUOpen] and international levels [@Moedas2015, @RSOpen, @ams2016]. I will set out a working definition of Open Data and will discuss the key challenges preventing the publication of Open Data becoming standard practice. I will attempt to draw some general solutions to those challenges from field specific examples.
this is to show the Organizational Structure for development and employment fund
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This file contains the list of publications and filtering decisions of the systematic literature review conducted for the article "Towards a common definition of open data intermediaries" published in the Digital Government: Research and Practice (DGOV) journal (https://doi.org/10.1145/3585537). The literature search was done on 1 June 2022 and there was no start date set (i.e. all relevant literature up to 1 June 2022 was included).
There are 4 documents in this folder (apart from README text describing the data in each document):
The authors acknowledge the financial support from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No. 955569, "Towards a sustainable Open Data ECOsystem" (ODECO).
Licence Ouverte / Open Licence 2.0https://www.etalab.gouv.fr/wp-content/uploads/2018/11/open-licence.pdf
License information was derived automatically
Ce jeu de données, rafraîchi une fois par jour, présente les données régionales consolidées depuis janvier 2021 et définitives (de janvier 2013 à décembre 2020) issues de l'application éCO2mix. Elles sont élaborées à partir des comptages et complétées par des forfaits. Les données sont dites consolidées lorsqu'elles ont été vérifiées et complétées (livraison en milieu de M+1). Elles deviennent définitives lorsque tous les partenaires ont transmis et vérifié l'ensemble des comptages, (livraison deuxième trimestre A+1).
Vous y trouverez au pas demi-heure:
La consommation réalisée. La production selon les différentes filières composant le mix énergétique. La consommation des pompes dans les Stations de Transfert d'Energie par Pompage (STEP). Le solde des échanges avec les régions limitrophes.Pour information, ci-dessous les définitions de TCO et TCH :TCO : le Taux de COuverture (TCO) d'une filière de production au sein d'une région représente la part de cette filière dans la consommation de cette régionTCH : le Taux de CHarge (TCH) ou facteur de charge (FC) d'une filière représente son volume de production par rapport à la capacité de production installée et en service de cette filière
Si vous souhaitez consulter les données régionales "temps réel" après la dernière consolidation, vous pouvez suivre ce lien : https://opendata.reseaux-energies.fr/explore/dataset/eco2mix-regional-tr
Pour en savoir plus, n'hésitez pas à consulter le site éCO2mix à cette adresse : http://www.rte-france.com/fr/eco2mix/eco2mix
Ce jeu de données est mis à jour automatiquement une fois par jour : en raison d'un nombre de téléchargements excessif des données éCO2mix régionales consolidées et définitives par des robots à des fréquences disproportionnées au regard de la fréquence de mise à jour du jeu de données, un quota de 50000 appels API par utilisateur et par mois a été mis en place. Si vous constatez des soucis d'accès aux données suite à la mise en place de ce quota, merci de nous contacter à l'adresse rte-opendata@rte-france.com
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Slovenian definition extraction training dataset DF_NDF_wiki_slo contains 38613 sentences extracted from the Slovenian Wikipedia. The first sentence of a term's description on Wikipedia is considered a definition, and all other sentences are considered non-definitions.
The corpus consists of the following files each containing one definition / non-definition sentence per line:
The dataset is described in more detail in Fišer et al. 2010. If you use this resource, please cite:
Fišer, D., Pollak, S., Vintar, Š. (2010). Learning to Mine Definitions from Slovene Structured and Unstructured Knowledge-Rich Resources. Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10). https://aclanthology.org/L10-1089/
Reference to training Transformer-based definition extraction models using this dataset: Tran, T.H.H., Podpečan, V., Jemec Tomazin, M., Pollak, Senja (2023). Definition Extraction for Slovene: Patterns, Transformer Classifiers and ChatGPT. Proceedings of the ELEX 2023: Electronic lexicography in the 21st century. Invisible lexicography: everywhere lexical data is used without users realizing they make use of a “dictionary”.
Related resources: Jemec Tomazin, M. et al. (2023). Slovenian Definition Extraction evaluation datasets RSDO-def 1.0, Slovenian language resource repository CLARIN.SI, http://hdl.handle.net/11356/1841
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The New York City Airbnb 2019 Open Data is a dataset containing varius details about a listed unit, when the goal is to predict the rental price of a unit.
This dataset contains the details for units listed in NYC during 2019, was adapted from the following open kaggle dataset: https://www.kaggle.com/datasets/dgomonov/new-york-city-airbnb-open-data. This, in turn was downloaded from the Airbnb data repository http://insideairbnb.com/get-the-data.
This dataset is licensed under the CC0 1.0 Universal License (https://creativecommons.org/publicdomain/zero/1.0/).
The typical ML task in this dataset is to build a model that predicts the average rental price of a unit.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The primary objective from this project was to acquire historical shoreline information for all of the Northern Ireland coastline. Having this detailed understanding of the coast’s shoreline position and geometry over annual to decadal time periods is essential in any management of the coast.The historical shoreline analysis was based on all available Ordnance Survey maps and aerial imagery information. Analysis looked at position and geometry over annual to decadal time periods, providing a dynamic picture of how the coastline has changed since the start of the early 1800s.Once all datasets were collated, data was interrogated using the ArcGIS package – Digital Shoreline Analysis System (DSAS). DSAS is a software package which enables a user to calculate rate-of-change statistics from multiple historical shoreline positions. Rate-of-change was collected at 25m intervals and displayed both statistically and spatially allowing for areas of retreat/accretion to be identified at any given stretch of coastline.The DSAS software will produce the following rate-of-change statistics:Net Shoreline Movement (NSM) – the distance between the oldest and the youngest shorelines.Shoreline Change Envelope (SCE) – a measure of the total change in shoreline movement considering all available shoreline positions and reporting their distances, without reference to their specific dates.End Point Rate (EPR) – derived by dividing the distance of shoreline movement by the time elapsed between the oldest and the youngest shoreline positions.Linear Regression Rate (LRR) – determines a rate of change statistic by fitting a least square regression to all shorelines at specific transects.Weighted Linear Regression Rate (WLR) - calculates a weighted linear regression of shoreline change on each transect. It considers the shoreline uncertainty giving more emphasis on shorelines with a smaller error.The end product provided by Ulster University is an invaluable tool and digital asset that has helped to visualise shoreline change and assess approximate rates of historical change at any given coastal stretch on the Northern Ireland coast.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Frequently used terms and phrases in various Program Guidelines and Applications. For additional information, visit the department Funding page: https://www.austintexas.gov/department/economic-development/funding
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These three datasets support the findings of the paper "Open Data Intermediaries in Developing Countries" published in the Journal of Community Informatics. The paper explores the concept of open data intermediaries using the theoretical framework of Bourdieu’s social model, particularly his species of capital. Secondary data on intermediaries from Emerging Impacts of Open Data in Developing Countries research was analysed according to a working definition of an open data intermediary presented in this paper, and with a focus on how intermediaries are able to link agents in an open data supply chain, including to grassroots communities. The study found that open data supply chains may comprise multiple intermediaries and that multiple forms of capital may be required to connect the supply and use of open data. The effectiveness of intermediaries can be attributed to their proximity to data suppliers or users, and proximity can be expressed as a function of the type of capital that an intermediary possesses. However, because no single intermediary necessarily has all the capital available to link effectively to all sources of power in a field, multiple intermediaries with complementary configurations of capital are more likely to connect between power nexuses. This study concludes that consideration needs to be given to the presence of multiple intermediaries in an open data ecosystem, each of whom may possess different forms of capital to enable the use of open data.
Data:
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
You must agree to the license and terms of use before using the dataset in this repo.
DORE: Definition MOdelling in PoRtuguEse
This repository introduces DORE, a comprehensive corpus of over 100,000 definitions from Portuguese dictionaries. Alongside DORE, we also introduce the models used to perform Portuguese DM. The release of DORE aims to fill in the gap of resources for Automatic Definition Generation, or Definition Modelling (DM), in Portuguese. DORE is the first dataset… See the full description on the dataset page: https://huggingface.co/datasets/multidefmod/dore.
All the information about a specific agricultural program offered by FSA, the specified rules for eligibility, disbursement and possible repayment options and continuing service activity.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset is built for time-series Sentinel-2 cloud detection and stored in Tensorflow TFRecord (refer to https://www.tensorflow.org/tutorials/load_data/tfrecord).
Each file is compressed in 7z format and can be decompressed using Bandzip or 7-zip software.
Dataset Structure:
Each filename can be split into three parts using underscores. The first part indicates whether it is designated for training or validation ('train' or 'val'); the second part indicates the Sentinel-2 tile name, and the last part indicates the number of samples in this file.
For each sample, it includes:
Sample ID;
Array of time series 4 band image patches in 10m resolution, shaped as (n_timestamps, 4, 42, 42);
Label list indicating cloud cover status for the center (6\times6) pixels of each timestamp;
Ordinal list for each timestamp;
Sample weight list (reserved);
Here is a demonstration function for parsing the TFRecord file:
import tensorflow as tf
def parseRecordDirect(fname): sep = '/' parts = tf.strings.split(fname,sep) tn = tf.strings.split(parts[-1],sep='_')[-2] nn = tf.strings.to_number(tf.strings.split(parts[-1],sep='_')[-1],tf.dtypes.int64) t = tf.data.Dataset.from_tensors(tn).repeat().take(nn) t1 = tf.data.TFRecordDataset(fname) ds = tf.data.Dataset.zip((t, t1)) return ds
keys_to_features_direct = { 'localid': tf.io.FixedLenFeature([], tf.int64, -1), 'image_raw_ldseries': tf.io.FixedLenFeature((), tf.string, ''), 'labels': tf.io.FixedLenFeature((), tf.string, ''), 'dates': tf.io.FixedLenFeature((), tf.string, ''), 'weights': tf.io.FixedLenFeature((), tf.string, '') }
class SeriesClassificationDirectDecorder(decoder.Decoder): """A tf.Example decoder for tfds classification datasets.""" def init(self) -> None: super()._init_()
def decode(self, tid, ds): parsed = tf.io.parse_single_example(ds, keys_to_features_direct) encoded = parsed['image_raw_ldseries'] labels_encoded = parsed['labels'] decoded = tf.io.decode_raw(encoded, tf.uint16) label = tf.io.decode_raw(labels_encoded, tf.int8) dates = tf.io.decode_raw(parsed['dates'], tf.int64) weight = tf.io.decode_raw(parsed['weights'], tf.float32) decoded = tf.reshape(decoded,[-1,4,42,42]) sample_dict = { 'tid': tid, # tile ID 'dates': dates, # Date list 'localid': parsed['localid'], # sample ID 'imgs': decoded, # image array 'labels': label, # label list 'weights': weight } return sample_dict
def preprocessDirect(tid, record): parsed = tf.io.parse_single_example(record, keys_to_features_direct) encoded = parsed['image_raw_ldseries'] labels_encoded = parsed['labels'] decoded = tf.io.decode_raw(encoded, tf.uint16) label = tf.io.decode_raw(labels_encoded, tf.int8) dates = tf.io.decode_raw(parsed['dates'], tf.int64) weight = tf.io.decode_raw(parsed['weights'], tf.float32) decoded = tf.reshape(decoded,[-1,4,42,42]) return tid, dates, parsed['localid'], decoded, label, weight
t1 = parseRecordDirect('filename here') dataset = t1.map(preprocessDirect, num_parallel_calls=tf.data.experimental.AUTOTUNE)
#
Class Definition:
0: clear
1: opaque cloud
2: thin cloud
3: haze
4: cloud shadow
5: snow
Dataset Construction:
First, we randomly generate 500 points for each tile, and all these points are aligned to the pixel grid center of the subdatasets in 60m resolution (eg. B10) for consistence when comparing with other products. It is because that other cloud detection method may use the cirrus band as features, which is in 60m resolution.
Then, the time series image patches of two shapes are cropped with each point as the center.The patches of shape (42 \times 42) are cropped from the bands in 10m resolution (B2, B3, B4, B8) and are used to construct this dataset.And the patches of shape (348 \times 348) are cropped from the True Colour Image (TCI, details see sentinel-2 user guide) file and are used to interpreting class labels.
The samples with a large number of timestamps could be time-consuming in the IO stage, thus the time series patches are divided into different groups with timestamps not exceeding 100 for every group.
Programmatically generated Data Dictionary document detailing the TxDOT Street Definition service.
The PDF contains service metadata and a complete list of data fields.
For any questions or issues related to the document, please contact the data owner of the service identified in the PDF and Credits of this portal item.
Related Links
TxDOT Street Definition Service URL
TxDOT Street Definition Portal Item
This is the data set that includes rankings of competing definitions of open government by a sample of Canadian journalists, parliamentarians and bloggers.
This dataset represents ridgelines as defined by the Department of Interior (DOI): "Areas within 660 feet of the top of the ridgeline, where a ridgeline has at least 150 feet of vertical elevation gain with a minimum average slope of 10 percent between the ridgeline and the base." The dataset was created using the Geomorphons package from the University of Guelph, which can be found here: Geomorphons Package, and the 3DEP 1/3 arc-second digital elevation model. A TIF data file and a PNG map of the data are provided.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.
The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.
Methodology
To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).
These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.
To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.
Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.
Description of the data in this data set
Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies
The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information
Descriptive information
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet
2) Complete reference - the complete source information to refer to the study
3) Year of publication - the year in which the study was published
4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter}
5) DOI / Website- a link to the website where the study can be found
6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science
7) Availability in OA - availability of an article in the Open Access
8) Keywords - keywords of the paper as indicated by the authors
9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}
Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?
Quality- and relevance- related information
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)?
18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))
HVD determination-related information
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term?
20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output")
21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description)
22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles?
23) Data - what data do HVD cover?
24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)
Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx
Licenses or restrictions CC-BY
For more info, see README.txt
The canton is a territorial subdivision of the borough. It is the electoral district within which a general councillor is elected.The cantons were created, like the departments, by the Act of 22 December 1789. In most cases, the cantons comprise several municipalities. But the cantons do not always respect the municipal boundaries: the most populated municipalities belong to several cantons. A township belongs to one and a single borough.A layer different from that of the pseudo-cantons of the BD Carto layer #58 N_CANTON_BDC_ddd) because the latter contains the canton of attachment of the municipalities (finer definition to be provided)
Big Data and Society Abstract & Indexing - ResearchHelpDesk - Big Data & Society (BD&S) is open access, peer-reviewed scholarly journal that publishes interdisciplinary work principally in the social sciences, humanities and computing and their intersections with the arts and natural sciences about the implications of Big Data for societies. The Journal's key purpose is to provide a space for connecting debates about the emerging field of Big Data practices and how they are reconfiguring academic, social, industry, business, and government relations, expertise, methods, concepts, and knowledge. BD&S moves beyond usual notions of Big Data and treats it as an emerging field of practice that is not defined by but generative of (sometimes) novel data qualities such as high volume and granularity and complex analytics such as data linking and mining. It thus attends to digital content generated through online and offline practices in social, commercial, scientific, and government domains. This includes, for instance, the content generated on the Internet through social media and search engines but also that which is generated in closed networks (commercial or government transactions) and open networks such as digital archives, open government, and crowdsourced data. Critically, rather than settling on a definition the Journal makes this an object of interdisciplinary inquiries and debates explored through studies of a variety of topics and themes. BD&S seeks contributions that analyze Big Data practices and/or involve empirical engagements and experiments with innovative methods while also reflecting on the consequences for how societies are represented (epistemologies), realized (ontologies) and governed (politics). Article processing charge (APC) The article processing charge (APC) for this journal is currently 1500 USD. Authors who do not have funding for open access publishing can request a waiver from the publisher, SAGE, once their Original Research Article is accepted after peer review. For all other content (Commentaries, Editorials, Demos) and Original Research Articles commissioned by the Editor, the APC will be waived. Abstract & Indexing Clarivate Analytics: Social Sciences Citation Index (SSCI) Directory of Open Access Journals (DOAJ) Google Scholar Scopus
https://opendata.vancouver.ca/pages/licence/https://opendata.vancouver.ca/pages/licence/
Significant changes to the open data catalogue, including new datasets added, datasets renamed or retired, quarterly or annual updates to high-impact datasets, changes to data structure or definition. Smaller changes, such as adding or editing records or renaming a field in an existing dataset are not included. NoteThis log is published in the interest of transparency into the work of the open data program. You can subscribe to updates for a specific dataset by creating an account on the portal then clicking on the Follow button on the Information tab of any dataset. You can get updates by subscribing to our email newsletter. Data currencyNew records will be added whenever a significant change is made to the open data catalogue.
Data.ed.gov is the U.S. Department of Education’s solution for publishing, finding, and accessing our public data profiles. This open data catalog brings together the Department’s data assets in a single location, making them available with their metadata, documentation, and APIs for use by the public. The federal government’s Foundations for Evidence-Based Policymaking Act of 2018 (Evidence Act) requires government agencies to make data assets open and machine-readable by default. Data.ed.gov is the U.S. Department of Education’s comprehensive data inventory satisfying these requirements while also providing privacy and security. As defined by the Open Definition: Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike. Put simply, open data is data anyone can access, download, and use. Individuals, businesses, and governments can use open education data to bring about social, economic and educational benefits and drive innovation. Share your original analyses, products, and innovations on the Showcase tabs within Data.ed.gov. Open Data Platform - 5 Reasons Why Browse the data, download it, analyze it, or build apps or other tools using our APIs. Share what you do with our data using our Showcase feature. If you are new to open data, learn more here and get started with our How-Tos. If you are preparing an article or organizing a data event, and would like information or support from the Data.ed.gov team, contact us at: odp@ed.gov.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Calls in favour of Open Data in research are becoming overwhelming. They are at national [@RCKUOpen] and international levels [@Moedas2015, @RSOpen, @ams2016]. I will set out a working definition of Open Data and will discuss the key challenges preventing the publication of Open Data becoming standard practice. I will attempt to draw some general solutions to those challenges from field specific examples.