Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Open Data 500, funded by the John S. and James L. Knight Foundation (http://www.knightfoundation.org/) and conducted by the GovLab, is the first comprehensive study of U.S. companies that use open government data to generate new business and develop new products and services.
Provide a basis for assessing the economic value of government open data
Encourage the development of new open data companies
Foster a dialogue between government and business on how government data can be made more useful
The Open Data 500 study is conducted by the GovLab at New York University with funding from the John S. and James L. Knight Foundation. The GovLab works to improve people’s lives by changing how we govern, using technology-enabled solutions and a collaborative, networked approach. As part of its mission, the GovLab studies how institutions can publish the data they collect as open data so that businesses, organizations, and citizens can analyze and use this information.
The Open Data 500 team has compiled our list of companies through (1) outreach campaigns, (2) advice from experts and professional organizations, and (3) additional research.
Outreach Campaign
Mass email to over 3,000 contacts in the GovLab network
Mass email to over 2,000 contacts OpenDataNow.com
Blog posts on TheGovLab.org and OpenDataNow.com
Social media recommendations
Media coverage of the Open Data 500
Attending presentations and conferences
Expert Advice
Recommendations from government and non-governmental organizations
Guidance and feedback from Open Data 500 advisors
Research
Companies identified for the book, Open Data Now
Companies using datasets from Data.gov
Directory of open data companies developed by Deloitte
Online Open Data Userbase created by Socrata
General research from publicly available sources
The Open Data 500 is not a rating or ranking of companies. It covers companies of different sizes and categories, using various kinds of data.
The Open Data 500 is not a competition, but an attempt to give a broad, inclusive view of the field.
The Open Data 500 study also does not provide a random sample for definitive statistical analysis. Since this is the first thorough scan of companies in the field, it is not yet possible to determine the exact landscape of open data companies.
Facebook
TwitterUnited States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 14-Dec-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute. The dataset forms part of the outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023, and was funded with £13,913.85 of Research England monies held internally by the University of Sheffield as part of their ‘Enhancing Research Cultures’ scheme 2022-2023. The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2023. This includes due concern for participant anonymity and data management. ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made from reuse. It has been deposited under a CC-BY-NC license. Overall, this dataset comprises: 1 x Workshop transcript - in .docx file format which can be opened with Microsoft Word, Google Doc, or an open-source equivalent. The workshop took place on 18-Jul-2023 at the Wave Building, University of Sheffield. All five attendees have read and approved a portion of transcripts containing their own discussion. All workshop attendees have had an opportunity to retract details should they wish to do so. All workshop attendees have chosen whether to be pseudonymised or named directly. The pseudonym or real name can be used to identify individual participant responses in the qualitative coding held within accompanying dataset from the same project - Survey Responses: Hanchard M and San Roman Pineda I (2023) Fostering cultures of open qualitative research: Dataset 1 – Survey Responses. The University of Sheffield. DOI: 10.15131/shef.data.23567250.v1. Interviews: Hanchard M and San Roman Pineda I (2023) Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts. The University of Sheffield. DOI: 10.15131/shef.data.23567223.v2. As a limitation, the audio recording of the workshop session that this transcript is based upon is missing a section (due to a recording error) and may contain errors/inaccuracies (due to poor audio conditions within the workshop room). Every effort has been taken to correct these, including participants themselves reviewing their discussion/quotes, but the transcript may still contain minor inaccuracies, typos, and/or other errors in the text - as is noted on the transcript itself. The project was undertaken by two staff: Co-investigator: Dr. Itzel San Roman Pineda (Postdoctoral Research Assistant) ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk Labelled as ‘Researcher 1’ throughout all project datasets. Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard (Research Associate) ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk iHuman Institute, Social Research Institutes, Faculty of Social Science Labelled as ‘Researcher 2’ throughout all project datasets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb)
It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.
The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.
***Methodology***
To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).
These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.
To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.
***Test procedure***
Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study.
The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx)
The data collected for each study by two researchers were then synthesized in one final version by the third researcher.
***Description of the data in this data set***
Protocol_HVD_SLR provides the structure of the protocol
Spreadsheets #1 provides the filled protocol for relevant studies.
Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies
The information on each selected study was collected in four categories:
(1) descriptive information,
(2) approach- and research design- related information,
(3) quality-related information,
(4) HVD determination-related information
Descriptive information
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet
2) Complete reference - the complete source information to refer to the study
3) Year of publication - the year in which the study was published
4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter}
5) DOI / Website- a link to the website where the study can be found
6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science
7) Availability in OA - availability of an article in the Open Access
8) Keywords - keywords of the paper as indicated by the authors
9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}
Approach- and research design-related information
10) Objective / RQ - the research objective / aim, established research questions
11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.)
12) Contributions - the contributions of the study
13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach?
14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared?
15) Period under investigation - period (or moment) in which the study was conducted
16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?
Quality- and relevance- related information
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)?
18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))
HVD determination-related information
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term?
20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output")
21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description)
22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles?
23) Data - what data do HVD cover?
24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)
***Format of the file***
.xls, .csv (for the first spreadsheet only), .odt, .docx
***Licenses or restrictions***
CC-BY
For more info, see README.txt
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This collection contains five sets of datasets: 1) Publication counts from two multidisciplinary humanities data journals: the Journal of Open Humanities Data and Research Data in the Humanities and Social Sciences (RDJ_JOHD_Publications.csv); 2) A large dataset about the performance of research articles in HSS exported from dimensions.ai (allhumss_dims_res_papers_PUB_ID.csv); 3) A large dataset about the performance of datasets in HSS harvested from the Zenodo REST API (Zenodo.zip); 4) Impact and usage metrics from the papers published in the two journals above (final_outputs.zip); 5) Data from Twitter analytics on tweets from the @up_johd account, with paper DOI and engagement rate (twitter-data.zip).
Please note that, as requested by the Dimensions team, for 2 and 4, we only included the Publication IDs from Dimensions rather than the full data. Interested parties only need the Dimensions publications IDs to retrieve the data; even if they have no Dimensions subscription, they can easily get a no-cost agreement with Dimensions, for research purposes, in order to retrieve the data.
Facebook
TwitterH.J. Andrews Online Studies Map is a compilation of study site locations and pertinent GIS layers for the geographic area of H.J. Andrews Experimental Forest (LTER). For questions about content contact the HJ Andrews spatial data administrator via the contacts link at this page: https://andrewsforest.oregonstate.edu/data. Some individual data sets are available at the HJ Andrews Open Data Hub here: http://data-osugisci.opendata.arcgis.com/.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains data collected during a study "Transparency of open data ecosystems in smart cities: Definition and assessment of the maturity of transparency in 22 smart cities" (Sustainable Cities and Society (SCS), vol.82, 103906) conducted by Martin Lnenicka (University of Pardubice), Anastasija Nikiforova (University of Tartu), Mariusz Luterek (University of Warsaw), Otmane Azeroual (German Centre for Higher Education Research and Science Studies), Dandison Ukpabi (University of Jyväskylä), Visvaldis Valtenbergs (University of Latvia), Renata Machova (University of Pardubice).
This study inspects smart cities’ data portals and assesses their compliance with transparency requirements for open (government) data by means of the expert assessment of 34 portals representing 22 smart cities, with 36 features.
It being made public both to act as supplementary data for the paper and in order for other researchers to use these data in their own work potentially contributing to the improvement of current data ecosystems and build sustainable, transparent, citizen-centered, and socially resilient open data-driven smart cities.
Purpose of the expert assessment The data in this dataset were collected in the result of the applying the developed benchmarking framework for assessing the compliance of open (government) data portals with the principles of transparency-by-design proposed by Lněnička and Nikiforova (2021)* to 34 portals that can be considered to be part of open data ecosystems in smart cities, thereby carrying out their assessment by experts in 36 features context, which allows to rank them and discuss their maturity levels and (4) based on the results of the assessment, defining the components and unique models that form the open data ecosystem in the smart city context.
Methodology Sample selection: the capitals of the Member States of the European Union and countries of the European Economic Area were selected to ensure a more coherent political and legal framework. They were mapped/cross-referenced with their rank in 5 smart city rankings: IESE Cities in Motion Index, Top 50 smart city governments (SCG), IMD smart city index (SCI), global cities index (GCI), and sustainable cities index (SCI). A purposive sampling method and systematic search for portals was then carried out to identify relevant websites for each city using two complementary techniques: browsing and searching. To evaluate the transparency maturity of data ecosystems in smart cities, we have used the transparency-by-design framework (Lněnička & Nikiforova, 2021)*. The benchmarking supposes the collection of quantitative data, which makes this task an acceptability task. A six-point Likert scale was applied for evaluating the portals. Each sub-dimension was supplied with its description to ensure the common understanding, a drop-down list to select the level at which the respondent (dis)agree, and a comment to be provided, which has not been mandatory. This formed a protocol to be fulfilled on every portal. Each sub-dimension/feature was assessed using a six-point Likert scale, where strong agreement is assessed with 6 points, while strong disagreement is represented by 1 point. Each website (portal) was evaluated by experts, where a person is considered to be an expert if a person works with open (government) data and data portals daily, i.e., it is the key part of their job, which can be public officials, researchers, and independent organizations. In other words, compliance with the expert profile according to the International Certification of Digital Literacy (ICDL) and its derivation proposed in Lněnička et al. (2021)* is expected to be met. When all individual protocols were collected, mean values and standard deviations (SD) were calculated, and if statistical contradictions/inconsistencies were found, reassessment took place to ensure individual consistency and interrater reliability among experts’ answers. *Lnenicka, M., & Nikiforova, A. (2021). Transparency-by-design: What is the role of open data portals?. Telematics and Informatics, 61, 101605 *Lněnička, M., Machova, R., Volejníková, J., Linhartová, V., Knezackova, R., & Hub, M. (2021). Enhancing transparency through open government data: the case of data portals and their features and capabilities. Online Information Review.
Test procedure (1) perform an assessment of each dimension using sub-dimensions, mapping out the achievement of each indicator (2) all sub-dimensions in one dimension are aggregated, and then the average value is calculated based on the number of sub-dimensions – the resulting average stands for a dimension value - eight values per portal (3) the average value from all dimensions are calculated and then mapped to the maturity level – this value of each portal is also used to rank the portals.
Description of the data in this data set Sheet#1 "comparison_overall" provides results by portal Sheet#2 "comparison_category" provides results by portal and category Sheet#3 "category_subcategory" provides list of categories and its elements
Format of the file .xls
Licenses or restrictions CC-BY
For more info, see README.txt
Facebook
TwitterAssessing Risks To Online Polls Public Release Data Main Study
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 23-Jun-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute. The dataset forms part of three outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023:
· Fostering cultures of open qualitative research: Dataset 1 – Survey Responses · Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts · Fostering cultures of open qualitative research: Dataset 3 – Coding Book
The project was funded with £13,913.85 of Research England monies held internally by the University of Sheffield - as part of their ‘Enhancing Research Cultures’ scheme 2022-2023.
The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2021. This includes due concern for participant anonymity and data management.
ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made form reuse. It has been deposited under a CC-BY-NC license. Overall, this dataset comprises:
· 15 x Interview transcripts - in .docx file format which can be opened with Microsoft Word, Google Doc, or an open-source equivalent.
All participants have read and approved their transcripts and have had an opportunity to retract details should they wish to do so.
Participants chose whether to be pseudonymised or named directly. The pseudonym can be used to identify individual participant responses in the qualitative coding held within the ‘Fostering cultures of open qualitative research: Dataset 3 – Coding Book’ files.
For recruitment, 14 x participants we selected based on their responses to the project survey., whilst one participant was recruited based on specific expertise.
· 1 x Participant sheet – in .csv format which may by opened with Microsoft Excel, Google Sheet, or an open-source equivalent.
The provides socio-demographic detail on each participant alongside their main field of research and career stage. It includes a RespondentID field/column which can be used to connect interview participants with their responses to the survey questions in the accompanying ‘Fostering cultures of open qualitative research: Dataset 1 – Survey Responses’ files.
The project was undertaken by two staff:
Co-investigator: Dr. Itzel San Roman Pineda ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk Postdoctoral Research Assistant Labelled as ‘Researcher 1’ throughout the dataset
Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk Research Associate iHuman Institute, Social Research Institutes, Faculty of Social Science Labelled as ‘Researcher 2’ throughout the dataset
Facebook
Twitterhttps://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Global Government Open Data Management Platform Market size was valued at USD 1.75 Billion in 2024 and is projected to reach USD 3.38 Billion by 2032, growing at a CAGR of 8.54% from 2026 to 2032.
Global Government Open Data Management Platform Market Drivers
Increasing Demand for Transparency and Accountability: There is a growing public demand for transparency in government operations, which drives the adoption of open data initiatives. According to a survey by the World Bank, 85% of respondents in various countries indicated that transparency in government decisions is crucial for reducing corruption, prompting governments to implement open data platforms.
Technological Advancements: Rapid advancements in information and communication technology (ICT) facilitate the development and deployment of open data management platforms. The International Telecommunication Union (ITU) reported that global Internet penetration reached approximately 64% in 2023, enabling more citizens to access open data and engage with government services online.
Government Initiatives and Policies: Many governments are actively promoting open data through policies and initiatives. For instance, the U.S. government's Open Data Initiative, launched in 2013, has led to the publication of over 300,000 datasets on Data.gov. Additionally, the European Union's Open Data Directive, which aims to make public sector data available, is further encouraging governments to embrace open data practices.
Facebook
TwitterBy UCI [source]
Comprehensive Dataset on Online Retail Sales and Customer Data
Welcome to this comprehensive dataset offering a wide array of information related to online retail sales. This data set provides an in-depth look at transactions, product details, and customer information documented by an online retail company based in the UK. The scope of the data spans vastly, from granular details about each product sold to extensive customer data sets from different countries.
This transnational data set is a treasure trove of vital business insights as it meticulously catalogues all the transactions that happened during its span. It houses rich transactional records curated by a renowned non-store online retail company based in the UK known for selling unique all-occasion gifts. A considerable portion of its clientele includes wholesalers; ergo, this dataset can prove instrumental for companies looking for patterns or studying purchasing trends among such businesses.
The available attributes within this dataset offer valuable pieces of information:
InvoiceNo: This attribute refers to invoice numbers that are six-digit integral numbers uniquely assigned to every transaction logged in this system. Transactions marked with 'c' at the beginning signify cancellations - adding yet another dimension for purchase pattern analysis.
StockCode: Stock Code corresponds with specific items as they're represented within the inventory system via 5-digit integral numbers; these allow easy identification and distinction between products.
Description: This refers to product names, giving users qualitative knowledge about what kind of items are being bought and sold frequently.
Quantity: These figures ascertain the volume of each product per transaction – important figures that can help understand buying trends better.
InvoiceDate: Invoice Dates detail when each transaction was generated down to precise timestamps – invaluable when conducting time-based trend analysis or segmentation studies.
UnitPrice: Unit prices represent how much each unit retails at — crucial for revenue calculations or cost-related analyses.
Finally,
- Country: This locational attribute shows where each customer hails from, adding geographical segmentation to your data investigation toolkit.
This dataset was originally collated by Dr Daqing Chen, Director of the Public Analytics group based at the School of Engineering, London South Bank University. His research studies and business cases with this dataset have been published in various papers contributing to establishing a solid theoretical basis for direct, data and digital marketing strategies.
Access to such records can ensure enriching explorations or formulating insightful hypotheses about consumer behavior patterns among wholesalers. Whether it's managing inventory or studying transactional trends over time or spotting cancellation patterns - this dataset is apt for multiple forms of retail analysis
1. Sales Analysis:
Sales data forms the backbone of this dataset, and it allows users to delve into various aspects of sales performance. You can use the Quantity and UnitPrice fields to calculate metrics like revenue, and further combine it with InvoiceNo information to understand sales over individual transactions.
2. Product Analysis:
Each product in this dataset comes with its unique identifier (StockCode) and its name (Description). You could analyse which products are most popular based on Quantity sold or look at popularity per transaction by considering both Quantity and InvoiceNo.
3. Customer Segmentation:
If you associated specific business logic onto the transactions (such as calculating total amounts), then you could use standard machine learning methods or even RFM (Recency, Frequency, Monetary) segmentation techniques combining it with 'CustomerID' for your customer base to understand customer behavior better. Concatenating invoice numbers (which stand for separate transactions) per client will give insights about your clients as well.
4. Geographical Analysis:
The Country column enables analysts to study purchase patterns across different geographical locations.
Practical applications
Understand what products sell best where - It can help drive tailored marketing strategies. Anomalies detection – Identify unusual behaviors that might lead frau...
Facebook
TwitterThis dataset contains the results of a literature review of experimental nutrient addition studies to determine which nutrient forms were most often measured in the scientific literature. To obtain a representative selection of relevant studies, we searched Web of Science™ using a search string to target experimental studies in artificial and natural lotic systems while limiting irrelevant papers. We screened the titles and abstracts of returned papers for relevance (experimental studies in streams/stream mesocosms that manipulated nutrients). To supplement this search, we sorted the relevant articles from the Web of Science™ search alphabetically by author and sequentially examined the bibliographies for additional relevant articles (screening titles for relevance, and then screening abstracts of potentially relevant articles) until we had obtained a total of 100 articles. If we could not find a relevant article electronically, we moved to the next article in the bibliography. Our goal was not to be completely comprehensive, but to obtain a fairly large sample of published, peer-reviewed studies from which to assess patterns. We excluded any lentic or estuarine studies from consideration and included only studies that used mesocosms mimicking stream systems (flowing water or stream water source) or that manipulated nutrient concentrations in natural streams or rivers. We excluded studies that used nutrient diffusing substrate (NDS) because these manipulate nutrients on substrates and not in the water column. We also excluded studies examining only nutrient uptake, which rely on measuring dissolved nutrient concentrations with the goal of characterizing in-stream processing (e.g., Newbold et al., 1983). From the included studies, we extracted or summarized the following information: study type, study duration, nutrient treatments, nutrients measured, inclusion of TN and/or TP response to nutrient additions, and a description of how results were reported in relation to the research-management mismatch, if it existed. Below is information on how the search was conducted: Search string used for Web of Science advanced search Search conducted on 27 September 2016. TS= (stream OR creek OR river* OR lotic OR brook OR headwater OR tributary) AND TS = (mesocosm OR flume OR "artificial stream" OR "experimental stream" OR "nutrient addition") AND TI= (nitrogen OR phosphorus OR nutrient OR enrichment OR fertilization OR eutrophication)
Facebook
Twitterhttps://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc
The WBG launched the GovTech Maturity Index (GTMI) in 2020 as a composite index that uses 48 key indicators to measure critical aspects of four GovTech focus areas in 198 economies: supporting core government systems, enhancing service delivery, mainstreaming citizen engagement, and fostering GovTech enablers.
The construction of the GTMI is primarily based on the World Bank’s GovTech Dataset. The GTMI Report and GovTech dataset provides opportunities to replicate the study, identify gaps in digital transformation by comparing the differences among economies and groups of economies, as well as track changes over time transparently.
The 2020 GovTech dataset contained data/evidence collected from government websites using remotely measurable indicators (due to the COVID-19 pandemic) mostly reflecting de jure practices. The GTMI Team followed a different approach for the 2022 update of the GTMI and underlying GovTech Dataset.
First, the GTMI indicators were revised and extended to explore the performance of existing platforms and cover less known areas in consultation with 9 relevant organizations and 10 World Bank practices/groups from November 2021 to January 2022. A Central Government (CG) GTMI online survey was launched in March 2022 and 850+ officials from 164 countries accepted to join this exercise to reflect the latest developments and results of their GovTech initiatives. Additionally, a Subnational Government (SNG) GTMI online survey was launched in parallel as a pilot implementation for interested countries. Finally, a data validation phase was included to benefit from the clarifications and updates of all survey participants while checking the survey responses and calculating the GTMI scores and groups.
The GTMI includes 40 updated/expanded GovTech indicators measuring the maturity of four GovTech focus areas. Additionally, 8 highly relevant external indicators measured by other relevant indexes are used in the calculation of GTMI groups.
The 2022 GovTech Dataset presents all indicators based on the CG GTMI survey data submitted by 135 countries directly, as well as the remotely collected data from the web sites of 63 non-participating economies. Additionally, the dataset includes the SNG GTMI data submitted by 113 subnational government entities (states, municipalities) from 16 countries and this expanded the scope of GovTech Dataset considerably.
As a part of the 2022 GTMI update, a GTMI Data Dashboard was launched to create a data visualization portal with maps and graphs aimed at helping the end-user digest and explore the findings of the CG GTMI / GovTech Dataset, as well as the GovTech Projects Database (presenting the details of 1450+ digital government/GovTech projects funded by the WBG in 147 countries since 1995).
The GovTech Dataset is a substantially expanded version of the Digital Government Systems and Services (DGSS) global dataset, originally developed in 2014 and updated every two years to support the preparation of several WBG studies and flagship reports (e.g., 2014 FMIS and Open Budget Data Study; WDR 2016: Digital Dividends; 2018 WBG Digital Adoption Index; WDR 2021: Data for Better Lives; and 2020 GovTech Maturity Index). The dataset will be updated every two years to reflect progress in the GovTech domain globally.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset is part of a research project investigating university students’ experiences during emergency remote education due to the COVID-19 pandemic. A cross-sectional study among undergraduate medical students from the Netherlands was conducted in April 2021. Data were collected through an online survey programmed in Qualtrics. The dataset includes data from 372 students. More information about the research project, the dataset, the measured concepts/variables, and all included items can be found here: https://doi.org/10.25397/eur.23276108
Facebook
TwitterWONDER online databases include county-level Compressed Mortality (death certificates) since 1979; county-level Multiple Cause of Death (death certificates) since 1999; county-level Natality (birth certificates) since 1995; county-level Linked Birth / Death records (linked birth-death certificates) since 1995; state & large metro-level United States Cancer Statistics mortality (death certificates) since 1999; state & large metro-level United States Cancer Statistics incidence (cancer registry cases) since 1999; state and metro-level Online Tuberculosis Information System (TB case reports) since 1993; state-level Sexually Transmitted Disease Morbidity (case reports) since 1984; state-level Vaccine Adverse Event Reporting system (adverse reaction case reports) since 1990; county-level population estimates since 1970. The WONDER web server also hosts the Data2010 system with state-level data for compliance with Healthy People 2010 goals since 1998; the National Notifiable Disease Surveillance System weekly provisional case reports since 1996; the 122 Cities Mortality Reporting System weekly death reports since 1996; the Prevention Guidelines database (book in electronic format) published 1998; the Scientific Data Archives (public use data sets and documentation); and links to other online data sources on the "Topics" page.
Facebook
TwitterDisplays traffic study flow count data maintained by Seattle Department of Transportation.Users can utilize following definition query for traffic count study data for a particular year. Note-ENTER YEAR is the particular year of interest.Definition Query: STDY_YEAR=ENTER YEAR AND FLOWMAP = 'Y'Refresh: Weekly
Facebook
TwitterThis dataset includes bibliographic information for 501 papers that were published from 2010-April 2017 (time of search) and use online biodiversity databases for research purposes. Our overarching goal in this study is to determine how research uses of biodiversity data developed during a time of unprecedented growth of online data resources. We also determine uses with the highest number of citations, how online occurrence data are linked to other data types, and if/how data quality is addressed. Specifically, we address the following questions:
1.) What primary biodiversity databases have been cited in published research, and which
databases have been cited most often?
2.) Is the biodiversity research community citing databases appropriately, and are
the cited databases currently accessible online?
3.) What are the most common uses, general taxa addressed, and data linkages, and how
have they changed over time?
4.) What uses have the highest impact, as measured through the mean number of citations
per year?
5.) Are certain uses applied more often for plants/invertebrates/vertebrates?
6.) Are links to specific data types associated more often with particular uses?
7.) How often are major data quality issues addressed?
8.) What data quality issues tend to be addressed for the top uses?
Relevant papers for this analysis include those that use online and openly accessible primary occurrence records, or those that add data to an online database. Google Scholar (GS) provides full-text indexing, which was important to identify data sources that often appear buried in the methods section of a paper. Our search was therefore restricted to GS. All authors discussed and agreed upon representative search terms, which were relatively broad to capture a variety of databases hosting primary occurrence records. The terms included: “species occurrence” database (8,800 results), “natural history collection” database (634 results), herbarium database (16,500 results), “biodiversity database” (3,350 results), “primary biodiversity data” database (483 results), “museum collection” database (4,480 results), “digital accessible information” database (10 results), and “digital accessible knowledge” database (52 results)--note that quotations are used as part of the search terms where specific phrases are needed in whole. We downloaded all records returned by each search (or the first 500 if there were more) into a Zotero reference management database. About one third of the 2500 papers in the final dataset were relevant. Three of the authors with specialized knowledge of the field characterized relevant papers using a standardized tagging protocol based on a series of key topics of interest. We developed a list of potential tags and descriptions for each topic, including: database(s) used, database accessibility, scale of study, region of study, taxa addressed, research use of data, other data types linked to species occurrence data, data quality issues addressed, authors, institutions, and funding sources. Each tagged paper was thoroughly checked by a second tagger.
The final dataset of tagged papers allow us to quantify general areas of research made possible by the expansion of online species occurrence databases, and trends over time. Analyses of this data will be published in a separate quantitative review.
Facebook
TwitterWastewater-based epidemiology, which is the science of studying community sewage for public health information, is one method of getting information about community health and narcotics consumption. In an effort to prevent and reduce opioid misuse, the City of Tempe is working with scientists from Arizona State University's Biodesign Institute to study the city's wastewater. This feature layer view shows the testing results from wastewater collection areas which are comprised of merged public sewage drainage basins that flow to a shared testing location for the opioid wastewater study. The collection area testing data are provided by scientists from Arizona State University's Biodesign Institute. The wastewater is tested for the presence of prescription opioid parent drug compounds such as Fentanyl, Oxycodone, Codeine, the illegally made opioid parent drug compound Heroin and opioid metabolite drug compounds including Norfentanyl, 6-Acetylmorphine and Noroxycodone. Additional Information Source: Arizona State University's Biodesign Institute provided Excel document. Contact: Kimberly Sotelo Contact E-Mail: Kimberly_Sotelo@tempe.gov Data Source Type: Table Preparation Method: Excel Spreadsheet entered into ArcGIS Survey 123 form which populates ArcGIS Online feature layer table. Publish Frequency: Weekly, as data becomes available Publish Method: The collection area testing data are provided by scientists from Arizona State University's Biodesign Institute and published to an ArcGIS Online feature layer table. Data Dictionary
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study, inspired by Tony Hey’s “The Data Deluge”, was carried out to discover whether the Institutional Repository is a suitable body to manage curation of and access to the source data created at the institution. This study surveyed research scientists at the White Rose Universities to discover their practices of and attitudes to research data storage, sharing and publication. Key role players in the research data management process were interviewed to investigate possible options for a data infrastructure, the capabilities of WRRO, the establishment of a discipline-based repository and issues of metadata assignment.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This archive contains the files submitted to the 4th International Workshop on Data: Acquisition To Analysis (DATA) at SenSys. Files provided in this package are associated with the paper titled "Dataset: Analysis of IFTTT Recipes to Study How Humans Use Internet-of-Things (IoT) Devices"
With the rapid development and usage of Internet-of-Things (IoT) and smart-home devices, researchers continue efforts to improve the ''smartness'' of those devices to address daily needs in people's lives. Such efforts usually begin with understanding evolving user behaviors on how humans utilize the devices and what they expect in terms of their behavior. However, while research efforts abound, there is a very limited number of datasets that researchers can use to both understand how people use IoT devices and to evaluate algorithms or systems for smart spaces. In this paper, we collect and characterize more than 50,000 recipes from the online If-This-Then-That (IFTTT) service to understand a seemingly straightforward but complicated question: ''What kinds of behaviors do humans expect from their IoT devices?'' The dataset we collected contains the basic information of the IFTTT rules, trigger and action event, and how many people are using each rule.
For more detail about this dataset, please refer to the paper listed below.
Don't forget to cite our paper if you used it in some publications :)
The citation information is listed below:
@inbook{10.1145/3485730.3494115,
author = {Yu, Haoxiang and Hua, Jie and Julien, Christine},
title = {Analysis of IFTTT Recipes to Study How Humans Use Internet-of-Things (IoT) Devices},
year = {2021},
isbn = {9781450390972},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3485730.3494115},
abstract = {With the rapid development and usage of Internet-of-Things (IoT) and smart-home devices, researchers continue efforts to improve the "smartness" of those devices to address daily needs in people's lives. Such efforts usually begin with understanding evolving user behaviors on how humans utilize the devices and what they expect in terms of their behavior. However, while research efforts abound, there is a very limited number of datasets that researchers can use to both understand how people use IoT devices and to evaluate algorithms or systems for smart spaces. In this paper, we collect and characterize more than 50,000 recipes from the online If-This-Then-That (IFTTT) service to understand a seemingly straightforward but complicated question: "What kinds of behaviors do humans expect from their IoT devices?" The dataset we collected contains the basic information of the IFTTT rules, trigger and action events, and how many people are using each rule.},
booktitle = {Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems},
pages = {537–541},
numpages = {5}
}
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Open Data 500, funded by the John S. and James L. Knight Foundation (http://www.knightfoundation.org/) and conducted by the GovLab, is the first comprehensive study of U.S. companies that use open government data to generate new business and develop new products and services.
Provide a basis for assessing the economic value of government open data
Encourage the development of new open data companies
Foster a dialogue between government and business on how government data can be made more useful
The Open Data 500 study is conducted by the GovLab at New York University with funding from the John S. and James L. Knight Foundation. The GovLab works to improve people’s lives by changing how we govern, using technology-enabled solutions and a collaborative, networked approach. As part of its mission, the GovLab studies how institutions can publish the data they collect as open data so that businesses, organizations, and citizens can analyze and use this information.
The Open Data 500 team has compiled our list of companies through (1) outreach campaigns, (2) advice from experts and professional organizations, and (3) additional research.
Outreach Campaign
Mass email to over 3,000 contacts in the GovLab network
Mass email to over 2,000 contacts OpenDataNow.com
Blog posts on TheGovLab.org and OpenDataNow.com
Social media recommendations
Media coverage of the Open Data 500
Attending presentations and conferences
Expert Advice
Recommendations from government and non-governmental organizations
Guidance and feedback from Open Data 500 advisors
Research
Companies identified for the book, Open Data Now
Companies using datasets from Data.gov
Directory of open data companies developed by Deloitte
Online Open Data Userbase created by Socrata
General research from publicly available sources
The Open Data 500 is not a rating or ranking of companies. It covers companies of different sizes and categories, using various kinds of data.
The Open Data 500 is not a competition, but an attempt to give a broad, inclusive view of the field.
The Open Data 500 study also does not provide a random sample for definitive statistical analysis. Since this is the first thorough scan of companies in the field, it is not yet possible to determine the exact landscape of open data companies.