Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Data Can is a dataset for object detection tasks - it contains Can annotations for 899 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset offers comprehensive details on the potential and current developments in the AI employment market. Data about job titles, necessary skills, pay ranges, experience levels, company kinds, and locations are all included. The dataset aids in determining the most sought after talents and the industry specific comparisons of various professions.
Understanding the changing job landscape has become crucial for both businesses and job seekers as artificial intelligence continues to revolutionize industries. The purpose of this dataset is to provide information on hiring trends, pay disparities, and new career pathways in the AI industry. It can be used for data analysis, visulization, and prediction tasks relating to employment patterns in technology driven areas.
As a valuable resource for analyzing and learning
Facebook
TwitterBy Crawl Feeds [source]
GameStop Product Reviews Dataset
Comprehensive and Detailed Customer Reviews and Ratings of Products from GameStop
Data Overview:
This dataset comprises a rich variety of information centered on customer reviews and ratings for products purchased from GameStop. For each review, the data includes detailed aspects such as the product name, brand, SKU (Stock Keeping Unit), helpful and non-helpful votes count, reviewer's name along with their review title & description. Further insights can be found through additional features that outline whether or not the reviewer recommends the product, whether they are a verified purchaser and encompass individual & average ratings for each product.
Other significant facets encapsulated within this valuable resource involve multimedia elements like images posted in reviews. To verify temporal relevance, timestamps revealing when the review was written (reviewed_at) as well as when the data was collected (scraped_at) are provided.
Additionally, URLs related to both specific items up for purchase (url) at GameStop's site and other users' reviews pages (reviews_link) have been accumulated within. The total number of customer feedback posts per item is also available under reviews_count series.
Structure:
The dataset structure presents serialized versions of the afore-mentioned fields. This includes strings such as 'name', 'brand', 'review_title', etc; date times including 'reviewed_at' and 'scraped_at'; floating point numbers such as 'rating' & 'average_rating'; integers representing counts ('helpful_count','not_helpful_count' ); boolean flags determining reviewers recommendations or verified purchase status ('recommended_review','verifed_purchaser') along with some potential null entries spotted across several columns making it dynamic yet intuitive even to an unfamiliar eye.
Use Case:
This dataset can serve multiple functions depending largely on user requirements.There are intriguing prospects around tracking consumer sentiments across time periods which could lend fascinating insights into sales patterns.Another possibility might revolve around determining best selling items or brands on GameStop according to customer impressions and sales counts. Additionally, there is potential to link buying trends with whether the product was purchased legitimately or not.
This dataset could also be used by product managers to enhance existing ones or create improved versions of them taking into account customer suggestions from their review content.Finaly, marketing teams could use this dataset to strategize campaigns by identifying products with positive reviews & scaling promotions for those.
Of course,the versatility of this resource opens up vast domains, ranging from sentiment analysis and recommendation systems using machine learning methodologies,to data visualization projects that help demonstrate consumer trends in a more approachable
Sentiment Analysis: Use the 'review_description' field to understand customer sentiment towards specific products. NLP techniques can be deployed to derive sentiments from reviews text, which could help in understanding overall consumer opinion.
Brand Analysis: Use the 'brand' field for comparative analysis of various brands sold on GameStop's platform.
Product Recommendation System: Develop a product recommendation system based on the user's past purchase record represented by 'brand', 'sku', and past reviews.
Customer Segmentation: Analyse fields like 'rating', 'recommended_review', and 'verifed_purchaser' for advanced segmentation of customers.
Product Performance Analysis: By examining fields like average rating (average_rating), number of reviews (reviews_count), recommend status (recommended_review), one can gauge how well a product is performing or received by customers.
Review Popularity Analysis: The dataset features two interesting variables - helpful_count and not_helpful_count; these reflect how other users perceived a review’s usefulness in helping them make purchasing decisions.
7 .**Time Series Forecasting**: Although we're instructed not to include any dates here, don't forget that this dataset has temporal elements ('reviewed_at') you could use for forecasting trends over time!
8 .**Reviewer Trustworthiness Assessment**: The verified purchaser field can be used as an indicator for trustworthiness of the review or reviewer bias.
P...
Facebook
Twitterhttps://qdr.syr.edu/policies/qdr-standard-access-conditionshttps://qdr.syr.edu/policies/qdr-standard-access-conditions
Project Overview Trends toward open science practices, along with advances in technology, have promoted increased data archiving in recent years, thus bringing new attention to the reuse of archived qualitative data. Qualitative data reuse can increase efficiency and reduce the burden on research subjects, since new studies can be conducted without collecting new data. Qualitative data reuse also supports larger-scale, longitudinal research by combining datasets to analyze more participants. At the same time, qualitative research data can increasingly be collected from online sources. Social scientists can access and analyze personal narratives and social interactions through social media such as blogs, vlogs, online forums, and posts and interactions from social networking sites like Facebook and Twitter. These big social data have been celebrated as an unprecedented source of data analytics, able to produce insights about human behavior on a massive scale. However, both types of research also present key epistemological, ethical, and legal issues. This study explores the issues of context, data quality and trustworthiness, data comparability, informed consent, privacy and confidentiality, and intellectual property and data ownership, with a focus on data curation strategies. The research suggests that connecting qualitative researchers, big social researchers, and curators can enhance responsible practices for qualitative data reuse and big social research. This study addressed the following research questions: RQ1: How is big social data curation similar to and different from qualitative data curation? RQ1a: How are epistemological, ethical, and legal issues different or similar for qualitative data reuse and big social research? RQ1b: How can data curation practices such as metadata and archiving support and resolve some of these epistemological and ethical issues? RQ2: What are the implications of these similarities and differences for big social data curation and qualitative data curation, and what can we learn from combining these two conversations? Data Description and Collection Overview The data in this study was collected using semi-structured interviews that centered around specific incidents of qualitative data archiving or reuse, big social research, or data curation. The participants for the interviews were therefore drawn from three categories: researchers who have used big social data, qualitative researchers who have published or reused qualitative data, and data curators who have worked with one or both types of data. Six key issues were identified in a literature review, and were then used to structure three interview guides for the semi-structured interviews. The six issues are context, data quality and trustworthiness, data comparability, informed consent, privacy and confidentiality, and intellectual property and data ownership. Participants were limited to those working in the United States. Ten participants from each of the three target populations—big social researchers, qualitative researchers who had published or reused data, and data curators were interviewed. The interviews were conducted between March 11 and October 6, 2021. When scheduling the interviews, participants received an email asking them to identify a critical incident prior to the interview. The “incident” in critical incident interviewing technique is a specific example that focuses a participant’s answers to the interview questions. The participants were asked their permission to have the interviews recorded, which was completed using the built-in recording technology of Zoom videoconferencing software. The author also took notes during the interviews. Otter.ai speech-to-text software was used to create initial transcriptions of the interview recordings. A hired undergraduate student hand-edited the transcripts for accuracy. The transcripts were manually de-identified. The author analyzed the interview transcripts using a qualitative content analysis approach. This involved using a combination of inductive and deductive coding approaches. After reviewing the research questions, the author used NVivo software to identify chunks of text in the interview transcripts that represented key themes of the research. Because the interviews were structured around each of the six key issues that had been identified in the literature review, the author deductively created a parent code for each of the six key issues. These parent codes were context, data quality and trustworthiness, data comparability, informed consent, privacy and confidentiality, and intellectual property and data ownership. The author then used inductive coding to create sub-codes beneath each of the parent codes for these key issues. Selection and Organization of Shared Data The data files consist of 28 of the interview transcripts themselves – transcripts from Big Science Researchers (BSR), Data Curators (DC), and Qualitative Researchers (QR)...
Facebook
TwitterUnder the new quarterly data summary (QDS) framework departments’ spending data is published every quarter; to show the taxpayer how the government is spending their money.
The QDS grew out of commitments made in the 2011 Budget and the written ministerial statement on business plans. For the financial year 2012 to 2013 the QDS has been revised and improved in line with action 9 of the Civil Service Reform Plan to provide a common set of data that will enable comparisons of operational performance across government so that departments and individuals can be held to account.
The QDS breaks down the total spend of the department in 3 ways:
The QDS template is the same for all departments, though the individual detail of grants and policy will differ from department to department. In using this data:
Please note that the quarter 1 2012 to 2013 return for the Department of Transport (DfT) is for the core department only.
Quarterly data summaries for April 2012 are as follows:
Quarterly data summaries for January 2012 are as follows:
Quarterly data summaries for October 2011 are as follows:
Facebook
TwitterThis dataset was created to pilot techniques for creating synthetic data from datasets containing sensitive and protected information in the local government context. Synthetic data generation replaces actual data with representative data generated from statistical models; this preserves the key data properties that allow insights to be drawn from the data while protecting the privacy of the people included in the data. We invite you to read the Understanding Synthetic Data white paper for a concise introduction to synthetic data.
This effort was a collaboration of the Urban Institute, Allegheny County’s Department of Human Services (DHS) and CountyStat, and the University of Pittsburgh’s Western Pennsylvania Regional Data Center.
The source data for this project consisted of 1) month-by-month records of services included in Allegheny County's data warehouse and 2) demographic data about the individuals who received the services. As the County’s data warehouse combines this service and client data, this data is referred to as “Integrated Services data”. Read more about the data warehouse and the kinds of services it includes here.
Synthetic data are typically generated from probability distributions or models identified as being representative of the confidential data. For this dataset, a model of the Integrated Services data was used to generate multiple versions of the synthetic dataset. These different candidate datasets were evaluated to select for publication the dataset version that best balances utility and privacy. For high-level information about this evaluation, see the Synthetic Data User Guide.
For more information about the creation of the synthetic version of this data, see the technical brief for this project, which discusses the technical decision making and modeling process in more detail.
This disaggregated synthetic data allows for many analyses that are not possible with aggregate data (summary statistics). Broadly, this synthetic version of this data could be analyzed to better understand the usage of human services by people in Allegheny County, including the interplay in the usage of multiple services and demographic information about clients.
Some amount of deviation from the original data is inherent to the synthetic data generation process. Specific examples of limitations (including undercounts and overcounts for the usage of different services) are given in the Synthetic Data User Guide and the technical report describing this dataset's creation.
Please reach out to this dataset's data steward (listed below) to let us know how you are using this data and if you found it to be helpful. Please also provide any feedback on how to make this dataset more applicable to your work, any suggestions of future synthetic datasets, or any additional information that would make this more useful. Also, please copy wprdc@pitt.edu on any such feedback (as the WPRDC always loves to hear about how people use the data that they publish and how the data could be improved).
1) A high-level overview of synthetic data generation as a method for protecting privacy can be found in the Understanding Synthetic Data white paper.
2) The Synthetic Data User Guide provides high-level information to help users understand the motivation, evaluation process, and limitations of the synthetic version of Allegheny County DHS's Human Services data published here.
3) Generating a Fully Synthetic Human Services Dataset: A Technical Report on Synthesis and Evaluation Methodologies describes the full technical methodology used for generating the synthetic data, evaluating the various options, and selecting the final candidate for publication.
4) The WPRDC also hosts the Allegheny County Human Services Community Profiles dataset, which provides annual updates on human-services usage, aggregated by neighborhood/municipality. That data can be explored using the County's Human Services Community Profile web site.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For more information, see the Aquatic Biodiversity Index Factsheet at https://nrm.dfg.ca.gov/FileHandler.ashx?DocumentID=150856" STYLE="text-decoration:underline;">https://nrm.dfg.ca.gov/FileHandler.ashx?DocumentID=150856.
The California Department of Fish and Wildlife’s (CDFW) Areas of Conservation Emphasis (ACE) is a compilation and analysis of the best-available statewide spatial information in California on biodiversity, rarity and endemism, harvested species, significant habitats, connectivity and wildlife movement, climate vulnerability, climate refugia, and other relevant data (e.g., other conservation priorities such as those identified in the State Wildlife Action Plan (SWAP), stressors, land ownership). ACE addresses both terrestrial and aquatic data. The ACE model combines and analyzes terrestrial information in a 2.5 square mile hexagon grid and aquatic information at the HUC12 watershed level across the state to produce a series of maps for use in non-regulatory evaluation of conservation priorities in California. The model addresses as many of CDFWs statewide conservation and recreational mandates as feasible using high quality data sources. High value areas statewide and in each USDA Ecoregion were identified. The ACE maps and data can be viewed in the ACE online map viewer, or downloaded for use in ArcGIS. For more detailed information see https://www.wildlife.ca.gov/Data/Analysis/ACE" STYLE="text-decoration:underline;">https://www.wildlife.ca.gov/Data/Analysis/ACE and https://nrm.dfg.ca.gov/FileHandler.ashx?DocumentID=24326" STYLE="text-decoration:underline;">https://nrm.dfg.ca.gov/FileHandler.ashx?DocumentID=24326.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Under the new QDS framework departments’ spending data is published every quarter; to show the taxpayer how the Government is spending their money. The QDS grew out of commitments made in the 2011 Budget and the Written Ministerial Statement on Business Plans. For the financial year 2012/13 the QDS has been revised and improved in line with Action 9 of the Civil Service Reform Plan to provide a common set of data that will enable comparisons of operational performance across Government so that departments and individuals can be held to account. Q1 2012/13 is the first set of this new data collection and comprises of different categories and subsets. As collection proceeds, we expect to be able to make meaningful comparisons on what Departments are spending. The QDS breaks down the total spend of the department in three ways: by Budget, by Internal Operation and by Transaction. At the moment this data is published by individual departments in Excel format, however, in the future the intention is to make this data available centrally through an online application. Over time we will be making further improvements to the quality of the data and its timeliness. We expect that with time this process will allow the public to better understand the performance of each department and government operations in a meaningful way. The QDS template is the same for all departments, though the individual detail of grants and policy will differ from department to department. In using this data: 1. People should ensure they take full note of the caveats noted in each Department’s return. 2. As the improvement of the QDS is an ongoing process data quality and completeness will be developed over time and therefore necessary caution should be applied to any comparative analysis undertaken. Departmental Commentary The Cabinet Office departmental family includes the Civil Service Commission. The figures for the Government Procurement Service are not included in the figures for Quarter 1.
Facebook
TwitterThis report summarizes data on COVID-19 cases and COVID-19 associated deaths by race/ethnicity for the state of Connecticut and the 10 largest Connecticut towns. Data on race/ethnicity are missing on almost half (47%) of reported COVID-19 cases. CT DPH has urged healthcare providers and laboratories to complete information on race/ethnicity for all COVID-19 cases. All data in this report are preliminary; data will be updated as new COVID-19 case reports are received and data errors are corrected. Data on COVID-19 cases and COVID-19-associated deaths were last updated on April 20, 2020 at 3 PM. Information about race and ethnicity are collected on the Connecticut Department of Public Health (DPH) COVID-19 case report form, which is completed by healthcare providers for laboratory-confirmed COVID-19 cases. Information about the race/ethnicity of COVID-19-associated deaths also are collected by the Connecticut Office of the Chief Medical Examiner and shared with DPH. Race/ethnicity categories used in this report are mutually exclusive. People answering ‘yes’ to more than one race category are counted as ‘other’.
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Sample Sales Data is a retail sales dataset of 2,823 orders and 25 columns that includes a variety of sales-related data, including order numbers, product information, quantity, unit price, sales, order date, order status, customer and delivery information.
2) Data Utilization (1) Sample Sales Data has characteristics that: • This dataset consists of numerical (sales, quantity, unit price, etc.), categorical (product, country, city, customer name, transaction size, etc.), and date (order date) variables, with missing values in some columns (STATE, ADDRESSLINE2, POSTALCODE, etc.). (2) Sample Sales Data can be used to: • Analysis of sales trends and performance by product: Key variables such as order date, product line, and country can be used to visualize and analyze monthly and yearly sales trends, the proportion of sales by product line, and top sales by country and region. • Segmentation and marketing strategies: Segmentation of customer groups based on customer information, transaction size, and regional data, and use them to design targeted marketing and customized promotion strategies.
Facebook
TwitterThis report describes the quality assurance arrangements for the registered provider (RP) Tenant Satisfaction Measures statistics, providing more detail on the regulatory and operational context for data collections which feed these statistics and the safeguards that aim to maximise data quality.
The statistics we publish are based on data collected directly from local authority registered provider (LARPs) and from private registered providers (PRPs) through the Tenant Satisfaction Measures (TSM) return. We use the data collected through these returns extensively as a source of administrative data. The United Kingdom Statistics Authority (UKSA) encourages public bodies to use administrative data for statistical purposes and, as such, we publish these data.
These data are first being published in 2024, following the first collection and publication of the TSM.
In February 2018, the UKSA published the Code of Practice for Statistics. This sets standards for organisations producing and publishing statistics, ensuring quality, trustworthiness and value.
These statistics are drawn from our TSM data collection and are being published for the first time in 2024 as official statistics in development.
Official statistics in development are official statistics that are undergoing development. Over the next year we will review these statistics and consider areas for improvement to guidance, validations, data processing and analysis. We will also seek user feedback with a view to improving these statistics to meet user needs and to explore issues of data quality and consistency.
Until September 2023, ‘official statistics in development’ were called ‘experimental statistics’. Further information can be found on the https://www.ons.gov.uk/methodology/methodologytopicsandstatisticalconcepts/guidetoofficialstatisticsindevelopment">Office for Statistics Regulation website.
We are keen to increase the understanding of the data, including the accuracy and reliability, and the value to users. Please https://forms.office.com/e/cetNnYkHfL">complete the form or email feedback, including suggestions for improvements or queries as to the source data or processing to enquiries@rsh.gov.uk.
We intend to publish these statistics in Autumn each year, with the data pre-announced in the release calendar.
All data and additional information (including a list of individuals (if any) with 24 hour pre-release access) are published on our statistics pages.
The data used in the production of these statistics are classed as administrative data. In 2015 the UKSA published a regulatory standard for the quality assurance of administrative data. As part of our compliance to the Code of Practice, and in the context of other statistics published by the UK Government and its agencies, we have determined that the statistics drawn from the TSMs are likely to be categorised as low-quality risk – medium public interest (with a requirement for basic/enhanced assurance).
The publication of these statistics can be considered as medium publi
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionA required step for presenting results of clinical studies is the declaration of participants demographic and baseline characteristics as claimed by the FDAAA 801. The common workflow to accomplish this task is to export the clinical data from the used electronic data capture system and import it into statistical software like SAS software or IBM SPSS. This software requires trained users, who have to implement the analysis individually for each item. These expenditures may become an obstacle for small studies. Objective of this work is to design, implement and evaluate an open source application, called ODM Data Analysis, for the semi-automatic analysis of clinical study data.MethodsThe system requires clinical data in the CDISC Operational Data Model format. After uploading the file, its syntax and data type conformity of the collected data is validated. The completeness of the study data is determined and basic statistics, including illustrative charts for each item, are generated. Datasets from four clinical studies have been used to evaluate the application’s performance and functionality.ResultsThe system is implemented as an open source web application (available at https://odmanalysis.uni-muenster.de) and also provided as Docker image which enables an easy distribution and installation on local systems. Study data is only stored in the application as long as the calculations are performed which is compliant with data protection endeavors. Analysis times are below half an hour, even for larger studies with over 6000 subjects.DiscussionMedical experts have ensured the usefulness of this application to grant an overview of their collected study data for monitoring purposes and to generate descriptive statistics without further user interaction. The semi-automatic analysis has its limitations and cannot replace the complex analysis of statisticians, but it can be used as a starting point for their examination and reporting.
Facebook
TwitterAbout the Dataset
This data set contains claims information for meal reimbursement for sites participating in CACFP as child centers for the program year 2024-2025. This includes Child Care Centers, At-Risk centers, Head Start sites, Outside School Hours sites, and Emergency Shelters . The CACFP program year begins October 1 and ends September 30.
This dataset only includes claims submitted by CACFP sites operating as child centers.Sites can participate in multiple CACFP sub-programs. Each record (row) represents monthly meals data for a single site and for a single CACFP center sub-program.
To filter data for a specific CACFP center Program, select "View Data" to open the Exploration Canvas filter tools. Select the program(s) of interest from the Program field. A filtering tutorial can be found HERE
For meals data on CACFP participants operating as Day Care Homes, Adult Day Care Centers, or child care centers for previous program years, please refer to the corresponding “Child and Adult Care Food Programs (CACFP) – Meal Reimbursement” dataset for that sub-program available on the State of Texas Open Data Portal.
An overview of all CACFP data available on the Texas Open Data Portal can be found at our TDA Data Overview - Child and Adult Care Food Programs page.
An overview of all TDA Food and Nutrition data available on the Texas Open Data Portal can be found at our TDA Data Overview - Food and Nutrition Open Data page.
More information about accessing and working with TDA data on the Texas Open Data Portal can be found on the SquareMeals.org website on the TDA Food and Nutrition Open Data page.
About Dataset Updates
TDA aims to post new program year data by December 15 of the active program year. Participants have 60 days to file monthly reimbursement claims. Dataset updates will occur daily until 90 days after the close of the program year. After 90 days from the close of the program year, the dataset will be updated at six months and one year from the close of the program year before becoming archived. Archived datasets will remain published but will not be updated. Any data posted during the active program year is subject to change.
About the Agency
The Texas Department of Agriculture administers 12 U.S. Department of Agriculture nutrition programs in Texas including the National School Lunch and School Breakfast Programs, the Child and Adult Care Food Programs (CACFP), and the summer meal programs. TDA’s Food and Nutrition division provides technical assistance and training resources to partners operating the programs and oversees the USDA reimbursements they receive to cover part of the cost associated with serving food in their facilities. By working to ensure these partners serve nutritious meals and snacks, the division adheres to its mission — Feeding the Hungry and Promoting Healthy Lifestyles.
For more information on these programs, please visit our website.
"
Facebook
TwitterGlobal COVID-19 surveys conducted by National Statistical Offices. This dataset has several columns that contain different types of information. Here's a brief explanation of each column:
1.**Country**: This column likely contains the names of the countries for which the survey data is collected. Each row represents data related to a specific country.
2.**Category**: This column might contain information about the type or category of the survey. It could include categories such as healthcare, economic impact, public sentiment, etc. This helps in categorizing the surveys.
3.**Title and Link**: These columns may contain the title or name of the specific survey and a link to the source or webpage where more information about the survey can be found. The link can be useful for referencing the original source of the data.
4.**Description**: This column likely contains a brief description or summary of the survey's objectives, methodology, or key findings. It provides additional context for the survey data.
5.**Source**: This column may contain information about the organization or agency that conducted the survey. It's essential for understanding the authority behind the data.
6.**Date Added**: This column probably contains the date when the survey data was added to the dataset. This helps track the freshness of the data and can be useful for historical analysis.
With this dataset, you can perform various types of analysis, including but not limited to:
Country-based analysis: You can analyze survey data for specific countries to understand the impact of COVID-19 in different regions.
Category-based analysis: You can group surveys by category and analyze trends or patterns related to healthcare, economics, or public sentiment.
Temporal analysis: You can examine how survey data has evolved over time by using the "Date Added" column to track changes and trends.
Source-based analysis: You can assess the reliability and credibility of the data by considering the source of the surveys.
Data visualization: Create visual representations like charts, graphs, and maps to make the data more understandable and informative.
Before conducting any analysis, it's essential to clean and preprocess the data, handle missing values, and ensure data consistency. Additionally, consider the research questions or insights you want to gain from the dataset, which will guide your analysis approach.
Facebook
TwitterThis checklist serves to guide a general review of a data package submitted to the Eawag Research Data Repository. A General review, as opposed to a domain specific review can be conducted by people without expertise in the scientific field the data package relates to.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
1.National/Regional Policies 1.1 Paris Agreement Ratification I believe Paris Agreement has fostered the ambition of the countries to revise and implement their climate goals. However, not all of them have ratified it. I think the countries which have not ratified the agreement might show a different direction/ or they might be less ambitious in putting climate change adaptation and mitigation into their political agenda/processes/actions.
Data Source: https://treaties.un.org/Pages/ViewDetails.aspx?src=TREATY&mtdsg_no=XXVII-7-d&chapter=27&clang=_en
1.2 Climate Change ambitions While the majority of countries have promised to take action against climate change through the Paris Agreement, not all of them are working towards reaching the 1.5C goal at the same level. Climate tracker is an organization tracking the “ambition level” and progress of the countries which I believe could be a fruitful source of data.
Overview: https://climateactiontracker.org/publications/paris-agreement-benchmarks/ Data Source: https://climateactiontracker.org/data-portal/
1.3 Carbon Pricing Some countries/regions implement carbon pricing mechanism which is proven to be an efficient mechanism for decreasing carbon emissions. Worldbank provides a dashboard with carbon pricing data and information about the countries. Overview: https://carbonpricingdashboard.worldbank.org/ Data source: https://carbonpricingdashboard.worldbank.org/map_data
2. Economy 2.1 Composition of the sectors I know it is already shared by others, but the World Bank also provides further information on countries’ economy structures. One thing that I believe could be useful further to the GDP is the sector composition of the country which could play a role in countries' emission reduction. While it is easier for services to reach net-zero, it is harder for manufacturing. (this is also valid for companies, it is much easier to reach net-zero emission for a service company, but it could be very difficult for a steel production/processing plant to be emission-free). Overview: https://data.worldbank.org/indicator/NV.IND.MANF.ZS Data source: http://wdi.worldbank.org/table/4.2
2.2 Innovation Index Combating climate change requires fundamental changes in the systems that we have been living. Thus, innovation (technological, business model, political, social…) is necessary at all levels. Therefore, I believe the Global Innovation Index (GII) can be used as a proxy to measure innovative activities. Overview: https://www.globalinnovationindex.org/home Data source: https://www.globalinnovationindex.org/analysis-indicator
3. Low carbon Technologies Development, production and adoption of clean energy technologies are vital for lower carbon transitions. While latest developments in solar technologies made it both the cheapest and clean energy source, there is still a long way to reach a “reliable” technology to be considered as a commercially feasible option for Carbon Capture and Storage. IEA provides information related to low carbon RDDs, but it has limited country data (Mostly OECD countries). Overview: https://www.iea.org/fuels-and-technologies Data Source: https://www.iea.org/reports/energy-technology-rdd-budgets-2020
4. Development & Just Transitions 4.1 Energy Access Today there are still millions of people who don’t have access to electricity and clean cooking. Although for some countries finding ways to decrease emissions, for some others to ensure their population’s “reliable affordable and clean energy access” (SDG 7, UNDP) is the challenge. The World Bank provides data on Electricity production, sources, and percentage of the population who has access to electricity by country as part of World Development Indicators. Overview: https://data.worldbank.org/indicator/EG.ELC.ACCS.ZS Data Source: http://wdi.worldbank.org/table/3.7
4.2. Bonus: Environmental Justice (no dataset uploaded- just qualitative data) Environmental Justice Atlas is a citizen-led mapping tool which shows conflicts related to environmental injustices. The data cannot be fully downloaded and subject to restrictive data use, and I am not sure even if it could be quantified. But, I believe it could be useful to think about the social aspects of transitions. https://ejatlas.org/
Facebook
TwitterThe Government published Business Plan quarterly data summaries (QDS) on 18 July 2011.
They provide a quarterly snapshot on how each department is spending its budget, the results it has achieved and how it is deploying its workforce.
The QDS follows commitments made at Budget 2011 and the Written Ministerial Statement on Business Plans. Their primary purpose is to make more of the management information currently held by government available to members of the public on a regular basis. This information is not audited and the quality and accuracy of the data needs to dramatically improve. However, over time with improvements in data quality and timeliness the public will be able to judge the performance of each department in a meaningful and understandable manner.
We intend for an annual version of this information to be formally laid in Parliament in the Annual Report and Accounts for July 2011/12 onwards.
The information is presented in a re-usable format.
The QDS template is the same for all departments, though many of the individual indicators are unique to the department (especially input and impact indicators).
This is the first time Government has published this kind of information, and while this is a good start, there is room for improvement. Before using this data people should ensure they take full note of the caveats noted in each Department’s measurement annex and treat with necessary caution.
At the moment, people should not be using this data to make direct comparisons between departments for several reasons. Firstly, the business of each department is unique and it does not make sense to compare some measures across all departments. Secondly, many of the measures are not directly comparable because they do not have common definitions, time periods, or data collection processes.
We will be updating regularly the QDS each quarter with the next publication following in October 2011.
Quarterly Data Summary (QDS)
Under the new QDS framework departments’ spending data is published every quarter; to show the taxpayer how the Government is spending their money. The QDS grew out of commitments made in the 2011 Budget and the Written Ministerial Statement on Business Plans. For the financial year 2012/13 the QDS has been revised and improved in line with Action 9 of the Civil Service Reform Plan to provide a common set of data that will enable comparisons of operational performance across Government so that departments and individuals can be held to account.
The QDS breaks down the total spend of the department in three ways: by Budget, by Internal Operation and by Transaction. At the moment this data is published by individual departments in Excel format, however, in the future the intention is to make this data available centrally through an online application.
Over time we will be making further improvements to the quality of the data and its timeliness. We expect that with time this process will allow the public to better understand the performance of each department and government operations in a meaningful way.
The QDS template is the same for all departments, though the individual detail of grants and policy will differ from department to department. In using this data:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data Analysis is the process that supports decision-making and informs arguments in empirical studies. Descriptive statistics, Exploratory Data Analysis (EDA), and Confirmatory Data Analysis (CDA) are the approaches that compose Data Analysis (Xia & Gong; 2014). An Exploratory Data Analysis (EDA) comprises a set of statistical and data mining procedures to describe data. We ran EDA to provide statistical facts and inform conclusions. The mined facts allow attaining arguments that would influence the Systematic Literature Review of DL4SE.
The Systematic Literature Review of DL4SE requires formal statistical modeling to refine the answers for the proposed research questions and formulate new hypotheses to be addressed in the future. Hence, we introduce DL4SE-DA, a set of statistical processes and data mining pipelines that uncover hidden relationships among Deep Learning reported literature in Software Engineering. Such hidden relationships are collected and analyzed to illustrate the state-of-the-art of DL techniques employed in the software engineering context.
Our DL4SE-DA is a simplified version of the classical Knowledge Discovery in Databases, or KDD (Fayyad, et al; 1996). The KDD process extracts knowledge from a DL4SE structured database. This structured database was the product of multiple iterations of data gathering and collection from the inspected literature. The KDD involves five stages:
Selection. This stage was led by the taxonomy process explained in section xx of the paper. After collecting all the papers and creating the taxonomies, we organize the data into 35 features or attributes that you find in the repository. In fact, we manually engineered features from the DL4SE papers. Some of the features are venue, year published, type of paper, metrics, data-scale, type of tuning, learning algorithm, SE data, and so on.
Preprocessing. The preprocessing applied was transforming the features into the correct type (nominal), removing outliers (papers that do not belong to the DL4SE), and re-inspecting the papers to extract missing information produced by the normalization process. For instance, we normalize the feature “metrics” into “MRR”, “ROC or AUC”, “BLEU Score”, “Accuracy”, “Precision”, “Recall”, “F1 Measure”, and “Other Metrics”. “Other Metrics” refers to unconventional metrics found during the extraction. Similarly, the same normalization was applied to other features like “SE Data” and “Reproducibility Types”. This separation into more detailed classes contributes to a better understanding and classification of the paper by the data mining tasks or methods.
Transformation. In this stage, we omitted to use any data transformation method except for the clustering analysis. We performed a Principal Component Analysis to reduce 35 features into 2 components for visualization purposes. Furthermore, PCA also allowed us to identify the number of clusters that exhibit the maximum reduction in variance. In other words, it helped us to identify the number of clusters to be used when tuning the explainable models.
Data Mining. In this stage, we used three distinct data mining tasks: Correlation Analysis, Association Rule Learning, and Clustering. We decided that the goal of the KDD process should be oriented to uncover hidden relationships on the extracted features (Correlations and Association Rules) and to categorize the DL4SE papers for a better segmentation of the state-of-the-art (Clustering). A clear explanation is provided in the subsection “Data Mining Tasks for the SLR od DL4SE”. 5.Interpretation/Evaluation. We used the Knowledge Discover to automatically find patterns in our papers that resemble “actionable knowledge”. This actionable knowledge was generated by conducting a reasoning process on the data mining outcomes. This reasoning process produces an argument support analysis (see this link).
We used RapidMiner as our software tool to conduct the data analysis. The procedures and pipelines were published in our repository.
Overview of the most meaningful Association Rules. Rectangles are both Premises and Conclusions. An arrow connecting a Premise with a Conclusion implies that given some premise, the conclusion is associated. E.g., Given that an author used Supervised Learning, we can conclude that their approach is irreproducible with a certain Support and Confidence.
Support = Number of occurrences this statement is true divided by the amount of statements Confidence = The support of the statement divided by the number of occurrences of the premise
Facebook
TwitterUnder the new quarterly data summary framework departments’ spending data is published every quarter; to show the taxpayer how the government is spending their money.
The QDS grew out of commitments made in the 2011 Budget and the written ministerial statement on business plans. For the financial year 2012 to 2013 the QDS has been revised and improved in line with Action 9 of the http://www.civilservice.gov.uk/reform">Civil Service Reform Plan to provide a common set of data that will enable comparisons of operational performance across government so that departments and individuals can be held to account.
The QDS breaks down the total spend of the department in 3 ways: by budget, by internal operation and by transaction. At the moment this data is published by individual departments in Excel format, however, in the future the intention is to make this data available centrally through an online application.
Over time we will be making further improvements to the quality of the data and its timeliness. We expect that with time this process will allow the public to better understand the performance of each department and government operations in a meaningful way.
The QDS template is the same for all departments, though the individual detail of grants and policy will differ from department to department. In using this data:
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve.
The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj.
The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 .
The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 .
The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed.
COVID-19 state summary including the following metrics, including the change from the data reported the previous day:
COVID-19 Cases (confirmed and probable) COVID-19 Tests Reported (molecular and antigen) Daily Test Positivity Patients Currently Hospitalized with COVID-19 COVID-19-Associated Deaths
Additional notes: The cumulative count of tests reported for 1/17/2021 includes 286,103 older tests from previous dates, which had been missing from previous reports due to a data processing error. The older tests were added to the cumulative count of tests reported, but they were not included in the calculation of change from the previous reporting day or daily percent test positivity.
Starting 5/10/2021, the date field will represent the date this data was updated on data.ct.gov. Previously the date the data was pulled by DPH was listed, which typically coincided with the date before the data was published on data.ct.gov. This change was made to standardize the COVID-19 data sets on data.ct.gov.
Starting April 4, 2022, negative rapid antigen and rapid PCR test results for SARS-CoV-2 are no longer required to be reported to the Connecticut Department of Public Health as of April 4. Negative test results from laboratory based molecular (PCR/NAAT) results are still required to be reported as are all positive test results from both molecular (PCR/NAAT) and antigen tests.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Data Can is a dataset for object detection tasks - it contains Can annotations for 899 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).