Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The data contained in the table describes COVID-19 in Canada in terms of number of cases and deaths at the provincial and national levels from January 31, 2020 to present time. It also describes the number of tests performed and the number of people recovered. The values displayed in the table are provided by the Public Health Infobase, managed by the Health Promotion and Chronic Disease Prevention Branch (HPCDPB) of the Public Health Agency of Canada (PHAC). The values are updated daily.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Overview
The COVID-19 Patient Recovery Dataset is a synthetic collection of anonymized records for around 70,000 COVID-19 patients. It aims to assist with classification tasks in machine learning and epidemiological research. The dataset includes detailed clinical and demographic information, such as symptoms, existing health issues, vaccination status, COVID-19 variants, treatment details, and outcomes related to recovery or mortality. This dataset is great for predicting patient recovery (recovered), mortality (death), disease severity (severity), or the need for intensive care (icu_admission) using algorithms like Logistic Regression, Random Forest, XGBoost, or Neural Networks. It also allows for exploratory data analysis (EDA), statistical modeling, and time-series studies to find patterns in COVID-19 outcomes.
The data is synthetic and reflects realistic trends found in public health data, based on sources like WHO reports. It ensures privacy and follows ethical guidelines. Dates are provided in Excel serial format, meaning 44447 corresponds to September 8, 2021, and can be converted to standard dates using Python’s datetime or Excel. With 70,000 records and 28 columns, this dataset serves as a valuable resource for data scientists, researchers, and students interested in health-related machine learning or pandemic trends.
Data Source and Collection
Source: Synthetic data based on public health patterns from sources like the World Health Organization (WHO). It includes placeholder URLs.
Collection Period: Simulated from early 2020 to mid-2022, covering the Alpha, Delta, and Omicron waves.
Number of Records: 70,000.
File Format: CSV, which works with Pandas, R, Excel, and more.
Data Quality Notes:
About 5% of the values are missing in fields like symptoms_2, symptoms_3, treatment_given_2, and date.
There are rare inconsistencies, such as between recovery/death flags and dates, which may need some preprocessing.
Unique, anonymized patient IDs.
| Column Name | Data Type |
|---|---|
| patient_id | String |
| country | String |
| region/state | String |
| date_reported | Integer |
| age | Integer |
| gender | String |
| comorbidities | String |
| symptoms_1 | String |
| symptoms_2 | String |
| symptoms_3 | String |
| severity | String |
| hospitalized | Integer |
| icu_admission | Integer |
| ventilator_support | Integer |
| vaccination_status | String |
| variant | String |
| treatment_given_1 | String |
| treatment_given_2 | String |
| days_to_recovery | Integer |
| recovered | Integer |
| death | Integer |
| date_of_recovery | Integer |
| date_of_death | Integer |
| tests_conducted | Integer |
| test_type | String |
| hospital_name | String |
| doctor_assigned | String |
| source_url | String |
Key Column Details
patient_id: Unique identifier (e.g., P000001).
country: Reporting country (e.g., India, USA, Brazil, Germany, China, Pakistan, South Africa, UK).
region/state: Sub-national region (e.g., Sindh, California, São Paulo, Beijing).
date_reported, date_of_recovery, date_of_death: Excel serial dates (convert using datetime(1899,12,30) + timedelta(days=value)).
age: Patient age (1–100 years).
gender: Male or Female.
comorbidities: Pre-existing conditions (e.g., Diabetes, Hypertension, Cancer, Heart Disease, Asthma, None).
symptoms_1, symptoms_2, symptoms_3: Reported symptoms (e.g., Cough, Fever, Fatigue, Loss of Smell, Sore Throat, or empty).
severity: Case severity (Mild, Moderate, Severe, Critical).
hospitalized, icu_admission, ventilator_support: Binary (1 = Yes, 0 = No).
vaccination_status: None, Partial, Full, or Booster.
variant: COVID-19 variant (Omicron, Delta, Alpha).
treatment_given_1, treatment_given_2: Treatments administered (e.g., Antibiotics, Remdesivir, Oxygen, Steroids, Paracetamol, or empty).
days_to_recovery: Days from report to recovery (5–30, or empty if not recovered).
recovered, death: Binary outcomes (1 = Yes, 0 = No; generally mutually exclusive).
tests_conducted: Number of tests (1–5).
test_type: PCR or Antigen.
hospital_name: Fictional hospital (e.g., Aga Khan, Mayo Clinic, NHS Trust).
doctor_assigned: Fictional doctor name (e.g., Dr. Smith, Dr. Müller).
source_url: Placeholder.
Summary Statistics
Total Patients: 70,000.
Age: Mean ~50 years, Min 1, Max 100, evenly distributed.
Gender: ~50% Male, ~50% Female.
Top Countries: USA (20%), India (18%), Brazil (15%), China (12%), Germany (10%).
Comorbidities: Diabetes (25%), Hypertension (20%), Cancer (15%), Heart Disease (15%), Asthma (10%), None (15%).
Severity: Mild (60%), Moderate (25%), Severe (10%), Critical (5%).
Recovery Rate: ~60% recovered (recovered=1), ~30% deceased (death=1), ~10% unresolved (both 0).
Vaccination: None (40%), Full (30%), Partial (15%), Booster (15%).
Variants: Omicron (50%), Delt...
Facebook
Twitter*** The County of Santa Clara Public Health Department discontinued updates to the COVID-19 data tables effective June 30, 2025. The COVID-19 data tables will be removed from the Open Data Portal on December 30, 2025. For current information on COVID-19 in Santa Clara County, please visit the Respiratory Virus Dashboard [sccphd.org/respiratoryvirusdata]. For any questions, please contact phinternet@phd.sccgov.org ***
The dataset provides number of new and cumulative COVID-19 cases over time among Santa Clara County residents. A case is someone who tests positive for COVID-19 using viral testing performed in a lab. Source: California Reportable Disease Information Exchange. Data Notes: Cases are reported by the date that the specimens were collected for testing. Values for the most recent 5 days are likely to increase as additional results are received.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the data used in "A Power-Law-Based Approach to Mapping COVID-19 Cases in the United States". By Bin Jiang and Chris de Rijke.The input data about the COVID-19 cases is obtained from https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data - which is maintained by Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE).The retrieved data used and obtained for this study are available in the "RawInputData" folder, where Daily and Weekly Cases are separated as well.Shapefiles and administrative borders used for the figures and calculations are available in the "Figure2and4" folder. A combination of these shapefiles and the databaseis used to visualize the COVID-19 cases.Processed powerlaw data can be found in the "Powerlawdata" folder in an excel file. All powerlaw results are available in a raw format.Finally data used to create the time animation (http://lifegis.hig.se/COVID19/) is available in the folder "TimeAnimation" where a combination of an organized shapefiles and raw .csv data is used for the visualization.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I am no longer updating this dataset. The purpose of this dataset was to track the changes in testing over time. Since then I believe there are better resources where you can get this information. Some open datasets which will give better documented: https://ourworldindata.org/grapher/full-list-total-tests-for-covid-19
For data related to testing in India, you can refer to the api endpoints provided by covid19india.org https://api.covid19india.org
I am trying to highlight the relationship between number of tests conducted vs. the number of confirmed cases. Is this metric important? we will find out - either via experience or through rigorous analysis.
Number of actual cases >> Number of confirmed cases
The dataset has been updated with a concatenated file Thanks to @Kamil Kiljan for suggesting the update filename: TestsConducted_AllDates_ddMMMYYYY
What's inside is more than just rows and columns. Please check the data definitions & change logs below
Update: March 31st 2020
The original location has not seen any new updates. Hence I have taken the information from a different source. Added source information
Wiki page on Covid-19 Testing
Check file: Tests_Conducted_31Mar2020.csv
Update: March 24th 2020 The data has been scraped from the following web-page Coronovirus Testing Data
The copyrights for the splash image belong to Jim Huylebroek for The New York TimesNYTimes Can't get tested? Maybe you are in the wrong country
The kernel used for extracting the information is provided as a kernel - Notebook for web-scraping & extracting information
Notebook illustrating insights that can be derived from the dataset - Test, Test and Test
This data can be used in conjunction with the following: 1. Health expenditure per capita and number of hospital beds per 1000s 2. Intervention measures employed by individual governments
Also please read Nate Silvers critique on how the number of positive cases doesn't mean anything unless we know how many tests were conducted & the testing strategy.
Date 09th June 2020 Updated.
Date 01st June 2020 Updated.
Date 23rd May 2020 Updated.
Date 11th May 2020
Concatenated all older datasets into a single file : TestsConducted_AllDates_ddbbbYYYY.csv
Notebook used for concatenating the datasets: Kernel Link
The April 15th file didn't have the 'Tests' column populated. Hence was calculated in the updated file. If you are not comfortable using it, please drop rows using the following code:
df = df.drop(df[df['FileDate']=='15April2020'].index)
Date: 8th May 2020 Updated.
Date: 5th May 2020
Updated. No change in data structure.
Replaced excel file with csv. This is for data before 31st March:Tests_Conducted_DEPRECEATED.csv
Date: 1st May 2020
Updated till date Minor changes in column names
Tests -> Tested
Tests /millionpeople -> Tested /millionpeople
New Column % added
Date: 26th April 2020
This was long delayed!
Date: April 15th 2020
Latest file: Tests_Conducted_15April2020.csv
Note that column names have changed in this file. This was because they were changed in the source file.
Positive / thousand(has changed to) Positive /millionpeople
New columns added:
Tests /millionpeople
and Date
TODO: normalize the column names & data with previous version.
Date: April 7th 2020
Latest file: Tests_Conducted_07April2020.csv
Date: April 5th 2020
Latest file: Tests_Conducted_05April2020.csv
Please note that older files are not being removed. This should give an indication of the change in the number of tests conducted over time.
Date: March 31st 2020
Latest file: Tests_Conducted_31Mar2020.csv
Facebook
TwitterAbstractThe dataset provided here contains the efforts of independent data aggregation, quality control, and visualization of the University of Arizona (UofA) COVID-19 testing programs for the 2019 novel Coronavirus pandemic. The dataset is provided in the form of machine-readable tables in comma-separated value (.csv) and Microsoft Excel (.xlsx) formats.Additional InformationAs part of the UofA response to the 2019-20 Coronavirus pandemic, testing was conducted on students, staff, and faculty prior to start of the academic year and throughout the school year. These testings were done at the UofA Campus Health Center and through their instance program called "Test All Test Smart" (TATS). These tests identify active cases of SARS-nCoV-2 infections using the reverse transcription polymerase chain reaction (RT-PCR) test and the Antigen test. Because the Antigen test provided more rapid diagnosis, it was greatly used three weeks prior to the start of the Fall semester and throughout the academic year.As these tests were occurring, results were provided on the COVID-19 websites. First, beginning in early March, the Campus Health Alerts website reported the total number of positive cases. Later, numbers were provided for the total number of tests (March 12 and thereafter). According to the website, these numbers were updated daily for positive cases and weekly for total tests. These numbers were reported until early September where they were then included in the reporting for the TATS program.For the TATS program, numbers were provided through the UofA COVID-19 Update website. Initially on August 21, the numbers provided were the total number (July 31 and thereafter) of tests and positive cases. Later (August 25), additional information was provided where both PCR and Antigen testings were available. Here, the daily numbers were also included. On September 3, this website then provided both the Campus Health and TATS data. Here, PCR and Antigen were combined and referred to as "Total", and daily and cumulative numbers were provided.At this time, no official data dashboard was available until September 16, and aside from the information provided on these websites, the full dataset was not made publicly available. As such, the authors of this dataset independently aggregated data from multiple sources. These data were made publicly available through a Google Sheet with graphical illustration provided through the spreadsheet and on social media. The goal of providing the data and illustrations publicly was to provide factual information and to understand the infection rate of SARS-nCoV-2 in the UofA community.Because of differences in reported data between Campus Health and the TATS program, the dataset provides Campus Health numbers on September 3 and thereafter. TATS numbers are provided beginning on August 14, 2020.Description of Dataset ContentThe following terms are used in describing the dataset.1. "Report Date" is the date and time in which the website was updated to reflect the new numbers2. "Test Date" is to the date of testing/sample collection3. "Total" is the combination of Campus Health and TATS numbers4. "Daily" is to the new data associated with the Test Date5. "To Date (07/31--)" provides the cumulative numbers from 07/31 and thereafter6. "Sources" provides the source of information. The number prior to the colon refers to the number of sources. Here, "UACU" refers to the UA COVID-19 Update page, and "UARB" refers to the UA Weekly Re-Entry Briefing. "SS" and "WBM" refers to screenshot (manually acquired) and "Wayback Machine" (see Reference section for links) with initials provided to indicate which author recorded the values. These screenshots are available in the records.zip file.The dataset is distinguished where available by the testing program and the methods of testing. Where data are not available, calculations are made to fill in missing data (e.g., extrapolating backwards on the total number of tests based on daily numbers that are deemed reliable). Where errors are found (by comparing to previous numbers), those are reported on the above Google Sheet with specifics noted.For inquiries regarding the contents of this dataset, please contact the Corresponding Author listed in the README.txt file. Administrative inquiries (e.g., removal requests, trouble downloading, etc.) can be directed to data-management@arizona.edu
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data for paper entitled “Navigating Civic Space in a Time of COVID-19: The Case of Mozambique”, published by IESE in 2021.Between January to December 2020, there are several events that stimulated the debate regarding civic space in the context of COVID-19 in Mozambique. This database created for the ‘Navigating Civic Space in a time of COVID-19’ research project, illustrates some of the main events and debates in Mozambique. The database contains pieces of media and official documents focused on civic space and the pandemic in the country, basically showing scenarios such as restricting space, opening space, changing rules, changing discourse, civic action, popular protest, including their respective actors. Database created in Microsoft Office Excel.Abstract-This report builds on the Action for Empowerment and Accountability (A4EA) research project Navigating Civic Space, which seeks to analyze the extent to which COVID-19 is contributing to the opening or closing of space for civic action in three countries, namely Mozambique, Nigeria and Pakistan. Focusing on Mozambique, the report uses theoretical and methodological tools produced by the work stream and is essentially structured around the three main research questions: What is known about the civic space trend in Mozambique before the arrival of the COVID-19 pandemic?; How are different actors responding to the COVID-19 pandemic affecting the civic space in Mozambique?; What are the medium and long term implications for governance in Mozambique?
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset reports the daily cumulative number of patients affected by COVID-19 from 24th of February to 29th of April 2020 gathered from the following website https://github.com/pcm-dpc/COVID-19. Data are produced and published by the Italian Civil Protection Department. Excel was used to build a time-series sheet. Data were subsequently fitted on the basis of a logistic function to capture the onset of the different epidemic phases in each Italian region.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data record contains 7 data files in .xlsx file format.
The 6 .xlsx files are: Additional file 3_France_rem1.xlsx, Additional file 4_Germany_rem1.xlsx, Additional file 5_Italy_rem1.xlsx, Additional file 6_UK_rem1.xlsx, Additional file 7_US_rem1.xlsx and Additional file 8_Indonesia_rem1.xlsx.
The Excel files contain data on COVID-19 cumulative- and daily-number of cases in France, Germany, Italy, the UK, the US and Indonesia from the first day of official recording to June 30th, 2020, and their daily average products, marginal products, and production elasticities based on the data's exponential moving average (EMA). The Bayesian probabilities of meeting a policy target given a production elasticity range, are also included in the Excel files.
Study aims and methodology:
Physical distancing measures to control the COVID-19 pandemic come at a heavy short-term economic cost. But easing the measures too early carries a high risk of transmission re-escalations. To assess if physical distancing can be relaxed, a number of epidemic indicators are used, most notably the reproduction number R. Many developing countries, however, have limited capacities to estimate R accurately. This study aims to demonstrate how health production function can be used to assess the state of COVID-19 transmission and to determine a risk-based physical distancing relaxation policy.
The authors established a short-run health production function, representing the cumulative number of COVID-19 cases, from the standard Susceptible-Infected-Recovered (SIR) SIR model.
This study employed time as the health input variable. Because only one variable was used, the authors were dealing with a short-run health production function. The cumulative number of COVID-19 cases was employed as the health status output. For more details on the methodology, please read the related article.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains information on COVID-19 cases and deaths in 50 Muslim-majority countries compared to the 50 richest non-Muslim countries. The aim of the dataset is to investigate the differences in COVID-19 incidence between these two groups and to explore potential reasons for these disparities. The Muslim-majority countries in the sample had more than 50.0% Muslims, while the non-Muslim countries were selected based on their GDP, excluding any Muslim-majority countries listed. The data was collected on September 18, 2020, and includes information on the percentage of Muslim population per country, GDP, population count, and total number of COVID-19 cases and deaths. The dataset was transferred via an Excel spreadsheet on September 23, 2020 and analyzed using three different Average Treatment Methods (ATE) to validate the results. The dataset was published as a preprint and is associated with a manuscript titled "Fifty Muslim-majority countries have fewer COVID-19 cases and deaths than the 50 richest non-Muslim countries". The manuscript can be accessed via the following Link The sources of the data are also provided in the manuscript. The percentage of Muslim population per country was obtained from World Population Review and can be accessed at Link The GDP per country, population count, and total number of COVID-19 cases and deaths were obtained from Worldometers and can be accessed at Link
For more datasets, click here.
| Column Name | Description |
|---|---|
| Country: | Name of the country. |
| % Muslim Population: | The percentage of Muslim population in the country. |
| Top GDP Countries: | The top 50 countries in terms of GDP, excluding any Muslim-majority countries listed. |
| Country With A Muslim Majority: | Whether the country has a Muslim majority. |
| Population: | Population count of the country. |
| Total Cases: | Total number of COVID-19 cases in the country. |
| Total Deaths: | Total number of COVID-19 deaths in the country. |
| Total Cases/Pop: | Ratio of total COVID-19 cases to the population. |
| Total Deaths/Pop: | Ratio of total COVID-19 deaths to the population. |
| Total Deaths/Total Cases: | Ratio of total COVID-19 deaths to total COVID-19 cases in the country. |
- Comparative analysis: Researchers can use this dataset to compare the COVID-19 cases and deaths between Muslim-majority and non-Muslim countries. This can help to identify any disparities or differences in the response to the pandemic.
- Trend analysis: Over time, this dataset can be used to track the changes in the COVID-19 cases and deaths in Muslim-majority and non-Muslim countries. This can help to identify trends and patterns that may inform future research.
- Geographical analysis: This dataset can be used to explore the geographical distribution of COVID-19 cases and deaths in Muslim-majority and non-Muslim countries. This can help to identify hotspots and areas that may require special attention.
- Demographic analysis: Researchers can use the data to explore the impact of demographic factors on the spread and severity of the pandemic in Muslim-majority and non-Muslim countries. This can help to identify any patterns or correlations that may inform future research and policy decisions.
- Economic analysis: The data can be used to explore the economic impact of the pandemic on Muslim-majority and non-Muslim countries. By comparing the GDP and other economic indicators in these countries, researchers can identify any patterns or trends that may inform economic policy decisions.
if this dataset was used in your work or studies, please credit the original source Please Credit ↑ ⠀
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. More Information
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dengue is a significant public health problem in mostly tropical countries, including Timor-Leste. Dengue continues to draw attention from the health sector during the COVID-19 phenomenon. Therefore, the goal of this study is to evaluate the dengue incidence rate in comparison with the COVID-19 cumulative number and associated dengue risk factors, including the fatality rate of dengue infection in each municipality during the COVID-19 phenomenon in Timor-Leste, by applying the data processing program in Geographic Information Systems (GIS). A descriptive study using GIS was performed to provide a spatial-temporal mapping of dengue cases. Secondary data, which were sourced from the Department of Health Statistics Information under the Ministry of Health Timor-Leste, were collected for the period during the COVID-19 outbreak in 2020–2021. These data were grounded at the municipal (province) level. Quantum GIS and Microsoft Excel were used to analyze the data. During the COVID-19 outbreak (2020–2021), dengue spread nationwide. It was found that there was an increase in municipalities with high dengue cases and cumulative COVID-19 numbers. The high number of dengue cases associated with the COVID-19 cumulative number found in municipalities with an urban characteristic and in terms of severity, dengue fever (DF) is most commonly reported with a total of 1,556 cases and is followed by dengue hemorrhagic fever (DHF) and dengue shock syndrome (DSS). Most cases were reported in the months of the monsoon season, such as December, January, and March. Dengue GIS mapping helps understand the disease's presence and dynamic nature over time.
Facebook
TwitterBackgroundHybrid immunity (a combination of natural and vaccine-induced immunity) provides additional immune protection against the coronavirus disease 2019 (COVID-19) reinfection. Today, people are commonly infected and vaccinated; hence, hybrid immunity is the norm. However, the mitigation of the risk of Omicron variant reinfection by hybrid immunity and the durability of its protection remain uncertain. This meta-analysis aims to explore hybrid immunity to mitigate the risk of Omicron variant reinfection and its protective durability to provide a new evidence-based basis for the development and optimization of immunization strategies and improve the public’s awareness and participation in COVID-19 vaccination, especially in vulnerable and at-risk populations.MethodsEmbase, PubMed, Web of Science, Chinese National Knowledge Infrastructure, and Wanfang databases were searched for publicly available literature up to 10 June 2024. Two researchers independently completed the data extraction and risk of bias assessment and cross-checked each other. The Newcastle-Ottawa Scale assessed the risk of bias in included cohort and case–control studies, while criteria recommended by the Agency for Health Care Research and Quality (AHRQ) evaluated cross-sectional studies. The extracted data were synthesized in an Excel spreadsheet according to the predefined items to be collected. The outcome was Omicron variant reinfection, reported as an Odds Ratio (OR) with its 95% confidence interval (CI) and Protective Effectiveness (PE) with 95% CI. The data were pooled using a random- or fixed-effects model based on the I2 test. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines were followed.ResultsThirty-three articles were included. Compared with the natural immunity group, the hybrid immunity (booster vaccination) group had the highest level of mitigation in the risk of reinfection (OR = 0.43, 95% CI:0.34–0.56), followed by the complete vaccination group (OR = 0.58, 95% CI:0.45–0.74), and lastly the incomplete vaccination group (OR = 0.64, 95% CI:0.44–0.93). Compared with the complete vaccination-only group, the hybrid immunity (complete vaccination) group mitigated the risk of reinfection by 65% (OR = 0.35, 95% CI:0.27–0.46), and the hybrid immunity (booster vaccination) group mitigated the risk of reinfection by an additional 29% (OR = 0.71, 95% CI:0.61–0.84) compared with the hybrid immunity (complete vaccination) group. The effectiveness of hybrid immunity (incomplete vaccination) in mitigating the risk of reinfection was 37.88% (95% CI, 28.88–46.89%) within 270–364 days, and decreased to 33.23%% (95% CI, 23.80–42.66%) within 365–639 days; whereas, the effectiveness after complete vaccination was 54.36% (95% CI, 50.82–57.90%) within 270–364 days, and the effectiveness of booster vaccination was 73.49% (95% CI, 68.95–78.04%) within 90–119 days.ConclusionHybrid immunity was significantly more protective than natural or vaccination-induced immunity, and booster doses were associated with enhanced protection against Omicron. Although its protective effects waned over time, vaccination remains a crucial measure for controlling COVID-19.Systematic review registrationhttps://www.crd.york.ac.uk/PROSPERO/, identifier, CRD42024539682.
Facebook
TwitterThese datasets are framed on predicting the short-term electricity, this forecasting problem is known in the research field as short-term load forecasting (STLF). These datasets address the STLF problem for the Panama power system, in which the forecasting horizon is one week, with hourly steps, which is a total of 168 hours. These datasets are useful to train and test forecasting models and compare their results with the power system operator official forecast (take a look at real-time electricity load). The datasets include historical load, a vast set of weather variables, holidays, and historical load weekly forecast features. More information regarding these datasets context, a literature review of forecasting techniques suitable for this dataset, and results after testing a set of Machine Learning; are available in the article Short-Term Electricity Load Forecasting with Machine Learning. (Aguilar Madrid, E.; Antonio, N. Short-Term Electricity Load Forecasting with Machine Learning. Information 2021, 12, 50. https://doi.org/10.3390/info12020050)
The main objectives around these datasets are: 1. Evaluate the power system operator official forecasts (weekly pre-dispatch forecast) against the real load, on weekly basis. 2. Develop, train and test forecasting models to improve the operator official weekly forecasts (168 hours), in different scenarios.
The following considerations should be kept to compare forecasting results with the weekly pre-dispatch forecast: 1. Saturday is the first day of each weekly forecast; for instance, Friday is the last day. 2. The first full-week starting on Saturday should be considered as the first week of the year, to number the weeks. 3. A 72 hours gap of unseen records should be considered before the first day to forecast. In other words, next week forecast should be done with records until each Tuesday last hour. 4. Make sure to train and test keeping the chronological order of records.
Data sources provide hourly records, from January 2015 until June 2020. The data composition is the following: 1. Historical electricity load, available on daily post-dispatch reports, from the grid operator (ETESA, CND). 2. Historical weekly forecasts available on weekly pre-dispatch reports, both from ETESA, CND. 3. Calendar information related to school periods, from Panama's Ministry of Education, published in official gazette. 4. Calendar information related to holidays, from "When on Earth?" website. 5. Weather variables, such as temperature, relative humidity, precipitation, and wind speed, for three main cities in Panama, from Earthdata.
The original data sources provide the post-dispatch electricity load in individual Excel files on a daily basis and weekly pre-dispatch electricity load forecast data in individual Excel files on a weekly basis, both with hourly granularity. Holidays and school periods data is sparse, along with websites and PDF files. Weather data is available on daily NetCDF files.
For simplicity, the published datasets are already pre-processed by merging all data sources on the date-time index: 1. A CSV file containing all records in a single continuous dataset with all variables. 2. A CSV file containing the load forecast from weekly pre-dispatch reports. 3. Two Excel files containing suggested regressors and 14 pairs of training/testing datasets as described in the PDF file.
These 14 pairs of raining/testing datasets are selected according to these testing criteria: 1. A testing week for each month before the lockdown due to COVID-19. 2. Select testing weeks containing holidays. 3. Plus, two testing weeks during the lockdown.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The data contained in the table describes COVID-19 in Canada in terms of number of cases and deaths at the provincial and national levels from January 31, 2020 to present time. It also describes the number of tests performed and the number of people recovered. The values displayed in the table are provided by the Public Health Infobase, managed by the Health Promotion and Chronic Disease Prevention Branch (HPCDPB) of the Public Health Agency of Canada (PHAC). The values are updated daily.