100+ datasets found
  1. n

    Coronavirus (Covid-19) Data in the United States

    • nytimes.com
    • openicpsr.org
    • +2more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
    Explore at:
    Dataset provided by
    New York Times
    Description

    The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

    Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

    We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

    The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

  2. w

    Afrobarometer Survey 1 1999-2000, Merged 7 Country - Botswana, Lesotho,...

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Apr 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Institute for Democracy in South Africa (IDASA) (2021). Afrobarometer Survey 1 1999-2000, Merged 7 Country - Botswana, Lesotho, Malawi, Namibia, South Africa, Zambia, Zimbabwe [Dataset]. https://microdata.worldbank.org/index.php/catalog/889
    Explore at:
    Dataset updated
    Apr 27, 2021
    Dataset provided by
    Ghana Centre for Democratic Development (CDD-Ghana)
    Institute for Democracy in South Africa (IDASA)
    Michigan State University (MSU)
    Time period covered
    1999 - 2000
    Area covered
    Africa, Malawi, Zambia, Namibia, South Africa, Zimbabwe, Botswana, Lesotho
    Description

    Abstract

    Round 1 of the Afrobarometer survey was conducted from July 1999 through June 2001 in 12 African countries, to solicit public opinion on democracy, governance, markets, and national identity. The full 12 country dataset released was pieced together out of different projects, Round 1 of the Afrobarometer survey,the old Southern African Democracy Barometer, and similar surveys done in West and East Africa.

    The 7 country dataset is a subset of the Round 1 survey dataset, and consists of a combined dataset for the 7 Southern African countries surveyed with other African countries in Round 1, 1999-2000 (Botswana, Lesotho, Malawi, Namibia, South Africa, Zambia and Zimbabwe). It is a useful dataset because, in contrast to the full 12 country Round 1 dataset, all countries in this dataset were surveyed with the identical questionnaire

    Geographic coverage

    Botswana Lesotho Malawi Namibia South Africa Zambia Zimbabwe

    Analysis unit

    Basic units of analysis that the study investigates include: individuals and groups

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    A new sample has to be drawn for each round of Afrobarometer surveys. Whereas the standard sample size for Round 3 surveys will be 1200 cases, a larger sample size will be required in societies that are extremely heterogeneous (such as South Africa and Nigeria), where the sample size will be increased to 2400. Other adaptations may be necessary within some countries to account for the varying quality of the census data or the availability of census maps.

    The sample is designed as a representative cross-section of all citizens of voting age in a given country. The goal is to give every adult citizen an equal and known chance of selection for interview. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible. A randomly selected sample of 1200 cases allows inferences to national adult populations with a margin of sampling error of no more than plus or minus 2.5 percent with a confidence level of 95 percent. If the sample size is increased to 2400, the confidence interval shrinks to plus or minus 2 percent.

    Sample Universe

    The sample universe for Afrobarometer surveys includes all citizens of voting age within the country. In other words, we exclude anyone who is not a citizen and anyone who has not attained this age (usually 18 years) on the day of the survey. Also excluded are areas determined to be either inaccessible or not relevant to the study, such as those experiencing armed conflict or natural disasters, as well as national parks and game reserves. As a matter of practice, we have also excluded people living in institutionalized settings, such as students in dormitories and persons in prisons or nursing homes.

    What to do about areas experiencing political unrest? On the one hand we want to include them because they are politically important. On the other hand, we want to avoid stretching out the fieldwork over many months while we wait for the situation to settle down. It was agreed at the 2002 Cape Town Planning Workshop that it is difficult to come up with a general rule that will fit all imaginable circumstances. We will therefore make judgments on a case-by-case basis on whether or not to proceed with fieldwork or to exclude or substitute areas of conflict. National Partners are requested to consult Core Partners on any major delays, exclusions or substitutions of this sort.

    Sample Design

    The sample design is a clustered, stratified, multi-stage, area probability sample.

    To repeat the main sampling principle, the objective of the design is to give every sample element (i.e. adult citizen) an equal and known chance of being chosen for inclusion in the sample. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible.

    In a series of stages, geographically defined sampling units of decreasing size are selected. To ensure that the sample is representative, the probability of selection at various stages is adjusted as follows:

    The sample is stratified by key social characteristics in the population such as sub-national area (e.g. region/province) and residential locality (urban or rural). The area stratification reduces the likelihood that distinctive ethnic or language groups are left out of the sample. And the urban/rural stratification is a means to make sure that these localities are represented in their correct proportions. Wherever possible, and always in the first stage of sampling, random sampling is conducted with probability proportionate to population size (PPPS). The purpose is to guarantee that larger (i.e., more populated) geographical units have a proportionally greater probability of being chosen into the sample. The sampling design has four stages

    A first-stage to stratify and randomly select primary sampling units;

    A second-stage to randomly select sampling start-points;

    A third stage to randomly choose households;

    A final-stage involving the random selection of individual respondents

    We shall deal with each of these stages in turn.

    STAGE ONE: Selection of Primary Sampling Units (PSUs)

    The primary sampling units (PSU's) are the smallest, well-defined geographic units for which reliable population data are available. In most countries, these will be Census Enumeration Areas (or EAs). Most national census data and maps are broken down to the EA level. In the text that follows we will use the acronyms PSU and EA interchangeably because, when census data are employed, they refer to the same unit.

    We strongly recommend that NIs use official national census data as the sampling frame for Afrobarometer surveys. Where recent or reliable census data are not available, NIs are asked to inform the relevant Core Partner before they substitute any other demographic data. Where the census is out of date, NIs should consult a demographer to obtain the best possible estimates of population growth rates. These should be applied to the outdated census data in order to make projections of population figures for the year of the survey. It is important to bear in mind that population growth rates vary by area (region) and (especially) between rural and urban localities. Therefore, any projected census data should include adjustments to take such variations into account.

    Indeed, we urge NIs to establish collegial working relationships within professionals in the national census bureau, not only to obtain the most recent census data, projections, and maps, but to gain access to sampling expertise. NIs may even commission a census statistician to draw the sample to Afrobarometer specifications, provided that provision for this service has been made in the survey budget.

    Regardless of who draws the sample, the NIs should thoroughly acquaint themselves with the strengths and weaknesses of the available census data and the availability and quality of EA maps. The country and methodology reports should cite the exact census data used, its known shortcomings, if any, and any projections made from the data. At minimum, the NI must know the size of the population and the urban/rural population divide in each region in order to specify how to distribute population and PSU's in the first stage of sampling. National investigators should obtain this written data before they attempt to stratify the sample.

    Once this data is obtained, the sample population (either 1200 or 2400) should be stratified, first by area (region/province) and then by residential locality (urban or rural). In each case, the proportion of the sample in each locality in each region should be the same as its proportion in the national population as indicated by the updated census figures.

    Having stratified the sample, it is then possible to determine how many PSU's should be selected for the country as a whole, for each region, and for each urban or rural locality.

    The total number of PSU's to be selected for the whole country is determined by calculating the maximum degree of clustering of interviews one can accept in any PSU. Because PSUs (which are usually geographically small EAs) tend to be socially homogenous we do not want to select too many people in any one place. Thus, the Afrobarometer has established a standard of no more than 8 interviews per PSU. For a sample size of 1200, the sample must therefore contain 150 PSUs/EAs (1200 divided by 8). For a sample size of 2400, there must be 300 PSUs/EAs.

    These PSUs should then be allocated proportionally to the urban and rural localities within each regional stratum of the sample. Let's take a couple of examples from a country with a sample size of 1200. If the urban locality of Region X in this country constitutes 10 percent of the current national population, then the sample for this stratum should be 15 PSUs (calculated as 10 percent of 150 PSUs). If the rural population of Region Y constitutes 4 percent of the current national population, then the sample for this stratum should be 6 PSU's.

    The next step is to select particular PSUs/EAs using random methods. Using the above example of the rural localities in Region Y, let us say that you need to pick 6 sample EAs out of a census list that contains a total of 240 rural EAs in Region Y. But which 6? If the EAs created by the national census bureau are of equal or roughly equal population size, then selection is relatively straightforward. Just number all EAs consecutively, then make six selections using a table of random numbers. This procedure, known as simple random sampling (SRS), will

  3. w

    Dataset of artists who created We Love You

    • workwithdata.com
    Updated May 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of artists who created We Love You [Dataset]. https://www.workwithdata.com/datasets/artists?f=1&fcol0=j0-artwork&fop0=%3D&fval0=We+Love+You&j=1&j0=artworks
    Explore at:
    Dataset updated
    May 8, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about artists. It has 1 row and is filtered where the artworks is We Love You. It features 9 columns including birth date, death date, country, and gender.

  4. O

    COVID-19 case rate per 100,000 population and percent test positivity in the...

    • data.ct.gov
    • catalog.data.gov
    application/rdfxml +5
    Updated Jun 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Public Health (2022). COVID-19 case rate per 100,000 population and percent test positivity in the last 14 days by town - ARCHIVE [Dataset]. https://data.ct.gov/Health-and-Human-Services/COVID-19-case-rate-per-100-000-population-and-perc/hree-nys2
    Explore at:
    application/rssxml, xml, csv, json, tsv, application/rdfxmlAvailable download formats
    Dataset updated
    Jun 23, 2022
    Dataset authored and provided by
    Department of Public Health
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve.

    The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj.

    The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 .

    The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 .

    The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed.

    This dataset includes a count and rate per 100,000 population for COVID-19 cases, a count of COVID-19 molecular diagnostic tests, and a percent positivity rate for tests among people living in community settings for the previous two-week period. Dates are based on date of specimen collection (cases and positivity).

    A person is considered a new case only upon their first COVID-19 testing result because a case is defined as an instance or bout of illness. If they are tested again subsequently and are still positive, it still counts toward the test positivity metric but they are not considered another case.

    Percent positivity is calculated as the number of positive tests among community residents conducted during the 14 days divided by the total number of positive and negative tests among community residents during the same period. If someone was tested more than once during that 14 day period, then those multiple test results (regardless of whether they were positive or negative) are included in the calculation.

    These case and test counts do not include cases or tests among people residing in congregate settings, such as nursing homes, assisted living facilities, or correctional facilities.

    These data are updated weekly and reflect the previous two full Sunday-Saturday (MMWR) weeks (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf).

    DPH note about change from 7-day to 14-day metrics: Prior to 10/15/2020, these metrics were calculated using a 7-day average rather than a 14-day average. The 7-day metrics are no longer being updated as of 10/15/2020 but the archived dataset can be accessed here: https://data.ct.gov/Health-and-Human-Services/COVID-19-case-rate-per-100-000-population-and-perc/s22x-83rd

    As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well.

    With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county).

    Additional notes: As of 11/5/2020, CT DPH has added antigen testing for SARS-CoV-2 to reported test counts in this dataset. The tests included in this dataset include both molecular and antigen datasets. Molecular tests reported include polymerase chain reaction (PCR) and nucleic acid amplicfication (NAAT) tests.

    The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used.

    Data suppression is applied when the rate is <5 cases per 100,000 or if there are <5 cases within the town. Information on why data suppression rules are applied can be found online here: https://www.cdc.gov/cancer/uscs/technical_notes/stat_methods/suppression.htm

  5. Z

    MGD: Music Genre Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danilo B. Seufitelli (2021). MGD: Music Genre Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4778562
    Explore at:
    Dataset updated
    May 28, 2021
    Dataset provided by
    Gabriel P. Oliveira
    Mirella M. Moro
    Mariana O. Silva
    Danilo B. Seufitelli
    Anisio Lacerda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MGD: Music Genre Dataset

    Over recent years, the world has seen a dramatic change in the way people consume music, moving from physical records to streaming services. Since 2017, such services have become the main source of revenue within the global recorded music market. Therefore, this dataset is built by using data from Spotify. It provides a weekly chart of the 200 most streamed songs for each country and territory it is present, as well as an aggregated global chart.

    Considering that countries behave differently when it comes to musical tastes, we use chart data from global and regional markets from January 2017 to December 2019, considering eight of the top 10 music markets according to IFPI: United States (1st), Japan (2nd), United Kingdom (3rd), Germany (4th), France (5th), Canada (8th), Australia (9th), and Brazil (10th).

    We also provide information about the hit songs and artists present in the charts, such as all collaborating artists within a song (since the charts only provide the main ones) and their respective genres, which is the core of this work. MGD also provides data about musical collaboration, as we build collaboration networks based on artist partnerships in hit songs. Therefore, this dataset contains:

    Genre Networks: Success-based genre collaboration networks

    Genre Mapping: Genre mapping from Spotify genres to super-genres

    Artist Networks: Success-based artist collaboration networks

    Artists: Some artist data

    Hit Songs: Hit Song data and features

    Charts: Enhanced data from Spotify Weekly Top 200 Charts

    This dataset was originally built for a conference paper at ISMIR 2020. If you make use of the dataset, please also cite the following paper:

    Gabriel P. Oliveira, Mariana O. Silva, Danilo B. Seufitelli, Anisio Lacerda, and Mirella M. Moro. Detecting Collaboration Profiles in Success-based Music Genre Networks. In Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR 2020), 2020.

    @inproceedings{ismir/OliveiraSSLM20, title = {Detecting Collaboration Profiles in Success-based Music Genre Networks}, author = {Gabriel P. Oliveira and Mariana O. Silva and Danilo B. Seufitelli and Anisio Lacerda and Mirella M. Moro}, booktitle = {21st International Society for Music Information Retrieval Conference} pages = {726--732}, year = {2020} }

  6. f

    ORBIT: A real-world few-shot dataset for teachable object recognition...

    • city.figshare.com
    bin
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann (2023). ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision [Dataset]. http://doi.org/10.25383/city.14294597.v3
    Explore at:
    binAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    City, University of London
    Authors
    Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Object recognition predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset, grounded in a real-world application of teachable object recognizers for people who are blind/low vision. We provide a full, unfiltered dataset of 4,733 videos of 588 objects recorded by 97 people who are blind/low-vision on their mobile phones, and a benchmark dataset of 3,822 videos of 486 objects collected by 77 collectors. The code for loading the dataset, computing all benchmark metrics, and running the baseline models is available at https://github.com/microsoft/ORBIT-DatasetThis version comprises several zip files:- train, validation, test: benchmark dataset, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS- other: data not in the benchmark set, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS (please note that the train, validation, test, and other files make up the unfiltered dataset)- *_224: as for the benchmark, but static individual frames are scaled down to 224 pixels.- *_unfiltered_videos: full unfiltered dataset, organised by collector, in mp4 format.

  7. Financial News Sentiment Classification Dataset

    • kaggle.com
    Updated Jan 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PercyZheng (2022). Financial News Sentiment Classification Dataset [Dataset]. https://www.kaggle.com/datasets/percyzheng/sentiment-classification-selflabel-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 17, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    PercyZheng
    Description

    Context

    Due to the limited sentiment classification dataset online, I labeled more than 200 news title(from well-known financial websites such as CNBC, Financial times etc.) with 3 sentiment categories. This dataset contains relative new information which may be helpful for you in predicting new trends such as COVID-19). The standard that how I labeled is based on the other two already exist datasets. So when you judge the sentences you might have some different feelings. Hope if you also do this job you can share your data with us if you can! Also looking forward to have a thumb up from you!

  8. d

    COVID-19 case rate per 100,000 population and percent test positivity in the...

    • datasets.ai
    • data.ct.gov
    • +1more
    23, 40, 55, 8
    Updated Sep 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State of Connecticut (2024). COVID-19 case rate per 100,000 population and percent test positivity in the last 7 days by town - ARCHIVE [Dataset]. https://datasets.ai/datasets/covid-19-case-rate-per-100000-population-and-percent-test-positivity-in-the-last-7-days-by
    Explore at:
    23, 55, 40, 8Available download formats
    Dataset updated
    Sep 8, 2024
    Dataset authored and provided by
    State of Connecticut
    Description

    DPH note about change from 7-day to 14-day metrics: As of 10/15/2020, this dataset is no longer being updated. Starting on 10/15/2020, these metrics will be calculated using a 14-day average rather than a 7-day average. The new dataset using 14-day averages can be accessed here: https://data.ct.gov/Health-and-Human-Services/COVID-19-case-rate-per-100-000-population-and-perc/hree-nys2

    As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well.

    With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county).

    This dataset includes a weekly count and weekly rate per 100,000 population for COVID-19 cases, a weekly count of COVID-19 PCR diagnostic tests, and a weekly percent positivity rate for tests among people living in community settings. Dates are based on date of specimen collection (cases and positivity).

    A person is considered a new case only upon their first COVID-19 testing result because a case is defined as an instance or bout of illness. If they are tested again subsequently and are still positive, it still counts toward the test positivity metric but they are not considered another case.

    These case and test counts do not include cases or tests among people residing in congregate settings, such as nursing homes, assisted living facilities, or correctional facilities.

    These data are updated weekly; the previous week period for each dataset is the previous Sunday-Saturday, known as an MMWR week (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf). The date listed is the date the dataset was last updated and corresponds to a reporting period of the previous MMWR week. For instance, the data for 8/20/2020 corresponds to a reporting period of 8/9/2020-8/15/2020.

    Notes: 9/25/2020: Data for Mansfield and Middletown for the week of Sept 13-19 were unavailable at the time of reporting due to delays in lab reporting.

  9. w

    Dataset of artists who created Graaf Willem I van Holland

    • workwithdata.com
    Updated May 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of artists who created Graaf Willem I van Holland [Dataset]. https://www.workwithdata.com/datasets/artists?f=1&fcol0=j0-artwork&fop0=%3D&fval0=Graaf+Willem+I+van+Holland&j=1&j0=artworks
    Explore at:
    Dataset updated
    May 8, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about artists. It has 1 row and is filtered where the artworks is Graaf Willem I van Holland. It features 9 columns including birth date, death date, country, and gender.

  10. d

    The Marshall Project: COVID Cases in Prisons

    • data.world
    csv, zip
    Updated Apr 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Associated Press (2023). The Marshall Project: COVID Cases in Prisons [Dataset]. https://data.world/associatedpress/marshall-project-covid-cases-in-prisons
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Apr 6, 2023
    Authors
    The Associated Press
    Time period covered
    Jul 31, 2019 - Aug 1, 2021
    Description

    Overview

    The Marshall Project, the nonprofit investigative newsroom dedicated to the U.S. criminal justice system, has partnered with The Associated Press to compile data on the prevalence of COVID-19 infection in prisons across the country. The Associated Press is sharing this data as the most comprehensive current national source of COVID-19 outbreaks in state and federal prisons.

    Lawyers, criminal justice reform advocates and families of the incarcerated have worried about what was happening in prisons across the nation as coronavirus began to take hold in the communities outside. Data collected by The Marshall Project and AP shows that hundreds of thousands of prisoners, workers, correctional officers and staff have caught the illness as prisons became the center of some of the country’s largest outbreaks. And thousands of people — most of them incarcerated — have died.

    In December, as COVID-19 cases spiked across the U.S., the news organizations also shared cumulative rates of infection among prison populations, to better gauge the total effects of the pandemic on prison populations. The analysis found that by mid-December, one in five state and federal prisoners in the United States had tested positive for the coronavirus -- a rate more than four times higher than the general population.

    This data, which is updated weekly, is an effort to track how those people have been affected and where the crisis has hit the hardest.

    Methodology and Caveats

    The data tracks the number of COVID-19 tests administered to people incarcerated in all state and federal prisons, as well as the staff in those facilities. It is collected on a weekly basis by Marshall Project and AP reporters who contact each prison agency directly and verify published figures with officials.

    Each week, the reporters ask every prison agency for the total number of coronavirus tests administered to its staff members and prisoners, the cumulative number who tested positive among staff and prisoners, and the numbers of deaths for each group.

    The time series data is aggregated to the system level; there is one record for each prison agency on each date of collection. Not all departments could provide data for the exact date requested, and the data indicates the date for the figures.

    To estimate the rate of infection among prisoners, we collected population data for each prison system before the pandemic, roughly in mid-March, in April, June, July, August, September and October. Beginning the week of July 28, we updated all prisoner population numbers, reflecting the number of incarcerated adults in state or federal prisons. Prior to that, population figures may have included additional populations, such as prisoners housed in other facilities, which were not captured in our COVID-19 data. In states with unified prison and jail systems, we include both detainees awaiting trial and sentenced prisoners.

    To estimate the rate of infection among prison employees, we collected staffing numbers for each system. Where current data was not publicly available, we acquired other numbers through our reporting, including calling agencies or from state budget documents. In six states, we were unable to find recent staffing figures: Alaska, Hawaii, Kentucky, Maryland, Montana, Utah.

    To calculate the cumulative COVID-19 impact on prisoner and prison worker populations, we aggregated prisoner and staff COVID case and death data up through Dec. 15. Because population snapshots do not account for movement in and out of prisons since March, and because many systems have significantly slowed the number of new people being sent to prison, it’s difficult to estimate the total number of people who have been held in a state system since March. To be conservative, we calculated our rates of infection using the largest prisoner population snapshots we had during this time period.

    As with all COVID-19 data, our understanding of the spread and impact of the virus is limited by the availability of testing. Epidemiology and public health experts say that aside from a few states that have recently begun aggressively testing in prisons, it is likely that there are more cases of COVID-19 circulating undetected in facilities. Sixteen prison systems, including the Federal Bureau of Prisons, would not release information about how many prisoners they are testing.

    Corrections departments in Indiana, Kansas, Montana, North Dakota and Wisconsin report coronavirus testing and case data for juvenile facilities; West Virginia reports figures for juvenile facilities and jails. For consistency of comparison with other state prison systems, we removed those facilities from our data that had been included prior to July 28. For these states we have also removed staff data. Similarly, Pennsylvania’s coronavirus data includes testing and cases for those who have been released on parole. We removed these tests and cases for prisoners from the data prior to July 28. The staff cases remain.

    About the Data

    There are four tables in this data:

    • covid_prison_cases.csv contains weekly time series data on tests, infections and deaths in prisons. The first dates in the table are on March 26. Any questions that a prison agency could not or would not answer are left blank.

    • prison_populations.csv contains snapshots of the population of people incarcerated in each of these prison systems for whom data on COVID testing and cases are available. This varies by state and may not always be the entire number of people incarcerated in each system. In some states, it may include other populations, such as those on parole or held in state-run jails. This data is primarily for use in calculating rates of testing and infection, and we would not recommend using these numbers to compare the change in how many people are being held in each prison system.

    • staff_populations.csv contains a one-time, recent snapshot of the headcount of workers for each prison agency, collected as close to April 15 as possible.

    • covid_prison_rates.csv contains the rates of cases and deaths for prisoners. There is one row for every state and federal prison system and an additional row with the National totals.

    Queries

    The Associated Press and The Marshall Project have created several queries to help you use this data:

    Get your state's prison COVID data: Provides each week's data from just your state and calculates a cases-per-100000-prisoners rate, a deaths-per-100000-prisoners rate, a cases-per-100000-workers rate and a deaths-per-100000-workers rate here

    Rank all systems' most recent data by cases per 100,000 prisoners here

    Find what percentage of your state's total cases and deaths -- as reported by Johns Hopkins University -- occurred within the prison system here

    Attribution

    In stories, attribute this data to: “According to an analysis of state prison cases by The Marshall Project, a nonprofit investigative newsroom dedicated to the U.S. criminal justice system, and The Associated Press.”

    Contributors

    Many reporters and editors at The Marshall Project and The Associated Press contributed to this data, including: Katie Park, Tom Meagher, Weihua Li, Gabe Isman, Cary Aspinwall, Keri Blakinger, Jake Bleiberg, Andrew R. Calderón, Maurice Chammah, Andrew DeMillo, Eli Hager, Jamiles Lartey, Claudia Lauer, Nicole Lewis, Humera Lodhi, Colleen Long, Joseph Neff, Michelle Pitcher, Alysia Santo, Beth Schwartzapfel, Damini Sharma, Colleen Slevin, Christie Thompson, Abbie VanSickle, Adria Watson, Andrew Welsh-Huggins.

    Questions

    If you have questions about the data, please email The Marshall Project at info+covidtracker@themarshallproject.org or file a Github issue.

    To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

  11. Human Resource Data Set (The Company)

    • kaggle.com
    Updated Jan 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koluit (2025). Human Resource Data Set (The Company) [Dataset]. https://www.kaggle.com/datasets/koluit/human-resource-data-set-the-company/versions/940
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 10, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Koluit
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    Similar to others who have created HR data sets, we felt that the lack of data out there for HR was limiting. It is very hard for someone to test new systems or learn People Analytics in the HR space. The only dataset most HR practitioners have is their real employee data and there are a lot of reasons why you would not want to use that when experimenting. We hope that by providing this dataset with an evergrowing variation of data points, others can learn and grow their HR data analytics and systems knowledge.

    Some example test cases where someone might use this dataset:

    HR Technology Testing and Mock-Ups Engagement survey tools HCM tools BI Tools Learning To Code For People Analytics Python/R/SQL HR Tech and People Analytics Educational Courses/Tools

    Content

    The core data CompanyData.txt has the basic demographic data about a worker. We treat this as the core data that you can join future data sets to.

    Please read the Readme.md for additional information about this along with the Changelog for additional updates as they are made.

    Acknowledgements

    Initial names, addresses, and ages were generated using FakenameGenerator.com. All additional details including Job, compensation, and additional data sets were created by the Koluit team using random generation in Excel.

    Inspiration

    Our hope is this data is used in the HR or Research space to experiment and learn using HR data. Some examples that we hope this data will be used are listed above.

    Contact Us

    Have any suggestions for additions to the data? See any issues with our data? Want to use it for your project? Please reach out to us! https://koluit.com/ ryan@koluit.com

  12. Amount of data created, consumed, and stored 2010-2023, with forecasts to...

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 2024
    Area covered
    Worldwide
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.

  13. A

    ‘COVID-19 case rate per 100,000 population and percent test positivity in...

    • analyst-2.ai
    Updated Feb 13, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘COVID-19 case rate per 100,000 population and percent test positivity in the last 14 days by town’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-covid-19-case-rate-per-100000-population-and-percent-test-positivity-in-the-last-14-days-by-town-d334/760f38b9/?iid=006-223&v=presentation
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘COVID-19 case rate per 100,000 population and percent test positivity in the last 14 days by town’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/d5e87e00-5f12-4c5e-9fb7-9718e5dbef35 on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    This dataset includes a count and rate per 100,000 population for COVID-19 cases, a count of COVID-19 molecular diagnostic tests, and a percent positivity rate for tests among people living in community settings for the previous two-week period. Dates are based on date of specimen collection (cases and positivity).

    A person is considered a new case only upon their first COVID-19 testing result because a case is defined as an instance or bout of illness. If they are tested again subsequently and are still positive, it still counts toward the test positivity metric but they are not considered another case.

    Percent positivity is calculated as the number of positive tests among community residents conducted during the 14 days divided by the total number of positive and negative tests among community residents during the same period. If someone was tested more than once during that 14 day period, then those multiple test results (regardless of whether they were positive or negative) are included in the calculation.

    These case and test counts do not include cases or tests among people residing in congregate settings, such as nursing homes, assisted living facilities, or correctional facilities.

    These data are updated weekly and reflect the previous two full Sunday-Saturday (MMWR) weeks (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf).

    DPH note about change from 7-day to 14-day metrics: Prior to 10/15/2020, these metrics were calculated using a 7-day average rather than a 14-day average. The 7-day metrics are no longer being updated as of 10/15/2020 but the archived dataset can be accessed here: https://data.ct.gov/Health-and-Human-Services/COVID-19-case-rate-per-100-000-population-and-perc/s22x-83rd

    As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well.

    With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county).

    Additional notes: As of 11/5/2020, CT DPH has added antigen testing for SARS-CoV-2 to reported test counts in this dataset. The tests included in this dataset include both molecular and antigen datasets. Molecular tests reported include polymerase chain reaction (PCR) and nucleic acid amplicfication (NAAT) tests.

    The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used.

    Data suppression is applied when the rate is <5 cases per 100,000 or if there are <5 cases within the town. Information on why data suppression rules are applied can be found online here: https://www.cdc.gov/cancer/uscs/technical_notes/stat_methods/suppression.htm

    --- Original source retains full ownership of the source dataset ---

  14. Z

    Empathy dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathematical Research Data Initiative (2024). Empathy dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7683906
    Explore at:
    Dataset updated
    Dec 18, 2024
    Dataset authored and provided by
    Mathematical Research Data Initiative
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The database for this study (Briganti et al. 2018; the same for the Braun study analysis) was composed of 1973 French-speaking students in several universities or schools for higher education in the following fields: engineering (31%), medicine (18%), nursing school (16%), economic sciences (15%), physiotherapy, (4%), psychology (11%), law school (4%) and dietetics (1%). The subjects were 17 to 25 years old (M = 19.6 years, SD = 1.6 years), 57% were females and 43% were males. Even though the full dataset was composed of 1973 participants, only 1270 answered the full questionnaire: missing data are handled using pairwise complete observations in estimating a Gaussian Graphical Model, meaning that all available information from every subject are used.

    The feature set is composed of 28 items meant to assess the four following components: fantasy, perspective taking, empathic concern and personal distress. In the questionnaire, the items are mixed; reversed items (items 3, 4, 7, 12, 13, 14, 15, 18, 19) are present. Items are scored from 0 to 4, where “0” means “Doesn’t describe me very well” and “4” means “Describes me very well”; reverse-scoring is calculated afterwards. The questionnaires were anonymized. The reanalysis of the database in this retrospective study was approved by the ethical committee of the Erasmus Hospital.

    Size: A dataset of size 1973*28

    Number of features: 28

    Ground truth: No

    Type of Graph: Mixed graph

    The following gives the description of the variables:

    Feature FeatureLabel Domain Item meaning from Davis 1980

    001 1FS Green I daydream and fantasize, with some regularity, about things that might happen to me.

    002 2EC Purple I often have tender, concerned feelings for people less fortunate than me.

    003 3PT_R Yellow I sometimes find it difficult to see things from the “other guy’s” point of view.

    004 4EC_R Purple Sometimes I don’t feel very sorry for other people when they are having problems.

    005 5FS Green I really get involved with the feelings of the characters in a novel.

    006 6PD Red In emergency situations, I feel apprehensive and ill-at-ease.

    007 7FS_R Green I am usually objective when I watch a movie or play, and I don’t often get completely caught up in it.(Reversed)

    008 8PT Yellow I try to look at everybody’s side of a disagreement before I make a decision.

    009 9EC Purple When I see someone being taken advantage of, I feel kind of protective towards them.

    010 10PD Red I sometimes feel helpless when I am in the middle of a very emotional situation.

    011 11PT Yellow sometimes try to understand my friends better by imagining how things look from their perspective

    012 12FS_R Green Becoming extremely involved in a good book or movie is somewhat rare for me. (Reversed)

    013 13PD_R Red When I see someone get hurt, I tend to remain calm. (Reversed)

    014 14EC_R Purple Other people’s misfortunes do not usually disturb me a great deal. (Reversed)

    015 15PT_R Yellow If I’m sure I’m right about something, I don’t waste much time listening to other people’s arguments. (Reversed)

    016 16FS Green After seeing a play or movie, I have felt as though I were one of the characters.

    017 17PD Red Being in a tense emotional situation scares me.

    018 18EC_R Purple When I see someone being treated unfairly, I sometimes don’t feel very much pity for them. (Reversed)

    019 19PD_R Red I am usually pretty effective in dealing with emergencies. (Reversed)

    020 20FS Green I am often quite touched by things that I see happen.

    021 21PT Yellow I believe that there are two sides to every question and try to look at them both.

    022 22EC Purple I would describe myself as a pretty soft-hearted person.

    023 23FS Green When I watch a good movie, I can very easily put myself in the place of a leading character.

    024 24PD Red I tend to lose control during emergencies.

    025 25PT Yellow When I’m upset at someone, I usually try to “put myself in his shoes” for a while.

    026 26FS Green When I am reading an interesting story or novel, I imagine how I would feel if the events in the story were happening to me.

    027 27PD Red When I see someone who badly needs help in an emergency, I go to pieces.

    028 28PT Yellow Before criticizing somebody, I try to imagine how I would feel if I were in their place

    More information about the dataset is contained in empathy_description.html file.

  15. n

    FOI-01915 - Datasets - Open Data Portal

    • opendata.nhsbsa.net
    Updated Jun 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). FOI-01915 - Datasets - Open Data Portal [Dataset]. https://opendata.nhsbsa.net/dataset/foi-01915
    Explore at:
    Dataset updated
    Jun 5, 2024
    Description

    On 10 May you clarified: The dates I'm requesting are from 2010 to the present day as this was when this current government came into power Response I can confirm that the NHSBSA holds the information you have requested • 1,081,286 cases have paid the penalty charge in full • 219,940 cases have paid both the penalty charge and the surcharge in full. • No one has been taken to court. Please read the below notes to ensure correct understanding of the data: • We do not hold data for how many individual people have paid a fine. The data provided is based on the number of cases, rather than the number of individuals, where a fine has been paid. • We have included any cases that are classed as fully paid and have paid either the penalty charge or both the penalty charge and surcharge. • This data is correct as of 20th May 2024. • The Prescription Exemption Checking Service started in 2014. The data provided is therefore from 2014 to 20th May 2024. Publishing this response Please note that this information will be published on our Freedom of Information disclosure log at: https://opendata.nhsbsa.net/dataset/foi-01915

  16. Covid-19 Highest City Population Density

    • kaggle.com
    Updated Mar 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    lookfwd (2020). Covid-19 Highest City Population Density [Dataset]. https://www.kaggle.com/lookfwd/covid19highestcitypopulationdensity/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 25, 2020
    Dataset provided by
    Kaggle
    Authors
    lookfwd
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This is a dataset of the most highly populated city (if applicable) in a form easy to join with the COVID19 Global Forecasting (Week 1) dataset. You can see how to use it in this kernel

    Content

    There are four columns. The first two correspond to the columns from the original COVID19 Global Forecasting (Week 1) dataset. The other two is the highest population density, at city level, for the given country/state. Note that some countries are very small and in those cases the population density reflects the entire country. Since the original dataset has a few cruise ships as well, I've added them there.

    Acknowledgements

    Thanks a lot to Kaggle for this competition that gave me the opportunity to look closely at some data and understand this problem better.

    Inspiration

    Summary: I believe that the square root of the population density should relate to the logistic growth factor of the SIR model. I think the SEIR model isn't applicable due to any intervention being too late for a fast-spreading virus like this, especially in places with dense populations.

    After playing with the data provided in COVID19 Global Forecasting (Week 1) (and everything else online or media) a bit, one thing becomes clear. They have nothing to do with epidemiology. They reflect sociopolitical characteristics of a country/state and, more specifically, the reactivity and attitude towards testing.

    The testing method used (PCR tests) means that what we measure could potentially be a proxy for the number of people infected during the last 3 weeks, i.e the growth (with lag). It's not how many people have been infected and recovered. Antibody or serology tests would measure that, and by using them, we could go back to normality faster... but those will arrive too late. Way earlier, China will have experimentally shown that it's safe to go back to normal as soon as your number of newly infected per day is close to zero.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F197482%2F429e0fdd7f1ce86eba882857ac7a735e%2Fcovid-summary.png?generation=1585072438685236&alt=media" alt="">

    My view, as a person living in NYC, about this virus, is that by the time governments react to media pressure, to lockdown or even test, it's too late. In dense areas, everyone susceptible has already amble opportunities to be infected. Especially for a virus with 5-14 days lag between infections and symptoms, a period during which hosts spread it all over on subway, the conditions are hopeless. Active populations have already been exposed, mostly asymptomatic and recovered. Sensitive/older populations are more self-isolated/careful in affluent societies (maybe this isn't the case in North Italy). As the virus finishes exploring the active population, it starts penetrating the more isolated ones. At this point in time, the first fatalities happen. Then testing starts. Then the media and the lockdown. Lockdown seems overly effective because it coincides with the tail of the disease spread. It helps slow down the virus exploring the long-tail of sensitive population, and we should all contribute by doing it, but it doesn't cause the end of the disease. If it did, then as soon as people were back in the streets (see China), there would be repeated outbreaks.

    Smart politicians will test a lot because it will make their condition look worse. It helps them demand more resources. At the same time, they will have a low rate of fatalities due to large denominator. They can take credit for managing well a disproportionally major crisis - in contrast to people who didn't test.

    We were lucky this time. We, Westerners, have woken up to the potential of a pandemic. I'm sure we will give further resources for prevention. Additionally, we will be more open-minded, helping politicians to have more direct responses. We will also require them to be more responsible in their messages and reactions.

  17. student data analysis

    • kaggle.com
    Updated Nov 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    maira javeed (2023). student data analysis [Dataset]. https://www.kaggle.com/datasets/mairajaveed/student-data-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 17, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    maira javeed
    Description

    In this project, we aim to analyze and gain insights into the performance of students based on various factors that influence their academic achievements. We have collected data related to students' demographic information, family background, and their exam scores in different subjects.

    **********Key Objectives:*********

    1. Performance Evaluation: Evaluate and understand the academic performance of students by analyzing their scores in various subjects.

    2. Identifying Underlying Factors: Investigate factors that might contribute to variations in student performance, such as parental education, family size, and student attendance.

    3. Visualizing Insights: Create data visualizations to present the findings effectively and intuitively.

    Dataset Details:

    • The dataset used in this analysis contains information about students, including their age, gender, parental education, lunch type, and test scores in subjects like mathematics, reading, and writing.

    Analysis Highlights:

    • We will perform a comprehensive analysis of the dataset, including data cleaning, exploration, and visualization to gain insights into various aspects of student performance.

    • By employing statistical methods and machine learning techniques, we will determine the significant factors that affect student performance.

    Why This Matters:

    Understanding the factors that influence student performance is crucial for educators, policymakers, and parents. This analysis can help in making informed decisions to improve educational outcomes and provide support where it is most needed.

    Acknowledgments:

    We would like to express our gratitude to [mention any data sources or collaborators] for making this dataset available.

    Please Note:

    This project is meant for educational and analytical purposes. The dataset used is fictitious and does not represent any specific educational institution or individuals.

  18. NETFLIX Stock Data 2025

    • kaggle.com
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umer Haddii (2025). NETFLIX Stock Data 2025 [Dataset]. https://www.kaggle.com/datasets/umerhaddii/netflix-stock-data-2025
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 13, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Umer Haddii
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Netflix, Inc. is an American media company engaged in paid streaming and the production of films and series.

    Market cap

    Market capitalization of Netflix (NFLX)
    
    Market cap: $517.08 Billion USD
    
    

    As of June 2025 Netflix has a market cap of $517.08 Billion USD. This makes Netflix the world's 19th most valuable company by market cap according to our data. The market capitalization, commonly called market cap, is the total market value of a publicly traded company's outstanding shares and is commonly used to measure how much a company is worth.

    Revenue

    Revenue for Netflix (NFLX)
    
    Revenue in 2025: $40.17 Billion USD
    

    According to Netflix's latest financial reports the company's current revenue (TTM ) is $40.17 Billion USD. In 2024 the company made a revenue of $39.00 Billion USD an increase over the revenue in the year 2023 that were of $33.72 Billion USD. The revenue is the total amount of income that a company generates by the sale of goods or services. Unlike with the earnings no expenses are subtracted.

    Earnings

    Earnings for Netflix (NFLX)
    
    Earnings in 2025 (TTM): $11.31 Billion USD
    
    

    According to Netflix's latest financial reports the company's current earnings are $40.17 Billion USD. In 2024 the company made an earning of $10.70 Billion USD, an increase over its 2023 earnings that were of $7.02 Billion USD. The earnings displayed on this page is the company's Pretax Income.

    End of Day market cap according to different sources

    On Jun 12th, 2025 the market cap of Netflix was reported to be:

    $517.08 Billion USD by Yahoo Finance

    $517.08 Billion USD by CompaniesMarketCap

    $517.21 Billion USD by Nasdaq

    Content

    Geography: USA

    Time period: May 2002- June 2025

    Unit of analysis: Netflix Stock Data 2025

    Variables

    VariableDescription
    datedate
    openThe price at market open.
    highThe highest price for that day.
    lowThe lowest price for that day.
    closeThe price at market close, adjusted for splits.
    adj_closeThe closing price after adjustments for all applicable splits and dividend distributions. Data is adjusted using appropriate split and dividend multipliers, adhering to Center for Research in Security Prices (CRSP) standards.
    volumeThe number of shares traded on that day.

    Acknowledgements

    This dataset belongs to me. I’m sharing it here for free. You may do with it as you wish.

  19. N

    United States Age Group Population Dataset: A Complete Breakdown of United...

    • neilsberg.com
    csv, json
    Updated Jul 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). United States Age Group Population Dataset: A Complete Breakdown of United States Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/aabf26b9-4983-11ef-ae5d-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Jul 24, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Variables measured
    Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the United States population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for United States. The dataset can be utilized to understand the population distribution of United States by age. For example, using this dataset, we can identify the largest age group in United States.

    Key observations

    The largest age group in United States was for the group of age 30 to 34 years years with a population of 22.71 million (6.86%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in United States was the 80 to 84 years years with a population of 6.25 million (1.89%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group in consideration
    • Population: The population for the specific age group in the United States is shown in this column.
    • % of Total Population: This column displays the population of each age group as a proportion of United States total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for United States Population by Age. You can refer the same here

  20. Z

    Spotify Million Playlist: Recsys Challenge 2018 Dataset

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Apr 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AIcrowd (2022). Spotify Million Playlist: Recsys Challenge 2018 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6425592
    Explore at:
    Dataset updated
    Apr 9, 2022
    Dataset authored and provided by
    AIcrowd
    Description

    Spotify Million Playlist Dataset Challenge

    Summary

    The Spotify Million Playlist Dataset Challenge consists of a dataset and evaluation to enable research in music recommendations. It is a continuation of the RecSys Challenge 2018, which ran from January to July 2018. The dataset contains 1,000,000 playlists, including playlist titles and track titles, created by users on the Spotify platform between January 2010 and October 2017. The evaluation task is automatic playlist continuation: given a seed playlist title and/or initial set of tracks in a playlist, to predict the subsequent tracks in that playlist. This is an open-ended challenge intended to encourage research in music recommendations, and no prizes will be awarded (other than bragging rights).

    Background

    Playlists like Today’s Top Hits and RapCaviar have millions of loyal followers, while Discover Weekly and Daily Mix are just a couple of our personalized playlists made especially to match your unique musical tastes.

    Our users love playlists too. In fact, the Digital Music Alliance, in their 2018 Annual Music Report, state that 54% of consumers say that playlists are replacing albums in their listening habits.

    But our users don’t love just listening to playlists, they also love creating them. To date, over 4 billion playlists have been created and shared by Spotify users. People create playlists for all sorts of reasons: some playlists group together music categorically (e.g., by genre, artist, year, or city), by mood, theme, or occasion (e.g., romantic, sad, holiday), or for a particular purpose (e.g., focus, workout). Some playlists are even made to land a dream job, or to send a message to someone special.

    The other thing we love here at Spotify is playlist research. By learning from the playlists that people create, we can learn all sorts of things about the deep relationship between people and music. Why do certain songs go together? What is the difference between “Beach Vibes” and “Forest Vibes”? And what words do people use to describe which playlists?

    By learning more about nature of playlists, we may also be able to suggest other tracks that a listener would enjoy in the context of a given playlist. This can make playlist creation easier, and ultimately help people find more of the music they love.

    Dataset

    To enable this type of research at scale, in 2018 we sponsored the RecSys Challenge 2018, which introduced the Million Playlist Dataset (MPD) to the research community. Sampled from the over 4 billion public playlists on Spotify, this dataset of 1 million playlists consist of over 2 million unique tracks by nearly 300,000 artists, and represents the largest public dataset of music playlists in the world. The dataset includes public playlists created by US Spotify users between January 2010 and November 2017. The challenge ran from January to July 2018, and received 1,467 submissions from 410 teams. A summary of the challenge and the top scoring submissions was published in the ACM Transactions on Intelligent Systems and Technology.

    In September 2020, we re-released the dataset as an open-ended challenge on AIcrowd.com. The dataset can now be downloaded by registered participants from the Resources page.

    Each playlist in the MPD contains a playlist title, the track list (including track IDs and metadata), and other metadata fields (last edit time, number of playlist edits, and more). All data is anonymized to protect user privacy. Playlists are sampled with some randomization, are manually filtered for playlist quality and to remove offensive content, and have some dithering and fictitious tracks added to them. As such, the dataset is not representative of the true distribution of playlists on the Spotify platform, and must not be interpreted as such in any research or analysis performed on the dataset.

    Dataset Contains

    1000 examples of each scenario:

    Title only (no tracks) Title and first track Title and first 5 tracks First 5 tracks only Title and first 10 tracks First 10 tracks only Title and first 25 tracks Title and 25 random tracks Title and first 100 tracks Title and 100 random tracks

    Download Link

    Full Details: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge Download Link: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge/dataset_files

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html

Coronavirus (Covid-19) Data in the United States

Explore at:
Dataset provided by
New York Times
Description

The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

Search
Clear search
Close search
Google apps
Main menu