98 datasets found
  1. Global Country Information Dataset 2023

    • kaggle.com
    zip
    Updated Jul 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nidula Elgiriyewithana ⚡ (2023). Global Country Information Dataset 2023 [Dataset]. https://www.kaggle.com/datasets/nelgiriyewithana/countries-of-the-world-2023
    Explore at:
    zip(24063 bytes)Available download formats
    Dataset updated
    Jul 8, 2023
    Authors
    Nidula Elgiriyewithana ⚡
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.

    DOI

    Key Features

    • Country: Name of the country.
    • Density (P/Km2): Population density measured in persons per square kilometer.
    • Abbreviation: Abbreviation or code representing the country.
    • Agricultural Land (%): Percentage of land area used for agricultural purposes.
    • Land Area (Km2): Total land area of the country in square kilometers.
    • Armed Forces Size: Size of the armed forces in the country.
    • Birth Rate: Number of births per 1,000 population per year.
    • Calling Code: International calling code for the country.
    • Capital/Major City: Name of the capital or major city.
    • CO2 Emissions: Carbon dioxide emissions in tons.
    • CPI: Consumer Price Index, a measure of inflation and purchasing power.
    • CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.
    • Currency_Code: Currency code used in the country.
    • Fertility Rate: Average number of children born to a woman during her lifetime.
    • Forested Area (%): Percentage of land area covered by forests.
    • Gasoline_Price: Price of gasoline per liter in local currency.
    • GDP: Gross Domestic Product, the total value of goods and services produced in the country.
    • Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.
    • Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.
    • Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.
    • Largest City: Name of the country's largest city.
    • Life Expectancy: Average number of years a newborn is expected to live.
    • Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.
    • Minimum Wage: Minimum wage level in local currency.
    • Official Language: Official language(s) spoken in the country.
    • Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.
    • Physicians per Thousand: Number of physicians per thousand people.
    • Population: Total population of the country.
    • Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.
    • Tax Revenue (%): Tax revenue as a percentage of GDP.
    • Total Tax Rate: Overall tax burden as a percentage of commercial profits.
    • Unemployment Rate: Percentage of the labor force that is unemployed.
    • Urban Population: Percentage of the population living in urban areas.
    • Latitude: Latitude coordinate of the country's location.
    • Longitude: Longitude coordinate of the country's location.

    Potential Use Cases

    • Analyze population density and land area to study spatial distribution patterns.
    • Investigate the relationship between agricultural land and food security.
    • Examine carbon dioxide emissions and their impact on climate change.
    • Explore correlations between economic indicators such as GDP and various socio-economic factors.
    • Investigate educational enrollment rates and their implications for human capital development.
    • Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.
    • Study labor market dynamics through indicators such as labor force participation and unemployment rates.
    • Investigate the role of taxation and its impact on economic development.
    • Explore urbanization trends and their social and environmental consequences.

    Data Source: This dataset was compiled from multiple data sources

    If this was helpful, a vote is appreciated ❤️ Thank you 🙂

  2. N

    South Range, MI Population Breakdown by Gender Dataset: Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). South Range, MI Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b254570a-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Range, Michigan
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of South Range by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of South Range across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a slight majority of male population, with 52.64% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the South Range is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of South Range total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for South Range Population by Race & Ethnicity. You can refer the same here

  3. Football Manager 2023: 90k+ Player Stats

    • kaggle.com
    zip
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siddhraj Thakor (2025). Football Manager 2023: 90k+ Player Stats [Dataset]. https://www.kaggle.com/datasets/siddhrajthakor/football-manager-2023-dataset
    Explore at:
    zip(9373378 bytes)Available download formats
    Dataset updated
    Oct 1, 2025
    Authors
    Siddhraj Thakor
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Football Manager Players Dataset

    Overview

    Dive into the ultimate treasure trove for football enthusiasts, data analysts, and gaming aficionados! The Football Manager Players Dataset is a comprehensive collection of player data extracted from a popular football management simulation game, offering an unparalleled look into the virtual world of football talent. This dataset includes detailed attributes for thousands of players across multiple leagues worldwide, making it a goldmine for analyzing player profiles, scouting virtual stars, and building predictive models for football strategies.

    Whether you're a data scientist exploring sports analytics, a football fan curious about your favorite virtual players, or a game developer seeking inspiration, this dataset is your ticket to unlocking endless possibilities!

    Dataset Description

    This dataset is a meticulously curated compilation of player statistics from five CSV files, merged into a single, unified dataset (merged_players.csv). It captures a diverse range of attributes for players from various clubs, nations, and leagues, including top-tier competitions like the English Premier Division, Argentina's Premier Division, and lower divisions across the globe.

    Key Features

    • Rich Player Attributes: Over 70 columns covering essential metrics such as:
      • Basic Info: UID, Name, Date of Birth (DOB), Nationality, Height, Weight, Age
      • Club & Position: Club, Position (e.g., AM, DM, GK), Based (league/division)
      • Performance Stats: Caps, Appearances (AT Apps), Goals (AT Gls), League Appearances, League Goals
      • Technical Skills: Acceleration, Passing, Dribbling, Finishing, Tackling, and more
      • Mental Attributes: Work Rate, Vision, Leadership, Determination
      • Physical Attributes: Pace, Strength, Stamina, Agility
      • Market Value: Transfer Value (e.g., $0 to millions)
      • Miscellaneous: Preferred Foot, Media Handling, Injury Proneness
    • Global Coverage: Players from diverse regions, including Europe (England, Spain, Italy), South America (Argentina, Brazil), Asia (South Korea, China), Africa (Ivory Coast, Burkina Faso), and North America (USA, Mexico).
    • Varied Player Types: From young prospects (15–18 years old) to veteran stars (up to 45 years old), including amateurs, youth players, and professionals.
    • Realistic Insights: Includes attributes like Media Description (e.g., "Young winger," "Veteran striker") and injury status, mirroring real-world football dynamics.

    Dataset Size

    • Rows: Thousands of player records (exact count depends on deduplication).
    • Columns: 70+ attributes per player.
    • File: merged_players.csv (UTF-8 encoded for compatibility with special characters).

    Potential Use Cases

    • Sports Analytics:
      • Analyze player attributes to identify key traits for success by position (e.g., what makes a top goalkeeper?).
      • Predict transfer values based on skills, age, and performance stats.
      • Cluster players by playing style or potential using machine learning.
    • Scouting & Strategy:
      • Build a dream team by filtering players based on specific attributes (e.g., high Pace and Dribbling for wingers).
      • Compare young talents vs. experienced veterans for team-building strategies.
    • Gaming & Modding:
      • Create custom Football Manager databases or mods.
      • Analyze game balance by studying attribute distributions.
    • Visualization:
      • Develop interactive dashboards to explore player stats by league, nationality, or position.
      • Map player origins to visualize global football talent distribution.
    • Education & Research:
      • Use as a teaching tool for data science, exploring data cleaning, merging, and analysis.
      • Study correlations between mental/physical attributes and in-game performance.

    Why This Dataset Stands Out

    • Comprehensive: Covers every aspect of a player's profile, from technical skills to personality traits.
    • Diverse: Includes players from top-tier to lower divisions, offering a broad spectrum of talent.
    • Engaging: Perfect for football fans and data enthusiasts alike, blending gaming with real-world analytics.
    • Ready-to-Use: Merged and cleaned for immediate analysis, with consistent column structure across all records.

    Getting Started

    1. Download: Grab merged_players.csv and load it into your favorite tool (Python/pandas, R, Excel, etc.).
    2. Explore: Check out columns like Transfer Value, Position, and Media Description to start your analysis.
    3. Analyze: Use Python (e.g., pandas, scikit-learn) or visualization tools (e.g., Tableau, Power BI) to uncover insights.
    4. Share: Build models, visualizations, or scouting reports and share your findings with the Kaggle community!

    Example Questions to Explore

    • Which young players (<18 years) have the highest poten...
  4. Simulation Data Set

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  5. NBA All time Stats (1947-Present)

    • kaggle.com
    zip
    Updated Sep 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    gonzalo-gigena (2024). NBA All time Stats (1947-Present) [Dataset]. https://www.kaggle.com/datasets/gonzalogigena/nba-all-time-stats
    Explore at:
    zip(162077732 bytes)Available download formats
    Dataset updated
    Sep 22, 2024
    Authors
    gonzalo-gigena
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset provides a comprehensive collection of NBA (National Basketball Association) all-time statistics sourced from Basketball Reference, a leading platform for basketball statistics and analysis. The dataset covers a wide range of categories including player statistics, team performance metrics, and historical records spanning several decades of NBA history.

    Content

    • NBA, ABA and BAA
    • Results of every match
    • Basic and advanced boxscores of +74000 games
    • +30 Teams
    • All time player Stats
    • Rookies by year

    You can find all the available datasets in the github repository.

    Take a look in their glossary for a detailed column description Glossary

    https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white">

    Repository link

    The dataset is updated daily to incorporate the latest player and team statistics, ensuring that users have access to the most recent data for their analyses and research.

    In addition to the existing dataset covering player and team statistics, I’m actively working on expanding the dataset to include play-by-play data and shot charts.

  6. OECD Revenue Statistics

    • kaggle.com
    Updated Feb 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    willian oliveira gibin (2024). OECD Revenue Statistics [Dataset]. http://doi.org/10.34740/kaggle/dsv/7620457
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 13, 2024
    Dataset provided by
    Kaggle
    Authors
    willian oliveira gibin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F8e1630ccacc7fec2f1851ad4ef7c8368%2FSem%20ttulo-1.png?generation=1707857613704062&alt=media" alt="">

    OECD Revenue Statistics: Comparative Tables Introduction

    The OECD Revenue Statistics database provides detailed and internationally comparable data on the taxes and social contributions paid by businesses and individuals in OECD countries. The data is collected annually from national governments and covers a wide range of taxes, including personal income tax, corporate income tax, social security contributions, and value-added tax.

    Data

    The database is divided into two main parts:

    Part 1: Revenue by Level of Government This part of the database provides data on the total revenue collected by each level of government (central, state, and local) in each OECD country. The data is broken down by type of tax and by source of revenue (e.g., taxes on income, profits, and capital gains; taxes on goods and services; social security contributions).

    Part 2: Revenue by Tax Type This part of the database provides data on the revenue collected from each type of tax in each OECD country. The data is broken down by level of government and by source of revenue.

    Uses

    The OECD Revenue Statistics database can be used for a variety of purposes, including:

    Cross-country comparisons of tax levels and structures The database can be used to compare the tax levels and structures of different OECD countries. This information can be used by policymakers to assess the effectiveness of their tax systems and to identify potential areas for reform.

    Analysis of the impact of tax policies The database can be used to analyze the impact of tax policies on economic growth, income distribution, and other outcomes. This information can be used by policymakers to design tax policies that are more effective and efficient.

    Research on tax policy The database can be used by researchers to study the effects of tax policy on a variety of economic outcomes. This research can help to inform the design of tax policy and to improve our understanding of the economic effects of taxation.

    Conclusion

    The OECD Revenue Statistics database is a valuable resource for policymakers, researchers, and anyone interested in the taxation of businesses and individuals in OECD countries. The database provides detailed and internationally comparable data on a wide range of taxes, making it an essential tool for understanding the tax systems of OECD countries.

    Data Access

    The OECD Revenue Statistics database is available online to subscribers. Subscribers can access the data through the OECD's website.

  7. Meta data and supporting documentation

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  8. World Development Indicators

    • kaggle.com
    zip
    Updated Sep 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Regina (2025). World Development Indicators [Dataset]. https://www.kaggle.com/datasets/dataregina/world-development-indicators
    Explore at:
    zip(77227006 bytes)Available download formats
    Dataset updated
    Sep 17, 2025
    Authors
    Data Regina
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The World Development Indicators (WDI) database, published by the World Bank, is a comprehensive collection of global development data, providing key economic, social, and environmental statistics. It includes almost 1,500 indicators covering more than 200 countries and territories, with data spanning several decades.

    WDI serves as a vital resource for policymakers, researchers, businesses, and analysts seeking to understand global trends and make data-driven decisions. The database covers a wide range of topics, including economic growth, education, health, poverty, trade, energy, infrastructure, governance, and environmental sustainability.

    The indicators are sourced from reputable national and international agencies, ensuring high-quality, consistent, and comparable data. Users can access the database through interactive online tools, API services, and downloadable datasets, facilitating detailed analysis and visualization.

    WDI is also used for tracking progress on the Sustainable Development Goals (SDGs) and other global development initiatives. By providing accessible and reliable statistics, it helps to inform policy discussions and strategies globally.

    Whether for academic research, policy planning, or economic analysis, the World Development Indicators database is an essential tool for understanding and addressing global development challenges.

  9. ERA5 post-processed daily statistics on single levels from 1940 to present

    • cds.climate.copernicus.eu
    grib
    Updated Dec 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ECMWF (2025). ERA5 post-processed daily statistics on single levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.4991cf48
    Explore at:
    gribAvailable download formats
    Dataset updated
    Dec 3, 2025
    Dataset provided by
    European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
    Authors
    ECMWF
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. This catalogue entry provides post-processed ERA5 hourly single-level data aggregated to daily time steps. In addition to the data selection options found on the hourly page, the following options can be selected for the daily statistic calculation:

    The daily aggregation statistic (daily mean, daily max, daily min, daily sum*) The sub-daily frequency sampling of the original data (1 hour, 3 hours, 6 hours) The option to shift to any local time zone in UTC (no shift means the statistic is computed from UTC+00:00)

    *The daily sum is only available for the accumulated variables (see ERA5 documentation for more details). Users should be aware that the daily aggregation is calculated during the retrieval process and is not part of a permanently archived dataset. For more details on how the daily statistics are calculated, including demonstrative code, please see the documentation. For more details on the hourly data used to calculate the daily statistics, please refer to the ERA5 hourly single-level data catalogue entry and the documentation found therein.

  10. d

    Weekly Pennsylvania COVID-19 Vaccinations Stats Archive

    • catalog.data.gov
    • data.wprdc.org
    • +1more
    Updated May 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Western Pennsylvania Regional Data Center (2023). Weekly Pennsylvania COVID-19 Vaccinations Stats Archive [Dataset]. https://catalog.data.gov/dataset/weekly-pennsylvania-covid-19-vaccinations-stats-archive
    Explore at:
    Dataset updated
    May 14, 2023
    Dataset provided by
    Western Pennsylvania Regional Data Center
    Area covered
    Pennsylvania
    Description

    Weekly archive of some State of Pennsylvania datasets found in this list: https://data.pa.gov/browse?q=vaccinations For most of these datasets, the "date_saved" field is the date that the WPRDC pulled the data from the state data portal and the archive combines all the saved records into one table. The exception to this is the "COVID-19 Vaccinations by Day by County of Residence Current Health (archive)" which is already published by the state as an entire history. The "date_updated" field is based on the date that the "updatedAt" field from the corresponding data.pa.gov dataset. Changes to this field have turned out to not be a good indicator of whether records have updated, which is why we are archiving this data on a weekly basis without regard to the "updatedAt" value. The "date_saved" field is the one you should sort on to see the variation in vaccinations over time. Most of the source tables have gone through schema changes or expansions. In some cases, we've kept the old archives under a separate resource with something like "[Orphaned Schema]" added to the resource name. In other cases, we've adjusted our schema to accommodate new column names, but there will be a date range during which the new columns have null values because we did not start pulling them until we became aware of them.

  11. e

    Subjective wellbeing, 'Worthwhile', percentage of responses in range 0-6

    • data.europa.eu
    • ckan.publishing.service.gov.uk
    • +2more
    html, sparql
    Updated Oct 11, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ministry of Housing, Communities and Local Government (2021). Subjective wellbeing, 'Worthwhile', percentage of responses in range 0-6 [Dataset]. https://data.europa.eu/data/datasets/subjective-wellbeing-worthwhile-percentage-of-responses-in-range-0-6
    Explore at:
    html, sparqlAvailable download formats
    Dataset updated
    Oct 11, 2021
    Dataset authored and provided by
    Ministry of Housing, Communities and Local Government
    License

    http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence

    Description

    Percentage of responses in range 0-6 out of 10 (corresponding to 'low wellbeing') for 'Worthwhile' in the First ONS Annual Experimental Subjective Wellbeing survey.

    The Office for National Statistics has included the four subjective well-being questions below on the Annual Population Survey (APS), the largest of their household surveys.

    • Overall, how satisfied are you with your life nowadays?
    • Overall, to what extent do you feel the things you do in your life are worthwhile?
    • Overall, how happy did you feel yesterday?
    • Overall, how anxious did you feel yesterday?

    This dataset presents results from the second of these questions, "Overall, to what extent do you feel the things you do in your life are worthwhile?" Respondents answer these questions on an 11 point scale from 0 to 10 where 0 is ‘not at all’ and 10 is ‘completely’. The well-being questions were asked of adults aged 16 and older.

    Well-being estimates for each unitary authority or county are derived using data from those respondents who live in that place. Responses are weighted to the estimated population of adults (aged 16 and older) as at end of September 2011.

    The data cabinet also makes available the proportion of people in each county and unitary authority that answer with ‘low wellbeing’ values. For the ‘worthwhile’ question answers in the range 0-6 are taken to be low wellbeing.

    This dataset contains the percentage of responses in the range 0-6. It also contains the standard error, the sample size and lower and upper confidence limits at the 95% level.

    The ONS survey covers the whole of the UK, but this dataset only includes results for counties and unitary authorities in England, for consistency with other statistics available at this website.

    At this stage the estimates are considered ‘experimental statistics’, published at an early stage to involve users in their development and to allow feedback. Feedback can be provided to the ONS via this email address.

    The APS is a continuous household survey administered by the Office for National Statistics. It covers the UK, with the chief aim of providing between-census estimates of key social and labour market variables at a local area level. Apart from employment and unemployment, the topics covered in the survey include housing, ethnicity, religion, health and education. When a household is surveyed all adults (aged 16+) are asked the four subjective well-being questions.

    The 12 month Subjective Well-being APS dataset is a sub-set of the general APS as the well-being questions are only asked of persons aged 16 and above, who gave a personal interview and proxy answers are not accepted. This reduces the size of the achieved sample to approximately 120,000 adult respondents in England.

    The original data is available from the ONS website.

    Detailed information on the APS and the Subjective Wellbeing dataset is available here.

    As well as collecting data on well-being, the Office for National Statistics has published widely on the topic of wellbeing. Papers and further information can be found here.

  12. N

    councilconstsvcs

    • data.cityofnewyork.us
    csv, xlsx, xml
    Updated Jun 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York City Council (NYCC) (2025). councilconstsvcs [Dataset]. https://data.cityofnewyork.us/City-Government/councilconstsvcs/kxhn-274p
    Explore at:
    xml, csv, xlsxAvailable download formats
    Dataset updated
    Jun 3, 2025
    Authors
    New York City Council (NYCC)
    Description

    The dataset comes from CouncilStat, which is used by many NYC Council district offices to enter and track constituent cases that can range from issues around affordable housing, to potholes and pedestrian safety. This dataset aggregates the information that individual staff have input. However, district staffs handle a wide range of complex issues. Each offices uses the program differently, and thus records cases, differently and so comparisons between accounts may be difficult. Not all offices use the program. For more info - http://labs.council.nyc/districts/data/

  13. Lands Area Statistics

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +3more
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Forest Service (2025). Lands Area Statistics [Dataset]. https://catalog.data.gov/dataset/lands-area-statistics
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    U.S. Department of Agriculture Forest Servicehttp://fs.fed.us/
    Description

    This dataset fulfills a request from multiple regional units as an operational aid that provides an authoritative companion to the Land Areas Report sliced along multiple administrative and political boundaries. This dataset powers a dynamic and interactive dashboard called Forests by the Numbers. The Forests by the Numbers series is expected to continue with other assets (roads, waterways, recreation areas, etc) to act as a quick reference for internal operations and the public. This dataset covers National Forest System Lands including federally owned units of forest, range, and related land consisting of national forests, purchase units, national grasslands, land utilization project areas, experimental forest areas, experimental range areas, designated experimental areas, other land areas, water areas, and interests in lands that are administered by the U.S. Department of Agriculture (USDA) Forest Service or designated for administration through the Forest Service. Each polygon is attributed with ownership (USDA Forest Service or Non-FS), wilderness status, Forest Service Administrative jurisdiction and geopolitical membership.

  14. c

    Data from: Datasets used to train the Generative Adversarial Networks used...

    • opendata.cern.ch
    Updated 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ATLAS collaboration (2021). Datasets used to train the Generative Adversarial Networks used in ATLFast3 [Dataset]. http://doi.org/10.7483/OPENDATA.ATLAS.UXKX.TXBN
    Explore at:
    Dataset updated
    2021
    Dataset provided by
    CERN Open Data Portal
    Authors
    ATLAS collaboration
    Description

    Three datasets are available, each consisting of 15 csv files. Each file containing the voxelised shower information obtained from single particles produced at the front of the calorimeter in the |η| range (0.2-0.25) simulated in the ATLAS detector. Two datasets contain photons events with different statistics; the larger sample has about 10 times the number of events as the other. The other dataset contains pions. The pion dataset and the photon dataset with the lower statistics were used to train the corresponding two GANs presented in the AtlFast3 paper SIMU-2018-04.

    The information in each file is a table; the rows correspond to the events and the columns to the voxels. The voxelisation procedure is described in the AtlFast3 paper linked above and in the dedicated PUB note ATL-SOFT-PUB-2020-006. In summary, the detailed energy deposits produced by ATLAS were converted from x,y,z coordinates to local cylindrical coordinates defined around the particle 3-momentum at the entrance of the calorimeter. The energy deposits in each layer were then grouped in voxels and for each voxel the energy was stored in the csv file. For each particle, there are 15 files corresponding to the 15 energy points used to train the GAN. The name of the csv file defines both the particle and the energy of the sample used to create the file.

    The size of the voxels is described in the binning.xml file. Software tools to read the XML file and manipulate the spatial information of voxels are provided in the FastCaloGAN repository.

    Updated on February 10th 2022. A new dataset photons_samples_highStat.tgz was added to this record and the binning.xml file was updated accordingly.

    Updated on April 18th 2023. A new dataset pions_samples_highStat.tgz was added to this record.

  15. r

    Daily and monthly minimum, maximum and range of eReefs hydrodynamic model...

    • researchdata.edu.au
    Updated Oct 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lafond,Gael; Hammerton,Marc; Smith, Aaron; Lawrey, Eric (2020). Daily and monthly minimum, maximum and range of eReefs hydrodynamic model outputs - temperature, water elevation (AIMS, Source: CSIRO) [Dataset]. https://researchdata.edu.au/ereefs-aims-csiro-model-outputs/3766488
    Explore at:
    Dataset updated
    Oct 27, 2020
    Dataset provided by
    Australian Ocean Data Network
    Authors
    Lafond,Gael; Hammerton,Marc; Smith, Aaron; Lawrey, Eric
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2010 - Nov 30, 2022
    Area covered
    Description

    This derived dataset contains basic statistical products derived from the eReefs CSIRO hydrodynamic model v2.0 outputs at both 1 km and 4 km resolution and v4.0 at 4 km for both a daily and monthly aggregation period. The statistics generated are daily minimum, maximum, mean and range. For monthly aggregations there are monthly mean of the daily minimum, maximum and range, and the monthly minimum, maximum and range. The dataset only calculates statistics for the temperature and water elevation (eta).

    These are generated by the AIMS eReefs Platform (https://ereefs.aims.gov.au/). These statistical products are derived from the original hourly model outputs available via the National Computing Infrastructure (NCI) (https://thredds.nci.org.au/thredds/catalogs/fx3/catalog.html).

    The data is re-gridded from the original curvilinear grid used by the eReefs model into a regular grid so the data files can be easily loaded into standard GIS software. These products are made available via a THREDDS server (https://thredds.ereefs.aims.gov.au/thredds/) in NetCDF format and
    This data set contains two (2) products, based on the periods over which the statistics are determined: daily, and monthly.

    Method:
    Data files are processed in two stages. The daily files are calculated from the original hourly files, then the monthly files are calculated from the daily files. See Technical Guide to Derived Products from CSIRO eReefs Models for details on the regridding process.

    Data Dictionary:

    Daily statistics:
    The following variables can be found in the Daily statistics product:

    - temp_mean: mean temperature for each grid cell for the day.
    - temp_min: minimum temperature for each grid cell for the day.
    - temp_max: maximum temperature for each grid cell for the day.
    - temp_range: difference between maximum and minimum temperatures for each grid cell for the day.

    - eta_mean: mean surface elevation for each grid cell for the day.
    - eta_min: minimum surface elevation for each grid cell for the day.
    - eta_max: maximum surface elevation for each grid cell for the day.
    - eta_range: difference between maximum and minimum surface elevation for each grid cell for the day.

    Depths:

    Depths at 1km resolution: -2.35m, -5.35m, -18.0m, -49.0m
    Depths are 4km resolution: -1.5m, -5.55m, -17.75m, -49.0m

    * Monthly statistics:

    The following variables can be found in the Monthly statistics product:

    - temp_min_min: the minimum value of the "temp_min" variable from the Daily statistics product. This equates to the minimum temperature for each grid cell for the corresponding month.
    - temp_min_mean: the mean value of the "temp_min" variable from the Daily statistics product. This equates to the mean minimum temperature for each grid cell for the corresponding month.
    - temp_max_max: the maximum value of the "temp_max" variable from the Daily statistics product. This equates to the maximum temperature for each grid cell for the corresponding month.
    - temp_max_mean: the mean value of the "temp_max" variable from the Daily statistics product. This equates to the mean maximum temperature for each grid cell for the corresponding month.
    - temp_mean: the mean value of the "temp_mean" variable from the Daily statistics product. This equates to the mean temperature for each grid cell for the corresponding month.
    - temp_range_mean: the mean value of the "temp_range" variable from the Daily statistics product. This equates to the mean range of temperatures for each grid cell for the corresponding month.
    - eta_min_min: the minimum value of the "eta_min" variable from the Daily statistics product. This equates to the minimum surface elevation for each grid cell for the corresponding month.
    - eta_min_mean: the mean value of the "eta_min" variable from the Daily statistics product. This equates to the mean minimum surface elevation for each grid cell for the corresponding month.
    - eta_max_max: the maximum value of the "eta_max" variable from the Daily statistics product. This equates to the maximum surface elevation for each grid cell for the corresponding month.
    - eta_max_mean: the mean value of the "eta_max" variable from the Daily statistics product. This equates to the mean maximum surface elevation for each grid cell for the corresponding month.
    - eta_mean: the mean value of the "eta_mean" variable from the Daily statistics product. This equates to the mean surface elevation for each grid cell for the corresponding month.
    - eta_range_mean: the mean value of the "eta_range" variable from the Daily statistics product. This equates to the mean range of surface elevations for each grid cell for the corresponding month.

    Depths:
    Depths at 1km resolution: -2.35m, -5.35m, -18.0m, -49.0m
    Depths are 4km resolution: -1.5m, -5.55m, -17.75m, -49.0m

    What does this dataset show:

    The temperature statistics show that inshore areas along the coast get significantly warmer in summer and cooler in winter than offshore areas. The daily temperature range is lower in winter with most areas experiencing 0.2 - 0.3 degrees Celsius temperature change. In summer months the daily temperature range approximately doubles, with up welling areas in the Capricorn Bunker group, off the outer edge of the Prompey sector of reefs and on the east side of Torres Strait seeing daily temperature ranges between 0.7 - 1.2 degree Celsius.

    Limitations:

    This dataset is based on spatial and temporal models and so are an estimate of the environmental conditions. It is not based on in-water measurements, and thus will have a spatially varying level of error in the modelled values. It is important to consider if the model results are fit for the intended purpose.

    Change Log:
    2025-10-29: Updated the metadata title from 'eReefs AIMS-CSIRO Statistics of hydrodynamic model outputs' to 'Daily and monthly minimum, maximum and range of eReefs hydrodynamic model outputs - temperature, water elevation (AIMS, Source: CSIRO)'. Improve the introduction text. Corrected deprecated link to NCI THREDDS. Added a description of what the dataset shows.

  16. S3 Dataset

    • figshare.com
    • portalinvestigacion.um.es
    zip
    Updated Apr 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan Manuel Espín López; Alberto Huertas Celdrán; Javier G. Marín-Blázquez; Francisco Esquembre Martínez; Gregorio Martínez Pérez (2021). S3 Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.14410229.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 13, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Juan Manuel Espín López; Alberto Huertas Celdrán; Javier G. Marín-Blázquez; Francisco Esquembre Martínez; Gregorio Martínez Pérez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The S3 dataset contains the behavior (sensors, statistics of applications, and voice) of 21 volunteers interacting with their smartphones for more than 60 days. The type of users is diverse, males and females in the age range from 18 until 70 have been considered in the dataset generation. The wide range of age is a key aspect, due to the impact of age in terms of smartphone usage. To generate the dataset the volunteers installed a prototype of the smartphone application in on their Android mobile phones.All attributes of the different kinds of data are writed in a vector. The dataset contains the fellow vectors: Sensors: This type of vector contains data belonging to smartphone sensors (accelerometer and gyroscope) that has been acquired in a given windows of time. Each vector is obtained every 20 seconds, and the monitored features are:- Average of accelerometer and gyroscope values.- Maximum and minimum of accelerometer and gyroscope values.- Variance of accelerometer and gyroscope values.- Peak-to-peak (max-min) of X, Y, Z coordinates.- Magnitude for gyroscope and accelerometer.Statistics: These vectors contain data about the different applications used by the user recently. Each vector of statistics is calculated every 60 seconds and contains : - Foreground application counters (number of different and total apps) for the last minute and the last day.- Most common app ID and the number of usages in the last minute and the last day. - ID of the currently active app. - ID of the last active app prior to the current one.- ID of the application most frequently utilized prior to the current application. - Bytes transmitted and received through the network interfaces. Voice: This kind of vector is generated when the microphone is active in a call o voice note. The speaker vector is an embedding, extracted from the audio, and it contains information about the user's identity. This vector, is usually named "x-vector" in the Speaker Recognition field, and it is calculated following the steps detailed in "egs/sitw/v2" for the Kaldi library, with the models available for the extraction of the embedding. A summary of the details of the collected database.- Users: 21 - Sensors vectors: 417.128 - Statistics app's usage vectors: 151.034 - Speaker vectors: 2.720 - Call recordings: 629 - Voice messages: 2.091

  17. N

    nyc council constit serv 6

    • data.cityofnewyork.us
    csv, xlsx, xml
    Updated Jun 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York City Council (NYCC) (2025). nyc council constit serv 6 [Dataset]. https://data.cityofnewyork.us/City-Government/nyc-council-constit-serv-6/6wkm-9y7m
    Explore at:
    xlsx, csv, xmlAvailable download formats
    Dataset updated
    Jun 3, 2025
    Authors
    New York City Council (NYCC)
    Area covered
    New York
    Description

    The dataset comes from CouncilStat, which is used by many NYC Council district offices to enter and track constituent cases that can range from issues around affordable housing, to potholes and pedestrian safety. This dataset aggregates the information that individual staff have input. However, district staffs handle a wide range of complex issues. Each offices uses the program differently, and thus records cases, differently and so comparisons between accounts may be difficult. Not all offices use the program. For more info - http://labs.council.nyc/districts/data/

  18. UFC Fight Statistics Data

    • kaggle.com
    zip
    Updated Feb 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Magnus (2025). UFC Fight Statistics Data [Dataset]. https://www.kaggle.com/datasets/alexmagnus24/ufc-fight-statistics-july-2016-nov-2024
    Explore at:
    zip(1147818 bytes)Available download formats
    Dataset updated
    Feb 2, 2025
    Authors
    Alex Magnus
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset has web-scraped each UFC fight from July 2016 until November 2024. These statistics have been accumulated round-by-round and include outcome and event attributes.

    The disparities dataset finds each statistical difference between Fighter1 (Red Corner) and Fighter2 (Blue Corner).

    UFC Fight Statistics Metadata:

    Event Information:

    Fighter1, Fighter2: Names of competing fighters Winner?: Fight winner Fight Method: Method of victory (KO, Submission, Decision, etc.) Time: Fight ending time Time Format: Format of time measurement Referee: Official referee Finish Details or Judges Scorecard: Fight ending specifics or scoring details Bout: Fight category/type Event Name: Name of UFC event Location: Venue/city of event Date: Event date Rounds: Total rounds fought

    Per-Fighter, Per-Round Statistics (F1R1-F5, F2R1-R5): For each fighter (F1/F2) in each round (R1-R5): Strike Statistics:

    Knockdowns: Number of knockdowns Total Strike Landed/Missed: All strikes attempted Non-Sig. Strike Landed/Missed: Non-significant strikes attempted Sig. Strike Landed/Missed: Significant strikes attempted

    Strike Location:

    Head: Strikes to head region Body: Strikes to body region Leg: Strikes to leg region

    Position Statistics:

    Distance: Strikes at range Clinch: Strikes in clinch Ground: Strikes on the ground

    Grappling Statistics:

    TD Completed/Missed: Takedowns attempted

    Sub. Att: Submission attempts

    Rev.: Position reversals

    Ctrl Time (Minutes): Ground control duration

    Note: Statistics are separated by fighter (F1/F2) and tracked individually for each round (R1-R5), rather than using differentials.

    UFC Fight Disparities Metadata:

    Core Fight Information:

    Fighter1, Fighter2: Names of competing fighters 'F1 Winner?' or 'F2 Winner?': Binary indicators of fight outcome Fight Method: Method of victory (KO, Submission, Decision, etc.) Rounds: Total number of rounds fought

    Per-Round Statistics (R1-R5): Each round has the following metrics with differential ("Disp.") between fighters:

    Strike Statistics:

    Knockdowns: Number of knockdowns scored Total Strike Landed/Missed: All strikes attempted, landed or missed Non-Sig. Strike Landed/Missed: Non-significant strikes attempted Sig. Strike Landed/Missed: Significant strikes attempted

    Strike Location:

    Head: Strikes to head region Body: Strikes to body region Leg: Strikes to leg region

    Position Statistics:

    Distance: Strikes thrown at range Clinch: Strikes thrown in clinch Ground: Strikes thrown on the ground

    Grappling Statistics:

    TD Completed/Missed: Takedowns attempted and success rate Sub. Att: Submission attempts Rev.: Position reversals Ctrl Time (Minutes): Ground control time

    Note: This structure repeats for each round (R1-R5), with "Disp." indicating the differential between fighters for each metric.

  19. Evaluating NFL Safety Range

    • kaggle.com
    zip
    Updated Jan 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kirk Williams II (2021). Evaluating NFL Safety Range [Dataset]. https://www.kaggle.com/kirkwilliamsii/using-ngs-data-to-evaluate-nfl-safety-range
    Explore at:
    zip(421153222 bytes)Available download formats
    Dataset updated
    Jan 7, 2021
    Authors
    Kirk Williams II
    Description

    Using NGS Data to Evaluate NFL Safety Range

    Kirk Douglas Williams, II

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5962067%2Fc1eb14a9119cd479cf709b7a526d5e61%2FPicture1nfl%20logo.jpg?generation=1610049708767991&alt=media" alt="">

    Abstract Our analysis will use RStudio and a Random Forest algorithm to show which NFL safeties covered the most ground on the football field in 2018. Or in other words, which safety is the “rangiest”. If we can learn a new NFL stat that can rank an NFL safety based on the ground, they covered. Or how close they come to making a play on the ball. We could solve a lot of front offices and player personal problems around the league. What makes covering ground and being close to the ball often for a safety so important. As an NFL safety not only are you the last line of support in your team’s defensive. But you are often relied on countless times in coverage. A particular defensive back might not be as good as the receiver they are covering. But having a range stat can fear off a lot more quarterbacks into throwing in your direction on the field. In this analysis, we will be using RStudio and the Random Forest Model to predict if a safety can get within two yards of the ball at arrival. Our research hopefully will be very informative and able to be used in the future of sports analytic expansion in the NFL.

    1.** Introduction** When you think elite defender what comes to mind? Is it the stats, their consistency, their name or is it their accolades? When watching your favorite NFL team, you tend to notice a high-performing player that you feel lacks his desired amount of exposer. Many people believe that a lot of players are undervalued especially because of lack of exposer, poor performing teams, or even small market cities. NextGEN stats allow us to take a deep dive into what does not show up on your regular stat sheet. The goal of this research is to figure out who those elite defenders are, specifically when it comes to the passing game. The next-gen stats that helped guide us to our results consisted of playId, gameId, speed, etc. We were able to combine certain attributes to observe who those elite defenders are. We were able to calculate stats that told us when the ball arrives after the pass was thrown, the angle the player took to the ball after it was thrown and even the amount of time the ball was in the air. Using RStudio we were able to create functions, assign variables, and calculate angles to formulate stats helping us decided what players are quick to the ball after the pass arrived.  

    2. Body Those that get a name for being considered the best safeties of all time. Often all have one thing in common when people start to rank them. They were constantly around the ball, whether it be the passing game or even stuffing the run. With the help of data analytics, we can now objectively see who those impactful players are. Data analytics is important to football because it allows us to get a deeper understanding of a player’s value on the field. It is important to understand how beneficial data analytics is to the NFL. For example, knowing what defenders are quickest to the ball, particularly when it comes to zone coverage. In the NFL coaches, scouts, and even general managers make tough decisions every day. The ability to tend to the needs of their players and the satisfaction of fans. All while keeping cap space and outside revenue into consideration. With data analytics, we can help make those coaches, scouts, and general managers make player personnel decisions easier. One of the hardest decisions a team can make is deciding whether to invest in a certain player. From the moment that player is drafted the organization is taking a calculated gamble on their team’s future. We created stats called “distance ball arrival to player thrown”, the distance that the ball arrives at the player after the football is thrown. And “distance ball arrival to player arrival”, the distance that the ball arrives at the player after the football arrives at its intended target. Finally, “angle to ball” the angle the player took from the football. When evaluating a player’s future with these statistics you can make the case on whether they deserve that next payday or not. And help make personal decisions on the future of the said player. The NFL is one of the best marketing companies when it comes to marketing their players. With teams deciding on who should represent their team next for years to come. In 2019, writer for “TheSpun” Andrew McCarty listed the NFL’s top ten most marketable players. Nine of them were offensive players and only one was a defensive player. Of those nine offensive players, six of them were quarterbacks. With the help of analytics, we can help make a case for the NFL to brand more defensive players. For examp...

  20. P

    Coastal population (1, 5 and 10km from coast)

    • pacificdata.org
    • pacific-data.sprep.org
    csv
    Updated Nov 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SPC (2023). Coastal population (1, 5 and 10km from coast) [Dataset]. https://pacificdata.org/data/dataset/coastal-population-1-5-and-10km-from-coast-df-pop-coast
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 13, 2023
    Dataset provided by
    SPC
    Time period covered
    Jan 1, 2010 - Dec 31, 2021
    Description

    Proportion of population living in 1, 5 and 10km buffer zones for Pacific Island Countries and Territories, determined using most recent Population and Housing Census. Number of people living in 1,5 and 10km buffer zones determined by apportioning population projections.

    Find more Pacific data on PDH.stat.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nidula Elgiriyewithana ⚡ (2023). Global Country Information Dataset 2023 [Dataset]. https://www.kaggle.com/datasets/nelgiriyewithana/countries-of-the-world-2023
Organization logo

Global Country Information Dataset 2023

A Comprehensive Dataset Empowering In-Depth Analysis and Cross-Country Insights

Explore at:
8 scholarly articles cite this dataset (View in Google Scholar)
zip(24063 bytes)Available download formats
Dataset updated
Jul 8, 2023
Authors
Nidula Elgiriyewithana ⚡
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Description

This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.

DOI

Key Features

  • Country: Name of the country.
  • Density (P/Km2): Population density measured in persons per square kilometer.
  • Abbreviation: Abbreviation or code representing the country.
  • Agricultural Land (%): Percentage of land area used for agricultural purposes.
  • Land Area (Km2): Total land area of the country in square kilometers.
  • Armed Forces Size: Size of the armed forces in the country.
  • Birth Rate: Number of births per 1,000 population per year.
  • Calling Code: International calling code for the country.
  • Capital/Major City: Name of the capital or major city.
  • CO2 Emissions: Carbon dioxide emissions in tons.
  • CPI: Consumer Price Index, a measure of inflation and purchasing power.
  • CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.
  • Currency_Code: Currency code used in the country.
  • Fertility Rate: Average number of children born to a woman during her lifetime.
  • Forested Area (%): Percentage of land area covered by forests.
  • Gasoline_Price: Price of gasoline per liter in local currency.
  • GDP: Gross Domestic Product, the total value of goods and services produced in the country.
  • Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.
  • Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.
  • Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.
  • Largest City: Name of the country's largest city.
  • Life Expectancy: Average number of years a newborn is expected to live.
  • Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.
  • Minimum Wage: Minimum wage level in local currency.
  • Official Language: Official language(s) spoken in the country.
  • Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.
  • Physicians per Thousand: Number of physicians per thousand people.
  • Population: Total population of the country.
  • Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.
  • Tax Revenue (%): Tax revenue as a percentage of GDP.
  • Total Tax Rate: Overall tax burden as a percentage of commercial profits.
  • Unemployment Rate: Percentage of the labor force that is unemployed.
  • Urban Population: Percentage of the population living in urban areas.
  • Latitude: Latitude coordinate of the country's location.
  • Longitude: Longitude coordinate of the country's location.

Potential Use Cases

  • Analyze population density and land area to study spatial distribution patterns.
  • Investigate the relationship between agricultural land and food security.
  • Examine carbon dioxide emissions and their impact on climate change.
  • Explore correlations between economic indicators such as GDP and various socio-economic factors.
  • Investigate educational enrollment rates and their implications for human capital development.
  • Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.
  • Study labor market dynamics through indicators such as labor force participation and unemployment rates.
  • Investigate the role of taxation and its impact on economic development.
  • Explore urbanization trends and their social and environmental consequences.

Data Source: This dataset was compiled from multiple data sources

If this was helpful, a vote is appreciated ❤️ Thank you 🙂

Search
Clear search
Close search
Google apps
Main menu