Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
About Dataset UPDATE: Source code used for collecting this data released here
Context YouTube (the world-famous video sharing website) maintains a list of the top trending videos on the platform. According to Variety magazine, “To determine the year’s top-trending videos, YouTube uses a combination of factors including measuring users interactions (number of views, shares, comments and likes). Note that they’re not the most-viewed videos overall for the calendar year”. Top performers on the YouTube trending list are music videos (such as the famously virile “Gangam Style”), celebrity and/or reality TV performances, and the random dude-with-a-camera viral videos that YouTube is well-known for.
This dataset is a daily record of the top trending YouTube videos.
Note that this dataset is a structurally improved version of this dataset.
Content This dataset includes several months (and counting) of data on daily trending YouTube videos. Data is included for the US, GB, DE, CA, and FR regions (USA, Great Britain, Germany, Canada, and France, respectively), with up to 200 listed trending videos per day.
EDIT: Now includes data from RU, MX, KR, JP and IN regions (Russia, Mexico, South Korea, Japan and India respectively) over the same time period.
Each region’s data is in a separate file. Data includes the video title, channel title, publish time, tags, views, likes and dislikes, description, and comment count.
The data also includes a category_id field, which varies between regions. To retrieve the categories for a specific video, find it in the associated JSON. One such file is included for each of the five regions in the dataset.
For more information on specific columns in the dataset refer to the column metadata.
Acknowledgements This dataset was collected using the YouTube API.
Inspiration Possible uses for this dataset could include:
Sentiment analysis in a variety of forms Categorising YouTube videos based on their comments and statistics. Training ML algorithms like RNNs to generate their own YouTube comments. Analysing what factors affect how popular a YouTube video will be. Statistical analysis over time. For further inspiration, see the kernels on this dataset!
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was from tekkum and the original file was in xlsx format.
While numerous studies have explored the factors influencing life expectancy, most have focused on demographic variables, economic indicators, and mortality rates. However, there has been limited examination of the impact of immunization coverage, health expenditures, and educational attainment on life expectancy. This study seeks to address these gaps by developing a comprehensive dataset with no missing values analyses, utilizing data from many years across 193 different countries. Key immunizations such as Hepatitis B, Polio, and Diphtheria, along with factors like GDP, schooling, and health expenditure, are included in this dataset. This approach aims to identify the most significant predictors of life expectancy, allowing countries to prioritize interventions that could most effectively improve the health and longevity of their populations.
The success of this analysis relies heavily on the accuracy and completeness of the data. The dataset used in this project has been sourced from the Global Health Observatory (GHO) data repository of the World Health Organization (WHO), which tracks health metrics and related factors for countries worldwide. The corresponding economic data was obtained from the United Nations. From the broad range of health-related variables available, this study focuses on those that are most representative and critical to understanding life expectancy. The dataset includes data for 193 countries and has been meticulously merged into a single file containing 22 columns and 2,938 rows, representing 20 predictive variables. The variables were categorized into four main groups: Immunization-related factors, Mortality factors, Economic factors, and Social factors. Countries with a lot of missing values were excluded, and some values were generated by Bayesian Ridge.
This dataset aims to answer the following key questions:
Do the selected predictive factors significantly impact life expectancy, and which variables are the most influential?
Should countries with a lower life expectancy (below 65 years) increase healthcare expenditure to improve their population's lifespan?
How do infant and adult mortality rates influence life expectancy across different regions?
What is the relationship between life expectancy and lifestyle factors such as alcohol consumption?
How does educational attainment, as measured by years of schooling, affect human lifespan?
Is there a positive or negative correlation between alcohol consumption and life expectancy?
What is the impact of immunization coverage on life expectancy, particularly regarding diseases like Hepatitis B, Polio, and Diphtheria?
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This map shows the percentage of high school drop outs by county. Counties are shaded based on quartile distribution. The lighter shaded counties have lower percentages of high school drop outs. The darker shaded counties have higher percentages of high school drop outs. New York State Community Health Indicator Reports (CHIRS) were developed in 2012, and are updated annually to consolidate and improve data linkages for the health indicators included in the County Health Assessment Indicators (CHAI) for all communities in New York. The CHIRS present data for more than 300 health indicators that are organized by 15 different health topics. Data if provided for all 62 New York State counties, 11 regions (including New York City), the State excluding New York City, and New York State. For more information, check out: http://www.health.ny.gov/statistics/chac/indicators/. The "About" tab contains additional details concerning this dataset.
Facebook
TwitterThis is one of three datasets related to the Prevention Agenda Tracking Indicators county level data posted on this site. Each dataset consists of county level data for 68 health tracking indicators and sub-indicators for the Prevention Agenda 2013-2017: New York State’s Health Improvement Plan. A health tracking indicator is a metric through which progress on a certain area of health improvement can be assessed. The indicators are organized by the Priority Area of the Prevention Agenda as well as the Focus Area under each Priority Area. Each dataset includes tracking indicators for the five Priority Areas of the Prevention Agenda 2013-2017. The latest data dataset includes the most recent county level data for all indicators. The trend dataset includes the most recent county level data and historical data, where available. Each dataset also includes the Prevention Agenda 2017 state targets for the indicators. Sub-indicators are included in these datasets to measure health disparities among socioeconomic groups. For more information, check out: http://www.health.ny.gov/prevention/prevention_agenda/2013-2017/ and https://www.health.ny.gov/PreventionAgendaDashboard, or go to the “About” tab.
Facebook
TwitterWARNING: This is a pre-release dataset and its fields names and data structures are subject to change. It should be considered pre-release until the end of March 2025. The schema changed in February 2025 - please see below. We will post a roadmap of upcoming changes, but service URLs and schema are now stable. For deployment status of new services in February 2025, see https://gis.data.ca.gov/pages/city-and-county-boundary-data-status. Additional roadmap and status links at the bottom of this metadata.This dataset is continuously updated as the source data from CDTFA is updated, as often as many times a month. If you require unchanging point-in-time data, export a copy for your own use rather than using the service directly in your applications.PurposeCounty boundaries along with third party identifiers used to join in external data. Boundaries are from the California Department of Tax and Fee Administration (CDTFA). These boundaries are the best available statewide data source in that CDTFA receives changes in incorporation and boundary lines from the Board of Equalization, who receives them from local jurisdictions for tax purposes. Boundary accuracy is not guaranteed, and though CDTFA works to align boundaries based on historical records and local changes, errors will exist. If you require a legal assessment of boundary location, contact a licensed surveyor.This dataset joins in multiple attributes and identifiers from the US Census Bureau and Board on Geographic Names to facilitate adding additional third party data sources. In addition, we attach attributes of our own to ease and reduce common processing needs and questions. Finally, coastal buffers are separated into separate polygons, leaving the land-based portions of jurisdictions and coastal buffers in adjacent polygons. This layer removes the coastal buffer polygons. This feature layer is for public use.Related LayersThis dataset is part of a grouping of many datasets:Cities: Only the city boundaries and attributes, without any unincorporated areasWith Coastal BuffersWithout Coastal BuffersCounties: Full county boundaries and attributes, including all cities within as a single polygonWith Coastal BuffersWithout Coastal Buffers (this dataset)Cities and Full Counties: A merge of the other two layers, so polygons overlap within city boundaries. Some customers require this behavior, so we provide it as a separate service.With Coastal BuffersWithout Coastal BuffersCity and County AbbreviationsUnincorporated Areas (Coming Soon)Census Designated PlacesCartographic CoastlinePolygonLine source (Coming Soon)Working with Coastal BuffersThe dataset you are currently viewing excludes the coastal buffers for cities and counties that have them in the source data from CDTFA. In the versions where they are included, they remain as a second polygon on cities or counties that have them, with all the same identifiers, and a value in the COASTAL field indicating if it"s an ocean or a bay buffer. If you wish to have a single polygon per jurisdiction that includes the coastal buffers, you can run a Dissolve on the version that has the coastal buffers on all the fields except OFFSHORE and AREA_SQMI to get a version with the correct identifiers.Point of ContactCalifornia Department of Technology, Office of Digital Services, odsdataservices@state.ca.govField and Abbreviation DefinitionsCDTFA_COUNTY: CDTFA county name. For counties, this will be the name of the polygon itself. For cities, it is the name of the county the city polygon is within.CDTFA_COPRI: county number followed by the 3-digit city primary number used in the Board of Equalization"s 6-digit tax rate area numbering system. The boundary data originate with CDTFA's teams managing tax rate information, so this field is preserved and flows into this dataset.CENSUS_GEOID: numeric geographic identifiers from the US Census BureauCENSUS_PLACE_TYPE: City, County, or Town, stripped off the census name for identification purpose.GNIS_PLACE_NAME: Board on Geographic Names authorized nomenclature for area names published in the Geographic Name Information SystemGNIS_ID: The numeric identifier from the Board on Geographic Names that can be used to join these boundaries to other datasets utilizing this identifier.CDT_COUNTY_ABBR: Abbreviations of county names - originally derived from CalTrans Division of Local Assistance and now managed by CDT. Abbreviations are 3 characters.CDT_NAME_SHORT: The name of the jurisdiction (city or county) with the word "City" or "County" stripped off the end. Some changes may come to how we process this value to make it more consistent.AREA_SQMI: The area of the administrative unit (city or county) in square miles, calculated in EPSG 3310 California Teale Albers.OFFSHORE: Indicates if the polygon is a coastal buffer. Null for land polygons. Additional values include "ocean" and "bay".PRIMARY_DOMAIN: Currently empty/null for all records. Placeholder field for official URL of the city or countyCENSUS_POPULATION: Currently null for all records. In the future, it will include the most recent US Census population estimate for the jurisdiction.GlobalID: While all of the layers we provide in this dataset include a GlobalID field with unique values, we do not recommend you make any use of it. The GlobalID field exists to support offline sync, but is not persistent, so data keyed to it will be orphaned at our next update. Use one of the other persistent identifiers, such as GNIS_ID or GEOID instead.Boundary AccuracyCounty boundaries were originally derived from a 1:24,000 accuracy dataset, with improvements made in some places to boundary alignments based on research into historical records and boundary changes as CDTFA learns of them. City boundary data are derived from pre-GIS tax maps, digitized at BOE and CDTFA, with adjustments made directly in GIS for new annexations, detachments, and corrections. Boundary accuracy within the dataset varies. While CDTFA strives to correctly include or exclude parcels from jurisdictions for accurate tax assessment, this dataset does not guarantee that a parcel is placed in the correct jurisdiction. When a parcel is in the correct jurisdiction, this dataset cannot guarantee accurate placement of boundary lines within or between parcels or rights of way. This dataset also provides no information on parcel boundaries. For exact jurisdictional or parcel boundary locations, please consult the county assessor's office and a licensed surveyor.CDTFA's data is used as the best available source because BOE and CDTFA receive information about changes in jurisdictions which otherwise need to be collected independently by an agency or company to compile into usable map boundaries. CDTFA maintains the best available statewide boundary information.CDTFA's source data notes the following about accuracy:City boundary changes and county boundary line adjustments filed with the Board of Equalization per Government Code 54900. This GIS layer contains the boundaries of the unincorporated county and incorporated cities within the state of California. The initial dataset was created in March of 2015 and was based on the State Board of Equalization tax rate area boundaries. As of April 1, 2024, the maintenance of this dataset is provided by the California Department of Tax and Fee Administration for the purpose of determining sales and use tax rates. The boundaries are continuously being revised to align with aerial imagery when areas of conflict are discovered between the original boundary provided by the California State Board of Equalization and the boundary made publicly available by local, state, and federal government. Some differences may occur between actual recorded boundaries and the boundaries used for sales and use tax purposes. The boundaries in this map are representations of taxing jurisdictions for the purpose of determining sales and use tax rates and should not be used to determine precise city or county boundary line locations. Boundary ProcessingThese data make a structural change from the source data. While the full boundaries provided by CDTFA include coastal buffers of varying sizes, many users need boundaries to end at the shoreline of the ocean or a bay. As a result, after examining existing city and county boundary layers, these datasets provide a coastline cut generally along the ocean facing coastline. For county boundaries in northern California, the cut runs near the Golden Gate Bridge, while for cities, we cut along the bay shoreline and into the edge of the Delta at the boundaries of Solano, Contra Costa, and Sacramento counties.In the services linked above, the versions that include the coastal buffers contain them as a second (or third) polygon for the city or county, with the value in the COASTAL field set to whether it"s a bay or ocean polygon. These can be processed back into a single polygon by dissolving on all the fields you wish to keep, since the attributes, other than the COASTAL field and geometry attributes (like areas) remain the same between the polygons for this purpose.SliversIn cases where a city or county"s boundary ends near a coastline, our coastline data may cross back and forth many times while roughly paralleling the jurisdiction"s boundary, resulting in many polygon slivers. We post-process the data to remove these slivers using a city/county boundary priority algorithm. That is, when the data run parallel to each other, we discard the coastline cut and keep the CDTFA-provided boundary, even if it extends into the ocean a small amount. This processing supports consistent boundaries for Fort Bragg, Point Arena, San Francisco, Pacifica, Half Moon Bay, and Capitola, in addition to others. More information on this algorithm will be provided soon.Coastline CaveatsSome cities have buffers extending into water bodies that we do not cut at the shoreline. These include South Lake Tahoe and Folsom, which extend into neighboring lakes, and
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Will all children be able to read by 2030? The ability to read with comprehension is a foundational skill that every education system around the world strives to impart by late in primary school—generally by age 10. Moreover, attaining the ambitious Sustainable Development Goals (SDGs) in education requires first achieving this basic building block, and so does improving countries’ Human Capital Index scores. Yet past evidence from many low- and middle-income countries has shown that many children are not learning to read with comprehension in primary school. To understand the global picture better, we have worked with the UNESCO Institute for Statistics (UIS) to assemble a new dataset with the most comprehensive measures of this foundational skill yet developed, by linking together data from credible cross-national and national assessments of reading. This dataset covers 115 countries, accounting for 81% of children worldwide and 79% of children in low- and middle-income countries. The new data allow us to estimate the reading proficiency of late-primary-age children, and we also provide what are among the first estimates (and the most comprehensive, for low- and middle-income countries) of the historical rate of progress in improving reading proficiency globally (for the 2000-17 period). The results show that 53% of all children in low- and middle-income countries cannot read age-appropriate material by age 10, and that at current rates of improvement, this “learning poverty” rate will have fallen only to 43% by 2030. Indeed, we find that the goal of all children reading by 2030 will be attainable only with historically unprecedented progress. The high rate of “learning poverty” and slow progress in low- and middle-income countries is an early warning that all the ambitious SDG targets in education (and likely of social progress) are at risk. Based on this evidence, we suggest a new medium-term target to guide the World Bank’s work in low- and middle- income countries: cut learning poverty by at least half by 2030. This target, together with improved measurement of learning, can be as an evidence-based tool to accelerate progress to get all children reading by age 10. For further details, please refer to https://thedocs.worldbank.org/en/doc/e52f55322528903b27f1b7e61238e416-0200022022/original/Learning-poverty-report-2022-06-21-final-V7-0-conferenceEdition.pdf
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: In these datasets, a person is defined as up to date if they have received at least one dose of an updated COVID-19 vaccine. The Centers for Disease Control and Prevention (CDC) recommends that certain groups, including adults ages 65 years and older, receive additional doses.
On 6/16/2023 CDPH replaced the booster measures with a new “Up to Date” measure based on CDC’s new recommendations, replacing the primary series, boosted, and bivalent booster metrics The definition of “primary series complete” has not changed and is based on previous recommendations that CDC has since simplified. A person cannot complete their primary series with a single dose of an updated vaccine. Whereas the booster measures were calculated using the eligible population as the denominator, the new up to date measure uses the total estimated population. Please note that the rates for some groups may change since the up to date measure is calculated differently than the previous booster and bivalent measures.
This data is from the same source as the Vaccine Progress Dashboard at https://covid19.ca.gov/vaccination-progress-data/ which summarizes vaccination data at the county level by county of residence. Where county of residence was not reported in a vaccination record, the county of provider that vaccinated the resident is included. This applies to less than 1% of vaccination records. The sum of county-level vaccinations does not equal statewide total vaccinations due to out-of-state residents vaccinated in California.
These data do not include doses administered by the following federal agencies who received vaccine allocated directly from CDC: Indian Health Service, Veterans Health Administration, Department of Defense, and the Federal Bureau of Prisons.
Totals for the Vaccine Progress Dashboard and this dataset may not match, as the Dashboard totals doses by Report Date and this dataset totals doses by Administration Date. Dose numbers may also change for a particular Administration Date as data is updated.
Previous updates:
On March 3, 2023, with the release of HPI 3.0 in 2022, the previous equity scores have been updated to reflect more recent community survey information. This change represents an improvement to the way CDPH monitors health equity by using the latest and most accurate community data available. The HPI uses a collection of data sources and indicators to calculate a measure of community conditions ranging from the most to the least healthy based on economic, housing, and environmental measures.
Starting on July 13, 2022, the denominator for calculating vaccine coverage has been changed from age 5+ to all ages to reflect new vaccine eligibility criteria. Previously the denominator was changed from age 16+ to age 12+ on May 18, 2021, then changed from age 12+ to age 5+ on November 10, 2021, to reflect previous changes in vaccine eligibility criteria. The previous datasets based on age 16+ and age 5+ denominators have been uploaded as archived tables.
Starting on May 29, 2021 the methodology for calculating on-hand inventory in the shipped/delivered/on-hand dataset has changed. Please see the accompanying data dictionary for details. In addition, this dataset is now down to the ZIP code level.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This chart shows the rate of hospitalizations for short- term complications of diabetes for the most recent data year by age range and county. It also shows the 2017 objective by age range. This chart is based on one of three datasets related to the Prevention Agenda Tracking Indicators county level data posted on this site. Each dataset consists of county level data for 68 health tracking indicators and sub-indicators for the Prevention Agenda 2013-2017: New York State’s Health Improvement Plan. A health tracking indicator is a metric through which progress on a certain area of health improvement can be assessed. The indicators are organized by the Priority Area of the Prevention Agenda as well as the Focus Area under each Priority Area. Each dataset includes tracking indicators for the five Priority Areas of the Prevention Agenda 2013-2017. The most recent year dataset includes the most recent county level data for all indicators. The trend dataset includes the most recent county level data and historical data, where available. Each dataset also includes the Prevention Agenda 2017 state targets for the indicators. Sub-indicators are included in these datasets to measure health disparities among socioeconomic groups. For more information, check out: http://www.health.ny.gov/prevention/prevention_agenda/2013-2017/ and https://www.health.ny.gov/PreventionAgendaDashboard. The "About" tab contains additional details concerning this dataset.
Facebook
TwitterThis report uses recent economic modelling to relate cognitive skills – as measured by PISA and other international instruments – to economic growth, demonstrating that relatively small improvements to labour force skills can largely impact the future well-being of a nation. The report also shows that it is the quality of learning outcomes, not the length of schooling, which makes the difference. A modest goal of all OECD countries boosting their average PISA scores by 25 points over the next 20 years would increase OECD gross domestic product by USD 115 trillion over the lifetime of the generation born in 2010. More aggressive goals could result in gains in the order of USD 260 trillion.
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset provides a comparative analysis of education and health indicators across top countries, including Poland, Finland, Italy, and the USA etc... The data covers a range of indicators related to education, such as literacy rates, enrollment rates, and education spending, as well as health indicators such as life expectancy, infant mortality rates, and healthcare spending. The data is sourced from various official and publicly available data sources, including the World Bank, the United Nations, and country-specific government websites. Researchers, analysts, and educators can use this dataset to gain insights into the education and health outcomes of different countries, as well as to identify areas for improvement and best practices. The dataset is ideal for cross-country comparative analysis and can be used to inform policy-making, research, and educational programs.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Note: The schema changed in February 2025 - please see below. We will post a roadmap of upcoming changes, but service URLs and schema are now stable. For deployment status of new services beginning in February 2025, see https://gis.data.ca.gov/pages/city-and-county-boundary-data-status. Additional roadmap and status links at the bottom of this metadata.This dataset is regularly updated as the source data from CDTFA is updated, as often as many times a month. If you require unchanging point-in-time data, export a copy for your own use rather than using the service directly in your applications. PurposeCounty boundaries along with third party identifiers used to join in external data. Boundaries are from the California Department of Tax and Fee Administration (CDTFA). These boundaries are the best available statewide data source in that CDTFA receives changes in incorporation and boundary lines from the Board of Equalization, who receives them from local jurisdictions for tax purposes. Boundary accuracy is not guaranteed, and though CDTFA works to align boundaries based on historical records and local changes, errors will exist. If you require a legal assessment of boundary location, contact a licensed surveyor.This dataset joins in multiple attributes and identifiers from the US Census Bureau and Board on Geographic Names to facilitate adding additional third party data sources. In addition, we attach attributes of our own to ease and reduce common processing needs and questions. Finally, coastal buffers are separated into separate polygons, leaving the land-based portions of jurisdictions and coastal buffers in adjacent polygons. This feature layer is for public use. Related LayersThis dataset is part of a grouping of many datasets:Cities: Only the city boundaries and attributes, without any unincorporated areasWith Coastal BuffersWithout Coastal BuffersCounties: Full county boundaries and attributes, including all cities within as a single polygonWith Coastal Buffers (this dataset)Without Coastal BuffersCities and Full Counties: A merge of the other two layers, so polygons overlap within city boundaries. Some customers require this behavior, so we provide it as a separate service.With Coastal BuffersWithout Coastal BuffersCity and County AbbreviationsUnincorporated Areas (Coming Soon)Census Designated PlacesCartographic CoastlinePolygonLine source (Coming Soon)State BoundaryWith Bay CutsWithout Bay Cuts Working with Coastal Buffers The dataset you are currently viewing includes the coastal buffers for cities and counties that have them in the source data from CDTFA. In the versions where they are included, they remain as a second polygon on cities or counties that have them, with all the same identifiers, and a value in the COASTAL field indicating if it"s an ocean or a bay buffer. If you wish to have a single polygon per jurisdiction that includes the coastal buffers, you can run a Dissolve on the version that has the coastal buffers on all the fields except OFFSHORE and AREA_SQMI to get a version with the correct identifiers. Point of ContactCalifornia Department of Technology, Office of Digital Services, gis@state.ca.gov Field and Abbreviation DefinitionsCDTFA_COUNTY: CDTFA county name. For counties, this will be the name of the polygon itself. For cities, it is the name of the county the city polygon is within.CDTFA_COPRI: county number followed by the 3-digit city primary number used in the Board of Equalization"s 6-digit tax rate area numbering system. The boundary data originate with CDTFA's teams managing tax rate information, so this field is preserved and flows into this dataset.CENSUS_GEOID: numeric geographic identifiers from the US Census BureauCENSUS_PLACE_TYPE: City, County, or Town, stripped off the census name for identification purpose.GNIS_PLACE_NAME: Board on Geographic Names authorized nomenclature for area names published in the Geographic Name Information SystemGNIS_ID: The numeric identifier from the Board on Geographic Names that can be used to join these boundaries to other datasets utilizing this identifier.CDT_COUNTY_ABBR: Abbreviations of county names - originally derived from CalTrans Division of Local Assistance and now managed by CDT. Abbreviations are 3 characters.CDT_NAME_SHORT: The name of the jurisdiction (city or county) with the word "City" or "County" stripped off the end. Some changes may come to how we process this value to make it more consistent.AREA_SQMI: The area of the administrative unit (city or county) in square miles, calculated in EPSG 3310 California Teale Albers.OFFSHORE: Indicates if the polygon is a coastal buffer. Null for land polygons. Additional values include "ocean" and "bay".PRIMARY_DOMAIN: Currently empty/null for all records. Placeholder field for official URL of the city or countyCENSUS_POPULATION: Currently null for all records. In the future, it will include the most recent US Census population estimate for the jurisdiction.GlobalID: While all of the layers we provide in this dataset include a GlobalID field with unique values, we do not recommend you make any use of it. The GlobalID field exists to support offline sync, but is not persistent, so data keyed to it will be orphaned at our next update. Use one of the other persistent identifiers, such as GNIS_ID or GEOID instead. Boundary AccuracyCounty boundaries were originally derived from a 1:24,000 accuracy dataset, with improvements made in some places to boundary alignments based on research into historical records and boundary changes as CDTFA learns of them. City boundary data are derived from pre-GIS tax maps, digitized at BOE and CDTFA, with adjustments made directly in GIS for new annexations, detachments, and corrections.Boundary accuracy within the dataset varies. While CDTFA strives to correctly include or exclude parcels from jurisdictions for accurate tax assessment, this dataset does not guarantee that a parcel is placed in the correct jurisdiction. When a parcel is in the correct jurisdiction, this dataset cannot guarantee accurate placement of boundary lines within or between parcels or rights of way. This dataset also provides no information on parcel boundaries. For exact jurisdictional or parcel boundary locations, please consult the county assessor's office and a licensed surveyor. CDTFA's data is used as the best available source because BOE and CDTFA receive information about changes in jurisdictions which otherwise need to be collected independently by an agency or company to compile into usable map boundaries. CDTFA maintains the best available statewide boundary information. CDTFA's source data notes the following about accuracy: City boundary changes and county boundary line adjustments filed with the Board of Equalization per Government Code 54900. This GIS layer contains the boundaries of the unincorporated county and incorporated cities within the state of California. The initial dataset was created in March of 2015 and was based on the State Board of Equalization tax rate area boundaries. As of April 1, 2024, the maintenance of this dataset is provided by the California Department of Tax and Fee Administration for the purpose of determining sales and use tax rates. The boundaries are continuously being revised to align with aerial imagery when areas of conflict are discovered between the original boundary provided by the California State Board of Equalization and the boundary made publicly available by local, state, and federal government. Some differences may occur between actual recorded boundaries and the boundaries used for sales and use tax purposes. The boundaries in this map are representations of taxing jurisdictions for the purpose of determining sales and use tax rates and should not be used to determine precise city or county boundary line locations. Boundary ProcessingThese data make a structural change from the source data. While the full boundaries provided by CDTFA include coastal buffers of varying sizes, many users need boundaries to end at the shoreline of the ocean or a bay. As a result, after examining existing city and county boundary layers, these datasets provide a coastline cut generally along the ocean facing coastline. For county boundaries in northern California, the cut runs near the Golden Gate Bridge, while for cities, we cut along the bay shoreline and into the edge of the Delta at the boundaries of Solano, Contra Costa, and Sacramento counties. In the services linked above, the versions that include the coastal buffers contain them as a second (or third) polygon for the city or county, with the value in the COASTAL field set to whether it"s a bay or ocean polygon. These can be processed back into a single polygon by dissolving on all the fields you wish to keep, since the attributes, other than the COASTAL field and geometry attributes (like areas) remain the same between the polygons for this purpose. SliversIn cases where a city or county"s boundary ends near a coastline, our coastline data may cross back and forth many times while roughly paralleling the jurisdiction"s boundary, resulting in many polygon slivers. We post-process the data to remove these slivers using a city/county boundary priority algorithm. That is, when the data run parallel to each other, we discard the coastline cut and keep the CDTFA-provided boundary, even if it extends into the ocean a small amount. This processing supports consistent boundaries for Fort Bragg, Point Arena, San Francisco, Pacifica, Half Moon Bay, and Capitola, in addition to others. More information on this algorithm will be provided soon. Coastline CaveatsSome cities have buffers extending into water bodies that we do not cut at the shoreline. These include South Lake Tahoe and Folsom, which extend into neighboring lakes, and San Diego and surrounding cities that extend into San Diego Bay, which our shoreline encloses. If you have feedback on the exclusion of these
Facebook
Twitterhttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets
Title: Mortality Rate (Under-5, Per 1000 Live Births)
Subtitle: Exploring global trends in child survival and health advancements.
Detailed Description:
This dataset contains the under-5 mortality rate, measured as the number of deaths per 1,000 live births for children under five years of age. Sourced from the World Bank, it highlights progress in child survival and health outcomes globally over decades.
Key Highlights: - Annual data for countries worldwide. - Metric: Mortality rate (under-5, per 1000 live births). - Use cases: Analyze trends, compare regional disparities, and correlate mortality rates with health and economic indicators.
Data Cleaning:
Visualizations:
Descriptive Analysis:
Create a Kaggle notebook with: 1. Data Cleaning: Show how missing or inconsistent values are handled. 2. EDA: Include visualizations like heatmaps, scatterplots, and line charts. 3. Insights: Highlight significant findings, such as countries with notable improvements in child survival. 4. Optional Predictive Modeling: Use regression or time-series models to project future trends.
GitHub Link: https://github.com/yourusername/Under5_Mortality_Trends
Kaggle Link: https://www.kaggle.com/datasets/yourusername/under5-mortality-rate
Post Title:
📉 Global Trends in Under-5 Mortality Rates 🌍
Post Body:
I’m excited to share my latest dataset on under-5 mortality rates (per 1,000 live births), sourced from the World Bank. This dataset highlights progress in global health and child survival, spanning decades and covering countries worldwide.
📂 Explore the Dataset:
- GitHub Repository: https://github.com/yourusername/Under5_Mortality_Trends
- Kaggle Dataset: https://www.kaggle.com/datasets/yourusername/under5-mortality-rate
Child survival is a fundamental measure of global health progress. This dataset is ideal for:
- Trend Analysis: Explore how under-5 mortality rates have evolved globally.
- Regional Comparisons: Identify disparities in child survival rates across regions.
- Correlations: Study the relationship between mortality rates and economic indicators like healthcare expenditure or GDP per capita.
📈 Get Involved:
- Use the dataset for your own analyses and visualizations.
- Share your insights and findings.
- Upvote the Kaggle dataset to help others discover it!
❓ What trends or correlations do you find in the data?
- Which country or region has shown the most improvement?
- What factors would you analyze further?
Let me know your thoughts, and feel free to share this resource with others who might benefit! 🌟
Let me know if you'd like assistance with EDA or visualization templates!
Facebook
TwitterThis is one of three datasets related to the Prevention Agenda Tracking Indicators county level data posted on this site. Each dataset consists of county level data for 68 health tracking indicators and sub-indicators for the Prevention Agenda 2013-2017: New York State’s Health Improvement Plan. A health tracking indicator is a metric through which progress on a certain area of health improvement can be assessed. The indicators are organized by the Priority Area of the Prevention Agenda as well as the Focus Area under each Priority Area. Each dataset includes tracking indicators for the five Priority Areas of the Prevention Agenda 2013-2017. The most recent year dataset includes the most recent county level data for all indicators. The trend dataset includes the most recent county level data and historical data, where available. Each dataset also includes the Prevention Agenda 2017 state targets for the indicators. Sub-indicators are included in these datasets to measure health disparities among socioeconomic groups. For more information, check out: http://www.health.ny.gov/prevention/prevention_agenda/2013-2017/ and https://www.health.ny.gov/PreventionAgendaDashboard, or go to the “About” tab.
Facebook
TwitterHigh-resolution crop maps over large spatial extents are fundamental to many agricultural applications; however, generating high-quality crop maps consistently across space and time remains a challenge. In this study, we improved a workflow for operational crop mapping and developed the first openly available, annual, 10-m spatial resolution maize and soybean maps over the Contiguous United States (CONUS) from 2019 to 2022. We obtained all available Sentinel-2 surface reflectance data between May and October for every year, applied quality assurance, corrected the bidirectional reflectance distribution function (BRDF) effects, and generated 10-day analysis ready data (ARD) composites. We then derived multi-temporal metrics from the 10-day ARD as training features for the national-scale wall-to-wall mapping. We implemented a stratified, two-stage cluster sampling, and then conducted annual field surveys and collected ground data. Utilizing the training data with Sentinel-2 multi-temporal metrics, we trained random forest models generalized for annual maize and soybean classification separately. Validated using field data from the two-stage cluster sample, our annual maps achieved consistent overall accuracies (OA) greater than 95% with standard errors of less than 1%. User’s accuracies (UAs) and producer’s accuracies (PAs) for maize were higher than 91% and 84% across the years, and UAs and PAs for soybean were greater than 88% and 82%, respectively. To illustrate the substantial improvement of the 10-m map over existing datasets, e.g., the 30-m Cropland Data Layer (CDL), we aggregated the 10-m maps to 30-m spatial resolution and quantified the amount of 30-m mixed pixels that can be reduced at field, regional, and national levels. The counties with the most maize and soybean production in Iowa, Illinois and Nebraska had the lowest reduction in mixed pixels, ranging from 1% to 10%, whereas southern counties had a higher reduction in mixed pixels. Overall, the median percentages of mixed maize and soybean pixels across all counties were 14% and 16%, respectively, illustrating the substantial benefits of 10-m maps over 30-m maps. With more Sentinel-2-like data available from continuous observations and incoming satellite missions, we anticipate that 10-m crop maps will greatly benefit long-term monitoring for agricultural practices from the field to global scales.
Facebook
TwitterBackgroundTimely linkage to care (LTC) is key in the HIV care continuum, as it enables people newly diagnosed with HIV (PNWH) to benefit from HIV treatment at the earliest stage. Previous studies have found LTC disparities by individual factors, but data are limited beyond the individual level, especially at the county level. This study examined the temporal and geographic variations of county-level LTC status across 46 counties in South Carolina (SC) from 2010 to 2018 and the association of county-level characteristics with LTC status.MethodsAll adults newly diagnosed with HIV from 2010 to 2018 in SC were included in this study. County-level LTC status was defined as 1 = “high LTC (≥ yearly national LTC percentage)” and 0 = “low LTC (< yearly national LTC percentage)”. A generalized estimating equation model with stepwise selection was employed to examine the relationship between 29 county-level characteristics and LTC status.ResultsThe number of counties with high LTC in SC decreased from 34 to 21 from 2010 to 2018. In the generalized estimating equation model, six out of 29 factors were significantly associated with LTC status. Counties with a higher percentage of males (OR = 0.07, 95%CI: 0.02~0.29) and persons with at least four years of college (OR = 0.07, 95%CI: 0.02~0.34) were less likely to have high LTC. However, counties with more mental health centers per PNWH (OR = 45.09, 95%CI: 6.81~298.55) were more likely to have high LTC.ConclusionsFactors associated with demographic characteristics and healthcare resources contributed to the variations of LTC status at the county level. Interventions targeting increasing the accessibility to mental health facilities could help improve LTC.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This chart shows the trend in percentage of preterm births for Albany county. It also shows the 2024 objective. To view the chart for a different county, create a new chart under the "Visualize" tab. This chart is based on the Prevention Agenda Tracking Indicators county level trend data set posted on this site. Each dataset consists of county level data for 68 health tracking indicators and sub-indicators for the Prevention Agenda 2019-2024: New York State’s Health Improvement Plan. A health tracking indicator is a metric through which progress on a certain area of health improvement can be assessed. The indicators are organized by the Priority Area of the Prevention Agenda as well as the Focus Area under each Priority Area. Each dataset includes tracking indicators for the five Priority Areas of the Prevention Agenda 2013-2018. The most recent year dataset includes the most recent county level data for all indicators. The trend dataset includes the most recent county level data and historical data, where available. Each dataset also includes the Prevention Agenda 2018 state targets for the indicators. Sub-indicators are included in these datasets to measure health disparities among socioeconomic groups. For more information, check out: https://www.health.ny.gov/prevention/prevention_agenda/. The "About" tab contains additional details concerning this dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Machine learning (ML) and deep learning (DL) models are being increasingly employed for medical imagery analyses, with both approaches used to enhance the accuracy of classification/prediction in the diagnoses of various cancers, tumors and bloodborne diseases. To date however, no review of these techniques and their application(s) within the domain of white blood cell (WBC) classification in blood smear images has been undertaken, representing a notable knowledge gap with respect to model selection and comparison. Accordingly, the current study sought to comprehensively identify, explore and contrast ML and DL methods for classifying WBCs. Following development and implementation of a formalized review protocol, a cohort of 136 primary studies published between January 2006 and May 2023 were identified from the global literature, with the most widely used techniques and best-performing WBC classification methods subsequently ascertained. Studies derived from 26 countries, with highest numbers from high-income countries including the United States (n = 32) and The Netherlands (n = 26). While WBC classification was originally rooted in conventional ML, there has been a notable shift toward the use of DL, and particularly convolutional neural networks (CNN), with 54.4% of identified studies (n = 74) including the use of CNNs, and particularly in concurrence with larger datasets and bespoke features e.g., parallel data pre-processing, feature selection, and extraction. While some conventional ML models achieved up to 99% accuracy, accuracy was shown to decrease in concurrence with decreasing dataset size. Deep learning models exhibited improved performance for more extensive datasets and exhibited higher levels of accuracy in concurrence with increasingly large datasets. Availability of appropriate datasets remains a primary challenge, potentially resolvable using data augmentation techniques. Moreover, medical training of computer science researchers is recommended to improve current understanding of leucocyte structure and subsequent selection of appropriate classification models. Likewise, it is critical that future health professionals be made aware of the power, efficacy, precision and applicability of computer science, soft computing and artificial intelligence contributions to medicine, and particularly in areas like medical imaging.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Water is vital for life and local water pollution can damage the environment and affect human health. Governments and private institutions monitor and regulate water quality to protect the environment and populations. The consequences of pollution can reach far and wide, costing companies significant amounts in cleanup costs and loss of reputation. Most countries have official accredited laboratories and sampling teams that use varied technology, global expertise and local knowledge to provide water quality monitoring for different types of water and different and varied sampling locations. However, one of the main problems associated with monitoring and assessing water quality and meeting minimum standards of potability or usability is the analysis of samples based on local data. The problem lies in the fact that in many cases the data, due to the methodology or technique used or the expertise of the human resource that handles the samples, ends up configured in sets that have a large amount of missing information or data without information. This implies a problem depending on the analysis to be carried out. If you want to estimate a water quality index based on the samples, then you may have biased calculations due to the loss of information.
This dataset has been used for the generation of the manuscript: Efficient improvement for water quality analysis with large amount of missing data. D. Sierra-Porta,M. Tobón-Ospino. This manuscript is being submitted to Sustainable Production and Consumption (2022 Elsevier), Publication of the Institution of Chemical Engineers.
Facebook
TwitterThe United States Census Bureau’s International Dataset provides estimates of country populations since 1950 and projections through 2050. Specifically, the data set includes midyear population figures broken down by age and gender assignment at birth. Additionally, they provide time-series data for attributes including fertility rates, birth rates, death rates, and migration rates.
The full documentation is available here. For basic field details, please see the data dictionary.
Note: The U.S. Census Bureau provides estimates and projections for countries and areas that are recognized by the U.S. Department of State that have a population of at least 5,000.
This dataset was created by the United States Census Bureau.
Which countries have made the largest improvements in life expectancy? Based on current trends, how long will it take each country to catch up to today’s best performers?
You can use Kernels to analyze, share, and discuss this data on Kaggle, but if you’re looking for real-time updates and bigger data, check out the data on BigQuery, too: https://cloud.google.com/bigquery/public-data/international-census.
Facebook
TwitterThe map shows the incidence rate of confirmed high blood lead levels per 1,000 tested children less than 72 months old. A high blood lead level is 10 micrograms or higher per deciliter, Counties are shaded based on quartile distribution. The lighter shaded counties have a lower incidence rate of high blood lead levels. The darker shaded counties have a higher incidence rate of high blood levels. New York State Community Health Indicator Reports (CHIRS) were developed in 2012, and are updated annually to consolidate and improve data linkages for the health indicators included in the County Health Assessment Indicators (CHAI) for all communities in New York. The CHIRS present data for more than 300 health indicators that are organized by 15 different health topics. Data if provided for all 62 New York State counties, 11 regions (including New York City), the State excluding New York City, and New York State. For more information, check out: http://www.health.ny.gov/statistics/chac/indicators/. The "About" tab contains additional details concerning this dataset.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
About Dataset UPDATE: Source code used for collecting this data released here
Context YouTube (the world-famous video sharing website) maintains a list of the top trending videos on the platform. According to Variety magazine, “To determine the year’s top-trending videos, YouTube uses a combination of factors including measuring users interactions (number of views, shares, comments and likes). Note that they’re not the most-viewed videos overall for the calendar year”. Top performers on the YouTube trending list are music videos (such as the famously virile “Gangam Style”), celebrity and/or reality TV performances, and the random dude-with-a-camera viral videos that YouTube is well-known for.
This dataset is a daily record of the top trending YouTube videos.
Note that this dataset is a structurally improved version of this dataset.
Content This dataset includes several months (and counting) of data on daily trending YouTube videos. Data is included for the US, GB, DE, CA, and FR regions (USA, Great Britain, Germany, Canada, and France, respectively), with up to 200 listed trending videos per day.
EDIT: Now includes data from RU, MX, KR, JP and IN regions (Russia, Mexico, South Korea, Japan and India respectively) over the same time period.
Each region’s data is in a separate file. Data includes the video title, channel title, publish time, tags, views, likes and dislikes, description, and comment count.
The data also includes a category_id field, which varies between regions. To retrieve the categories for a specific video, find it in the associated JSON. One such file is included for each of the five regions in the dataset.
For more information on specific columns in the dataset refer to the column metadata.
Acknowledgements This dataset was collected using the YouTube API.
Inspiration Possible uses for this dataset could include:
Sentiment analysis in a variety of forms Categorising YouTube videos based on their comments and statistics. Training ML algorithms like RNNs to generate their own YouTube comments. Analysing what factors affect how popular a YouTube video will be. Statistical analysis over time. For further inspiration, see the kernels on this dataset!