25 datasets found
  1. O

    Equity Report Data: Geography

    • data.sandiegocounty.gov
    Updated May 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Various (2025). Equity Report Data: Geography [Dataset]. https://data.sandiegocounty.gov/dataset/Equity-Report-Data-Geography/p6uw-qxpv
    Explore at:
    application/geo+json, csv, kmz, kml, xlsx, xmlAvailable download formats
    Dataset updated
    May 21, 2025
    Dataset authored and provided by
    Various
    Description

    This dataset contains the geographic data used to create maps for the San Diego County Regional Equity Indicators Report led by the Office of Equity and Racial Justice (OERJ). The full report can be found here: https://data.sandiegocounty.gov/stories/s/7its-kgpt

    Demographic data from the report can be found here: https://data.sandiegocounty.gov/dataset/Equity-Report-Data-Demographics/q9ix-kfws

    Filter by the Indicator column to select data for a particular indicator map.

    Export notes: Dataset may not automatically open correctly in Excel due to geospatial data. To export the data for geospatial analysis, select Shapefile or GEOJSON as the file type. To view the data in Excel, export as a CSV but do not open the file. Then, open a blank Excel workbook, go to the Data tab, select “From Text/CSV,” and follow the prompts to import the CSV file into Excel. Alternatively, use the exploration options in "View Data" to hide the geographic column prior to exporting the data.

    USER NOTES: 4/7/2025 - The maps and data have been removed for the Health Professional Shortage Areas indicator due to inconsistencies with the data source leading to some missing health professional shortage areas. We are working to fix this issue, including exploring possible alternative data sources.

    5/21/2025 - The following changes were made to the 2023 report data (Equity Report Year = 2023). Self-Sufficiency Wage - a typo in the indicator name was fixed (changed sufficienct to sufficient) and the percent for one PUMA corrected from 56.9 to 59.9 (PUMA = San Diego County (Northwest)--Oceanside City & Camp Pendleton). Notes were made consistent for all rows where geography = ZCTA. A note was added to all rows where geography = PUMA. Voter registration - label "92054, 92051" was renamed to be in numerical order and is now "92051, 92054". Removed data from the percentile column because the categories are not true percentiles. Employment - Data was corrected to show the percent of the labor force that are employed (ages 16 and older). Previously, the data was the percent of the population 16 years and older that are in the labor force. 3- and 4-Year-Olds Enrolled in School - percents are now rounded to one decimal place. Poverty - the last two categories/percentiles changed because the 80th percentile cutoff was corrected by 0.01 and one ZCTA was reassigned to a different percentile as a result. Low Birthweight - the 33th percentile label was corrected to be written as the 33rd percentile. Life Expectancy - Corrected the category and percentile assignment for SRA CENTRAL SAN DIEGO. Parks and Community Spaces - corrected the category assignment for six SRAs.

    5/21/2025 - Data was uploaded for Equity Report Year 2025. The following changes were made relative to the 2023 report year. Adverse Childhood Experiences - added geographic data for 2025 report. No calculation of bins nor corresponding percentiles due to small number of geographic areas. Low Birthweight - no calculation of bins nor corresponding percentiles due to small number of geographic areas.

    Prepared by: Office of Evaluation, Performance, and Analytics and the Office of Equity and Racial Justice, County of San Diego, in collaboration with the San Diego Regional Policy & Innovation Center (https://www.sdrpic.org).

  2. e

    Waterworks — water supply system_reporting

    • data.europa.eu
    • gimi9.com
    unknown
    Updated Feb 7, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Waterworks — water supply system_reporting [Dataset]. https://data.europa.eu/88u/dataset/https-data-norge-no-node-1495
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Feb 7, 2022
    License

    http://spdx.org/licenses/NLOD-2.0http://spdx.org/licenses/NLOD-2.0

    Description

    The data sets provide an overview of selected data on waterworks registered with the Norwegian Food Safety Authority. The information has been reported by the waterworks through application processing or other reporting to the Norwegian Food Safety Authority. Drinking water regulations require, among other things, annual reporting. The Norwegian Food Safety Authority has created a separate form service for such reporting. The data sets include public or private waterworks that supply 50 people or more. In addition, all municipal owned businesses with their own water supply are included regardless of size. The data sets also contain decommissioned facilities. This is done for those who wish to view historical data, i.e. data for previous years or earlier. There are data sets for the following supervisory objects: 1. Water supply system. It also includes analysis of drinking water. 2. Transport system 3. Treatment facility 4. Entry point. It also includes analysis of the water source. Below you will find datasets for: 1. Water supply system_reporting In addition, there is a file (information.txt) that provides an overview of when the extracts were produced and how many lines there are in the individual files. The withdrawals are done weekly. Furthermore, for the data sets water supply system, transport system and intake point it is possible to see historical data on what is included in the annual reporting. To make use of that information, the file must be linked to the “moder” file. to get names and other static information. These files have the _reporting ending in the file name. Description of the data fields (i.e. metadata) in the individual data sets appears in separate files. These are available in pdf format. If you double-click the csv file and it opens directly in excel, then you will not get the æøå. To see the character set correctly in Excel, you must: & start Excel and a new spreadsheet & select data and then from text, press Import & select separator data and file origin 65001: Unicode (UTF-8) and tick of My Data have headings and press Next & remove tab as separator and select semicolon as separator, press next & otherwise, complete the data sets can be imported into a separate database and compiled as desired. There are link keys in the files that make it possible to link the files together. The waterworks are responsible for the quality of the datasets.

    Purpose: Make data for drinking water supply available to the public.

  3. Asthma ED Visit Rates by ZIP

    • kaggle.com
    Updated Jan 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Asthma ED Visit Rates by ZIP [Dataset]. https://www.kaggle.com/datasets/thedevastator/asthma-ed-visit-rates-by-zip
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 22, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Asthma ED Visit Rates by ZIP

    Counts and Rates by Age Group in California

    By Health [source]

    About this dataset

    This dataset presents a comprehensive look into the prevalence of asthma among Californian residents in terms of emergency department visits. Using age-adjusted rates and county FIPS codes, it offers an accurate snapshot of the prevalence rates per 10,000 people and provides key insights into how this condition affects certain age groups by ZIP Code. With its easy to use associated map view, this dataset allows users to quickly gain deeper knowledge about this important health issue and craft meaningful solutions to address it

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains counts and rates of asthma related emergency department visits by ZIP Code and age group in California. This data can be useful when doing research on asthma related trends or attempting to find correlations between environmental factors, prevalence of disease and geography.

    • Select a year for analysis - the latest year for which data is available is the default selection, but other years are also listed in the dropdown menu.
    • Select an Age Group to analyze - use the provided dropdown menus to select one or more age groups (all ages, 0-17, 18+) if you wish to analyze two different age groups in your analysis.
    • Define a geographical area by selecting a ZIP code or County Fips code from which you wish to obtain your dataset from based on its availability or importance in your research question .
    • View and download relevant data - after selecting all of the desired criteria (year,Age group(s), ZIP code/County FIPS Code) click “View Data” then “Download” at the bottom right corner of window that opens up
      5 Analyze information found - use software such as Microsoft Excel or open source programs like Openoffice Calc to gain insight into your downloaded dataset through statistics calculations, graphs etc.. In particular look out for anomalies that could signify further investigation

    Research Ideas

    • Identifying the geographic clusters of asthma sufferers by analyzing the rate of emergency department visits with geographic mapping.
    • Developing outreach initiatives to areas with a high rate of ED visits for asthma to provide education, interventions and resources designed towards increasing preventive care and reducing preventable complications due to lack of access or knowledge about available services in these communities.
    • Assessing disparities in ED visit rates for asthma between age groups as well as between urban and rural areas or different socio-economic groups within counties or ZIP codes in order to identify areas where there is a need for increased interventions, services and other resources related to asthma care in order to reduce the burden or severity of this chronic condition among particularly vulnerable population groups

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Open Database License (ODbL) v1.0 - You are free to: - Share - copy and redistribute the material in any medium or format. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices. - No Derivatives - If you remix, transform, or build upon the material, you may not distribute the modified material. - No additional restrictions - You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

    Columns

    File: Asthma_Emergency_Department_Visit_Rates_by_ZIP_Code.csv | Column name | Description | |:----------------------|:------------------------------------------------------------------------------------------------------------------| | Year | The year the data was collected. (Integer) | | ZIP code | The ZIP code of the area the data was collected from. (String...

  4. e

    Mayor Election 2014 Düsseldorf

    • data.europa.eu
    csv, json
    Updated May 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Düsseldorf (2025). Mayor Election 2014 Düsseldorf [Dataset]. https://data.europa.eu/data/datasets/851e793b-50ac-4e57-91fc-a10418b8bb56?locale=en
    Explore at:
    json, csv(33995), csv(272), csv(928), csv(5542), csv(510), csv(51497), csv(3583), csv(1575)Available download formats
    Dataset updated
    May 30, 2025
    Dataset authored and provided by
    Düsseldorf
    License

    http://dcat-ap.de/def/licenses/other-closedhttp://dcat-ap.de/def/licenses/other-closed

    Area covered
    Düsseldorf
    Description

    The data set contains the results of the mayor’s election on 25 May 2014 and the mayor’s key election on 15 June 2014 of the City of Düsseldorf.

    The local elections took place on 25 May 2014. Because no clear majority was reached, there was a runoff election of the mayor on 15 June 2014.

    An authority may set up different territorial levels to present the election results, from the lowest level (voting districts) to constituencies and districts to the level of the city or municipality, district and constituency. However, not all levels are necessary for each type of election. For each of the territorial levels that an authority has set up, there is a file containing the overview of those areas with fast messages already received.

    Further data sets contain information on the division of electoral areas for local elections and the division of voting districts.

    Information on terms in the field of ‘Elections’ can be found in the Election ABC of the interactive learning platform for election workers of the City of Düsseldorf.

    The files are encoded in UTF-8. By default, Excel does not display the umlauts in the files correctly. You can avoid this as follows:

    Excel 2003 Select from the menu ‘Data’ -> ‘Import external data’ from the menu item ‘Import data’. The ‘Select data source’ dialog opens. Select the file you want to open and press the ‘Open’ button. Then place the file origin to '65001 Unicode: (UTF-8)' fixed and continue with the ‘Next’ button. In the next dialog, set the separator to ‘Semicolon’ instead of ‘Tabstopp’ and continue with the ‘Next’ button again. They then select the ‘Text’ option as the data format of the columns and exit the wizard with the ‘Finish’ button. Use the ‘OK’ button to finish the procedure and the data is displayed UTF-8 encoded in Microsoft Excel.

    Excel 2010 From the tab ‘Data’ in the section ‘Retrieve external data’, select the option ‘From text’. The dialog ‘Import text file’ opens. Select the file you want to open and press the ‘Open’ button. Then place the file origin to '65001 Unicode: (UTF-8)' fixed and continue with the ‘Next’ button. In the next dialog, set the separator to ‘Semicolon’ instead of ‘Tabstopp’ and continue with the ‘Next’ button again. They then select the ‘Text’ option as the data format of the columns and exit the wizard with the ‘Finish’ button. Use the ‘OK’ button to finish the procedure and the data is displayed UTF-8 encoded in Microsoft Excel.

    The files contain the following column information:

    Number: Constituency number Name: Name of the constituency MaxQuick Messages: maximum number of quick messages AnzQuick Messages: Number of fast messages already recorded Eligible voters: Number of eligible voters Filed under: Number of ballot papers submitted Turnout: Voter turnouts at the respective view levels valid Voting List: Number of valid ballot papers valid: Number of valid votes cast invalid Voting List: Number of invalid ballot papers invalid: Number of invalid votes cast In addition, the following fields are available for each party (example of one party called ‘A Party’):

    A Party: Number of total votes of the party A-Party_Proz: Percentage of total votes of the party from the total result

  5. b

    Bulk Data Provider | Verified B2B & B2C Databases in Excel – India’s Trusted...

    • bulkdataprovider.com
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bulk data Provider (2025). Bulk Data Provider | Verified B2B & B2C Databases in Excel – India’s Trusted Source [Dataset]. https://bulkdataprovider.com/blog/articles/bulk-data-provider-verified-b2b-b2c-databases-in-excel-india-s-trusted-source/
    Explore at:
    Dataset updated
    Jul 4, 2025
    Dataset authored and provided by
    Bulk data Provider
    Description

    🚀 Bulk Data Provider – Your Trusted Source for Verified B2B & B2C Databases in IndiaMeta Description:Looking for a reliable bulk data provider? Get verified B2B and B2C databases for marketing, telecalling, and lead generation from India’s leading source—Bulk Data…

  6. C

    Verden Source LLC

    • data.cityofchicago.org
    Updated Dec 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2025). Verden Source LLC [Dataset]. https://data.cityofchicago.org/Community-Economic-Development/Verden-Source-LLC/qkv5-pk99
    Explore at:
    application/geo+json, csv, xlsx, kml, kmz, xmlAvailable download formats
    Dataset updated
    Dec 2, 2025
    Authors
    City of Chicago
    Description

    This dataset contains all current and active business licenses issued by the Department of Business Affairs and Consumer Protection. This dataset contains a large number of records /rows of data and may not be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Notepad or Wordpad, to view and search.

    Data fields requiring description are detailed below.

    APPLICATION TYPE: 'ISSUE' is the record associated with the initial license application. 'RENEW' is a subsequent renewal record. All renewal records are created with a term start date and term expiration date. 'C_LOC' is a change of location record. It means the business moved. 'C_CAPA' is a change of capacity record. Only a few license types my file this type of application. 'C_EXPA' only applies to businesses that have liquor licenses. It means the business location expanded.

    LICENSE STATUS: 'AAI' means the license was issued.

    Business license owners may be accessed at: http://data.cityofchicago.org/Community-Economic-Development/Business-Owners/ezma-pppn To identify the owner of a business, you will need the account number or legal name.

    Data Owner: Business Affairs and Consumer Protection

    Time Period: Current

    Frequency: Data is updated daily

  7. HelpSteer: AI Alignment Dataset

    • kaggle.com
    zip
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). HelpSteer: AI Alignment Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/helpsteer-ai-alignment-dataset
    Explore at:
    zip(16614333 bytes)Available download formats
    Dataset updated
    Nov 22, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    HelpSteer: AI Alignment Dataset

    Real-World Helpfulness Annotated for AI Alignment

    By Huggingface Hub [source]

    About this dataset

    HelpSteer is an Open-Source dataset designed to empower AI Alignment through the support of fair, team-oriented annotation. The dataset provides 37,120 samples each containing a prompt and response along with five human-annotated attributes ranging between 0 and 4; with higher results indicating better quality. Using cutting-edge methods in machine learning and natural language processing in combination with the annotation of data experts, HelpSteer strives to create a set of standardized values that can be used to measure alignment between human and machine interactions. With comprehensive datasets providing responses rated for correctness, coherence, complexity, helpfulness and verbosity, HelpSteer sets out to assist organizations in fostering reliable AI models which ensure more accurate results thereby leading towards improved user experience at all levels

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    How to Use HelpSteer: An Open-Source AI Alignment Dataset

    HelpSteer is an open-source dataset designed to help researchers create models with AI Alignment. The dataset consists of 37,120 different samples each containing a prompt, a response and five human-annotated attributes used to measure these responses. This guide will give you a step-by-step introduction on how to leverage HelpSteer for your own projects.

    Step 1 - Choosing the Data File

    Helpsteer contains two data files – one for training and one for validation. To start exploring the dataset, first select the file you would like to use by downloading both train.csv and validation.csv from the Kaggle page linked above or getting them from the Google Drive repository attached here: [link]. All the samples in each file consist of 7 columns with information about a single response: prompt (given), response (submitted), helpfulness, correctness, coherence, complexity and verbosity; all sporting values between 0 and 4 where higher means better in respective category.

    ## Step 2—Exploratory Data Analysis (EDA) Once you have your file loaded into your workspace or favorite software environment (e.g suggested libraries like Pandas/Numpy or even Microsoft Excel), it’s time explore it further by running some basic EDA commands that summarize each feature's distribution within our data set as well as note potential trends or points of interests throughout it - e.g what are some traits that are polarizing these responses more? Are there any outliers that might signal something interesting happening? Plotting these results often provides great insights into pattern recognition across datasets which can be used later on during modeling phase also known as “Feature Engineering”

    ## Step 3—Data Preprocessing After your interpretation of raw data while doing EDA should form some hypotheses around what features matter most when trying to estimate attribute scores of unknown responses accurately so proceeding with preprocessing such as cleaning up missing entries or handling outliers accordingly becomes highly recommended before starting any modelling efforts with this data set - kindly refer also back at Kaggle page description section if unsure about specific attributes domain ranges allowed values explicitly for extra confidence during this step because having correct numerical suggestions ready can make modelling workload lighter later on while building predictive models . It’s important not rushing over this stage otherwise poor results may occur later when aiming high accuracy too quickly upon model deployment due low quality

    Research Ideas

    • Designating and measuring conversational AI engagement goals: Researchers can utilize the HelpSteer dataset to design evaluation metrics for AI engagement systems.
    • Identifying conversational trends: By analyzing the annotations and data in HelpSteer, organizations can gain insights into what makes conversations more helpful, cohesive, complex or consistent across datasets or audiences.
    • Training Virtual Assistants: Train artificial intelligence algorithms on this dataset to develop virtual assistants that respond effectively to customer queries with helpful answers

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    **License: [CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication](https://creativecommons.org/pu...

  8. f

    Data from: S1 Dataset -

    • figshare.com
    xlsx
    Updated Apr 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Mahwera; Erick Killel; Ninael Jonas; Adam Hancy; Anna Zangira; Aika Lekey; Rose Msaki; Doris Katana; Rogath Kishimba; Debora Charwe; Fatma Abdallah; Geofrey Chiduo; Ray Masumo; Germana Leyna; Geofrey Mchau (2024). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0299025.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Apr 19, 2024
    Dataset provided by
    PLOS ONE
    Authors
    David Mahwera; Erick Killel; Ninael Jonas; Adam Hancy; Anna Zangira; Aika Lekey; Rose Msaki; Doris Katana; Rogath Kishimba; Debora Charwe; Fatma Abdallah; Geofrey Chiduo; Ray Masumo; Germana Leyna; Geofrey Mchau
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThe evaluation of surveillance systems has been recommended by the World Health Organization (WHO) to identify the performance and areas for improvement. Universal salt iodization (USI) as one of the surveillance systems in Tanzania needs periodic evaluation for its optimal function. This study aimed at evaluating the universal salt iodization (USI) surveillance system in Tanzania from January to December 2021 to find out if the system meets its intended objectives by evaluating its attributes as this was the first evaluation of the USI surveillance system since its establishment in 2010. The USI surveillance system is key for monitoring the performance towards the attainment of universal salt iodization (90%).MethodologyThis evaluation was guided by the Center for Disease Control Guidelines for Evaluating Public Health Surveillance Systems, (MMWR) to evaluate USI 2021 data. The study was conducted in Kigoma region in March 2022. Both Purposive and Convenient sampling was used to select the region, district, and ward for the study. The study involved reviewing documents used in the USI system and interviewing the key informants in the USI program. Data analysis was done by Microsoft Excel and presented in tables and graphs.ResultsA total of 1715 salt samples were collected in the year 2021 with 279 (16%) of non-iodized salt identified. The majority of the system attributes 66.7% had a good performance with a score of three, 22.2% had a moderate performance with a score of two and one attribute with poor performance with a score of one. Data quality, completeness and sensitivity were 100%, acceptability 91.6%, simplicity 83% were able to collect data on a single sample in < 2 minutes, the system stability in terms of performance was >75% and the usefulness of the system had poor performance.ConclusionAlthough the system attributes were found to be working overall well, for proper surveillance of the USI system, the core attributes need to be strengthened. Key variables that measure the system performance must be included from the primary data source and well-integrated with the Local Government (district and regions) to Ministry of Health information systems.

  9. d

    3.07 AZ Merit Data (summary)

    • catalog.data.gov
    • data-academy.tempe.gov
    • +13more
    Updated Jan 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2025). 3.07 AZ Merit Data (summary) [Dataset]. https://catalog.data.gov/dataset/3-07-az-merit-data-summary-55307
    Explore at:
    Dataset updated
    Jan 17, 2025
    Dataset provided by
    City of Tempe
    Description

    This page provides data for the 3rd Grade Reading Level Proficiency performance measure.The dataset includes the student performance results on the English/Language Arts section of the AzMERIT from the Fall 2017 and Spring 2018. Data is representive of students in third grade in public elementary schools in Tempe. This includes schools from both Tempe Elementary and Kyrene districts. Results are by school and provide the total number of students tested, total percentage passing and percentage of students scoring at each of the four levels of proficiency. The performance measure dashboard is available at 3.07 3rd Grade Reading Level Proficiency.Additional InformationSource: Arizona Department of EducationContact: Ann Lynn DiDomenicoContact E-Mail: Ann_DiDomenico@tempe.govData Source Type: Excel/ CSVPreparation Method: Filters on original dataset: within "Schools" Tab School District [select Tempe School District and Kyrene School District]; School Name [deselect Kyrene SD not in Tempe city limits]; Content Area [select English Language Arts]; Test Level [select Grade 3]; Subgroup/Ethnicity [select All Students] Remove irrelevant fields; Add Fiscal YearPublish Frequency: Annually as data becomes availablePublish Method: ManualData Dictionary

  10. Uniquely Popular Businesses

    • kaggle.com
    zip
    Updated Jan 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Uniquely Popular Businesses [Dataset]. https://www.kaggle.com/datasets/thedevastator/uniquely-popular-businesses
    Explore at:
    zip(48480 bytes)Available download formats
    Dataset updated
    Jan 22, 2023
    Authors
    The Devastator
    Description

    Uniquely Popular Businesses

    Rankings of Business Categories in Seattle & NYC Neighborhoods

    By data.world's Admin [source]

    About this dataset

    This dataset contains data used to analyze the uniquely popular business types in the neighborhoods of Seattle and New York City. We used publically available neighborhood-level shapefiles to identify neighborhoods, and then crossed that information against Yelp's Business Category API to find businesses operating within each neighborhood. The ratio of businesses from each category was studied in comparison to their ratios in the entire city to determine any significant differences between each borough.

    Any single business with more than one category was repeated for each one, however none of them were ever recorded twice for any single category. Moreover, if a certain business type didn't make up at least 1% of a particular neighborhood's businesses overall it was removed from the analysis altogether.

    The data available here is free to use under MIT license, with appropriate attribution given back to Yelp for providing this information. It is an invaluable resource for researchers across different disciplines looking into consumer behavior or clustering within urban areas!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    How to Use This Dataset

    To get started using this dataset: - Download the appropriate file for the area you’re researching - either salt5_Seattle.csv or top5_NewYorkCity.csv - from the Kaggle site which hosts this dataset (https://www.kaggle.com/puddingmagazine/uniquely-popular-businesses). - Read through each columns information available under Columns section associated with this kaggle description (above).
    - Take note of columns that are relevant to your analysis such as nCount which indicates the number of businesses in a neighborhood, rank which shows how popular that business type is overall and neighborhoodTotal which specifies total number of businesses in a particular neighborhood etc.,
    - ) Load your selected file into an application designed for data analysis such as Jupyter Notebook, Microsoft Excel, Power BI etc.,
    - ) Begin performing various analyses related to understanding where certain types of unique business are most common by subsetting rows based on specific neighborhoods; alternatively perform regressions-based analyses related to trends similar unique type's ranks over multiple neighborhoods etc.,

    If you have any questions about interpreting data from this source please reach out if needed!

    Research Ideas

    • Analyzing the unique business trends in Seattle and New York City to identify potential investment opportunities.
    • Creating a tool that helps businesses understand what local competitions they face by neighborhood.
    • Exploring the distinctions between neighborhoods by plotting out the different businesses they have in comparison with each other and other cities

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: top5_Seattle.csv | Column name | Description | |:----------------------|:----------------------------------------------------------------------------------------------------------------------------------| | neighborhood | Name of the neighborhood. (String) | | yelpAlias | The Yelp-specified Alias for the business type. (String) | | yelpTitle | The Title given to this business type by Yelp. (String) | | nCount | Number of businesses with this type within a particular neighborhood. (Integer) | | neighborhoodTotal | Total number of businesses located within that particular region. (Integer) | | cCount | Number of businesses with this storefront within an entire city. (Integer) | | cityTotal | Total number of all types of storefronts within an entire city. (Integer) ...

  11. d

    1.05 Feeling of Safety in Your Neighborhood (summary)

    • catalog.data.gov
    • performance.tempe.gov
    • +7more
    Updated Nov 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2025). 1.05 Feeling of Safety in Your Neighborhood (summary) [Dataset]. https://catalog.data.gov/dataset/1-05-feeling-of-safety-in-your-neighborhood-summary-8efc2
    Explore at:
    Dataset updated
    Nov 1, 2025
    Dataset provided by
    City of Tempe
    Description

    Tempe’s trust data for this measure is collected every month and comes from the “Safety” result from the monthly administered Police Sentiment Survey. There is one question which feeds into these results: "When it comes to the threat of crime, how safe do you feel in your neighborhood?" Benchmark data is from cohorts of communities with similar characteristics, such as size, population density, and region. This data is collected every month and quarter via a recurring report.This page provides data for the Feeling of Safety in Your Neighborhood performance measure. The performance measure dashboard is available at 1.05 Feeling of Safety in Your Neighborhood.Data Dictionary Additional Information Source: Zencity Contact: Amber Asburry Contact email: strategic_management_innovation@tempe.gov Data Source Type: Excel, CSV Preparation Method: Take the "Safety" score from the Police Sentiment Survey. This score includes the average of the top two results from the question underneath this area on the report. These months are then averaged to get the quarterly score. Publish Frequency: Monthly Publish Method: Manual

  12. G

    Graph Data Integration Platform Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Graph Data Integration Platform Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/graph-data-integration-platform-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Aug 22, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Graph Data Integration Platform Market Outlook



    According to our latest research, the global graph data integration platform market size reached USD 2.1 billion in 2024, reflecting robust adoption across industries. The market is projected to grow at a CAGR of 18.4% from 2025 to 2033, reaching approximately USD 10.7 billion by 2033. This significant growth is fueled by the increasing need for advanced data management and analytics solutions that can handle complex, interconnected data across diverse organizational ecosystems. The rapid digital transformation and the proliferation of big data have further accelerated the demand for graph-based data integration platforms.




    The primary growth factor driving the graph data integration platform market is the exponential increase in data complexity and volume within enterprises. As organizations collect vast amounts of structured and unstructured data from multiple sources, traditional relational databases often struggle to efficiently process and analyze these data sets. Graph data integration platforms, with their ability to map, connect, and analyze relationships between data points, offer a more intuitive and scalable solution. This capability is particularly valuable in sectors such as BFSI, healthcare, and telecommunications, where real-time data insights and dynamic relationship mapping are crucial for decision-making and operational efficiency.




    Another significant driver is the growing emphasis on advanced analytics and artificial intelligence. Modern enterprises are increasingly leveraging AI and machine learning to extract actionable insights from their data. Graph data integration platforms enable the creation of knowledge graphs and support complex analytics, such as fraud detection, recommendation engines, and risk assessment. These platforms facilitate seamless integration of disparate data sources, enabling organizations to gain a holistic view of their operations and customers. As a result, investment in graph data integration solutions is rising, particularly among large enterprises seeking to enhance their analytics capabilities and maintain a competitive edge.




    The surge in regulatory requirements and compliance mandates across various industries also contributes to the expansion of the graph data integration platform market. Organizations are under increasing pressure to ensure data accuracy, lineage, and transparency, especially in highly regulated sectors like finance and healthcare. Graph-based platforms excel in tracking data provenance and relationships, making it easier for companies to comply with regulations such as GDPR, HIPAA, and others. Additionally, the shift towards hybrid and multi-cloud environments further underscores the need for robust data integration tools capable of operating seamlessly across different infrastructures, further boosting market growth.




    From a regional perspective, North America currently dominates the graph data integration platform market, accounting for the largest share due to early adoption of advanced data technologies, a strong presence of key market players, and significant investments in digital transformation initiatives. However, Asia Pacific is expected to witness the fastest growth over the forecast period, driven by rapid industrialization, expanding IT infrastructure, and increasing adoption of cloud-based solutions among enterprises in countries like China, India, and Japan. Europe also remains a significant contributor, supported by stringent data privacy regulations and a mature digital economy.





    Component Analysis



    The component segment of the graph data integration platform market is bifurcated into software and services. The software segment currently commands the largest market share, reflecting the critical role of robust graph database engines, visualization tools, and integration frameworks in managing and analyzing complex data relationships. These software solutions are designed to deliver high scalability, flexibility, and real-time proces

  13. e

    Waterworks — intake point_reporting

    • data.europa.eu
    • gimi9.com
    unknown
    Updated Feb 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Waterworks — intake point_reporting [Dataset]. https://data.europa.eu/data/datasets/https-data-norge-no-node-1499/
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Feb 7, 2022
    License

    http://spdx.org/licenses/NLOD-2.0http://spdx.org/licenses/NLOD-2.0

    Description

    The data sets provide an overview of selected data on waterworks registered with the Norwegian Food Safety Authority. The information has been reported by the waterworks through application processing or other reporting to the Norwegian Food Safety Authority. Drinking water regulations require, among other things, annual reporting. The Norwegian Food Safety Authority has created a separate form service for such reporting. The data sets include public or private waterworks that supply 50 people or more. In addition, all municipal owned businesses with their own water supply are included regardless of size. The data sets also contain decommissioned facilities. This is done for those who wish to view historical data, i.e. data for previous years or earlier. There are data sets for the following supervisory objects: 1. Water supply system. It also includes analysis of drinking water. 2. Transport system 3. Treatment facility 4. Entry point. It also includes analysis of the water source. Below you will find datasets for: 4. Input point_reporting. In addition, there is a file (information.txt) that provides an overview of when the extracts were produced and how many lines there are in the individual files. The withdrawals are done weekly. Furthermore, for the data sets water supply system, transport system and intake point it is possible to see historical data on what is included in the annual reporting. To make use of that information, the file must be linked to the “moder” file. to get names and other static information. These files have the _reporting ending in the file name. Description of the data fields (i.e. metadata) in the individual data sets appears in separate files. These are available in pdf format. If you double-click the csv file and it opens directly in excel, then you will not get the æøå. To see the character set correctly in Excel, you must: & start Excel and a new spreadsheet & select data and then from text, press Import & select separator data and file origin 65001: Unicode (UTF-8) and tick of My Data have headings and press Next & remove tab as separator and select semicolon as separator, press next & otherwise, complete the data sets can be imported into a separate database and compiled as desired. There are link keys in the files that make it possible to link the files together. The waterworks are responsible for the quality of the datasets.

    Purpose: Make data for drinking water supply available to the public.

  14. Yelp Reviews Sentiment Dataset

    • kaggle.com
    zip
    Updated Nov 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Yelp Reviews Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/yelp-reviews-sentiment-dataset/code
    Explore at:
    zip(169587518 bytes)Available download formats
    Dataset updated
    Nov 25, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Yelp Reviews Sentiment Dataset

    A Challenge for Natural Language Processing

    By Huggingface Hub [source]

    About this dataset

    The Yelp Reviews Polarity dataset is a collection of Yelp reviews that have been labeled as positive or negative. This dataset is perfect for natural language processing tasks such as sentiment analysis

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This YELP reviews dataset is a great natural language processing dataset for anyone looking to get started with text classification. The data is split into two files: train.csv and test.csv. The training set contains 7,000 reviews with labels (0 = negative, 1 = positive), and the test set contains 3,000 unlabeled reviews.

    To get started with this dataset, download the two CSV files and put them in the same directory. Then, open up train.csv in your favorite text editor or spreadsheet software (I like using Microsoft Excel). Next, take a look at the first few rows of data to get a feel for what you're working with:

    textlabel
    So there is no way for me to plug it in here in the US unless I go by...0

    Research Ideas

    • This dataset could be used to train a machine learning model to classify Yelp reviews as positive or negative.
    • This dataset could be used to train a machine learning model to predict the star rating of a Yelp review based on the text of the review.
    • This dataset could be used to build a natural language processing system that generates fake Yelp reviews

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------|:----------------------------------| | text | The text of the review. (string) | | label | The label of the review. (string) |

    File: test.csv | Column name | Description | |:--------------|:----------------------------------| | text | The text of the review. (string) | | label | The label of the review. (string) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  15. Google Certificate BellaBeats Capstone Project

    • kaggle.com
    zip
    Updated Jan 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason Porzelius (2023). Google Certificate BellaBeats Capstone Project [Dataset]. https://www.kaggle.com/datasets/jasonporzelius/google-certificate-bellabeats-capstone-project
    Explore at:
    zip(169161 bytes)Available download formats
    Dataset updated
    Jan 5, 2023
    Authors
    Jason Porzelius
    Description

    Introduction: I have chosen to complete a data analysis project for the second course option, Bellabeats, Inc., using a locally hosted database program, Excel for both my data analysis and visualizations. This choice was made primarily because I live in a remote area and have limited bandwidth and inconsistent internet access. Therefore, completing a capstone project using web-based programs such as R Studio, SQL Workbench, or Google Sheets was not a feasible choice. I was further limited in which option to choose as the datasets for the ride-share project option were larger than my version of Excel would accept. In the scenario provided, I will be acting as a Junior Data Analyst in support of the Bellabeats, Inc. executive team and data analytics team. This combined team has decided to use an existing public dataset in hopes that the findings from that dataset might reveal insights which will assist in Bellabeat's marketing strategies for future growth. My task is to provide data driven insights to business tasks provided by the Bellabeats, Inc.'s executive and data analysis team. In order to accomplish this task, I will complete all parts of the Data Analysis Process (Ask, Prepare, Process, Analyze, Share, Act). In addition, I will break each part of the Data Analysis Process down into three sections to provide clarity and accountability. Those three sections are: Guiding Questions, Key Tasks, and Deliverables. For the sake of space and to avoid repetition, I will record the deliverables for each Key Task directly under the numbered Key Task using an asterisk (*) as an identifier.

    Section 1 - Ask:

    A. Guiding Questions:
    1. Who are the key stakeholders and what are their goals for the data analysis project? 2. What is the business task that this data analysis project is attempting to solve?

    B. Key Tasks: 1. Identify key stakeholders and their goals for the data analysis project *The key stakeholders for this project are as follows: -Urška Sršen and Sando Mur - co-founders of Bellabeats, Inc. -Bellabeats marketing analytics team. I am a member of this team.

    1. Identify the business task. *The business task is: -As provided by co-founder Urška Sršen, the business task for this project is to gain insight into how consumers are using their non-BellaBeats smart devices in order to guide upcoming marketing strategies for the company which will help drive future growth. Specifically, the researcher was tasked with applying insights driven by the data analysis process to 1 BellaBeats product and presenting those insights to BellaBeats stakeholders.

    Section 2 - Prepare:

    A. Guiding Questions: 1. Where is the data stored and organized? 2. Are there any problems with the data? 3. How does the data help answer the business question?

    B. Key Tasks:

    1. Research and communicate the source of the data, and how it is stored/organized to stakeholders. *The data source used for our case study is FitBit Fitness Tracker Data. This dataset is stored in Kaggle and was made available through user Mobius in an open-source format. Therefore, the data is public and available to be copied, modified, and distributed, all without asking the user for permission. These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk reportedly (see credibility section directly below) between 03/12/2016 thru 05/12/2016.
      *Reportedly (see credibility section directly below), thirty eligible Fitbit users consented to the submission of personal tracker data, including output related to steps taken, calories burned, time spent sleeping, heart rate, and distance traveled. This data was broken down into minute, hour, and day level totals. This data is stored in 18 CSV documents. I downloaded all 18 documents into my local laptop and decided to use 2 documents for the purposes of this project as they were files which had merged activity and sleep data from the other documents. All unused documents were permanently deleted from the laptop. The 2 files used were: -sleepDay_merged.csv -dailyActivity_merged.csv

    2. Identify and communicate to stakeholders any problems found with the data related to credibility and bias. *As will be more specifically presented in the Process section, the data seems to have credibility issues related to the reported time frame of the data collected. The metadata seems to indicate that the data collected covered roughly 2 months of FitBit tracking. However, upon my initial data processing, I found that only 1 month of data was reported. *As will be more specifically presented in the Process section, the data has credibility issues related to the number of individuals who reported FitBit data. Specifically, the metadata communicates that 30 individual users agreed to report their tracking data. My initial data processing uncovered 33 individual ...

  16. Student Performance Factors (Excel Analysis)

    • kaggle.com
    zip
    Updated Nov 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kino (2025). Student Performance Factors (Excel Analysis) [Dataset]. https://www.kaggle.com/datasets/kinozyne/student-performance-factors-excel-analysis
    Explore at:
    zip(973447 bytes)Available download formats
    Dataset updated
    Nov 17, 2025
    Authors
    Kino
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📊 Student Performance Analysis

    Project: Data Analysis using Excel Pivot Tables & Charts

    Executive Summary

    Based on the analysis of 6,607 students, this project identifies that active student habits (Attendance, Tutoring) are stronger predictors of success than environmental factors (Income, Resources).

    Key Insights

    1. Show Up: Attendance is the #1 driver of success.
    2. Get Help: Students attending 6 tutoring sessions/week scored 5 points higher on average.
    3. Sleep Myth: Sleep duration showed no correlation with exam scores.

    Tools Used

    • Microsoft Excel: Pivot Tables, Advanced Charting, Statistical Analysis, Data Cleaning.

    Source of Dataset(.csv)

  17. Cyclistic Bike-share

    • kaggle.com
    zip
    Updated May 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arsenio Clark (2023). Cyclistic Bike-share [Dataset]. https://www.kaggle.com/datasets/arsenioclark/cyclistic-bike-share
    Explore at:
    zip(590509171 bytes)Available download formats
    Dataset updated
    May 15, 2023
    Authors
    Arsenio Clark
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    **Introduction ** This case study will be based on Cyclistic, a bike sharing company in Chicago. I will perform tasks of a junior data analyst to answer business questions. I will do this by following a process that includes the following phases: ask, prepare, process, analyze, share and act.

    Background Cyclistic is a bike sharing company that operates 5828 bikes within 692 docking stations. The company has been around since 2016 and separates itself from the competition due to the fact that they offer a variety of bike services including assistive options. Lily Moreno is the director of the marketing team and will be the person to receive these insights from this analysis.

    Case Study and business task Lily Morenos perspective on how to generate more income by marketing Cyclistics services correctly includes converting casual riders (one day passes and/or pay per ride customers) into annual riders with a membership. Annual riders are more profitable than casual riders according to the finance analysts. She would rather see a campaign targeting casual riders into annual riders, instead of launching campaigns targeting new costumers. So her strategy as the manager of the marketing team is simply to maximize the amount of annual riders by converting casual riders.

    In order to make a data driven decision, Moreno needs the following insights:

    A better understanding of how casual riders and annual riders differ Why would a casual rider become an annual one How digital media can affect the marketing tactics Moreno has directed me to the first question - how do casual riders and annual riders differ?

    Stakeholders Lily Moreno, manager of the marketing team Cyclistic Marketing team Executive team

    Data sources and organization Data used in this report is made available and is licensed by Motivate International Inc. Personal data is hidden to protect personal information. Data used is from the past 12 months (03/2022 – 02/2023) of bike share dataset.

    By merging all 12 monthly bike share data provided, an extensive amount of data with 5,785,180 rows were returned and included in this analysis.

    Data security and limitations: Personal information is secured and hidden to prevent unlawful use. Original files are backed up in folders and subfolders.

    Tools and documentation of cleaning process The tools used for data verification and data cleaning are Microsoft Excel. The original files made accessible by Motivate International Inc. are backed up in their original format and in separate files.

    Microsoft Excel is used to generally look through the dataset and get a overview of the content. I performed simple checks of the data by filtering, sorting, formatting and standardizing the data to make it easily mergeable.. In Excel, I also changed data type to have the right format, removed unnecessary data if its incomplete or incorrect, created new columns to subtract and reformat existing columns and deleting empty cells. These tasks are easily done in spreadsheets and provides an initial cleaning process of the data.

    Limitations Microsoft Excel has a limitation of 1,048,576 rows while the data of the 12 months combined are over 5,785,180 rows. When combining the 12 months of data into one table/sheet, Excel is no longer efficient and I switched over to R programming.

  18. Bee Colony Dynamics

    • kaggle.com
    zip
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Bee Colony Dynamics [Dataset]. https://www.kaggle.com/datasets/thedevastator/uncovering-bee-colony-dynamics-with-usda-statist
    Explore at:
    zip(141016 bytes)Available download formats
    Dataset updated
    Dec 6, 2022
    Authors
    The Devastator
    Description

    Bee Colony Dynamics

    U.S. Survey and Census Data

    By Brenda Griffith [source]

    About this dataset

    This dataset provides access to the rich and expansive bee colony contribution information captured by the United States Department of Agriculture National Agricultural Statistics Service Quick Stats Dataset. Here, you will find survey data by state and census data by county, which can help inform decision-making and research related to beekeeping practices.
    Our dataset also covers Bee Informed Partnership's original data on bee colony loss in the U.S., including percentages of colonies exclusive to each state, total winter all loss, number of beekeepers exclusive to each state, as well as overall percentage of populations experiencing losses during a given period. With this valuable data at hand, it becomes easier for policymakers, business owners and researchers to make well informed decisions that help protect our nation's most important pollinators!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    The first three datasets contain information on colonies and their management while BIP Bee Loss Clean contains information specifically related to bees’ losses during specific periods and years throughout all 50 states. For each dataset there is a “Search Criteria” excel file which provides more details about what data was collected from each set in order for you to better understand it when looking at query results or performing features/dimensionality reduction analyses based on these criteria one might want different levels of aggregation available for his/her use case wherein he can get helpful insights form both state-level as well as county-level granularities wrapped up together into one entity that can help one draw connections between them over time for example . The datasets include fields such as year, period, state code, agricultural district code, watershed code, region and various others related to the characteristics of colonies and their management in different states over multiple years (from 2017–2020). Depending upon your analysis task or outcome you would like perform afterwards you can carefully choose columns/parameters that best serve your purpose accordingly else it would be bear more processing resources than necessary also consequentlly reducing efficiency overall due sacrificing optimal functioning capabilities at least partiually especially regarding system resources being consumed thereby possibly resulting into noteworthy performance degradation overall , quite obviously hence needful for optimal performance & computational efficiencies unique requisites .

    You may also look at other similar data collections such as EPA's official Pollutant Release Inventory or PRI dataset which provides pollutant releases(emissions & transfers) along provide additional relevant insights obtained via analysis corresponding pollutants released across varius waetedbodies & farms within US& respective impacts upon ecosystems^ eco services etc etc ..Meanwhile during Mining IT recource consideration should be given particular attention regarding any redundant preprocessing steps if possible execution done

    Research Ideas

    • Developing an app or device to help beekeepers track their hive health and productivity over time. This could include tracking population of bees, diseases and pests in their hives, and performance in producing honey.
    • Analyzing the data to identify geographical regions with the most favorable conditions for successful bee colonies, which can then be used to better inform prospective new beekeepers as well as eager honey consumers looking for ethically sourced honey products.
    • Examining how trends in pollen sources available affect beehive production by correlating census data with changes in local flora or climate over time (e.g., studying areas experiencing severe drought vs those with abundant blooming plants)

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions u...

  19. Global Country Information Dataset 2023

    • kaggle.com
    zip
    Updated Jul 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nidula Elgiriyewithana ⚡ (2023). Global Country Information Dataset 2023 [Dataset]. https://www.kaggle.com/datasets/nelgiriyewithana/countries-of-the-world-2023
    Explore at:
    zip(24063 bytes)Available download formats
    Dataset updated
    Jul 8, 2023
    Authors
    Nidula Elgiriyewithana ⚡
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.

    DOI

    Key Features

    • Country: Name of the country.
    • Density (P/Km2): Population density measured in persons per square kilometer.
    • Abbreviation: Abbreviation or code representing the country.
    • Agricultural Land (%): Percentage of land area used for agricultural purposes.
    • Land Area (Km2): Total land area of the country in square kilometers.
    • Armed Forces Size: Size of the armed forces in the country.
    • Birth Rate: Number of births per 1,000 population per year.
    • Calling Code: International calling code for the country.
    • Capital/Major City: Name of the capital or major city.
    • CO2 Emissions: Carbon dioxide emissions in tons.
    • CPI: Consumer Price Index, a measure of inflation and purchasing power.
    • CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.
    • Currency_Code: Currency code used in the country.
    • Fertility Rate: Average number of children born to a woman during her lifetime.
    • Forested Area (%): Percentage of land area covered by forests.
    • Gasoline_Price: Price of gasoline per liter in local currency.
    • GDP: Gross Domestic Product, the total value of goods and services produced in the country.
    • Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.
    • Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.
    • Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.
    • Largest City: Name of the country's largest city.
    • Life Expectancy: Average number of years a newborn is expected to live.
    • Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.
    • Minimum Wage: Minimum wage level in local currency.
    • Official Language: Official language(s) spoken in the country.
    • Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.
    • Physicians per Thousand: Number of physicians per thousand people.
    • Population: Total population of the country.
    • Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.
    • Tax Revenue (%): Tax revenue as a percentage of GDP.
    • Total Tax Rate: Overall tax burden as a percentage of commercial profits.
    • Unemployment Rate: Percentage of the labor force that is unemployed.
    • Urban Population: Percentage of the population living in urban areas.
    • Latitude: Latitude coordinate of the country's location.
    • Longitude: Longitude coordinate of the country's location.

    Potential Use Cases

    • Analyze population density and land area to study spatial distribution patterns.
    • Investigate the relationship between agricultural land and food security.
    • Examine carbon dioxide emissions and their impact on climate change.
    • Explore correlations between economic indicators such as GDP and various socio-economic factors.
    • Investigate educational enrollment rates and their implications for human capital development.
    • Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.
    • Study labor market dynamics through indicators such as labor force participation and unemployment rates.
    • Investigate the role of taxation and its impact on economic development.
    • Explore urbanization trends and their social and environmental consequences.

    Data Source: This dataset was compiled from multiple data sources

    If this was helpful, a vote is appreciated ❤️ Thank you 🙂

  20. Covid-19 Food Insecurity Data

    • kaggle.com
    zip
    Updated Sep 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jack Ogozaly (2021). Covid-19 Food Insecurity Data [Dataset]. https://www.kaggle.com/datasets/jackogozaly/pulse-survey-food-insecurity-data
    Explore at:
    zip(6230854 bytes)Available download formats
    Dataset updated
    Sep 13, 2021
    Authors
    Jack Ogozaly
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    What's in the Data?

    This dataset tracks food insecurity across different demographics starting 4/23/2020 to 8/23/2021. It contains fields such as Race, Education, Sex, State, Income, etc. If you're looking for a dataset to examine Covid-19's impact on food insecurity for different demographics, then here you are!

    Data Source

    This data is from the United States Census Bureau's Pulse Survey. The Pulse Survey is a frequently updating survey designed to collect data on how people's lives have been impacted by the coronavirus. Specifically, this dataset is a cleaned up version of the ' Food Sufficiency for Households, in the Last 7 Days, by Select Characteristics" tables.

    The original form of this data can be found at: https://www.census.gov/programs-surveys/household-pulse-survey/data.html

    What was done to this data?

    The original form of this data was split into 36 excel files containing ~67 sheets each. The data was in a non-tidy format, and questions were also not entirely standard. This dataset is my attempt to combine all these different files, tidy the data up, and combine slightly different questions together.

    Why are there so many NA's?

    The large amount of NA's are a consequence of how awful the data was originally/ forcing the data into a tidy format. Just filter the NA's out for the question you want to analyze and you'll be fine.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Various (2025). Equity Report Data: Geography [Dataset]. https://data.sandiegocounty.gov/dataset/Equity-Report-Data-Geography/p6uw-qxpv

Equity Report Data: Geography

Explore at:
application/geo+json, csv, kmz, kml, xlsx, xmlAvailable download formats
Dataset updated
May 21, 2025
Dataset authored and provided by
Various
Description

This dataset contains the geographic data used to create maps for the San Diego County Regional Equity Indicators Report led by the Office of Equity and Racial Justice (OERJ). The full report can be found here: https://data.sandiegocounty.gov/stories/s/7its-kgpt

Demographic data from the report can be found here: https://data.sandiegocounty.gov/dataset/Equity-Report-Data-Demographics/q9ix-kfws

Filter by the Indicator column to select data for a particular indicator map.

Export notes: Dataset may not automatically open correctly in Excel due to geospatial data. To export the data for geospatial analysis, select Shapefile or GEOJSON as the file type. To view the data in Excel, export as a CSV but do not open the file. Then, open a blank Excel workbook, go to the Data tab, select “From Text/CSV,” and follow the prompts to import the CSV file into Excel. Alternatively, use the exploration options in "View Data" to hide the geographic column prior to exporting the data.

USER NOTES: 4/7/2025 - The maps and data have been removed for the Health Professional Shortage Areas indicator due to inconsistencies with the data source leading to some missing health professional shortage areas. We are working to fix this issue, including exploring possible alternative data sources.

5/21/2025 - The following changes were made to the 2023 report data (Equity Report Year = 2023). Self-Sufficiency Wage - a typo in the indicator name was fixed (changed sufficienct to sufficient) and the percent for one PUMA corrected from 56.9 to 59.9 (PUMA = San Diego County (Northwest)--Oceanside City & Camp Pendleton). Notes were made consistent for all rows where geography = ZCTA. A note was added to all rows where geography = PUMA. Voter registration - label "92054, 92051" was renamed to be in numerical order and is now "92051, 92054". Removed data from the percentile column because the categories are not true percentiles. Employment - Data was corrected to show the percent of the labor force that are employed (ages 16 and older). Previously, the data was the percent of the population 16 years and older that are in the labor force. 3- and 4-Year-Olds Enrolled in School - percents are now rounded to one decimal place. Poverty - the last two categories/percentiles changed because the 80th percentile cutoff was corrected by 0.01 and one ZCTA was reassigned to a different percentile as a result. Low Birthweight - the 33th percentile label was corrected to be written as the 33rd percentile. Life Expectancy - Corrected the category and percentile assignment for SRA CENTRAL SAN DIEGO. Parks and Community Spaces - corrected the category assignment for six SRAs.

5/21/2025 - Data was uploaded for Equity Report Year 2025. The following changes were made relative to the 2023 report year. Adverse Childhood Experiences - added geographic data for 2025 report. No calculation of bins nor corresponding percentiles due to small number of geographic areas. Low Birthweight - no calculation of bins nor corresponding percentiles due to small number of geographic areas.

Prepared by: Office of Evaluation, Performance, and Analytics and the Office of Equity and Racial Justice, County of San Diego, in collaboration with the San Diego Regional Policy & Innovation Center (https://www.sdrpic.org).

Search
Clear search
Close search
Google apps
Main menu