This dataset contains the geographic data used to create maps for the San Diego County Regional Equity Indicators Report led by the Office of Equity and Racial Justice (OERJ). The full report can be found here: https://data.sandiegocounty.gov/stories/s/7its-kgpt
Demographic data from the report can be found here: https://data.sandiegocounty.gov/dataset/Equity-Report-Data-Demographics/q9ix-kfws
Filter by the Indicator column to select data for a particular indicator map.
Export notes: Dataset may not automatically open correctly in Excel due to geospatial data. To export the data for geospatial analysis, select Shapefile or GEOJSON as the file type. To view the data in Excel, export as a CSV but do not open the file. Then, open a blank Excel workbook, go to the Data tab, select “From Text/CSV,” and follow the prompts to import the CSV file into Excel. Alternatively, use the exploration options in "View Data" to hide the geographic column prior to exporting the data.
USER NOTES: 4/7/2025 - The maps and data have been removed for the Health Professional Shortage Areas indicator due to inconsistencies with the data source leading to some missing health professional shortage areas. We are working to fix this issue, including exploring possible alternative data sources.
5/21/2025 - The following changes were made to the 2023 report data (Equity Report Year = 2023). Self-Sufficiency Wage - a typo in the indicator name was fixed (changed sufficienct to sufficient) and the percent for one PUMA corrected from 56.9 to 59.9 (PUMA = San Diego County (Northwest)--Oceanside City & Camp Pendleton). Notes were made consistent for all rows where geography = ZCTA. A note was added to all rows where geography = PUMA. Voter registration - label "92054, 92051" was renamed to be in numerical order and is now "92051, 92054". Removed data from the percentile column because the categories are not true percentiles. Employment - Data was corrected to show the percent of the labor force that are employed (ages 16 and older). Previously, the data was the percent of the population 16 years and older that are in the labor force. 3- and 4-Year-Olds Enrolled in School - percents are now rounded to one decimal place. Poverty - the last two categories/percentiles changed because the 80th percentile cutoff was corrected by 0.01 and one ZCTA was reassigned to a different percentile as a result. Low Birthweight - the 33th percentile label was corrected to be written as the 33rd percentile. Life Expectancy - Corrected the category and percentile assignment for SRA CENTRAL SAN DIEGO. Parks and Community Spaces - corrected the category assignment for six SRAs.
5/21/2025 - Data was uploaded for Equity Report Year 2025. The following changes were made relative to the 2023 report year. Adverse Childhood Experiences - added geographic data for 2025 report. No calculation of bins nor corresponding percentiles due to small number of geographic areas. Low Birthweight - no calculation of bins nor corresponding percentiles due to small number of geographic areas.
Prepared by: Office of Evaluation, Performance, and Analytics and the Office of Equity and Racial Justice, County of San Diego, in collaboration with the San Diego Regional Policy & Innovation Center (https://www.sdrpic.org).
https://data.norge.no/nlod/en/2.0/https://data.norge.no/nlod/en/2.0/
The data sets provide an overview of selected data on waterworks registered with the Norwegian Food Safety Authority. The information has been reported by the waterworks through application processing or other reporting to the Norwegian Food Safety Authority. Drinking water regulations require, among other things, annual reporting. The Norwegian Food Safety Authority has created a separate form service for such reporting. The data sets include public or private waterworks that supply 50 people or more. In addition, all municipal owned businesses with their own water supply are included regardless of size. The data sets also contain decommissioned facilities. This is done for those who wish to view historical data, i.e. data for previous years or earlier.There are data sets for the following supervisory objects: 1. Water supply system. It also includes analysis of drinking water. 2. Transport system 3. Treatment facility 4. Entry point. It also includes analysis of the water source. Below you will find data sets for the 4th intake point_analysis. In addition, there is a file (information.txt) that provides an overview of when the extracts were produced and how many lines there are in the individual files. The withdrawals are done weekly. Furthermore, for the data sets water supply system, transport system and intake point it is possible to see historical data on what is included in the annual reporting. To make use of that information, the file must be linked to the “moder” file. to get names and other static information. These files have the _reporting ending in the file name.Description of the data fields (i.e. metadata) in the individual data sets appears in separate files. These are available in pdf format. If you double-click the csv file and it opens directly in excel, then you will not get the æøå. To see the character set correctly in Excel, you must: & start Excel and a new spreadsheet & select data and then from text, press Import & select separator data and file origin 65001: Unicode (UTF-8) and tick of My Data have headings and press Next &remove tab as separator and select semicolon as separator, press next & otherwise, complete the data sets can be imported into a separate database and compiled as desired. There are link keys in the files that make it possible to link the files together. The waterworks are responsible for the quality of the datasets. — Purpose: Make information on the supply of drinking water available to the public. The data sets provide an overview of selected data on waterworks registered with the Norwegian Food Safety Authority. The information has been reported by the waterworks through application processing or other reporting to the Norwegian Food Safety Authority. Drinking water regulations require, among other things, annual reporting. The Norwegian Food Safety Authority has created a separate form service for such reporting.The data sets include public or private waterworks that supply 50 people or more. In addition, all municipal owned businesses with their own water supply are included regardless of size. The data sets also contain decommissioned facilities. This is done for those who wish to view historical data, i.e. data for previous years or earlier. There are data sets for the following supervisory objects: 1. Water supply system. It also includes analysis of drinking water. 2. Transport system 3. Treatment facility 4. Entry point. It also includes analysis of the water source.Below you will find data sets for the 4th intake point_analysis. In addition, there is a file (information.txt) that provides an overview of when the extracts were produced and how many lines there are in the individual files. The withdrawals are done weekly. Furthermore, for the data sets water supply system, transport system and intake point it is possible to see historical data on what is included in the annual reporting. To make use of that information, the file must be linked to the “moder” file. to get names and other static information. These files have the _reporting ending in the file name. Description of the data fields (i.e. metadata) in the individual data sets appears in separate files. These are available in pdf format. If you double-click the csv file and it opens directly in excel, then you will not get the æøå. To see the character set correctly in Excel, you must: & start Excel and a new spreadsheet & select data and then from text, press Import & select separator data and file origin 65001: Unicode (UTF-8) and tick of My Data have headings and press Next & remove tab as separator and select semicolon as separator, press next & otherwise, complete the data sets can be imported into a separate database and compiled as desired. There are link keys in the files that make it possible to link the files together. The waterworks are responsible for the quality of the datasets. — Purpose: Make information on the supply of drinking
This page provides data for the 3rd Grade Reading Level Proficiency performance measure.The dataset includes the student performance results on the English/Language Arts section of the AzMERIT from the Fall 2017 and Spring 2018. Data is representive of students in third grade in public elementary schools in Tempe. This includes schools from both Tempe Elementary and Kyrene districts. Results are by school and provide the total number of students tested, total percentage passing and percentage of students scoring at each of the four levels of proficiency. The performance measure dashboard is available at 3.07 3rd Grade Reading Level Proficiency.Additional InformationSource: Arizona Department of EducationContact: Ann Lynn DiDomenicoContact E-Mail: Ann_DiDomenico@tempe.govData Source Type: Excel/ CSVPreparation Method: Filters on original dataset: within "Schools" Tab School District [select Tempe School District and Kyrene School District]; School Name [deselect Kyrene SD not in Tempe city limits]; Content Area [select English Language Arts]; Test Level [select Grade 3]; Subgroup/Ethnicity [select All Students] Remove irrelevant fields; Add Fiscal YearPublish Frequency: Annually as data becomes availablePublish Method: ManualData Dictionary
Tempe’s trust data for this measure is collected every month and comes from the “Safety” result from the monthly administered Police Sentiment Survey. There is one question which feeds into these results: "When it comes to the threat of crime, how safe do you feel in your neighborhood?" Benchmark data is from cohorts of communities with similar characteristics, such as size, population density, and region. This data is collected every month and quarter via a recurring report.This page provides data for the Feeling of Safety in Your Neighborhood performance measure. The performance measure dashboard is available at 1.05 Feeling of Safety in Your Neighborhood.Data Dictionary Additional Information Source: Zencity Contact: Amber Asburry Contact email: strategic_management_innovation@tempe.gov Data Source Type: Excel, CSV Preparation Method: Take the "Safety" score from the Police Sentiment Survey. This score includes the average of the top two results from the question underneath this area on the report. These months are then averaged to get the quarterly score. Publish Frequency: Monthly Publish Method: Manual
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Also saved as Readme.txt
SupplTable1_June2023.xlsx: Supplementary Table 1: Directional data as reported by Mitchell et al 2021. This table has been filtered for repeated samples, and data not made available from the original data source.
Column Headings:
Section, stratigraphic section;
Sample Name, sample identification of Mitchell et al 2021;
Bin Age (Ma), age assigned by Mitchell et al 2021;
Age (Ma), age of samples as reported by Mitchell et al 2021;
GDec (o), GINc (o), Geographic Directions as reported by Mitchell et al 2021;
SDec (o), SInc (o), stratigraphic directions as reported by Mitchell et al 2021;
SPDec (o), SPInc (o), stratigraphic directions reported as northern hemisphere directions by Mitchell et al 2021.
SupplTable2_June2023.xlsx: Supplementary Table 2: 1 million year binned average directions and poles for Apiro and Furlo sections.
Section, Stratigraphic section;
Age (Ma), binned ages from Suppl. Table 1;
dec (o) and inc (o), Fisher directions of age bins;
n, number of individual directions in each bin;
alpha95 (o), circle of confidence radius for Fisher Means;
Plon (o), Plat (o), A95 (o), poles calculated from directions in Suppl. Table 1 using section locations as reported in Mitchell et al 2021, binned and averaged and presented here. These poles are used for Supplementary Figures 1 and 2 in the Cottrell et al 2023 manuscript.
Figure1data.xlsx: Table of published paleomagnetic poles as presented in Mitchell et al 2021. The published age bin (included for comparison) is corrected in the last column for bin age assignment based on data provided in the Mitchell et al 2021 publication. This data set forms the basis of Figure 1 in Cottrell et al 2023 manuscript.
Figure2data.xlsx: Excel file of select data columns of two samples (C10AD253 and C10AD268) presented as reverse and normal polarity examples from the data set of Mitchell et al 2021. Each file is presented on a separate tab in a format compatible with MagIC database format. Original data can be downloaded via the link provided in Mitchell et al 2021. Original data files were filtered using python and bash scripts to select demagnetization step, magnetization moment, and stratigraphic corrected directions.
Column headings
Sample: sample name
Demag Step (oC) - demagnetization step in degrees C
Moment (emu) - Magnetization moment in electromagnetic units
SDec (o) - declination corrected for geographic direction and strike/bedding dip
SInc (o) - inclination corrected for geographic direction and strike/bedding dip
Figure3data.zip: Zipped folder of hysteresis data presented in Figure 3 of Cottrell et al 2023. Data were collected on a Princeton Measurements Alternating Gradient Force Magnetometer Model 2900 with a P1 probe. The probe and diamagnetic/paramagnetic adjusted hysteresis data file and first order reversal curve files for each sample are provided. The PDF and text output of forcsensei (publically available python code for evaluating first order reversal curves) are also provided.
Figure4data.xlsx: Excel file of data presented in Figure 4. The original demagnetization data files can be found in the link provided in Mitchell et al 2021. Magnetization moment of ~580 degrees C of each data file (580 degrees specifically was not always used as a demagnetization step by the original authors) was used to calculate the percent natural remanent magnetization remaining as normalized by the zero demagnetization step magnetization moment. Any sample line designated excursion or transition was removed from the analysis. Samples were grouped based on Chron designation into Normal (32n, 33n, 34n) or Reverse (32r, 33r). Histograms of % NRM remaining after demagnetization to 580 degrees were plotted in Figure 4.
Figure5data.xlsx: Data file for plotting Figure 5. Originally presented in Mitchell et al 2021, and filtered for repeated data lines and files not made available for download. See Supplementary Data Table 1 for full details.
Column headings
Section - sedimentary section
Sample Name - sample designation assigned by Mitchell et al 2021
Bin Age - Bin Age in millions of years as assigned by Mitchell et al 2021
Age - Age in millions of years as determined by Mitchell et al 2021
Chron - Chron assignment as determined by Mitchell et al 2021
SDec - Stratigraphic declination as presented by Mitchell et al 2021
SInc - Stratigraphic inclination as presented by Mitchell et al 2021
NRMleft580 - percent NRM remaining after demagnetization to ~580 degrees. See Methods in Cottrell et al 2023 for details.
Figure6data.xlsx: Excel data file of filtered directional data presented in Mitchell et al 2021. Characteristic remanent magnetization directions and Low temperature component fits as presented in Mitchell et al 2021, and filtered for excluded data lines, repeated measurement lines, and only for the Apiro sedimentary section. Chron 33r and 33n data are presented in separate tabs.
Column headings:
Section - sedimentary section
Sample Name - sample name as presented in Mitchell et al 2021
Bin Age - as presented in Mitchell et al 2021
Age - as presented in Mitchell et al 2021
Chron - chron assignment as presented in Mitchell et al 2021
GDec - geographic declination direction of the characteristic remanent magnetization, as presented in Mitchell et al 2021
GInc - geographic inclination direction of the characteristic remanent magnetizartion, as presented in Mitchell et al 2021
NRMleft580 - percent natural remanent magnetization remaining aftyer demagnetization to ~580 degrees
LTGDec - low temperature geographic declination direction as presented by Mitchell et al 2021
LTGInc - low temperature geographic inclination direction as presented by Mitchell et al 2021
Figure 7 is a statistical model based on input parameters; there is no data associated with it.
This dataset contains all current and active business licenses issued by the Department of Business Affairs and Consumer Protection. This dataset contains a large number of records /rows of data and may not be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Notepad or Wordpad, to view and search.
Data fields requiring description are detailed below.
APPLICATION TYPE: 'ISSUE' is the record associated with the initial license application. 'RENEW' is a subsequent renewal record. All renewal records are created with a term start date and term expiration date. 'C_LOC' is a change of location record. It means the business moved. 'C_CAPA' is a change of capacity record. Only a few license types my file this type of application. 'C_EXPA' only applies to businesses that have liquor licenses. It means the business location expanded.
LICENSE STATUS: 'AAI' means the license was issued.
Business license owners may be accessed at: http://data.cityofchicago.org/Community-Economic-Development/Business-Owners/ezma-pppn To identify the owner of a business, you will need the account number or legal name.
Data Owner: Business Affairs and Consumer Protection
Time Period: Current
Frequency: Data is updated daily
The data sets provide an overview of selected data on waterworks registered with the Norwegian Food Safety Authority. The information has been reported by the waterworks through application processing or other reporting to the Norwegian Food Safety Authority. Drinking water regulations require, among other things, annual reporting. The Norwegian Food Safety Authority has created a separate form service for such reporting. The data sets include public or private waterworks that supply 50 people or more. In addition, all municipal owned businesses with their own water supply are included regardless of size. The data sets also contain decommissioned facilities. This is done for those who wish to view historical data, i.e. data for previous years or earlier. There are data sets for the following supervisory objects: 1. Water supply system. It also includes analysis of drinking water. 2. Transport system 3. Treatment facility 4.Entry point. It also includes analysis of the water source. Below you will find data sets for the 1st water supply system.In addition, there is a file (information.txt) that provides an overview of when the extracts were produced and how many lines there are in the individual files. The withdrawals are done weekly. Furthermore, for the data sets water supply system, transport system and intake point it is possible to see historical data on what is included in the annual reporting. To make use of that information, the file must be linked to the “moder” file. to get names and other static information. These files have the _reporting ending in the file name. Description of the data fields (i.e. metadata) in the individual data sets appears in separate files. These are available in pdf format. If you double-click the csv file and it opens directly in excel, then you will not get the æøå. To see the character set correctly in Excel, you must: & start Excel and a new spreadsheet & select data and then from text, press Import & select separator data and file origin 65001: Unicode (UTF-8) and tick of My Data have headings and press Next & remove tab as separator and select semicolon as separator, press next & otherwise, complete the data sets can be imported into a separate database and compiled as desired. There are link keys in the files that make it possible to link the files together. Waterworks are responsible for the quality of the datasets — Purpose: Make information on drinking water supply available to the public
Introduction: I have chosen to complete a data analysis project for the second course option, Bellabeats, Inc., using a locally hosted database program, Excel for both my data analysis and visualizations. This choice was made primarily because I live in a remote area and have limited bandwidth and inconsistent internet access. Therefore, completing a capstone project using web-based programs such as R Studio, SQL Workbench, or Google Sheets was not a feasible choice. I was further limited in which option to choose as the datasets for the ride-share project option were larger than my version of Excel would accept. In the scenario provided, I will be acting as a Junior Data Analyst in support of the Bellabeats, Inc. executive team and data analytics team. This combined team has decided to use an existing public dataset in hopes that the findings from that dataset might reveal insights which will assist in Bellabeat's marketing strategies for future growth. My task is to provide data driven insights to business tasks provided by the Bellabeats, Inc.'s executive and data analysis team. In order to accomplish this task, I will complete all parts of the Data Analysis Process (Ask, Prepare, Process, Analyze, Share, Act). In addition, I will break each part of the Data Analysis Process down into three sections to provide clarity and accountability. Those three sections are: Guiding Questions, Key Tasks, and Deliverables. For the sake of space and to avoid repetition, I will record the deliverables for each Key Task directly under the numbered Key Task using an asterisk (*) as an identifier.
Section 1 - Ask: A. Guiding Questions: Who are the key stakeholders and what are their goals for the data analysis project? What is the business task that this data analysis project is attempting to solve?
B. Key Tasks: Identify key stakeholders and their goals for the data analysis project *The key stakeholders for this project are as follows: -Urška Sršen and Sando Mur - co-founders of Bellabeats, Inc. -Bellabeats marketing analytics team. I am a member of this team. Identify the business task. *The business task is: -As provided by co-founder Urška Sršen, the business task for this project is to gain insight into how consumers are using their non-BellaBeats smart devices in order to guide upcoming marketing strategies for the company which will help drive future growth. Specifically, the researcher was tasked with applying insights driven by the data analysis process to 1 BellaBeats product and presenting those insights to BellaBeats stakeholders.
Section 2 - Prepare: A. Guiding Questions: Where is the data stored and organized? Are there any problems with the data? How does the data help answer the business question?
B. Key Tasks: Research and communicate the source of the data, and how it is stored/organized to stakeholders. *The data source used for our case study is FitBit Fitness Tracker Data. This dataset is stored in Kaggle and was made available through user Mobius in an open-source format. Therefore, the data is public and available to be copied, modified, and distributed, all without asking the user for permission. These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk reportedly (see credibility section directly below) between 03/12/2016 thru 05/12/2016. *Reportedly (see credibility section directly below), thirty eligible Fitbit users consented to the submission of personal tracker data, including output related to steps taken, calories burned, time spent sleeping, heart rate, and distance traveled. This data was broken down into minute, hour, and day level totals. This data is stored in 18 CSV documents. I downloaded all 18 documents into my local laptop and decided to use 2 documents for the purposes of this project as they were files which had merged activity and sleep data from the other documents. All unused documents were permanently deleted from the laptop. The 2 files used were: -sleepDaymerged.csv -dailyActivitymerged.csv Identify and communicate to stakeholders any problems found with the data related to credibility and bias. *As will be more specifically presented in the Process section, the data seems to have credibility issues related to the reported time frame of the data collected. The metadata seems to indicate that the data collected covered roughly 2 months of FitBit tracking. However, upon my initial data processing, I found that only 1 month of data was reported. *As will be more specifically presented in the Process section, the data has credibility issues related to the number of individuals who reported FitBit data. Specifically, the metadata communicates that 30 individual users agreed to report their tracking data. My initial data processing uncovered 33 individual IDs in the dailyActivity_merged dataset. *Due to the small number of participants (...
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This Kaggle dataset comes from an output dataset that powers my March Madness Data Analysis dashboard in Domo. - Click here to view this dashboard: Dashboard Link - Click here to view this dashboard features in a Domo blog post: Hoops, Data, and Madness: Unveiling the Ultimate NCAA Dashboard
This dataset offers one the most robust resource you will find to discover key insights through data science and data analytics using historical NCAA Division 1 men's basketball data. This data, sourced from KenPom, goes as far back as 2002 and is updated with the latest 2025 data. This dataset is meticulously structured to provide every piece of information that I could pull from this site as an open-source tool for analysis for March Madness.
Key features of the dataset include: - Historical Data: Provides all historical KenPom data from 2002 to 2025 from the Efficiency, Four Factors (Offense & Defense), Point Distribution, Height/Experience, and Misc. Team Stats endpoints from KenPom's website. Please note that the Height/Experience data only goes as far back as 2007, but every other source contains data from 2002 onward. - Data Granularity: This dataset features an individual line item for every NCAA Division 1 men's basketball team in every season that contains every KenPom metric that you can possibly think of. This dataset has the ability to serve as a single source of truth for your March Madness analysis and provide you with the granularity necessary to perform any type of analysis you can think of. - 2025 Tournament Insights: Contains all seed and region information for the 2025 NCAA March Madness tournament. Please note that I will continually update this dataset with the seed and region information for previous tournaments as I continue to work on this dataset.
These datasets were created by downloading the raw CSV files for each season for the various sections on KenPom's website (Efficiency, Offense, Defense, Point Distribution, Summary, Miscellaneous Team Stats, and Height). All of these raw files were uploaded to Domo and imported into a dataflow using Domo's Magic ETL. In these dataflows, all of the column headers for each of the previous seasons are standardized to the current 2025 naming structure so all of the historical data can be viewed under the exact same field names. All of these cleaned datasets are then appended together, and some additional clean up takes place before ultimately creating the intermediate (INT) datasets that are uploaded to this Kaggle dataset. Once all of the INT datasets were created, I joined all of the tables together on the team name and season so all of these different metrics can be viewed under one single view. From there, I joined an NCAAM Conference & ESPN Team Name Mapping table to add a conference field in its full length and respective acronyms they are known by as well as the team name that ESPN currently uses. Please note that this reference table is an aggregated view of all of the different conferences a team has been a part of since 2002 and the different team names that KenPom has used historically, so this mapping table is necessary to map all of the teams properly and differentiate the historical conferences from their current conferences. From there, I join a reference table that includes all of the current NCAAM coaches and their active coaching lengths because the active current coaching length typically correlates to a team's success in the March Madness tournament. I also join another reference table to include the historical post-season tournament teams in the March Madness, NIT, CBI, and CIT tournaments, and I join another reference table to differentiate the teams who were ranked in the top 12 in the AP Top 25 during week 6 of the respective NCAA season. After some additional data clean-up, all of this cleaned data exports into the "DEV _ March Madness" file that contains the consolidated view of all of this data.
This dataset provides users with the flexibility to export data for further analysis in platforms such as Domo, Power BI, Tableau, Excel, and more. This dataset is designed for users who wish to conduct their own analysis, develop predictive models, or simply gain a deeper understanding of the intricacies that result in the excitement that Division 1 men's college basketball provides every year in March. Whether you are using this dataset for academic research, personal interest, or professional interest, I hope this dataset serves as a foundational tool for exploring the vast landscape of college basketball's most riveting and anticipated event of its season.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThe evaluation of surveillance systems has been recommended by the World Health Organization (WHO) to identify the performance and areas for improvement. Universal salt iodization (USI) as one of the surveillance systems in Tanzania needs periodic evaluation for its optimal function. This study aimed at evaluating the universal salt iodization (USI) surveillance system in Tanzania from January to December 2021 to find out if the system meets its intended objectives by evaluating its attributes as this was the first evaluation of the USI surveillance system since its establishment in 2010. The USI surveillance system is key for monitoring the performance towards the attainment of universal salt iodization (90%).MethodologyThis evaluation was guided by the Center for Disease Control Guidelines for Evaluating Public Health Surveillance Systems, (MMWR) to evaluate USI 2021 data. The study was conducted in Kigoma region in March 2022. Both Purposive and Convenient sampling was used to select the region, district, and ward for the study. The study involved reviewing documents used in the USI system and interviewing the key informants in the USI program. Data analysis was done by Microsoft Excel and presented in tables and graphs.ResultsA total of 1715 salt samples were collected in the year 2021 with 279 (16%) of non-iodized salt identified. The majority of the system attributes 66.7% had a good performance with a score of three, 22.2% had a moderate performance with a score of two and one attribute with poor performance with a score of one. Data quality, completeness and sensitivity were 100%, acceptability 91.6%, simplicity 83% were able to collect data on a single sample in < 2 minutes, the system stability in terms of performance was >75% and the usefulness of the system had poor performance.ConclusionAlthough the system attributes were found to be working overall well, for proper surveillance of the USI system, the core attributes need to be strengthened. Key variables that measure the system performance must be included from the primary data source and well-integrated with the Local Government (district and regions) to Ministry of Health information systems.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset tracks food insecurity across different demographics starting 4/23/2020 to 8/23/2021. It contains fields such as Race, Education, Sex, State, Income, etc. If you're looking for a dataset to examine Covid-19's impact on food insecurity for different demographics, then here you are!
This data is from the United States Census Bureau's Pulse Survey. The Pulse Survey is a frequently updating survey designed to collect data on how people's lives have been impacted by the coronavirus. Specifically, this dataset is a cleaned up version of the ' Food Sufficiency for Households, in the Last 7 Days, by Select Characteristics" tables.
The original form of this data can be found at: https://www.census.gov/programs-surveys/household-pulse-survey/data.html
The original form of this data was split into 36 excel files containing ~67 sheets each. The data was in a non-tidy format, and questions were also not entirely standard. This dataset is my attempt to combine all these different files, tidy the data up, and combine slightly different questions together.
The large amount of NA's are a consequence of how awful the data was originally/ forcing the data into a tidy format. Just filter the NA's out for the question you want to analyze and you'll be fine.
Google Data Analytics How Does a Bike-Share Navigate Speedy Success?
This is a case study project to complete the Google Data Analytics Certification. In this project I followed the data analysis process which are ask, prepare, process, analyze, share, and act. In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. The director of marketing of the company has set a clear goal, to convert casual riders into annual members, which will make the company earn more profits. In order to do that the analyst team needs to better understand how annual and casual riders differ, why casual riders would buy a membership, and how digital media could affect the marketing tactics. How do annual members and casual riders use Cyclistic bikes differently?
Ask Three questions will guide the future marketing program: 1. How do annual members and casual riders use Cyclistic bikes differently? 2. Why would casual riders buy Cyclistic annual memberships? 3. How can Cyclistic use digital media to influence casual riders to become members?
Prepare In this part of data analysis process we will try to answer some of guiding questions about our data source , data quality and perform below task 1, Download data and store it appropriately. 2. Identify how it’s organized. 3. Sort and filter the data 4.Determine the credibility of the data Data Source: https://divvy-tripdata.s3.amazonaws.com/index.html Data License Agreement: https://www.divvybikes.com/data-license-agreement
Process
For the data process part of this project I used Excel, R, MS SQL, T-SQL and Tableau
Excel - was used to check the data integrity , sort and filter individual month data
SQL\ T-SQL - I choose to work on the 12 month dataset from 202011 - 202110 and this was a big dataset to process it in
Excel, so I choose to use SQL for data cleaning and processing
R - I also used R programing to for data cleaning , visualizations and report generation
Tableau - Useed the output dataset from SQL and R to generate viz in Tableau
Analyze 1. Aggregate the data so it’s useful and accessible. 2. Organize and format the data. 3. Perform calculations. 4. Identify trends and relationships.
Share This is a case study project to complete the Google Data Analytics Certification and has been published on kaggle
ACT Based on my analysist I will recommend to the Cyclistic marketing team to - Focus on weekend events and use social media to advertise - Give discount for causal riders since they ride for longer period of time - Promote causal riders to be came a member
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the datasets and data sources, analysis code, and workflow associated with the manuscript "Comparing the Effects of Euclidean Distance Matching and Dynamic Time Warping in the Clustering of COVID-19 Evolution". The following resources are provided:
Data Files:
time_series_data.csv
: A curated time series dataset with dates as rows and NUTS 2 regions as columns. Each column is labeled using a 4-letter abbreviation format "CC.RR", where "CC" represents the country code and "RR" represents the region code. This same abbreviation is also included in the accompanying GeoJSON file.geometry_data.geojson
: A GeoJSON file representing the spatial boundaries of the NUTS 2 regions, with the same 4-letter abbreviations used in the CSV file. EPSG:4326.COVID19_data_sources.xlsx
: This Excel file contains important metadata regarding the sources of COVID-19 data used in this study. It includes:
Code:
analysis.py
: A Python script used to process and analyze the data. This code can be run using Python 3.x. The libraries required to run this script are listed in the first lines of the code. The code is organized in different numbered sections (1), (2), ... and sub-sections (1a), (1b) ... Make sure to run the script one (sub-)section at a time, so that everything stays overviewable and you don't get all the output at once.Workflow:
workflow.png
: A detailed workflow according to the Knowledge Discovery in Databases (KDD) process, outlining the steps involved in processing and analyzing the data, including the methods used. This workflow provides a comprehensive guide to reproducing the analysis presented in the paper.
According to our latest research, the global graph data integration platform market size reached USD 2.1 billion in 2024, reflecting robust adoption across industries. The market is projected to grow at a CAGR of 18.4% from 2025 to 2033, reaching approximately USD 10.7 billion by 2033. This significant growth is fueled by the increasing need for advanced data management and analytics solutions that can handle complex, interconnected data across diverse organizational ecosystems. The rapid digital transformation and the proliferation of big data have further accelerated the demand for graph-based data integration platforms.
The primary growth factor driving the graph data integration platform market is the exponential increase in data complexity and volume within enterprises. As organizations collect vast amounts of structured and unstructured data from multiple sources, traditional relational databases often struggle to efficiently process and analyze these data sets. Graph data integration platforms, with their ability to map, connect, and analyze relationships between data points, offer a more intuitive and scalable solution. This capability is particularly valuable in sectors such as BFSI, healthcare, and telecommunications, where real-time data insights and dynamic relationship mapping are crucial for decision-making and operational efficiency.
Another significant driver is the growing emphasis on advanced analytics and artificial intelligence. Modern enterprises are increasingly leveraging AI and machine learning to extract actionable insights from their data. Graph data integration platforms enable the creation of knowledge graphs and support complex analytics, such as fraud detection, recommendation engines, and risk assessment. These platforms facilitate seamless integration of disparate data sources, enabling organizations to gain a holistic view of their operations and customers. As a result, investment in graph data integration solutions is rising, particularly among large enterprises seeking to enhance their analytics capabilities and maintain a competitive edge.
The surge in regulatory requirements and compliance mandates across various industries also contributes to the expansion of the graph data integration platform market. Organizations are under increasing pressure to ensure data accuracy, lineage, and transparency, especially in highly regulated sectors like finance and healthcare. Graph-based platforms excel in tracking data provenance and relationships, making it easier for companies to comply with regulations such as GDPR, HIPAA, and others. Additionally, the shift towards hybrid and multi-cloud environments further underscores the need for robust data integration tools capable of operating seamlessly across different infrastructures, further boosting market growth.
From a regional perspective, North America currently dominates the graph data integration platform market, accounting for the largest share due to early adoption of advanced data technologies, a strong presence of key market players, and significant investments in digital transformation initiatives. However, Asia Pacific is expected to witness the fastest growth over the forecast period, driven by rapid industrialization, expanding IT infrastructure, and increasing adoption of cloud-based solutions among enterprises in countries like China, India, and Japan. Europe also remains a significant contributor, supported by stringent data privacy regulations and a mature digital economy.
The component segment of the graph data integration platform market is bifurcated into software and services. The software segment currently commands the largest market share, reflecting the critical role of robust graph database engines, visualization tools, and integration frameworks in managing and analyzing complex data relationships. These software solutions are designed to deliver high scalability, flexibility, and real-time proces
The Emissions & Generation Resource Integrated Database (eGRID) is a comprehensive source of data on the environmental characteristics of almost all electric power generated in the United States. These environmental characteristics include air emissions for nitrogen oxides, sulfur dioxide, carbon dioxide, methane, and nitrous oxide; emissions rates; net generation; resource mix; and many other attributes.
eGRID2010 contains the complete release of year 2007 data, as well as years 2005 and 2004 data. Excel spreadsheets, full documentation, summary data, eGRID subregion and NERC region representational maps, and GHG emission factors are included in this data set. The Archived data in eGRID2002 contain years 1996 through 2000 data.
For year 2007 data, the first Microsoft Excel workbook, Plant, contains boiler, generator, and plant spreadsheets. The second Microsoft Excel workbook, Aggregation, contains aggregated data by state, electric generating company, parent company, power control area, eGRID subregion, NERC region, and U.S. total levels. The third Microsoft Excel workbook, ImportExport, contains state import-export data, as well as U.S. generation and consumption data for years 2007, 2005, and 2004. For eGRID data for years 2005 and 2004, a user friendly web application, eGRIDweb, is available to select, view, print, and export specified data.
Load, wind and solar, prices in hourly resolution. This data package contains different kinds of timeseries data relevant for power system modelling, namely electricity prices, electricity consumption (load) as well as wind and solar power generation and capacities. The data is aggregated either by country, control area or bidding zone. Geographical coverage includes the EU and some neighbouring countries. All variables are provided in hourly resolution. Where original data is available in higher resolution (half-hourly or quarter-hourly), it is provided in separate files. This package version only contains data provided by TSOs and power exchanges via ENTSO-E Transparency, covering the period 2015-mid 2020. See previous versions for historical data from a broader range of sources. All data processing is conducted in Python/pandas and has been documented in the Jupyter notebooks linked below.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The O*NET Database contains hundreds of standardized and occupation-specific descriptors on almost 1,000 occupations covering the entire U.S. economy. The database, which is available to the public at no cost, is continually updated by a multi-method data collection program. Sources of data include: job incumbents, occupational experts, occupational analysts, employer job postings, and customer/professional association input.
Data content areas include:
The Bureau of the Census has released Census 2000 Summary File 1 (SF1) 100-Percent data. The file includes the following population items: sex, age, race, Hispanic or Latino origin, household relationship, and household and family characteristics. Housing items include occupancy status and tenure (whether the unit is owner or renter occupied). SF1 does not include information on incomes, poverty status, overcrowded housing or age of housing. These topics will be covered in Summary File 3. Data are available for states, counties, county subdivisions, places, census tracts, block groups, and, where applicable, American Indian and Alaskan Native Areas and Hawaiian Home Lands. The SF1 data are available on the Bureau's web site and may be retrieved from American FactFinder as tables, lists, or maps. Users may also download a set of compressed ASCII files for each state via the Bureau's FTP server. There are over 8000 data items available for each geographic area. The full listing of these data items is available here as a downloadable compressed data base file named TABLES.ZIP. The uncompressed is in FoxPro data base file (dbf) format and may be imported to ACCESS, EXCEL, and other software formats. While all of this information is useful, the Office of Community Planning and Development has downloaded selected information for all states and areas and is making this information available on the CPD web pages. The tables and data items selected are those items used in the CDBG and HOME allocation formulas plus topics most pertinent to the Comprehensive Housing Affordability Strategy (CHAS), the Consolidated Plan, and similar overall economic and community development plans. The information is contained in five compressed (zipped) dbf tables for each state. When uncompressed the tables are ready for use with FoxPro and they can be imported into ACCESS, EXCEL, and other spreadsheet, GIS and database software. The data are at the block group summary level. The first two characters of the file name are the state abbreviation. The next two letters are BG for block group. Each record is labeled with the code and name of the city and county in which it is located so that the data can be summarized to higher-level geography. The last part of the file name describes the contents . The GEO file contains standard Census Bureau geographic identifiers for each block group, such as the metropolitan area code and congressional district code. The only data included in this table is total population and total housing units. POP1 and POP2 contain selected population variables and selected housing items are in the HU file. The MA05 table data is only for use by State CDBG grantees for the reporting of the racial composition of beneficiaries of Area Benefit activities. The complete package for a state consists of the dictionary file named TABLES, and the five data files for the state. The logical record number (LOGRECNO) links the records across tables.
The 2005/6 Household Income and Expenditure Survey is the second nationwide survey of households undertaken by Solomon Islands Statistics Office (SISO) since 1992.
The primary objectives of the HIES includes: • Re-basing of the weights of the current basket of goods and services in the Consumer Price Index (CPI). The survey also aimed to provide data on the behavior of household consumption expenditure patterns that will help form the weights that would reflect the relative importance that consumers attach to commodities and services; • Obtaining relevant data for purposes of updating the series of national accounts aggregates particularly the Gross Domestic Product.
The secondary objectives of the HIES were to: • Obtain data on housing and general demographic characteristics of households; • Obtain data on poverty measures, income and income inequality measures; • Obtain relevant data for the Millennium Development Goals (MDG), particularly health and education; and • Obtain other relevant data where necessary
The field data collecting exercise was undertaken from October 2005 to March 2006 and that seasonality effects on expenditure was not fully considered.
National. The HIES operation covered both the Urban and Rural areas focusing on Honiara, Other Urban Areas and the Rural Areas of the ten (9) provinces, and aimed to produce estimates at the country national and provincial levels only.
The survey targeted private households whilst collective households in hospital, hotels, prison and educational institutions were excluded. A household is considered in the scope for the survey if the household have resided in the Solomon Islands for the last 12 months or more, or if not, they intend to live in Solomon Islands for the next 12 months.
Sample survey data [ssd]
Survey Design The survey was based on a two-stage sampling strategy using probability proportional to size (PPS) selection and random selection. The strategy for selection of each area type is slightly different depending also on enumerator workload schedule and the need to accommodate estimates at the National and Provincial level as well as Urban and Rural splits.
The Survey was designed to collect data for national and provincial level estimates and covered both urban and rural areas. The survey covered Honiara, provincial centers and rural areas within these provinces.
The sampling scheme used was a stratified two stage design with the Enumeration Areas (EA) as the Primary Sampling Unit (PSU) and the households within the sample areas as the secondary sampling unit (SSU). In the first stage the EAs were selected with probability proportional to their population size based on the 1999 population census. In the second stage households were selected using systematic sampling with a random start. The next stage was allocating the sample to each provinces proportional to the square-root of the population. This should mean that estimates of each province would roughly have the same level of accuracy. The sample was then split for each province between the provincial centers (considered to be urban) and the remaining rural population. Given the need for urban and rural estimates the sample was split between the two areas proportional to the square-root of the population based on the 1999 census. The last stage in the process involved modifying the final counts to accommodate the workloads for interviewers during the fieldwork. The interviewers were expected in the field for six months and could accommodate 10 households per month (60 household in total). It was desirable to have the total workloads for each province divisible by 60 to give each interviewer an even sized workload and have the sample spread out evenly across each month.
Since Honiara (capital of Solomon Islands) consists of a mix of areas which covers high income, middle income and low income areas, it was advisable that the EAs be grouped based on the class best suited to their situation. Thus for Honiara the EA list was sorted by the income group category for selection. The number of EAs to select from Honiara is simply the desirable sample size (480 households) divided by the number of households to be selected for each EA. It was decided that 10 households should be selected from each selected EA. Therefore the number of EAs that were selected was equivalent to (480 / 10) = 48 EAs.
Face-to-face [f2f]
The HIES is a relatively complex survey and the instruments to collect data was implemented through the following questionnaires and associated sections: • Household Control Form – household composition and particulars; • Household Expenditure Form – housing amenities, facilities and major household, expenditure on tenure, fixed capital, land, property etc; • Personal Income Form – Income pattern of household members and other income earning activities; • Household Dairy – Daily expenditure by type of goods and services • An additional health module was included – health facility utilization, immunization, motherhood, mortality, breast feeding & family planning, Malaria and miscellaneous
The Statistics Programme at the Secretariat of the Pacific Community (SPC) provided the assistance in data processing. A HIES data entry program was setup in CSPro version 2.6 and data entry started soon after the first workload was registered in the Statistics Office in November 2005 until May 2006. Logic procedures for data editing are prepared in Microsoft Access and data editing for all questionnaires were done in CSPro, except for the Diary where the editing is done in Microsoft Excel. Data management queries are done in Microsoft Access and the production of tables was done in Microsoft Excel. This report was prepared in Microsoft Word. Data verification of 5 per cent is done to check the accuracy of data input, though data edit checks are carried out for completeness, consistency and accuracy including the outliers. Anomalies of data were amended appropriately.
Response Rates A sample of 4,320 households was planned for the country and about 3,822 households (88.5%) responded favorably satisfying the survey requirements.
Non-Response Despite efforts made by the enumerators and follow up attempts by the supervisors in most of the cases, there was non-response encountered during the survey.
The reasons for non response by the household were due mainly to the following: • The household was out of scope of the survey • Dwelling was vacant or not being lived in • The household could not be contacted after a number of attempts • Household excluded for other reasons like death in the family, refusals, customary reasons etc
Error Measurements No formal measures of sample errors have been calculated for the survey results.
Non sampling errors cannot be readily measured. These included: o A response difficulty caused by misunderstanding of what was required from the survey and survey instruments by both households and interviewers. o The questionnaires were in English, which is at least a second language for interviewers and respondents. o The fact that some expenditure are seasonal and would not have been picked up in the survey period. o The exclusion of remote areas and institutions from the sampling frame.
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
This statistical report presents information on obesity, physical activity and diet, drawn together from a variety of sources. The topics covered include: Obesity related hospital admissions. Prescription items for the treatment of obesity. Adult obesity prevalence. Childhood obesity prevalence. Physical activity levels among adults and children. Diet among adults and children, including trends in purchases, and consumption of food and drink and energy intake. Each section provides an overview of the key findings from these sources, as well as providing sources of further information and links to relevant documents and sources. Some of the data have been published previously by NHS Digital. A data visualisation tool at the link below allows users to select obesity related hospital admissions data for any Local Authority (as contained in Excel tables 3, 7 and 11 of this publication), along with time series data from 2013/14. Regional and national comparisons are also provided.
This dataset contains the geographic data used to create maps for the San Diego County Regional Equity Indicators Report led by the Office of Equity and Racial Justice (OERJ). The full report can be found here: https://data.sandiegocounty.gov/stories/s/7its-kgpt
Demographic data from the report can be found here: https://data.sandiegocounty.gov/dataset/Equity-Report-Data-Demographics/q9ix-kfws
Filter by the Indicator column to select data for a particular indicator map.
Export notes: Dataset may not automatically open correctly in Excel due to geospatial data. To export the data for geospatial analysis, select Shapefile or GEOJSON as the file type. To view the data in Excel, export as a CSV but do not open the file. Then, open a blank Excel workbook, go to the Data tab, select “From Text/CSV,” and follow the prompts to import the CSV file into Excel. Alternatively, use the exploration options in "View Data" to hide the geographic column prior to exporting the data.
USER NOTES: 4/7/2025 - The maps and data have been removed for the Health Professional Shortage Areas indicator due to inconsistencies with the data source leading to some missing health professional shortage areas. We are working to fix this issue, including exploring possible alternative data sources.
5/21/2025 - The following changes were made to the 2023 report data (Equity Report Year = 2023). Self-Sufficiency Wage - a typo in the indicator name was fixed (changed sufficienct to sufficient) and the percent for one PUMA corrected from 56.9 to 59.9 (PUMA = San Diego County (Northwest)--Oceanside City & Camp Pendleton). Notes were made consistent for all rows where geography = ZCTA. A note was added to all rows where geography = PUMA. Voter registration - label "92054, 92051" was renamed to be in numerical order and is now "92051, 92054". Removed data from the percentile column because the categories are not true percentiles. Employment - Data was corrected to show the percent of the labor force that are employed (ages 16 and older). Previously, the data was the percent of the population 16 years and older that are in the labor force. 3- and 4-Year-Olds Enrolled in School - percents are now rounded to one decimal place. Poverty - the last two categories/percentiles changed because the 80th percentile cutoff was corrected by 0.01 and one ZCTA was reassigned to a different percentile as a result. Low Birthweight - the 33th percentile label was corrected to be written as the 33rd percentile. Life Expectancy - Corrected the category and percentile assignment for SRA CENTRAL SAN DIEGO. Parks and Community Spaces - corrected the category assignment for six SRAs.
5/21/2025 - Data was uploaded for Equity Report Year 2025. The following changes were made relative to the 2023 report year. Adverse Childhood Experiences - added geographic data for 2025 report. No calculation of bins nor corresponding percentiles due to small number of geographic areas. Low Birthweight - no calculation of bins nor corresponding percentiles due to small number of geographic areas.
Prepared by: Office of Evaluation, Performance, and Analytics and the Office of Equity and Racial Justice, County of San Diego, in collaboration with the San Diego Regional Policy & Innovation Center (https://www.sdrpic.org).